Cluster computing

Monday, February 12, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

In particular, we were looking for a few lines above and below a match to include associated event attributes. This is easy with a streaming operation in the shell command with "grep –C<N> literal file". In SQL this becomes slightly complicated involving a recursive common table expression. A nested query might work too provided the identifiers are continuous.
For example:
SELECT a.*
FROM Table1 as a,
(SELECT id FROM Table1 WHERE message LIKE '%hello%') as b
WHERE a.ids BETWEEN b.id-N AND b.id+N;
On the other hand by using max(b.id) < id and min(b.id) > id as the sentinels, we can now advance the sentinels row by row in a recursive query to always include a determined number of lines above and below the match
For example:
with sentinels(prevr, nextr, lvl) as (
select nvl((select max(e.employee_id)
              from   hr.employees e
              where e.employee_id < emp.employee_id),
              emp.employee_id) prevr,
         nvl((select min(e.employee_id)
              from   hr.employees e
              where e.employee_id > emp.employee_id),
              emp.employee_id) nextr,
         1 lvl
from   hr.employees emp
where last_name = @lastname
union all
select nvl((select max(e.employee_id)
              from   hr.employees e
              where e.employee_id < prevr),
              prevr
         ) prevr,
         nvl((select min(e.employee_id)
              from   hr.employees e
              where e.employee_id > nextr),
              nextr
         ) nextr,
         lvl+1 lvl
from   sentinels
where lvl+1 <= @lvl
)
select e.employee_id, e.last_name
from   hr.employees e
join   sentinels b
on     e.employee_id between b.prevr and b.nextr
and    b.lvl = @lvl
order by e.employee_id;
adapted from Oracle blog by Chris Saxon

The additional lines around a match provide additional attributes that may now be searched for direct information or indirectly tagged and counted as contributing towards the tally for the labels.

In the logs, we can leverage protocols other than http and oauth. For example, if we use SAML or other encrypted but shared parameters, we can use it for correlations. Similarly user agents generally give a lot of information about the origin and can be used to selectively filter the requests. In addition to protocols, applications and devices contributing to request parameters, cookies may also store information that can be searched when they make it to the logs. Most mobile devices also come with app stores from where packet capture applications for those devices can be downloaded and installed. Although the use of simulator and live debugging does away with the use of packet capture applications, they certainly form a source of information.
The logs for mobile devices can also be shared especially if they are kept limited and small with a finite number of entries.

Sunday, February 11, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

45) looking for a few lines above and below a match to include associated event attributes. This is easy with a streaming operation in the shell command with "grep –C<N> literal file". In SQL this becomes slightly complicated involving a recursive common table expression. A nested query might work too provided the identifiers are continuous.
For example:
SELECT a.*
FROM Table1 as a,
(SELECT id FROM Table1 WHERE message LIKE '%hello%') as b
WHERE a.ids BETWEEN b.id-N AND b.id+N;
On the other hand by using max(b.id) < id and min(b.id) > id as the sentinels, we can now advance the sentinels row by row in a recursive query to always include a determined number of lines above and below the match

46) grouping selections and counting now works successfully with the above logic. For example, if we are searching for http requests in a long that span multiple lines one for each request parameter, then we could include the associated parameters to corresponding to the requests that match as tags to group the requests. For example

grep -C7 match file | grep tag | cut -d"=" -f1 | sort | uniq -c | sort -nr

47) In the absence of an already existing tags, we can now create new tags with search and replace command in the same logic as above but with piping operation..

Saturday, February 10, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

41) device access calls When mobile applications make request and responses to the server, they are harder to debug live because the code is usually tried on a simulator. Both iOS and android allow applications to be simulated and debugged so it may perform the same on actual device. However, logs provide a convenient mechanism to track the conversations with the server so long as the conversation can be narrowed down based on device, application, customer and session.

42) Device access without customer - devices may have to do handshakes before a customer data flow can be initiated. Fortunately, most applications and devices now follow similar Oauth protocol to handle this. They use client based identifier and secret that is specific to the application and the device. A device based authorization flow is also different from other oauth workflows because it uses no user-context mode. These calls are therefore easily searchable with oauth parameters.

43) Device with customer context - When the device engage in OAuth conversations with the customer context they usually carry an access token or a refresh token. These refresh tokens are exchanged old for the new so we can enumerate all such conversations based on the old and new tokens issues during the conversation. This line of search is very helpful across all api calls made with oauth because the calls are usually short lived and the access token spans more than one call so searching for other calls in the vicinity of a call is now just a regular expression or literal search

44) long lived customer context - When the devices engage in conversations on behalf of the customer and the user agent sessions are not lasting upto an hour but there is cross domain access, the number of api calls increase significantly even for the narrowed conversation. In such cases, we shift to higher level identifiers such as session tokens for single sign-on or identifiers for client context.

Friday, February 9, 2018

Web Assets as a software update

Introduction:
Any application with a web interface requires the usage of resources in the form of markup, stylesheets and scripts. Although they may represent code for the interaction with the end user, they don’t necessarily have to be maintained on the server side and treated the same way as server side code. This document argues for using an update service for any code that is not maintained on the server side. The update service automatically downloads and installs the latest update to the code on a device or a relay server by a pull mechanism rather than the conventional pipeline based push mechanism.

Description:
Content Delivery Network are widely popular to make web application assets available to a web page regardless of whether it is hosted on the mobile, desktop or software as a service. They serve many purposes but primarily function as a set of proxy servers distributed over geographical locations such that the web page may readily find them and download them at high speed regardless of when, where and how the web page is displayed. Update service on the other hand is generally a feature of any software platform such that tenants can download the latest update from their publisher. The server on the other hand has been a model where there is a single source code from a single point of origin and usually gated over a pipeline and every consuming device or application points to this server via web redirects. These three software publishing conventions make no restrictions over the size or granularity of individual releases and generally they are determined based on what can be achieved within a timeline. Since the most recent update is guaranteed to work compatible with previous versions of host or device ecosystem and updates are mostly forward progressive, there is very little testing or requirement to ensure that new releases mix and match on a particular host works well. Moreover, a number of request responses are already being made to load a web page. Therefore, there is no necessity to make these downloads or responses to be a minimum size. This brings us to a point where we view assets not as a bundle but as something discrete that can be versioned and made available over a content delivery network. The rules for publishing assets to a set of proxy servers are similar to the rules for releasing code to a virtual server.

Conclusion:
Software may be viewed both in terms of server side logic and client updated assets. The granularity of releases for both can be fine grained and independently verified. The distribution may be finely balanced so that the physical representation of what makes a web application, is much more modular and an opt in for every consumer.

Thursday, February 8, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

37) cross API calls - in the API sequence across layers such as http filters we discussed how to walk down the chain in the logs to find out which layer responded with an error. This mention here is for the same layer cross API calls which determine the response from this layer. Sometimes we have the information for responses gatherer via cross api calls and determining their failures requires inspection of the responses formed in this layer.

38) state sharing between APIs - most caller and callee share state or keys for each other and this helps in tracking or studying them in the logs. The count of unique such states indicates the distinct conversations between APIs. In this case we can even re-use this to find out the exact input or output for a particular customer. Often the customerId is shared in the request parameters itself, so listing all APIs by customerId should have covered this case but this is not necessarily true for APIs from different departments that may not follow the same rules. In such cases the translation of customerId to the corresponding key/state helps find the API calls.

39) incorrect API responses - one of the most notorious failures in the services is when the api fails without an exception. The latter is very helpful for diagnosis and troubleshooting because it determines a point of failure. In its absence reconstructing the point of failure by studying requests and responses at the API become very difficult. For this purpose tracing the api activity may come helpful but because production logs are rarely at debug level, it would behoove the api to log incorrect responses also. In such cases, the results are easier to diagnose and determine when compared with the other successful calls.

40) state pass through - one of the most successful techniques is when apis capture and append state that will be helpful downstream. In the example cited above, the logs were to be enhanced to improve the diagnosability. Here the data speaks for itself. The data carries all the information we need.subsequently and the operation at any particular layer merely has to look at this state.

#codingexercise
Generate the nth Newman Conway Sequence number. This sequence is shown as
1 1 2 2 3 4 4 4 5 6 7 7
It is defined as the recursion :
P(n) = P(P(n - 1)) + P(n - P(n - 1))
and with closure conditions as
P (1) = 1
P (2) = 1

double GetNCS ( double n )
{
if ( n == 1 || n == 2)
return 1;
else
return GetNCS (GetNCS (n-1)) + GetNCS (n-GetNCS (n-1));
}
n = 3:
P (P (2))+P (3-P (2))
= P (1) +P (2) 0= 2
n = 4
P (P (3))+P (4-P (3))
= P (2) +P (2) = 2

Wednesday, February 7, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

33) Searching API parameters and result - Often the APIs work correctly but we need to investigate the results and call parameters. In this case, while the API may log its activity, it might often be helpful to correlate the request and the request parameters. This association works the same way as if we were looking for the account which made these requests. With the identifier for the associated request, the request parameters can be obtained from the web server logs.

34) dependency failure tracking - often bugs trickle down the layers from the customer interaction. While it may be fairly evident from the call stack of an exception which dependency failed, we might not always be lucky to get an exception at that level. In such cases, API results of one layer can be used to narrow down the dependency in the other layer. In this case we apply the same technique as above but we use a success and a failure case to determine the field associated with the failure and consequently the service that failed.

35) dependency log correlation - The downstream service from the case above may also have its own logs. By narrowing the dependency in the above case, we can now correlate the actions taken by this dependency for the request. We match the request and the result to and from this dependency with the activities taken by the dependency to find the exception log.

36) Chains of dependencies - Sometimes the dependency chain is not one level deep but goes into services affecting more than one layers below each other as some layers pass down the calls. In such cases the result translation to request at the downstream layers needs to be related. In such case the logging is searched successively layer by layer.

Tuesday, February 6, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

29) looking up specific API for behavior across customerIds. API may be used for several accounts which may differ based on type - for example, there may be accounts that have a mobile number added. Further there may be accounts that have two step verification enabled. Moreover these accounts may exist in different retail domains. Looking up api behavior across different accounts can help with determining missed test case or bugs. The API usages may also explain behavior difference across accounts.

30) tracking API activity across devices - In the example above the same API may be called from different devices. Since there may be native applications on these devices, these applications may behave differently in their api calls. It might be harder to debug whether the application is using the APIs correctly and easier to find out on the server side whether the issue is specific to a type of application such as iOS or Android.

31) Listing the error codes - The API in the example above may return different error codes subject to callers, their call parameters, the device they are calling from, the account they use and the realm they are targeting. The example above differentiated the callers to see if this was a specific caller issue. This example charts the server side responses by error codes to diagnose the issues on the server side.

32) Eliminating a specific error code - The example above helped explain the difference in success and failures of the API on the server side. Typically the number of success is far higher than the number of failures but bugs may exist in the server to cause a consistent error rate in the API. Even though the error rate may be small, detecting the consistent one may and studying just those usages might prove useful Even if the error code is the same, other request parameters or call usages may indicate a symptom.