Cluster computing

Saturday, February 10, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

41) device access calls When mobile applications make request and responses to the server, they are harder to debug live because the code is usually tried on a simulator. Both iOS and android allow applications to be simulated and debugged so it may perform the same on actual device. However, logs provide a convenient mechanism to track the conversations with the server so long as the conversation can be narrowed down based on device, application, customer and session.

42) Device access without customer - devices may have to do handshakes before a customer data flow can be initiated. Fortunately, most applications and devices now follow similar Oauth protocol to handle this. They use client based identifier and secret that is specific to the application and the device. A device based authorization flow is also different from other oauth workflows because it uses no user-context mode. These calls are therefore easily searchable with oauth parameters.

43) Device with customer context - When the device engage in OAuth conversations with the customer context they usually carry an access token or a refresh token. These refresh tokens are exchanged old for the new so we can enumerate all such conversations based on the old and new tokens issues during the conversation. This line of search is very helpful across all api calls made with oauth because the calls are usually short lived and the access token spans more than one call so searching for other calls in the vicinity of a call is now just a regular expression or literal search

44) long lived customer context - When the devices engage in conversations on behalf of the customer and the user agent sessions are not lasting upto an hour but there is cross domain access, the number of api calls increase significantly even for the narrowed conversation. In such cases, we shift to higher level identifiers such as session tokens for single sign-on or identifiers for client context.

Friday, February 9, 2018

Web Assets as a software update

Introduction:
Any application with a web interface requires the usage of resources in the form of markup, stylesheets and scripts. Although they may represent code for the interaction with the end user, they don’t necessarily have to be maintained on the server side and treated the same way as server side code. This document argues for using an update service for any code that is not maintained on the server side. The update service automatically downloads and installs the latest update to the code on a device or a relay server by a pull mechanism rather than the conventional pipeline based push mechanism.

Description:
Content Delivery Network are widely popular to make web application assets available to a web page regardless of whether it is hosted on the mobile, desktop or software as a service. They serve many purposes but primarily function as a set of proxy servers distributed over geographical locations such that the web page may readily find them and download them at high speed regardless of when, where and how the web page is displayed. Update service on the other hand is generally a feature of any software platform such that tenants can download the latest update from their publisher. The server on the other hand has been a model where there is a single source code from a single point of origin and usually gated over a pipeline and every consuming device or application points to this server via web redirects. These three software publishing conventions make no restrictions over the size or granularity of individual releases and generally they are determined based on what can be achieved within a timeline. Since the most recent update is guaranteed to work compatible with previous versions of host or device ecosystem and updates are mostly forward progressive, there is very little testing or requirement to ensure that new releases mix and match on a particular host works well. Moreover, a number of request responses are already being made to load a web page. Therefore, there is no necessity to make these downloads or responses to be a minimum size. This brings us to a point where we view assets not as a bundle but as something discrete that can be versioned and made available over a content delivery network. The rules for publishing assets to a set of proxy servers are similar to the rules for releasing code to a virtual server.

Conclusion:
Software may be viewed both in terms of server side logic and client updated assets. The granularity of releases for both can be fine grained and independently verified. The distribution may be finely balanced so that the physical representation of what makes a web application, is much more modular and an opt in for every consumer.

Thursday, February 8, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

37) cross API calls - in the API sequence across layers such as http filters we discussed how to walk down the chain in the logs to find out which layer responded with an error. This mention here is for the same layer cross API calls which determine the response from this layer. Sometimes we have the information for responses gatherer via cross api calls and determining their failures requires inspection of the responses formed in this layer.

38) state sharing between APIs - most caller and callee share state or keys for each other and this helps in tracking or studying them in the logs. The count of unique such states indicates the distinct conversations between APIs. In this case we can even re-use this to find out the exact input or output for a particular customer. Often the customerId is shared in the request parameters itself, so listing all APIs by customerId should have covered this case but this is not necessarily true for APIs from different departments that may not follow the same rules. In such cases the translation of customerId to the corresponding key/state helps find the API calls.

39) incorrect API responses - one of the most notorious failures in the services is when the api fails without an exception. The latter is very helpful for diagnosis and troubleshooting because it determines a point of failure. In its absence reconstructing the point of failure by studying requests and responses at the API become very difficult. For this purpose tracing the api activity may come helpful but because production logs are rarely at debug level, it would behoove the api to log incorrect responses also. In such cases, the results are easier to diagnose and determine when compared with the other successful calls.

40) state pass through - one of the most successful techniques is when apis capture and append state that will be helpful downstream. In the example cited above, the logs were to be enhanced to improve the diagnosability. Here the data speaks for itself. The data carries all the information we need.subsequently and the operation at any particular layer merely has to look at this state.

#codingexercise
Generate the nth Newman Conway Sequence number. This sequence is shown as
1 1 2 2 3 4 4 4 5 6 7 7
It is defined as the recursion :
P(n) = P(P(n - 1)) + P(n - P(n - 1))
and with closure conditions as
P (1) = 1
P (2) = 1

double GetNCS ( double n )
{
if ( n == 1 || n == 2)
return 1;
else
return GetNCS (GetNCS (n-1)) + GetNCS (n-GetNCS (n-1));
}
n = 3:
P (P (2))+P (3-P (2))
= P (1) +P (2) 0= 2
n = 4
P (P (3))+P (4-P (3))
= P (2) +P (2) = 2

Wednesday, February 7, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

33) Searching API parameters and result - Often the APIs work correctly but we need to investigate the results and call parameters. In this case, while the API may log its activity, it might often be helpful to correlate the request and the request parameters. This association works the same way as if we were looking for the account which made these requests. With the identifier for the associated request, the request parameters can be obtained from the web server logs.

34) dependency failure tracking - often bugs trickle down the layers from the customer interaction. While it may be fairly evident from the call stack of an exception which dependency failed, we might not always be lucky to get an exception at that level. In such cases, API results of one layer can be used to narrow down the dependency in the other layer. In this case we apply the same technique as above but we use a success and a failure case to determine the field associated with the failure and consequently the service that failed.

35) dependency log correlation - The downstream service from the case above may also have its own logs. By narrowing the dependency in the above case, we can now correlate the actions taken by this dependency for the request. We match the request and the result to and from this dependency with the activities taken by the dependency to find the exception log.

36) Chains of dependencies - Sometimes the dependency chain is not one level deep but goes into services affecting more than one layers below each other as some layers pass down the calls. In such cases the result translation to request at the downstream layers needs to be related. In such case the logging is searched successively layer by layer.

Tuesday, February 6, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

29) looking up specific API for behavior across customerIds. API may be used for several accounts which may differ based on type - for example, there may be accounts that have a mobile number added. Further there may be accounts that have two step verification enabled. Moreover these accounts may exist in different retail domains. Looking up api behavior across different accounts can help with determining missed test case or bugs. The API usages may also explain behavior difference across accounts.

30) tracking API activity across devices - In the example above the same API may be called from different devices. Since there may be native applications on these devices, these applications may behave differently in their api calls. It might be harder to debug whether the application is using the APIs correctly and easier to find out on the server side whether the issue is specific to a type of application such as iOS or Android.

31) Listing the error codes - The API in the example above may return different error codes subject to callers, their call parameters, the device they are calling from, the account they use and the realm they are targeting. The example above differentiated the callers to see if this was a specific caller issue. This example charts the server side responses by error codes to diagnose the issues on the server side.

32) Eliminating a specific error code - The example above helped explain the difference in success and failures of the API on the server side. Typically the number of success is far higher than the number of failures but bugs may exist in the server to cause a consistent error rate in the API. Even though the error rate may be small, detecting the consistent one may and studying just those usages might prove useful Even if the error code is the same, other request parameters or call usages may indicate a symptom.

Monday, February 5, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

25) looking up authorization provider - Accounts are often used for payment purposes. Different merchants may honor payments from the same account pool. In such cases, the payments provider becomes an Oauth facilitator between merchants on different domains. In such cases, the access to different merchants may become historical value. Since this protocol allows the facilitator to be identified by a specific client id, it can be used to query the logs to list the merchants involved.

26) tracking user activity across merchants - In the example above a token for access is issued as an amalgamation of representations for the payment provider, the customer as well as an issuing authority. The customer representation can therefore help with listing the actions taken across merchants at the identity provider especially given that there is no necessity for signing in again at the participating merchants.

27) Listing the scopes - Access to a resource may be governed by fine grained scope. These scopes are associated with the access based on a token. The token is granted by the issuing authority specifically for the resources. A search of the logs for all the scopes used in a time window will give all the resource access sought.

28) Listing customers at a participating merchant site - Since tokens carry a representation for the user, it is helpful to list all the customers at a mechant site especially if there is a single merchant that has been affected. A followup to only the customers at the merchant becomes easier with this kind of search query.

Sunday, February 4, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

21) IP addresses of successful and failed login attempts - we have discussed counting successful and failed login attempts by looking at the counter metric in the code associated with this events. However that counter does not let us study a denial of service attack. Therefore we scan the logs for calls to sign in and count them by ipaddress. The culprit will usually have a high call volume with little success rate.

22) timecharting of successful and failed logons - The denial of service is not the only cause for skew in success to failed ratio. in order to find the others we may need to see the changes in this ratio in consecutive time intervals. We search for time intervals with regex for timestamp on all events that are sign in and count them in intervals towards either success or failures. A sample reflex is (?<timestampA>\d{4}-\d{2}-\d+)T(?<timestampB>\d+:\d+:\d+.\d+)

23) Finding customer ids of failed login attempts - a very useful information to have is whether failure failures in signisign in attempts occur with a specific set of accounts.

24) Top 10 most active users - just like the previous point talks about different account pools, here we look at the count of customer activities to determine the top few. This may indicate unusual activity depending on the count.