Cluster computing

Wednesday, February 7, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

33) Searching API parameters and result - Often the APIs work correctly but we need to investigate the results and call parameters. In this case, while the API may log its activity, it might often be helpful to correlate the request and the request parameters. This association works the same way as if we were looking for the account which made these requests. With the identifier for the associated request, the request parameters can be obtained from the web server logs.

34) dependency failure tracking - often bugs trickle down the layers from the customer interaction. While it may be fairly evident from the call stack of an exception which dependency failed, we might not always be lucky to get an exception at that level. In such cases, API results of one layer can be used to narrow down the dependency in the other layer. In this case we apply the same technique as above but we use a success and a failure case to determine the field associated with the failure and consequently the service that failed.

35) dependency log correlation - The downstream service from the case above may also have its own logs. By narrowing the dependency in the above case, we can now correlate the actions taken by this dependency for the request. We match the request and the result to and from this dependency with the activities taken by the dependency to find the exception log.

36) Chains of dependencies - Sometimes the dependency chain is not one level deep but goes into services affecting more than one layers below each other as some layers pass down the calls. In such cases the result translation to request at the downstream layers needs to be related. In such case the logging is searched successively layer by layer.

Tuesday, February 6, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

29) looking up specific API for behavior across customerIds. API may be used for several accounts which may differ based on type - for example, there may be accounts that have a mobile number added. Further there may be accounts that have two step verification enabled. Moreover these accounts may exist in different retail domains. Looking up api behavior across different accounts can help with determining missed test case or bugs. The API usages may also explain behavior difference across accounts.

30) tracking API activity across devices - In the example above the same API may be called from different devices. Since there may be native applications on these devices, these applications may behave differently in their api calls. It might be harder to debug whether the application is using the APIs correctly and easier to find out on the server side whether the issue is specific to a type of application such as iOS or Android.

31) Listing the error codes - The API in the example above may return different error codes subject to callers, their call parameters, the device they are calling from, the account they use and the realm they are targeting. The example above differentiated the callers to see if this was a specific caller issue. This example charts the server side responses by error codes to diagnose the issues on the server side.

32) Eliminating a specific error code - The example above helped explain the difference in success and failures of the API on the server side. Typically the number of success is far higher than the number of failures but bugs may exist in the server to cause a consistent error rate in the API. Even though the error rate may be small, detecting the consistent one may and studying just those usages might prove useful Even if the error code is the same, other request parameters or call usages may indicate a symptom.

Monday, February 5, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

25) looking up authorization provider - Accounts are often used for payment purposes. Different merchants may honor payments from the same account pool. In such cases, the payments provider becomes an Oauth facilitator between merchants on different domains. In such cases, the access to different merchants may become historical value. Since this protocol allows the facilitator to be identified by a specific client id, it can be used to query the logs to list the merchants involved.

26) tracking user activity across merchants - In the example above a token for access is issued as an amalgamation of representations for the payment provider, the customer as well as an issuing authority. The customer representation can therefore help with listing the actions taken across merchants at the identity provider especially given that there is no necessity for signing in again at the participating merchants.

27) Listing the scopes - Access to a resource may be governed by fine grained scope. These scopes are associated with the access based on a token. The token is granted by the issuing authority specifically for the resources. A search of the logs for all the scopes used in a time window will give all the resource access sought.

28) Listing customers at a participating merchant site - Since tokens carry a representation for the user, it is helpful to list all the customers at a mechant site especially if there is a single merchant that has been affected. A followup to only the customers at the merchant becomes easier with this kind of search query.

Sunday, February 4, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

21) IP addresses of successful and failed login attempts - we have discussed counting successful and failed login attempts by looking at the counter metric in the code associated with this events. However that counter does not let us study a denial of service attack. Therefore we scan the logs for calls to sign in and count them by ipaddress. The culprit will usually have a high call volume with little success rate.

22) timecharting of successful and failed logons - The denial of service is not the only cause for skew in success to failed ratio. in order to find the others we may need to see the changes in this ratio in consecutive time intervals. We search for time intervals with regex for timestamp on all events that are sign in and count them in intervals towards either success or failures. A sample reflex is (?<timestampA>\d{4}-\d{2}-\d+)T(?<timestampB>\d+:\d+:\d+.\d+)

23) Finding customer ids of failed login attempts - a very useful information to have is whether failure failures in signisign in attempts occur with a specific set of accounts.

24) Top 10 most active users - just like the previous point talks about different account pools, here we look at the count of customer activities to determine the top few. This may indicate unusual activity depending on the count.

Saturday, February 3, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

17) Search for single sign on activity - Unlike regular signin activity, a single sign on enables a user to navigate different domains and remain logged in. This is the equivalent of seamlessly signing into each domain using a secure token. These single session have their own identifier which are regenerated from the existing signin or a fresh one. Consequently the log search query here follows the sequence of unique identifiers issued.

18) TimeTraveling - This involves chronological sequencing of events for a particular criteria. Since events in the log are progressive, usually a selection of events are already sequenced. however when we have to corroborate race conditions, we need to evaluate their timestamps. Here we search for a timestamp within the event using a regular expression such as (?<timestampA>\d{4}-\d{2}-\d+)T(?<timestampB>\d+:\d+:\d+.\d+) and then search for the matching event from that timestamp.

19) Unleashing an army of fishes - This is a fun query where we determine a set of correlated events and write a search criteria for them, hence the term fishes. Then we evaluate it in offset by offset basis of time intervals and how they move between offsets from start to finish.

20) Number of user agents = With the growing popularity of the number of mobile phone devices, applications and the popularity of voice recognition devices, a website may collect traffic from different user agents. The breakdown of this traffic per source is a useful search query especially for knowing if the website is working fine for one and not for others.

Friday, February 2, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

13) Search for users of a workflow = This determines the set of users who took the same actions corresponding to a workflow. Web applications are developed in features that usually translate as routes to the domain. A series of redirects between different routes is termed a workflow. The feature adds value to the customer by introducing a workflow. The success of the feature is found by the percentage of people who completed the workflow. Finding the customers who took this approach and listing them requires finding requests that include the given new route and then finding customers who have that route and then finding the workflow trail in these customers to see how many completed.

14) Determining external versus internal failures = This determines the set of failures in workflows that resulted from user error versus those that resulted from system error. Systems are designed to be fault tolerant, highly available and quality service but they are complex and include many dependencies among participating services. The purpose of this log search query is two fold : first to determine that the number of system failures is within tolerance and second, to determine the classify the failures that the users encounter.

15) Real time website connections = This determines the number of connections that are made to the server. Some websites report this number on their website itself. For example, certain blog sites indicate the number of visitors. However, for a commercial system this goes beyond visitors and determines the number of active sessions on the webserver

16) Number of user agents = With the growing popularity of the number of mobile phone devices, applications and the popularity of voice recognition devices, a website may collect traffic from different user agents. The breakdown of this traffic per source is a useful search query especially for knowing if the website is working fine for one and not for others.

#codingexercise:

Determine third order Fibonacci series:
T(n) = Fib(Fib(Fib(n)))

generate maze

for (int i=1; i<ROWS; i++) {
for(int j=1;j<COLS;j++) {
String c = (Math.floor((Math.random()*2)%2)) ? "╱" : "╲";
Console.Write(c);
}
Console.Writeline("<br>");
}

Thursday, February 1, 2018

We were looking at some of the search queries that are typical of searching the logs of identity provider:

Some other interesting events for identity include:

9) Query to report on callers from different origins - The Login screen for the identity provider may be visited from different domains. The count of requests from each of these origins can be easily found by looking for the referrer domains and adding a count for each occurrence.

10) Determine users who used two step verification and their success rate. The growth in popularity of one time passcodes over captcha and other challenge questions could be plotted on a graph as a trend by tagging the requests that performed these operations. One of the reasons one time passcodes are popular is that unlike other forms they have less chance of going wrong. The user is guaranteed to get a fresh code and the code will successfully authenticate the user.
OTP is used in many workflows for this purpose.

11) Searching for login attempts. The above scenario also leads us to evaluate the conditions where customers did end up re-attempting where the captcha or their interaction on the page did not work. The hash of failures and their counts will determine which of these is a significant source of error. One of the outcomes of this is that we may discover some forms of challenges as not suitable for the user. In these cases, it is easier to migrate the user to other workflows.

12) Modifications made to account state - Some of the best indicators of fraudulent activity is the pattern of access of account state whether it is to read or write. For example, the address, zip code and payment methods of the account change less frequently than the password for the user. If these do change often for a user and from different source, they may lead to fraud detection. Hence the logs simply evaluate the criteria for change and pipe it to sort | uniq -c | sort -rn to get a stats for the same.

#codingexercise
Find the second order of the Fibonacci Series
H (n) = Fib (Fib (n))