Cluster computing

Monday, February 5, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

25) looking up authorization provider - Accounts are often used for payment purposes. Different merchants may honor payments from the same account pool. In such cases, the payments provider becomes an Oauth facilitator between merchants on different domains. In such cases, the access to different merchants may become historical value. Since this protocol allows the facilitator to be identified by a specific client id, it can be used to query the logs to list the merchants involved.

26) tracking user activity across merchants - In the example above a token for access is issued as an amalgamation of representations for the payment provider, the customer as well as an issuing authority. The customer representation can therefore help with listing the actions taken across merchants at the identity provider especially given that there is no necessity for signing in again at the participating merchants.

27) Listing the scopes - Access to a resource may be governed by fine grained scope. These scopes are associated with the access based on a token. The token is granted by the issuing authority specifically for the resources. A search of the logs for all the scopes used in a time window will give all the resource access sought.

28) Listing customers at a participating merchant site - Since tokens carry a representation for the user, it is helpful to list all the customers at a mechant site especially if there is a single merchant that has been affected. A followup to only the customers at the merchant becomes easier with this kind of search query.

Sunday, February 4, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

21) IP addresses of successful and failed login attempts - we have discussed counting successful and failed login attempts by looking at the counter metric in the code associated with this events. However that counter does not let us study a denial of service attack. Therefore we scan the logs for calls to sign in and count them by ipaddress. The culprit will usually have a high call volume with little success rate.

22) timecharting of successful and failed logons - The denial of service is not the only cause for skew in success to failed ratio. in order to find the others we may need to see the changes in this ratio in consecutive time intervals. We search for time intervals with regex for timestamp on all events that are sign in and count them in intervals towards either success or failures. A sample reflex is (?<timestampA>\d{4}-\d{2}-\d+)T(?<timestampB>\d+:\d+:\d+.\d+)

23) Finding customer ids of failed login attempts - a very useful information to have is whether failure failures in signisign in attempts occur with a specific set of accounts.

24) Top 10 most active users - just like the previous point talks about different account pools, here we look at the count of customer activities to determine the top few. This may indicate unusual activity depending on the count.

Saturday, February 3, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

17) Search for single sign on activity - Unlike regular signin activity, a single sign on enables a user to navigate different domains and remain logged in. This is the equivalent of seamlessly signing into each domain using a secure token. These single session have their own identifier which are regenerated from the existing signin or a fresh one. Consequently the log search query here follows the sequence of unique identifiers issued.

18) TimeTraveling - This involves chronological sequencing of events for a particular criteria. Since events in the log are progressive, usually a selection of events are already sequenced. however when we have to corroborate race conditions, we need to evaluate their timestamps. Here we search for a timestamp within the event using a regular expression such as (?<timestampA>\d{4}-\d{2}-\d+)T(?<timestampB>\d+:\d+:\d+.\d+) and then search for the matching event from that timestamp.

19) Unleashing an army of fishes - This is a fun query where we determine a set of correlated events and write a search criteria for them, hence the term fishes. Then we evaluate it in offset by offset basis of time intervals and how they move between offsets from start to finish.

20) Number of user agents = With the growing popularity of the number of mobile phone devices, applications and the popularity of voice recognition devices, a website may collect traffic from different user agents. The breakdown of this traffic per source is a useful search query especially for knowing if the website is working fine for one and not for others.

Friday, February 2, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

13) Search for users of a workflow = This determines the set of users who took the same actions corresponding to a workflow. Web applications are developed in features that usually translate as routes to the domain. A series of redirects between different routes is termed a workflow. The feature adds value to the customer by introducing a workflow. The success of the feature is found by the percentage of people who completed the workflow. Finding the customers who took this approach and listing them requires finding requests that include the given new route and then finding customers who have that route and then finding the workflow trail in these customers to see how many completed.

14) Determining external versus internal failures = This determines the set of failures in workflows that resulted from user error versus those that resulted from system error. Systems are designed to be fault tolerant, highly available and quality service but they are complex and include many dependencies among participating services. The purpose of this log search query is two fold : first to determine that the number of system failures is within tolerance and second, to determine the classify the failures that the users encounter.

15) Real time website connections = This determines the number of connections that are made to the server. Some websites report this number on their website itself. For example, certain blog sites indicate the number of visitors. However, for a commercial system this goes beyond visitors and determines the number of active sessions on the webserver

16) Number of user agents = With the growing popularity of the number of mobile phone devices, applications and the popularity of voice recognition devices, a website may collect traffic from different user agents. The breakdown of this traffic per source is a useful search query especially for knowing if the website is working fine for one and not for others.

#codingexercise:

Determine third order Fibonacci series:
T(n) = Fib(Fib(Fib(n)))

generate maze

for (int i=1; i<ROWS; i++) {
for(int j=1;j<COLS;j++) {
String c = (Math.floor((Math.random()*2)%2)) ? "╱" : "╲";
Console.Write(c);
}
Console.Writeline("<br>");
}

Thursday, February 1, 2018

We were looking at some of the search queries that are typical of searching the logs of identity provider:

Some other interesting events for identity include:

9) Query to report on callers from different origins - The Login screen for the identity provider may be visited from different domains. The count of requests from each of these origins can be easily found by looking for the referrer domains and adding a count for each occurrence.

10) Determine users who used two step verification and their success rate. The growth in popularity of one time passcodes over captcha and other challenge questions could be plotted on a graph as a trend by tagging the requests that performed these operations. One of the reasons one time passcodes are popular is that unlike other forms they have less chance of going wrong. The user is guaranteed to get a fresh code and the code will successfully authenticate the user.
OTP is used in many workflows for this purpose.

11) Searching for login attempts. The above scenario also leads us to evaluate the conditions where customers did end up re-attempting where the captcha or their interaction on the page did not work. The hash of failures and their counts will determine which of these is a significant source of error. One of the outcomes of this is that we may discover some forms of challenges as not suitable for the user. In these cases, it is easier to migrate the user to other workflows.

12) Modifications made to account state - Some of the best indicators of fraudulent activity is the pattern of access of account state whether it is to read or write. For example, the address, zip code and payment methods of the account change less frequently than the password for the user. If these do change often for a user and from different source, they may lead to fraud detection. Hence the logs simply evaluate the criteria for change and pipe it to sort | uniq -c | sort -rn to get a stats for the same.

#codingexercise
Find the second order of the Fibonacci Series
H (n) = Fib (Fib (n))

Wednesday, January 31, 2018

We discussed techniques for request chaining here: https://www.blogger.com/blogger.g?blogID=1985795500472842279#editor/target=post;postID=859017183586117344;onPublishedMenu=allposts;onClosedMenu=allposts;postNum=0;src=link
We were looking at some of the search queries that are typical of searching the logs of identity provider:
Some other interesting events for identity include

5) – failed attempt to login to a disabled account

Here we look for all failed login events and filter out those that have been due to a disabled account. We do this for a given time range and then we count the events based on attributes such as those belonging to a specific account

6) Failed versus successful logon attempts – While there may be counters for failed versus successful logon attempts globally, it requires to scan the logs for individual accounts. Here we find all the requests associated with the customer and then we filter out the ones that are logon attempts and for each logon attemps we find the corresponding success or failure response by following the sequence of requests from each such starting request. Then we split the failure versus success counts

7) Getting a list of concurrent users – Generally this requires a time-frame where the associated events are collected and then they are processed for detecting active distinct sessions or customers – either of which may serve the purpose. Again we filter the requests to see who have not signed out and then count it.

8) Success rate of workflows – An identity provider may have some common use cases such as register account, sign in, change account details and forgot password. The number of failures encountered in each workflow on a customer by customer count may also indicate a form of success rate because the total number of customers calling in a 24 hour period is an easy aggregation.

Tuesday, January 30, 2018

Another common practice for searching the logs is activity across services. Fortunately here too the customerId can be used to propagate across the services to filter out the requests associated with the customer.

Some of the other search queries include:
1) duration of user logon time between logon/logoff events - here the requests for the user may be selectively filtered for specific security events and then the timestamps for the corresponding pair of events may be put in a table.

2) potential suspicious activity detection - Here the requests made by the user are compared in their routing paths with the known set for anomalies specifically that don't fall in known workflow sequences and then raised as suspicious

3) detecting callers - clientIds and clients identified by the programs they use can help mitigate denial of service attacks mounted by specific clients that don't behave well with others. The number of request made from the client is compared with the others in this case to see if they are repeatedly trying something that they should not.

4) Find trends in patterns - often specific failures trigger specific failure path api calls. These calls can be hashed and the counts of the hashes may indicate the most number of mitigations taken by the user. This is slightly different from directly counting the number of exceptions.