Cluster computing

Saturday, February 3, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

17) Search for single sign on activity - Unlike regular signin activity, a single sign on enables a user to navigate different domains and remain logged in. This is the equivalent of seamlessly signing into each domain using a secure token. These single session have their own identifier which are regenerated from the existing signin or a fresh one. Consequently the log search query here follows the sequence of unique identifiers issued.

18) TimeTraveling - This involves chronological sequencing of events for a particular criteria. Since events in the log are progressive, usually a selection of events are already sequenced. however when we have to corroborate race conditions, we need to evaluate their timestamps. Here we search for a timestamp within the event using a regular expression such as (?<timestampA>\d{4}-\d{2}-\d+)T(?<timestampB>\d+:\d+:\d+.\d+) and then search for the matching event from that timestamp.

19) Unleashing an army of fishes - This is a fun query where we determine a set of correlated events and write a search criteria for them, hence the term fishes. Then we evaluate it in offset by offset basis of time intervals and how they move between offsets from start to finish.

20) Number of user agents = With the growing popularity of the number of mobile phone devices, applications and the popularity of voice recognition devices, a website may collect traffic from different user agents. The breakdown of this traffic per source is a useful search query especially for knowing if the website is working fine for one and not for others.

Friday, February 2, 2018

We were looking at some of the search queries that are collected from the community of those using logs from an identity provider:

Some other interesting events for identity include:

13) Search for users of a workflow = This determines the set of users who took the same actions corresponding to a workflow. Web applications are developed in features that usually translate as routes to the domain. A series of redirects between different routes is termed a workflow. The feature adds value to the customer by introducing a workflow. The success of the feature is found by the percentage of people who completed the workflow. Finding the customers who took this approach and listing them requires finding requests that include the given new route and then finding customers who have that route and then finding the workflow trail in these customers to see how many completed.

14) Determining external versus internal failures = This determines the set of failures in workflows that resulted from user error versus those that resulted from system error. Systems are designed to be fault tolerant, highly available and quality service but they are complex and include many dependencies among participating services. The purpose of this log search query is two fold : first to determine that the number of system failures is within tolerance and second, to determine the classify the failures that the users encounter.

15) Real time website connections = This determines the number of connections that are made to the server. Some websites report this number on their website itself. For example, certain blog sites indicate the number of visitors. However, for a commercial system this goes beyond visitors and determines the number of active sessions on the webserver

16) Number of user agents = With the growing popularity of the number of mobile phone devices, applications and the popularity of voice recognition devices, a website may collect traffic from different user agents. The breakdown of this traffic per source is a useful search query especially for knowing if the website is working fine for one and not for others.

#codingexercise:

Determine third order Fibonacci series:
T(n) = Fib(Fib(Fib(n)))

generate maze

for (int i=1; i<ROWS; i++) {
for(int j=1;j<COLS;j++) {
String c = (Math.floor((Math.random()*2)%2)) ? "╱" : "╲";
Console.Write(c);
}
Console.Writeline("<br>");
}

Thursday, February 1, 2018

We were looking at some of the search queries that are typical of searching the logs of identity provider:

Some other interesting events for identity include:

9) Query to report on callers from different origins - The Login screen for the identity provider may be visited from different domains. The count of requests from each of these origins can be easily found by looking for the referrer domains and adding a count for each occurrence.

10) Determine users who used two step verification and their success rate. The growth in popularity of one time passcodes over captcha and other challenge questions could be plotted on a graph as a trend by tagging the requests that performed these operations. One of the reasons one time passcodes are popular is that unlike other forms they have less chance of going wrong. The user is guaranteed to get a fresh code and the code will successfully authenticate the user.
OTP is used in many workflows for this purpose.

11) Searching for login attempts. The above scenario also leads us to evaluate the conditions where customers did end up re-attempting where the captcha or their interaction on the page did not work. The hash of failures and their counts will determine which of these is a significant source of error. One of the outcomes of this is that we may discover some forms of challenges as not suitable for the user. In these cases, it is easier to migrate the user to other workflows.

12) Modifications made to account state - Some of the best indicators of fraudulent activity is the pattern of access of account state whether it is to read or write. For example, the address, zip code and payment methods of the account change less frequently than the password for the user. If these do change often for a user and from different source, they may lead to fraud detection. Hence the logs simply evaluate the criteria for change and pipe it to sort | uniq -c | sort -rn to get a stats for the same.

#codingexercise
Find the second order of the Fibonacci Series
H (n) = Fib (Fib (n))

Wednesday, January 31, 2018

We discussed techniques for request chaining here: https://www.blogger.com/blogger.g?blogID=1985795500472842279#editor/target=post;postID=859017183586117344;onPublishedMenu=allposts;onClosedMenu=allposts;postNum=0;src=link
We were looking at some of the search queries that are typical of searching the logs of identity provider:
Some other interesting events for identity include

5) – failed attempt to login to a disabled account

Here we look for all failed login events and filter out those that have been due to a disabled account. We do this for a given time range and then we count the events based on attributes such as those belonging to a specific account

6) Failed versus successful logon attempts – While there may be counters for failed versus successful logon attempts globally, it requires to scan the logs for individual accounts. Here we find all the requests associated with the customer and then we filter out the ones that are logon attempts and for each logon attemps we find the corresponding success or failure response by following the sequence of requests from each such starting request. Then we split the failure versus success counts

7) Getting a list of concurrent users – Generally this requires a time-frame where the associated events are collected and then they are processed for detecting active distinct sessions or customers – either of which may serve the purpose. Again we filter the requests to see who have not signed out and then count it.

8) Success rate of workflows – An identity provider may have some common use cases such as register account, sign in, change account details and forgot password. The number of failures encountered in each workflow on a customer by customer count may also indicate a form of success rate because the total number of customers calling in a 24 hour period is an easy aggregation.

Tuesday, January 30, 2018

Another common practice for searching the logs is activity across services. Fortunately here too the customerId can be used to propagate across the services to filter out the requests associated with the customer.

Some of the other search queries include:
1) duration of user logon time between logon/logoff events - here the requests for the user may be selectively filtered for specific security events and then the timestamps for the corresponding pair of events may be put in a table.

2) potential suspicious activity detection - Here the requests made by the user are compared in their routing paths with the known set for anomalies specifically that don't fall in known workflow sequences and then raised as suspicious

3) detecting callers - clientIds and clients identified by the programs they use can help mitigate denial of service attacks mounted by specific clients that don't behave well with others. The number of request made from the client is compared with the others in this case to see if they are repeatedly trying something that they should not.

4) Find trends in patterns - often specific failures trigger specific failure path api calls. These calls can be hashed and the counts of the hashes may indicate the most number of mitigations taken by the user. This is slightly different from directly counting the number of exceptions.

Monday, January 29, 2018

The best way to establish chaining is with a common attribute specific only to the chaining.

one such attribute is the customer id. For example, this untested but illustration only example for searching indicates the chaining is automatically established as part of selection:

#! /usr/bin/bash
#
if [ $# -ne 1 ]
then
echo "Usage: chainer $0 customerId"
exit $E_BADARGS
fi
listfiles=$(ls -d -1 $PWD/manaus*.*)
unset files
while IFS= read -r line
do
    files+=("$line")
done < <(echo "$listfiles")
echo $files
for file in ${files[@]}
do
echo "file="; echo $file;
search="zgrep -n \"^customerID=$1\" $file | cut -d\":\" -f1"
echo $search; echo
command_test=$(whatis "$search" | grep 'nothing appropriate');
echo "search=";echo $search;
unset linesno
linesno=()
if [[ -z "$command_test" ]]
then
    echo "zgrep available"
    while IFS= read -r line
    do
        echo $line; echo
        linesno+=("$line")
    done < <(echo "$search")
echo "linesno="; echo $linesno;
for lineno in $linenos
do
if [! -z "$lineno" ]
then
     echo "improper line no: $lineno"; echo
     continue;
fi
$text=`zcat $file | head -n $(($lineno-1+2)) | tail -n 3`
echo $text; echo
done
else
echo "no suitable command found"
fi
done

: <<'output'
/home/ravi/manaus.tar.gz
file=
/home/ravi/manaus.tar.gz
zgrep -n "^customerID=food" /home/ravi/manaus.tar.gz | cut -d":" -f1
zgrep -n "^customerID=food" /home/ravi/manaus.tar.gz | cut -d":" -f1: nothing appropriate.
search=
zgrep -n "^customerID=food" /home/ravi/manaus.tar.gz | cut -d":" -f1
zgrep available
zgrep -n "^customerID=food" /home/ravi/manaus.tar.gz | cut -d":" -f1
linesno=
zgrep -n "^customerID=food" /home/ravi/manaus.tar.gz | cut -d":" -f1
output

Sunday, January 28, 2018

Storage Locker
Data is just as precious as anything else. While storage frameworks in the cloud and on-premise promise perpetual availability and sound security, they do not offer any differentiation to data to treat is as either sensitive or not. Moreover, they may change their policies every few years and do not offer any guarantees that the data will not be handled with minimal intervention.
Companies exist for record management, secure storage and secure destruction but they usually service backup data and often manage the archives. Sensitive data on the other hand may not live in an archive but can remain in a database, unstructured data or even shared among trusted subsidiaries. Locker services does not differentiate between live and aged data.
The vulnerabilities, threats and attacks in the cloud are discussed in survey of cloud security and made publically available. These include:
1) shared technology vulnerabilities - increased leverage of resources gives the attackers a single point of attack.
2) Data breach - with data protection moving from cloud consumer to cloud service provider, the risk for data breach grows
3) Account of service traffic hijacking - Since the data moves over internet, anybody who hijacks the account could mount a loss of service
4) Denial of service - a denial of service attack on the cloud provider affects all
5) malicious insider - a determined insider can find more ways to attack and cover tracks in a cloud scenario 6) Internet protocol : IP connectivity is a requirement for data but comes with its own vulnerabilities
7) injection vulnerabilities - XSS, sql injection and other injection vulnerabilities in the management layer affect even otherwise secure data
8) API & browser vulnerabilities - vulnerability in the cloud provider's API may also affect data security
9) Changes to business models - cloud computing may require consumers to change their business models and this introduces regressions from previous security reviews
10) abusive use - cloud computing is inviting all with zero cost subscription. While it is designed to mitigate denial of service attacks, it does not stop malicious users from trying.
11) malicious insider - even insiders of a cloud provider could become malicious
12) availability - the system has to be available at all times and while cloud providers take extra ordinary efforts, they may suffer from outages such as power