Cluster computing

Saturday, September 4, 2021

This article is dedicated to my teacher.

Kusto search queries

Enterprises realize the productivity impact of searches by thousands of their employees. They invested in making search available on their internal portals, machine data, log indexes, and collaboration document servers. The last one involves hundreds and thousands of artifacts created and saved daily that are mostly personal. Authors save it on their local disk, external file shares, or the cloud storage depending on the size, but they usually cannot find what they want when they want. Tools like grep, find and locate do not scale with the volume of data often taking longer than acceptable time for searches. File indexing is a technique that works very well for search over offline data. It parses the content of this file to form text tokens using an analyzer and stores them in a repository together with the location of the file. The analyzer tokenizes the content into a bag of words. This collection is then stored in an index with the help of an index writer. The index keeps track of each word and its occurrence information in sorted order. Search over the index is faster than scanning the document because it is like finding a name in the phonebook. Some searches can also yield scores based on some criteria for a match including learning techniques that can improve results.

The same analogy applies to machine data such as logs, metrics, and events. Engineers find these useful to search when troubleshooting incidents such as in Software reliability engineering, development, and test of software on large and complex systems, and for the reconstruction of the timeline of events. The tools and expressions have two popular forms – structured query operators and vectorized execution. Public clouds offer some of the largest datasets to search and one of them uses Kusto queries. Some examples of canned queries are listed below:

1) duration of user logon time between login/logoff events - here the requests for the user may be selectively filtered for specific security events and then the timestamps for the corresponding pair of events may be put in a table.

source |
where user = ‘user1’ |

datetime_diff(hour, logoffTime, loginTime)

2) potential suspicious activity detection - Here the requests made by the user are compared in their routing paths with the known set for anomalies specifically that do not fall in known workflow sequences and then raised as suspicious

3) detecting callers - clients and clients identified by the programs they use can help mitigate denial of service attacks mounted by specific clients that do not behave well with others. The number of requests made from the client is compared with the others, in this case, to see if they are repeatedly trying something that they should not.

source |

where user = ‘user2’

| distinct client_id

Cluster computing

Saturday, September 4, 2021

No comments:

Post a Comment