Tuesday, January 16, 2018


Techniques for HTTP Request Tracing at a web server
Introduction: When we point our browser to a website, we hardly realize the number of exchanges of HTTP request and responses that load the page. Still it is easy to ask the browser to capture and dump the request responses in an archive file. The web server handles many such sessions from different browsers corresponding to the millions of users using it. Consequently, its logs show a mix of requests from different origins at any given time and there is little reconstruction of history. The following are some of the techniques used to tag the requests so that the sequence determination can be made.
Description: Every request to a web server is a discrete item of work for it. Since these items may be referred to later in the logs, they are issued IDs often called a RequestID or RID for short. No two requests have the same ID and its generally not possible to exceed a trillion IDs in a given duration no matter how high the load after which the sequence can rollover. Therefore, RIDs happen to be convenient to look up the requests in the logs. Logs also happen to be a convenient destination to publish the data for any production category server and is well suited for subsequent translation and interpretation by analysis systems that can pull the data without affecting the business-critical systems.
One such way to trace request responses is to establish a previous next link between the requests it makes. Between a bunch of redirects to itself, a server may stamp the RID in an additional field designated as previous RID. If it’s a new request from a user and we don't have a current session, we have none to start with. As the conversation grows the RID is piggybacked on the responses so the new requests formed at the server can have the previous RID propagated to the current. This let us establish a trace.
Another way is for the server to reuse the session ID that the requests are part of. Unfortunately, the sessions are generally restricted to a domain that the web server is hosted in and does not help with the cross domain scenario unless we include additional IDs. Since its difficult to maintain unique integer ID across disparate servers, session IDs can generally be a Unique Universal identifier which has a predefined format and almost all systems know how to generate.
Another technique is for the server to make a stash of a request which can be looked up in a stash store with a key. The stashes are generally encrypted and can include values that may not be logged for privacy. These stashes may easily be purged the same way as logs are discarded. The stashes are done for high value requests and the stashes may be maintained in a list that is piggybacked and carried forward in each conversation.
The above techniques emphasize two key concepts. The web server is the only one that can determine the sequence either by chaining the requests or by propagating a common identifier be it scoped at the session or across domains. The command line tools that serve to analyze the logs can be smart enough to search by pattern or form rich search queries that can elicit the same information. This separates the concerns and keeps the business critical systems from having to deal with the more onerous tasks. That said, there are few command line tools that can discover and render chaining as most are still aimed at regular expression based search.
Conclusion – Request chaining and chain discovery tools will be helpful to trace requests and responses from logs and stores.
#codingexercise
Get Fibonacci number by tail recursion. A tail recursion is one where the recursion is last statement in execution inside the function
uint GetTailRecursiveFibonacci(uint n, uint a = 0, uint b = 1)
{
    if (n == 0)
        return a;
    if (n == 1)
        return b;
    return GetTailRecursiveFibonacci(n-1, b, a+b);
}
0 1 1 2 3 5 8 13 21 34
TailRecursion does not involve the sum of the recursive parts

No comments:

Post a Comment