Cluster computing

Tuesday, June 4, 2013

Finding stack frames from dump has multi level solutions.
1) Manually walkthrough the dump. This method reads the dump file header, lists the number of streams, finds the exception stream, reads the context and finds the stack pointer. Next it iterates through the streams to find the memory list stream, dumps the list of memory ranges, finds the range corresponding to the stack pointer, reads the memory pages for that range, finds the stack pointer and dumps the memory at the stack pointer to get the stack frames. For each of the stack frames, verify that the source corresponding to the current entry should make a call to the next frame. For any errors in completing the operation above, display a message that the stack frames could not be fully resolved. To find the source corresponding to each entry, goto the stream with the module list information and for the stack frame, resolve the module, file, function and offset. This method is exactly the same as what a debugger would do. The benefit of walking through the dump manually instead of a debugger is that this can be an in-memory stream based operation and since dump files come in ZipArchive and there are methods on ZipArchive to not extract the files but read them as streams. Just that the debugger and its SDK does not support reading a stream. This is not a marginal benefit because when we are dealing with thousands of dumps, we are not making wasting time to copy large files over network or to maintain the lifetime of copied files or to keep track of archived locations in our system. This is efficient, doable but expensive to re-implement outside the debugger.
2) use the debugger SDK. This makes the assumption that we programmatically call the debugger proxy to read the stack frame for us. This we call on the dump that we have extracted and made a local copy. The SDK has the benefit that it can be imported directly in a powershell CmdLet thus improving the automation that is desired for reading the dumps.The caveat here is that the SDK requires full trust and either requires registration to the GAC of the system on which it runs or a mention to skip verification and this is appdomain based so we need to do this early on . This is not a problem in test automation or environment. Further, a dedicated service can be written that takes as an input the location of each dump and reads the stack trace using the pre-loaded debugger sdk. In addition to portability, using the sdk also has the advantage that exception handling and propagation is easier since it is all in process. Moreover, the sdk comes with definitions of stack frames and validation logic that obviates string based parsing and re-interpretation from shell debugger invocation. At this level of solution, the changes are not as expensive and we reuse the debugger without having to rewrite the functionality we need from that layer.
3) Use a service that is a singleton and watches a folder for new dumps, reads those dumps using a debugger process or the layer mentioned just earlier and stores stack frames in a data store accessible to all. The service abstracts the implementation and provides APIs that can be used by different clients. Automation clients can directly call the APIs for their tasks. This approach has the advantage that it provides a single point of maintenance for all the usages.

Monday, June 3, 2013

Symbol tables

Objects such as variables, functions and types are named. At their declaration, their names are defined as synonyms and this is called binding. This is visible to a block of code which is the scope of the object. Such declarations are local and if the scope were for larger, the scopes would be global. If there are nested scopes, the context closest to the usage as it appears in the syntax tree is used. The scoping that is based on the syntax tree is called static or lexical binding. In dynamic binding, the declaration most recently encountered during execution defines the current use of the name.
A symbol table is a table that binds names to objects. We start with an empty table. We need to bind a name to an object. In case the name is already defined in the symbol table, the new name takes precedence. We should be able to lookup a name in the symbol table to find the object it is bound to. We need to be able to enter a new scope and exit a scope, reverting the table to what it was prior to entry.
Symbol tables are implemented using a persistent or functional data structure. In this method, a copy is made of the data structure whenever an operation updates it, hence preserving the old structure. Only a portion of the data structure needs to be copied, the rest can be shared. There is another approach called the imperative approach in which when an update is made, the old binding of a name that is overwritten is recorded (pushed) on the stack. When a new scope is entered, a marker is pushed on the stack. When the scope is exited, the bindings on the stack are used to revert to the earlier symbol table. Text from Mogensen.

Exposing APIs with REST versus SOAP.

REST stands for Representational State Transfer. SOAP is Simple object access protocol.

SOAP invokes methods on objects. REST uses a set of well known methods on http and the target of these methods are called resources. By their nature these requests are stateless.

SOAP requires tools to inspect the message. REST can be intercepted by web proxy and displayed with browser and add-on.

SOAP methods require declarative address, binding and contract. REST is URI based and has qualifiers for resources.

When walking through a file based on its layout, we validate not only the offsets but the content at the offset. We are very careful as we walk from one field to the next.

Sunday, June 2, 2013

The following requirements from application are ubiquitous and are being enumerated here.
1) Application data: one or more data stores is typically required for an application unless it is an exclusively in-memory application with no data persistence.
2) Application object model This forms the core of the business logic and enables flexibility to add and remove logic via additional inheritance, encapsulation and composition. This helps to write unit-tests and can interact with a variety of clients for participating in end to end tests.
3) Application Core components: These are a subset of the overall application object model that is common across different components such as for client and server and are typically scoped to the namespace of a feature or functionality of the overall application.
4) Application Services : This describes the service host creation and instantiation that is required for housing the services of the application.
5) Application UI : This organizes the view models and the views so that the business objects can be modified to enable the workflows required from the application. The properties on the UI are defined usually by declarative markup in views.
6) Application Exception handling : The entire application handles exceptions at each layer and across layers. Usually applications don't allow unhandled exceptions to propagate to the user. Messages are translated into what's meaningful to the user.
7) Application Logging : Application logging via standard application blocks or other utilities is necessary both for development as well as production support.
8) Application user access : controlling who has access to what is another important aspect of design for an application. Applications tend to handle security as close to the entry points as possible. The entry points typically have a login that comprises of authentication, authorization lines that need to be crossed before the user has access.
9) Application performance : Application design requires consideration for performance and often include fast code paths in addition to general more expensive ones as well as external tools or frameworks such as caching.
10) Application messaging : Service bus or some messaging framework is involved to send messages between application and its dependencies. These could be other services or other data providers. These could be external gateway when there are heterogeneous systems involved.
Overall the application source has different projects with well named and fully qualified namespaces for most of the components mentioned earlier. Each project creation and addition to the application source has to follow a set of rules and conventions that enable good organization of the source. This facilitates changes made to be isolated and following SOLID principles.

confirming the previous post with dumpchk output

Saturday, June 1, 2013

Applications are fragile. So we make up with testing and incremental releases and use scaffolding and shared reusable components Today it is easy to write applications in a test driven development method. And there's framework support and libraries to write repository, services and views. As an example. So to implement the stack trace services in the way we discussed earlier, we will require a local database, EF, sample dump, debugger sdk, and file watcher. Let's do this now.
We're going to read the dump file ourselves and we wil find the stack trace in the dump file ourselves without copying or moving the file or using a debugger. Here's the header of the dump file as given by dumpchk.exe. This header informs the offsets of each field and their size. We also know the layout of the file to contain the header, followed by the runs list which in turn is followed by the runs. At offset 0x348 we have the context record and at offset 0xf00 we have the exception with the structure _EXCEPTION_RECORD64 which has the exception address at 0x00C offset. The Exception Record has the exception code at 0x0, exception flags at 0x004, pointer to exception record at 0x008, number of parameters at 0x010 and exception information of 16 pointers at 0x014.
The physical memory block buffer gives the mapping between physical addresses and file offsets. The stuctue for physical memory block descriptor has a dword for number of runs, a dword for number of pages, and an array of run descriptors for the number of runs. The run descriptors each has a dword for base page and a dword for base page count. Runs have pages and each page has 4096 bytes The context gives the register for stack pointer at 0x0c4 offset and the stack pointer is the start of the stack trace.
As an aside, the PFN database array seen below is the memory manager's array to keep track of each page of physical memory (RAM) with around 28 byte per page data structure.
Once we dump the stack pointer, we could try to find what the module addresses are, the function addresses and check to see that each stack entry makes a call to the next.
Filename . . . . . . .memory.dmp
   Signature. . . . . . .PAGE
   ValidDump. . . . . . .DUMP
   MajorVersion . . . . .free system
   MinorVersion . . . . .1057
   DirectoryTableBase . .0x00030000
   PfnDataBase. . . . . .0xffbae000
   PsLoadedModuleList . .0x801463d0
   PsActiveProcessHead. .0x801462c8
   MachineImageType . . .i386
   NumberProcessors . . .1
   BugCheckCode . . . . .0xc000021a
   BugCheckParameter1 . .0xe131d948
   BugCheckParameter2 . .0x00000000
   BugCheckParameter3 . .0x00000000
   BugCheckParameter4 . .0x00000000
The above is for kernel dump. There are other dump files.
   ExceptionCode. . . . .0x80000003
   ExceptionFlags . . . .0x00000001
   ExceptionAddress . . .0x80146e1c

   NumberOfRuns . . . . .0x3
   NumberOfPages. . . . .0x1f5e
   Run #1
     BasePage . . . . . .0x1
     PageCount. . . . . .0x9e
   Run #2
     BasePage . . . . . .0x100
     PageCount. . . . . .0xec0
   Run #3
     BasePage . . . . . .0x1000
     PageCount. . . . . .0x1000