Cluster computing

Wednesday, May 15, 2013

Security application
In our previous posts we talked about a security administration application that enables domain object based security. We discussed several scenarios, features, approaches, and in general discussed a UI application that would enable configuration of user and object security. Today we try to improve upon the notion of user role management and it's place in this security application. Typically many of the web applications will leave user management to administrators and tools outside the application such as the operating system applets. And integrating user management with that of system, there is a lot more features and tools available for user management. Then there are applications like SiteMinder as well for single sign-on feature. And there are some interoperability tools that lets you configure users across platforms. Even that is being pushed to system level such as with Active Directory integration freeing up the application to do more for its business users.
Therefore unless there is a business need for security, the applications don't support these kinds of operations. There might be other reasons to require security such as when web applications do have different membership providers that keep user information in different stores such as asp.net stores, SQL stores, local file systems based store that require a common interface for management. Moreover, there may be mobile users who may require access that needs to be secured. In such cases, the mobile applications may not be hitting the web application UI but the API interfaces. Those methods may also need to be secured for different users and applications.
Overall, there's reasons for mapping users with objects and methods.
Most times these mapping is dynamic like a decision tree or a classifier that dynamically groups users and maps them to resources. This can be a policy server where the different policies or classification rules can be registered and maintained. The policies define which groups are associated with which pool of resources. The code to associate users with groups can be a scalar user defined function that takes incoming users and groups them. These groups have no meaning inside of the system other than a scalar value. The resources are what the application knows. They can be classified into some organizational units called pools. The users are temporary and they can change often. We keep track of more stable groups and associate users with groups. The groups can have certain privilege levels and are different from roles in that the roles are a subset of the groups but groups are what pools of resources assigned to. By having a dynamic classification mechanism, the users can be switched to one or more groups.
Policy server and access control for a user is a complex topic involving many different organizational units. Take IPSEC for network access control. There are many parameters for controlling IP security.

Reminder on GC

The reason Dispose() method has a Boolean parameter is to differentiate between when we are called by a finalizer versus ourselves.

Tuesday, May 14, 2013

Here we discuss an implementation from previous posts to finding topics based on a set of keywords. Let us say we have a function similar() that returns a set of words that co-occur with the words in the language corpora. Let us say we have selected a set of keyword candidates in set W.
For each of the words, we have found the similar co-occurring words and put them in a cluster. The clusters have a root keyword and all the similar words as leaves. When two clusters share common words, the clusters are merged. So the clusters could be additive. The root word of the combined cluster is the combination of the root words of their individual clusters. Similarly the leaves of the cluster are a combination of the leaves of the individual clusters. We may have to iterate several times until we find that there are no cluster pairs that share similar words.

Application Settings architecture

This is a review of the application settings architecture from MSDN. A setting specified in a custom settings file and embedded as a resource in the assembly is resolved when called from a console application but not from a test project. Hence this review is for a quick recap of the underlying mechanism.
Settings are strongly typed with either application scope or user scope. The default store for the settings is the local file based system. There is support for adding custom stores by way of SettingsProvider attribute.
SettingsBase provides access to settings through a collection. ApplicationSettingsBase adds higher level loading and saving operations, support for user-scoped settings, reverting a user's settings to the predefined defaults, upgrading settings from a previous application and validating.
Settings use the windows form data binding architecture to provide two-way communication of settings updates between the settings object and components. Embedded resources are pulled up with Reflection.

Monday, May 13, 2013

To get stack frames from streams instead of dump files

Dump files can be arbitrarily large and they may generally stored in compressed format along with other satellite files. File operations including extraction and copying on a remote network can be expensive. If we were interested only in a stack trace, we are probably not interested in these operations. Besides, we rely on the debuggers to give us the stack trace. The debuggers can attach to process, launch an executable and open the three different kinds of dump files to give you the stack trace but they don't work with compressed files or sections of it. While the debuggers have to support a lot of commands from the user, retrieving a specific stack trace requires access only to specific ranges of offsets in the crash dump file. Besides, the stack trace comes from a single thread. Unless all the thread stacks have to be analyzed, we will look at how to retrieve a specific stack trace using stream instead of files.
Note getting a stack trace that we describe here does not require symbols. The symbols help to make the frames user friendly. That can be done separately from getting the stack trace. Program debug database files and raw stack frames are sufficient to pretty print a stack.
The dump files we are talking about are Microsoft proprietary but the format is helpful for debugging. Retrieving physical address in a memory dump is easy. TEB information has top and bottom of stack. and memory dump of these can give us the stack.
Using streams is an improvement over using files for retrieving this information.
Streams can be written to a local file so we don't lose any feature we currently have.
Streams allow you to work with specific ranges of offsets so you don't need the whole file.
With a stream,
Debugger SDK available with the debugging tools has both managed and unmanaged APIs to get a stack trace. These APIs instantiate a debugging client which can give a stack trace. However, there is no API for supporting a stream yet. This is probably because most debuggers prefer to work on local files because the round trips for an entire debugging session over a low bandwidth and high latency networks is just not preferable. However, for specific operations such as to get a stack trace, this is not a bad idea. In fact, what stream support to GetStackTrace buys us is the ability to save a few more roundtrips for extraction, save on local storage as well as creating archive locations, and reduce the files and database footprint.
Both 32 bit and 64 bit dump require similar operations to retrieve the stack trace. There is additional information in the 64-bit dump files that helps with parsing.
The stack trace once retrieved can be made user friendly by looking up the symbols. These symbols are parsed from the program debug database. Modules and offsets are matched with the text and then the stack symbols can be printed better. Information need not be retrieved from these files by hand but they can be retrieved with the Debug Interface Access. There's an SDK available on MSDN for the same.
Lastly, with a streamlined operation of retrieving stack trace as read only, no file copy, no maintenance of data or metadata locally, the stack trace parsing and reporting can be an entirely in-memory operation.

Assembly Settings

In writing applications and libraries using C#, we may have frequently encountered a need to define configuration data as settings. This we define with a settings file and keep it under the Properties folder of the assembly source and consequently has the Properties namespace. As different libraries are loaded into the assembly, each assembly may define its own settings that can be used as is or overridden by the calling application. The settings are compiled into the assembly's resource which one can view from the assembly. When more than one assembly is referenced in the current application, these settings are resolved in a by first looking up in the local settings file and then any other settings provider which derive from the abstract SettingsProvider class. The provider that a wrapper class uses is determined by decorating the wrapper class with the SettingsProviderAttribute.

Sunday, May 12, 2013

Compiler design review

Programs that are written in a high level programming language by programmers need to be translated to a language that machines can understand. A compiler translates this high level programming language into the low level machine language that is required by the computers.
This translation involves the following:
1) Lexical analysis This is the part where the compiler divides the text of the program into tokens each of which corresponds to a symbol such as a variable name, keyword, or number.
2) Syntax analysis This is the part where the tokens generated in the previous step are 'parsed' and arranged in a tree-structure ( called the syntax tree) that reflects the structure of the program.
3) Type checking This is the part where the syntax tree is analyzed to determine if the program violates certain consistency requirements for example if a variable is used in a context where the type of the variable doesn't permit.
4) Intermediate code generation This is the part where the program is translated to a simple machine independent intermediate language.
5) Register allocation : This is the part where the symbolic variable names are translated to numbers each of which corresponds to a register in the target machine code.
6) Machine code generation : This is the part where the intermediate language is translated to assembly language for a specific architecture
7) Assembly and linking : This is the part where the assembly language code is translated to binary representation and addresses of variables, functions etc are determined.
The first three parts are called the frontend and the last three parts form the backend.
There are checks and transformation at each step of the processing in the order listed above such that each step passes stronger invariants to the next. The type checker for instance can assume the absence of syntax error.
Lexical analysis is done with regular expressions and precedence rules. Precedence rules are similar to algebraic convention. Regular expressions are transformed into efficient programs using non-deterministic finite automata which consists of a set of states including the starting state and a subset of accepting states and transitions from one state to another on the symbol c. Because they are non-deterministic, compilers use a more restrictive form called deterministic finite automaton. This conversion from a language description written as regular expression into an efficiently executable representation, a DFA, is done by the lexer generator.
Syntax analysis recombines the token that the lexical analysis split. This results in a syntax tree which has the tokens as the leaves and their left to right sequence is the same as input text. Like in lexical analysis, we rely on building automata and in this case the context free grammars we find can be converted to recursive programs called stack automata. There are two ways to generate such automata, the LL parser (the first L indicates the reading direction and the second L indicates the derivation order) and the SLR parser (S stands for simple)
Symbol tables are used to track the scope and binding of all named objects It supports operations such as initialize an empty symbol table, bind a name to an object, lookup a name in the symbol table, enter a new scope and exit a scope.
Bootstrapping a compiler is interesting because the compiler itself is a program. We resolve this with a quick and dirty compiler or intermediate compilers.
from the textbook on compiler design by Mogensen