Cluster computing

Sunday, May 5, 2013

calculating distance measure

Similarity distance measures between terms require that probabilities and conditional probabilities for the terms are computed. We rely on the corpus text to compute these. In addition, we use a naive Bayes classifier to determine the probability of term occurrences. Some of these probabilities were mentioned in an earlier post but today we take a look at whether we need to calculate them on the fly as we cluster the terms. Probabilities associated with the corpus text can be calculated in advance of the processing of a given text. For example, the probability of selecting an occurrence of a term from a source document as given by the number of occurence of the term in that document and the total number of occurrences in the corpus is something we can calculate and keep.
The distance measure itself is calculated once for each term that we evaluate from the document. If we choose a distance measure like the Jaccard coefficient, then the we evaluate the parts corresponding to each term in the pair. The calculation is a bit different when we use cosine similarity (Wartena 2008) between terms because we now use the sum of the respective probalities as well as their squares. The distance measure is calculated as one minus the cosine similarity.
The terms as well as the measure depend on summation of the probabilities over all documents. These documents are those from a collection C where each term from the term collection T can be found in exactly one source document. This doesn't mean the other documents cannot have occurences of the same term just that this particular instance of the term cannot be in multiple documents. So each occurrence is uniquely identified by the term, document pair and position. When we want to find the number of occurrences of a term, we sum the occurrences over all the documents in the collection.
We also consider Markov chain evolution for finding distributions of co-occurring terms. The first step is the calculation of the term occurrence in a particular document given that the term distribution of the occurrences is p. This we find as a sum over all the terms.
If we have a document distribution instead of the term distribution, we similarly compute the probability of finding a document with a particular term occurrence and their sum over all the documents. This leads to a weighted average of all the term distributions in the documents.
We can combine the above two when we evaluate the chain twice to get a new distribution which we use to find the distribution of the co-occurring terms t and z. By that we mean we find the distribution of one term given the first step of finding the distribution of a previous term. This gives an indication of the density of the document rather than the mere occurrence or non-occurrence of a keyword in a document. Otherwise it is similar to the previous model.
The document collection plays an important factor in the evaluation of the probabilities. A good choice of the documents and their processing will improve the results from the keyword analysis here. The corpus text is a comprehensive collection of documents and has already been tagged and parsed. While there could be improvements to the corpus text such as with the substitution of pronouns with corresponding nouns such that the frequency and distribution of terms are improved, the existing set of documents in the corpus text, their variance and the size is sufficient for a reasonable results from the term set.

Saturday, May 4, 2013

security application discussion continued ...

The Security application discussed in the previous post will enable the following workflows:

The Security administrator must be able to navigate any roles and see the members included. Users care about roles. Additionally, the application add and remove of these members should be enabled from the list and their details should be visible by double-clicking the items. There need not be grid lines and tab pages to separate the views. Instead we could use CSS and borderless transitions between views. These views are the same for each of the member and can include information such as groups, roles, resources, access levels etc. Mostly, we want the UI to not be boxy but clear and simple with seamless and smoothed transitions. A clear white background is preferable to any other colors. So the list of all members in a particular role can be listed on the same page with a white background and no borders for grid. And when the user double-clicks a particular member, the details are shown on the same page. Boxes and borders are great when we comparmentalize the UI parts and great for organizing properties on the UI however the ask here is for simpler information rendering with options to bring details onto the same focus area for the user with minimal peripheral changes. A light stationary hue in the peripheral area actually brightens up the canvas so that the user is drawn towards the simpler format of the information presented somewhere near the center of the page. Technology wise there can be based on XAML, prism, and .Net stack with little or no other front end technologies. But the application can be simpler and nicer ableit for security administration.

Another workflow that we could vision for this application other than adding users to roles as discussed above, is to grant users access to domain objects via both label mapping as well as object hierarchy. Users care about their objects. Note however the premise of the previous discussion was based on row level granularity and not object access. We could exercise object access and object control outside the database while the database has row level granularity. However security applications may have workflows to secure both. But looking at the database schema where each record has row level granularity that is set typically at one time only. ( you may actually want to forbid changing labels of those records because you have evaluated the record for the duration of its existence when roles and all else hasn't changed. Updates to the record does not change the identity of the record and on the other hand changing merely the labels on user input could mean we could end up with an inappropriate label because column constraints may not be able to catch all. This does not mean labels cannot be changed and in fact internal methods may exist to take action on user's behalf) So now let's look at enforcing object access security which is probably the primary workflow of this application. As stated earlier, security admin may want to add security to domain objects and expect it to cascade down to all row level entries. Objects could propagate permissions both on inheritance and composition but the preferred way is inheritance since no traversal is needed. Now coming back to the application to enable object security in a label based schema, the solution is to flatten all the derived objects to the same concrete entities and have them all be labeled the same via updateable views. So in effect we will be updating the row level entries. Note however that the inheritance based flow of security is secondary in priority to directly assigning security to individual objects themselves such as test pass and test results.

Friday, May 3, 2013

Control table for Label based security model in database

Let's look at some examples for the control table data when using the label security kit from sql server. In our case where we are desiging a UI application for security management of popular test tools, we will go by the use cases to pick and choose the values to populate in the tables. For example, we know that test tool users want to preserve the integrity of test results. Hence the results may be read write for data entry but read only for others. Likewise, read only results should be filterable. Hence read only users should be able to specify tags. Also test case cloning may be common operation requiring the use of templates. Similarly, we know that test cases can be used across suites and may be included in different matrices. Therefore they should be made available for increasing reuse. Testers may want the ease of use to define security up the object hierarchy and expect it to cascade down. Hence, we use the classifications of Reserved, private, protected and public. Further we can have compartments of none, readonly, readwrite and owner.
We may have only one category and one compartment. Categories can be hierarchical as in our case but compartments are mutually exclusive. The markings that we have for our category are the classifications mentioned above. Note that the default or guest low privileged access corresponding to public marking may not be sufficient for security provisioning of all out of box features and hence it may need to be split or refined into more classifications. The classification hierarchy is expressed in the marking hierarchy table as opposed to the marking table. Next we have the unique label table that assigns a unique label to a combination of markings and roles.
Database roles will be at least one for each possible value of an any or all comparision rule of a non-hierarchical category. For hierarchical categories, again there will be one for each possible value but the roles will also be nested. Some example of roles are guest, dev, test, production support, reporting, owners, administrator, security administrator etc.
When using label based security model, it is important to note that the labels are assigned directly to each row of the base table. The labels are small often a short byte or an integer and can have a non-clustered index on them. Sometimes, tags are not kept together in a base table but in a junction table of the base identifier and the label marking identifier. The reason this doesn't scale well is because it creates a duplicate column of the identifier. Moreover, if the identifier column is of type guid, then we can't even build an index on them and performance can suffer with scanning and full comparision between guids.Therefore, it is advisable to spend time and effort one time to add labels at the row level directly to the base table.
Next we define a view with a list of all security labels present in the database that the current user has access to. Users may or may not have access to specifying their labels with insert/update/delete. Also,
label syntax and semantics validation can be offloaded to xsd based checks when labels are represented by typed XML. Representations of labels in xml have an element per category.
We can also create helper functions looking up the labels based on id and vice versa and for resolving the label to whether a user has access to the data.
Thus we have discussed the database level schema changes for enabling row based security.
Next let's continue to look at the design of the UI application for Security.
The landing page of the UI security will have a split view between resources and users.Resource lockdown and user access management require detail view by themselves but the security admin's job can become easier if the landing page is like a dashboard with all the controls visible. Some examples are EMC Archer and Sharepoint applications that make governance easier. The security admin ideally wants to enable mapping between users and roles for a tool. Just as likely, (s)he may want an intuitive UI to define one or more of the base tables with appropriate tags. These tasks cannot be left to a designer or sql scripts. The UI for a security provisioning application is very much required to make the job easier and visual for a security admin.
Next the role provisioning, promotion and demotion of user accounts as well as selecting multiple roles for the same accounts must be facilitated with proper UI controls. It would be ideal if there's an illustration of the resultant privileges on a sample data based on the admin's selection of roles for a given user. This visual rendering of the final privilege set may re-inforce what is expected from the changes made.
Lastly, the changes made by an admin should be in the form of a ticket response such as for incident management. This ticket is opened whenever a security change is requested and the actions associated with the changes are documented in the ticket, ideally as automatic by the tool. The tickets not only obviate repudiation but also provide an audit trail.
The UI for a security application could open detailed views on any single account or role or label based on double-clicking with the ability to make and save changes. This gives a glimpse of the UI for security provisioning.

A clustering method for finding keywords in a text

Given a distance function between two terms that measures the similarity between the two terms, we build a tree of clusters which we traverse to insert the term in the cluster with the nearest center. For that cluster we recompute the center as if the record r is inserted into it. If the cluster threshold is exceeded, we can proceed to the next record. If the tree grows beyond a maximum number of clusters because we want to keep only a few clusters, then we can increase the threshold so that the clusters can be merged or accomodate more records

B+ Tree:

class Node:

def __init__(self, data, l = None, r = None, center = None):

self.l = l

self.next = None

l.next = r

self.r = r

self.center = center

def value(self):

return self.center

Thursday, May 2, 2013

Designing an application to manage security

Having worked on an insurance administration application in Baltimore, I can recognize several feature requests from a security provisioning application. What sets the insurance administration application aside is that there are several roles for the administrators and legality around the HIPAA rights for information confidentiality. Moreover, the administration tasks require many workflows to be explicitly restricted based on roles and rights. As an example, a plan maintenance requires checks against specific dates of administrator activity based on which controls and consequently workflows are disabled. When the UI talks to a middle tier WCF service and data stores behind it, these checks are plumbed all the way through the service and to the datastore.
In the application that I worked on, we even decided to keep a separate database which we referred to as security db. This database was for primarily for defining RBAC access but we were interested in several other things as well. For example, we had a catalog of user controls that were to be enabled or disabled and visible or invisible based on the user context. The user roles were also differentiated based on whether they were for internal or external users. Vitually every sections of administration required checks and safeguards between users and their usages so that the plans were safe and safeguarded from being invalid.
Let's look at a few of these features now.
First the roles have to be differentiated. Typically they are broken down into increasing levels of privilege but some of the roles can be split into the sections of the workflows as well especially if they do not interact with each other. Roles such as plan data entry, plan administrator, group administrator, account administrator are derived from scopes of influence or segregated on the workflows or business usages. Roles can also be differentiated between Intranet and Internet as well as geography based.
Second the grant and revoke of access to different roles should be made easy. Revoking should be automatic and can be determined from the specific expiration time associated with the grant. Access could be granted to different business objects and tied to their lifetimes or renewed periodically.
Third the application should integrate with active directory so that the application need not maintain the user accounts and their memberships can be offloaded outside the application. User accounts using https and http are based may require membership providers but these are also mapped to roles.
Fourth, the application should have a UI that makes it easy to associate users with resources, their access levels and their privileges. A simple grid may not be sufficient since the security administrator may find it onerous to tick each and every privilege to be granted. At the same time, hierarchy and automatic cascading of privileges via composition and inheritance of objects may come in handy.
Fifth, the application should have a default audit trail so that grant and revokes are easily available together with the page where they are facilitated. Items requiring attention based on audits should be flagged to the security administrator so that appropriate actions can be taken. Usability is a key criteria for security applications and governance just as much as they are for any others.
In fact, security administrators should have a dashboard that should capture and show all items pertaining to security management and this should be the landing page for the administrator.
Sixth, the application should consider that users can wear multiple hats and be a member of different groups at the same time. They can also change from one to another sequentially over time. Such access cannot merely be state based. There needs to be validation associated with adding and removing from each group.
Lastly, the application should tightly control its data. It would not be inappropriate to encrypt a security database.

Prism for WPF

Patterns and Practices series has an article titled Prism ( Composite Application Guidance for WPF)
This helps to build WPF client applications that are flexible and composite. Composite applications are loosely coupled, the parts independently evolve but work together in the overall application.
Prism can help you develop your WPF client application.
Typically Prism is intended for complex UI applications. As such it should be evaluated for use in your application. You can determine the fit if you want the following:
You are building an application that integrates and renders data from multiple sources
You are developing, testing and deploying modules independent of the rest of the application.
Your application has several views and will add more over time.
Your application targets both WPF and Silverlight and you want to share code between the two.
You shouldn't use Prism if your application doesn't require any of the above.
Using Prism is non-invasive because you can integrate existing libraries with the Prism library through a design that favors composition over the inheritance. You can incrementally add Prism capabilities to your application and you can opt-in and opt-out of these capabilties.
Prism commands are a way to handle User Interface actions. WPF allows the presenter or controller to handle the UI logic separately from the UI layout and presentation. Presenter or controller can handle command while they live outside the logical tree. WPF-routed commands deliver command messages through the UI elements in the tree, but the elements outside the tree will not receive these messages because they only propagate events up or down the from the focused element to the explicitly stated element. Additionally WPF-routed commands required a command handler in the code behind.
DelegateCommand and CompositeCommand are two custom implementations of the ICommand interface that require that deliver messages outside of the logical tree. These DelegateCommand uses its delegate to invoke the CanExecute or Execute method on the target object when the command is invoked. Because the class is generic, it invokes compile time checking of the command parameters which traditional WPF commands don't. It removes the need for creating a new command type for every instance where you need commanding
CompositeCommand has multiple child commands. Action on the composite command is invoked against all its children. When you click a submit all button, all the associated submit commands are invoked. When the Execute and CanExecute method is invoked, it calls the respective methods on the child control. Additionally, the ShouldExecute method is provided if one or more of the child commands are to be unsubscribed from the calls. The command can unsubscribe by setting the IsActive property to false. Individual commands are registered and unregistered using the RegisterCommand and UnregisterCommand
SilverLight supports the data binding only against the DataContext or against static resources. Silverlight does not support the data binding against other elements in a visual tree. This causes issues if you want to bind to a control that is within an Items control. In such cases, a solution is to bind the command property to a static resource and set the value of the static resource to the command you want to bind.
The command support in Silverlight is built using an attached behaviour pattern. This pattern connects events raised by controls to code on a presenter or a presentation model. It comprises of two parts an attached property and a behaviour object. The attached property establishes a relationship between the target control and the behaviour object. The behaviour object monitors the target control and takes actions based on event or state change on the control.

Wednesday, May 1, 2013

fat client versus thin client

User Interface applications are referred to as clients. Depending on how much business logic is in the client, it can be considered fat or thin. When we add non-functional requirements to the front end such as security, there are many ways in which the application can quickly bloat. One of the ways to reduce the redundancy and streamline the application is to keep fewer controls. When that is not an option, separating out clients for roles could also be considered. Controls require flags for their behavior and these flags and corresponding methods may need plumbing in every layer. Typically this is what adds to code bloat.
In this context, it is probably relevant to mention what a smart client is. A smart client is a composite UI application block. This is a microsoft patterns and practices software which can be used for the following:
Online transaction processing such as for data entry or data distribution centers
rich client portals such as for bank teller applications or one that requires several backend services
UI intensive information-worker standalone applications
All the scenarios mentioned above require rich client interaction, a shell architecture that can host the user interface, the business logic and the centralized control
The composite UI application block makes it easy for you to develop your client applications in three ways:
1) it allows the application to be based on the concept of modules or plug-ins
2) it allows separation of UI from shell client such that the business logic can be developed without encumbering client complexity.
3) it makes it easy to develop with patterns so that modules are loosely coupled.
Let us take an example of a User interface application for a call center application. This UI will likely have multiple collaborating parts for addressing business processes such as billing, claims or customer information. All of these parts could potentially be developed by different teams or interact with different backend systems and each can be independently developed, versioned and deployed. Yet the application provides a seamless and consistent experience to the users.
Let's take a look at the architecture for this application block. The design of this application block focuses on the following
1) finding and loading modules at application initialization to dynamically build a solution.
2) separating development of user interface and shell from that for business logic
3) achieving reuse and modularity of the code
Consequently the subsystems include the following:
modules for application initialization such as Authentication, Enumerator, Module Loader, and CabApplication
States and events such as event broker, state persistence and commands,
Shell interface such as IWorkspace, IUIElementAdapter, IUIElementAdapterFactoryCatalog and ISmartPartInfo.
The finding and loading of modules is based on a catalog that registers which modules to load and a module loader that actually loads and initializes the components that comprise of your application. The modules could vary from application to application but the architecture remains the same.
WorkItems describe which collaborating components participate in a use case, share state events and common services. An event broker enables objects to register their event handlers with. State is where multiple components can place or retrieve information.
This article courtesy of the literature on msdn.