Cluster computing

Thursday, May 2, 2013

Designing an application to manage security

Having worked on an insurance administration application in Baltimore, I can recognize several feature requests from a security provisioning application. What sets the insurance administration application aside is that there are several roles for the administrators and legality around the HIPAA rights for information confidentiality. Moreover, the administration tasks require many workflows to be explicitly restricted based on roles and rights. As an example, a plan maintenance requires checks against specific dates of administrator activity based on which controls and consequently workflows are disabled. When the UI talks to a middle tier WCF service and data stores behind it, these checks are plumbed all the way through the service and to the datastore.
In the application that I worked on, we even decided to keep a separate database which we referred to as security db. This database was for primarily for defining RBAC access but we were interested in several other things as well. For example, we had a catalog of user controls that were to be enabled or disabled and visible or invisible based on the user context. The user roles were also differentiated based on whether they were for internal or external users. Vitually every sections of administration required checks and safeguards between users and their usages so that the plans were safe and safeguarded from being invalid.
Let's look at a few of these features now.
First the roles have to be differentiated. Typically they are broken down into increasing levels of privilege but some of the roles can be split into the sections of the workflows as well especially if they do not interact with each other. Roles such as plan data entry, plan administrator, group administrator, account administrator are derived from scopes of influence or segregated on the workflows or business usages. Roles can also be differentiated between Intranet and Internet as well as geography based.
Second the grant and revoke of access to different roles should be made easy. Revoking should be automatic and can be determined from the specific expiration time associated with the grant. Access could be granted to different business objects and tied to their lifetimes or renewed periodically.
Third the application should integrate with active directory so that the application need not maintain the user accounts and their memberships can be offloaded outside the application. User accounts using https and http are based may require membership providers but these are also mapped to roles.
Fourth, the application should have a UI that makes it easy to associate users with resources, their access levels and their privileges. A simple grid may not be sufficient since the security administrator may find it onerous to tick each and every privilege to be granted. At the same time, hierarchy and automatic cascading of privileges via composition and inheritance of objects may come in handy.
Fifth, the application should have a default audit trail so that grant and revokes are easily available together with the page where they are facilitated. Items requiring attention based on audits should be flagged to the security administrator so that appropriate actions can be taken. Usability is a key criteria for security applications and governance just as much as they are for any others.
In fact, security administrators should have a dashboard that should capture and show all items pertaining to security management and this should be the landing page for the administrator.
Sixth, the application should consider that users can wear multiple hats and be a member of different groups at the same time. They can also change from one to another sequentially over time. Such access cannot merely be state based. There needs to be validation associated with adding and removing from each group.
Lastly, the application should tightly control its data. It would not be inappropriate to encrypt a security database.

Prism for WPF

Patterns and Practices series has an article titled Prism ( Composite Application Guidance for WPF)
This helps to build WPF client applications that are flexible and composite. Composite applications are loosely coupled, the parts independently evolve but work together in the overall application.
Prism can help you develop your WPF client application.
Typically Prism is intended for complex UI applications. As such it should be evaluated for use in your application. You can determine the fit if you want the following:
You are building an application that integrates and renders data from multiple sources
You are developing, testing and deploying modules independent of the rest of the application.
Your application has several views and will add more over time.
Your application targets both WPF and Silverlight and you want to share code between the two.
You shouldn't use Prism if your application doesn't require any of the above.
Using Prism is non-invasive because you can integrate existing libraries with the Prism library through a design that favors composition over the inheritance. You can incrementally add Prism capabilities to your application and you can opt-in and opt-out of these capabilties.
Prism commands are a way to handle User Interface actions. WPF allows the presenter or controller to handle the UI logic separately from the UI layout and presentation. Presenter or controller can handle command while they live outside the logical tree. WPF-routed commands deliver command messages through the UI elements in the tree, but the elements outside the tree will not receive these messages because they only propagate events up or down the from the focused element to the explicitly stated element. Additionally WPF-routed commands required a command handler in the code behind.
DelegateCommand and CompositeCommand are two custom implementations of the ICommand interface that require that deliver messages outside of the logical tree. These DelegateCommand uses its delegate to invoke the CanExecute or Execute method on the target object when the command is invoked. Because the class is generic, it invokes compile time checking of the command parameters which traditional WPF commands don't. It removes the need for creating a new command type for every instance where you need commanding
CompositeCommand has multiple child commands. Action on the composite command is invoked against all its children. When you click a submit all button, all the associated submit commands are invoked. When the Execute and CanExecute method is invoked, it calls the respective methods on the child control. Additionally, the ShouldExecute method is provided if one or more of the child commands are to be unsubscribed from the calls. The command can unsubscribe by setting the IsActive property to false. Individual commands are registered and unregistered using the RegisterCommand and UnregisterCommand
SilverLight supports the data binding only against the DataContext or against static resources. Silverlight does not support the data binding against other elements in a visual tree. This causes issues if you want to bind to a control that is within an Items control. In such cases, a solution is to bind the command property to a static resource and set the value of the static resource to the command you want to bind.
The command support in Silverlight is built using an attached behaviour pattern. This pattern connects events raised by controls to code on a presenter or a presentation model. It comprises of two parts an attached property and a behaviour object. The attached property establishes a relationship between the target control and the behaviour object. The behaviour object monitors the target control and takes actions based on event or state change on the control.

Wednesday, May 1, 2013

fat client versus thin client

User Interface applications are referred to as clients. Depending on how much business logic is in the client, it can be considered fat or thin. When we add non-functional requirements to the front end such as security, there are many ways in which the application can quickly bloat. One of the ways to reduce the redundancy and streamline the application is to keep fewer controls. When that is not an option, separating out clients for roles could also be considered. Controls require flags for their behavior and these flags and corresponding methods may need plumbing in every layer. Typically this is what adds to code bloat.
In this context, it is probably relevant to mention what a smart client is. A smart client is a composite UI application block. This is a microsoft patterns and practices software which can be used for the following:
Online transaction processing such as for data entry or data distribution centers
rich client portals such as for bank teller applications or one that requires several backend services
UI intensive information-worker standalone applications
All the scenarios mentioned above require rich client interaction, a shell architecture that can host the user interface, the business logic and the centralized control
The composite UI application block makes it easy for you to develop your client applications in three ways:
1) it allows the application to be based on the concept of modules or plug-ins
2) it allows separation of UI from shell client such that the business logic can be developed without encumbering client complexity.
3) it makes it easy to develop with patterns so that modules are loosely coupled.
Let us take an example of a User interface application for a call center application. This UI will likely have multiple collaborating parts for addressing business processes such as billing, claims or customer information. All of these parts could potentially be developed by different teams or interact with different backend systems and each can be independently developed, versioned and deployed. Yet the application provides a seamless and consistent experience to the users.
Let's take a look at the architecture for this application block. The design of this application block focuses on the following
1) finding and loading modules at application initialization to dynamically build a solution.
2) separating development of user interface and shell from that for business logic
3) achieving reuse and modularity of the code
Consequently the subsystems include the following:
modules for application initialization such as Authentication, Enumerator, Module Loader, and CabApplication
States and events such as event broker, state persistence and commands,
Shell interface such as IWorkspace, IUIElementAdapter, IUIElementAdapterFactoryCatalog and ISmartPartInfo.
The finding and loading of modules is based on a catalog that registers which modules to load and a module loader that actually loads and initializes the components that comprise of your application. The modules could vary from application to application but the architecture remains the same.
WorkItems describe which collaborating components participate in a use case, share state events and common services. An event broker enables objects to register their event handlers with. State is where multiple components can place or retrieve information.
This article courtesy of the literature on msdn.

Tuesday, April 30, 2013

Access control in middle tier continued

In addition to the earlier post, we further discuss row level security here. We wanted to do the following:
1) Specify the tables that define the label categories and markings.
2) Create roles for these marking values.
3) Add an integer column to the base table.
4) Define the view.

Defining the labeling policies, we start by creating a few tables that define the category, marking, markinghierarchy, uniquelabel etc.
After we define the structure of labels in our policy, we define the roles. We create a role for each possible value for nonheirarchical categories that use any or all comparision rule. For hierarchical categories, we nest the roles to model the hierarchy.
We also create a helper view for our controls.

Next we define the changes to the base tables. These are the tables to which we add row level security. We assign the labelID to the row. Creating a non-clustered index on the labelID column will be very helpful. As an aside, it should be noted that this column need not be added to the base table and a junction table can be specified with IDs of the base table and the IDs of the labels.

Now we are ready for the last step which is to create the base table view for the end user.

Next let's look at the Insert, update and delete operations. All that we have discussed so far enables us to select the rows pertaining to labels and are generally allowed for all users but CRUD operations may or may not be allowed for all users. Besides these DML statements may not only be adhoc but come from stored procedures. Moreover, security labels may not be allowed with these operations. Usually they are set to a default value and then modified separately. When the users can specify the security labels, they can be specified . In general, updateable views permit only those rows to be edited that a user can be allowed to see.

When talking about labels its important to observe that labels themselves get misused. More than one labels could mean the same thing, labels may differ merely in spelling, labels may be concatenated or translated, labels may not be consistently used etc. In most of these cases, XML typing the labels prevents these issues.

Other common tasks with labels include the following :
getting an ID to represent a label - this is helpful for CRUD operations where the label is set,
checking whether an ID already exists - this is helpful to see if an ID already exists,
another variant of the above to check if an ID exists but not to generate a new one, and
translating labels to ID and back or finding the label corresponding to a user.

Finally cell level security and encryption could be used in addition to row level security.
This article courtesy of the literature on msdn.

an easier approach to topic analysis

A quick look on how to implement Hang Yamanishi topic analysis and segmentation approach.
The steps involved are:
1) topic spotting
2) text segmentation
3) topic identification
The input is the given text and the output is a topic structure based on STMs.
In step 1, we select keywords from a given text based on the Shannon Information of each word.
I(w) = -N(w)logP(w) where N(w) denotes the frequency of w in t, and
P(w) the probability of the occurrence of w as given from the corpus data
I(w) is therefore the amount of information represented by w
P(w) can be evaluated as follows:
featuresets = [(word_features(n), g) for (n,g) in words]
train_set, test_set = featuresets[500:], featuresets[:500]
classifier = nltk.NaiveBayesClassifier.train(train_set)
classifier.classify(word_features(word))

Clustering helps with topic spotting and this is done with additive merges from treating each word as a cluster.

Text Segmentation is independent.

Monday, April 29, 2013

Access Control in middle tier

Let's quickly review some techniques of securing domain objects with user access control.
First there are two different approaches:
1) Resource based : Individual resources are secured using windows ACLs
2) Role based : Users are partitioned into roles, members of a particular role share the same privilege within the application.

It used to be that role based access was preferred over resource based because the latter didn't scale and worked better for resources that can leverage the builtin RBAC access control out of underlying systems such as windows. Further role based access enables connection pooling and so on. However with AD integrations and improvements to database server, administrators and developers alike want individual user access to different resources.

For simplicity, let's take a base table for an entity that we want to secure and say we have a relational mapping to a control table with IDs for privileges. ID 0 means access to none, ID 1 means access to everyone etc. and intermediate groups in between. And the controls table may be a view in itself with multiple tables just like the base table can be a view where only the records visible to the user are available for insert, update and delete.

Then we perform security at each level of the stack in this way:
Authenticate users within your front end application
Map users to role
Authorize access to operations (not directly to resources) based on role membership
Access the necessary backend resources by using fixed service identities and privileges.

Access can be granted and revoked to data tables as a whole but if row level security is desired, one could add row level labels that are classified based on control table.

Sunday, April 28, 2013

probability distribution in topic analysis

Probability distribution is useful for analysis of topics. As from the Wartena's paper on topic detection, similarity distance function between terms is defined as a distance between distributions associated to keywords by counting co-occurences in documents. So we look into the elaboration on the probability distributions useful for topic analysis. Let’s say we have a collection of T terms that can each be found in exactly one document in d in a collection of D. We consider the natural probability distributions Q on C X T that measures the probability to randomly select an occurrence of a term, from a source document or both. Let n(d,t) be the number of occurrences of term t in d. Then the probability to randomly select an occurrence of a term t from a source document = n(t)/n on T. And if we consider N as the sum of all term occurrences in d, then the conditional probability q(t/d) = q(t) = n(d,t)/N(d) on T. Then we can measure similarity between two terms as a metric d(i,j) between the elements i, j after eliminating non-negativity, indiscernables and triangle inequality. Two elements or more are similar if they are closer. Similarity measure is defined by Jensen-Shannon divergence or information radius between two distributions p and q is defined as JSD(p | q) = 1/2D(p|m) + 1/2D(q|m) where m = ½(p+q) is the mean distribution and D(p|q) is the relative entropy defined by D(p|q) = Sum(pt, log(pt/qt)).
For a developer, implementing a probability distribution such as the above could be broken out into the following cases:
1) Probability distribution Q(d,t) = n(d,t) /n on C X T. Here the n(d,t) as mentioned earlier is the number of occurrences of term t in d. n is the number of distinct terms we started our analysis with.
2) Probability distribution Q(d) = N(d)/n on C. Here N(d) is the number of term occurrences on C and is defined as the sum of all n(d,t) for all terms t in document d.
3) Probability distribution q(t) = n(t)/n on T where n(t) is the number of occurrences of term t across all documents.
Note that n(t) on C and N(d) on D are similar in that they form the sum of all occurrences over their respective collections.
This we use for the respective conditional probability Q(d|t) = Qt(d) = n(d,t)/n(t) on C and
Q(t|d) = qd(t) = n(d,t)/N(d) on T
As a recap of the various terms we use the following table
Term Description
t                 is a distinct term whose occurrences is being studied ( typically these are representative of topics)
d                is one of the documents
C               is the collection of documents d1, d2, ... dm being studied
T               is the set of term occurrences t1, t2, .... tm such that each term can be found in exactly one source document
n(d,t)        is the number of occurrences of term t in d,
n(t)           is the cumulative number of occurences of term t
n               is the number of term occurrences
N(d)         is the cumulative number of term occurrences in d
Q(d,t)       is the distribution n(d,t)/n on C X T and pertains both to documents and terms
Q(d)         is the distribution N(d)/n on C and pertains to documents
q(t)           is the distribution n(t)/n on T and pertains to terms
Q(d|t)       is the source distribution of t and represents the probability that the randomly selected occurrence of term t has source d
Q(t|d)      is the term distribution of d and is the probability that a randomly selected term occurrence from document d is an instance of term t