Cluster computing

Friday, September 20, 2013

Caching application block is another application block specifically designed for this purpose. Developers often choose Libraries such as this along with Lucene.Net or external solutions as AppFabric. While AppFabric is a caching framework serving across all API based on URI hashing and request and response caching, developers may look for caching within their library implementation. Lucene.net is more an inverted index that helps with search and storage.
By the way, Lucene can be used to build a source index server for all your source code.
Here we build an index and run the queries against it. Lucene.Net has an IndexWriter class that is used to make this index. You need an Analyzer object to instantiate this IndexWriter. Analyzers are based on language and you can also choose different tokenizers. For now, a SimpleAnalyzer could do. The input to the index is a set of documents. Documents are comprised of fields and fields are key value pairs. You can instantiate a SimpleFileIndexer and index the files in a directory. A Directory is a flat list of files. Directories are locked with a LockFactory such as a SingleInstanceLockFactory. Indices are stored in a single file.
I want to mention that Lucene comes with a Store to handle all the persistence. This store also lets us create RAM-based indices. A RAMDirectory is a memory resident Directory implementation.
Other than that it supports all operations as a regular directory in terms of listing the files, checking for the existence of a file, displaying the size in bytes, opening a file and reading bytes from a file or writing bytes to the file as input and output streams.
Lucene.Net's implementation of a Cache supports algorithms such as LRU or a HashMap to store key-value pairs. If you want a synchronizedCache you will have to explicitly mention so. The caches supports simple operations such as Get and Put.
Searching and storage are facilitated by Lucene.Net but you can also use such things as a SpellChecker and Regex. QueryStrings can be defined as a series of clauses. A Clause may be prefixed by a plus or minus sign indicating whether it is to be included or excluded, or a term following a colon indicating the field to be searched such as with multiple terms. A clause may either be a term indicating all the documents this term or a nested query, enclosed in paranthesis. If a range query is used, the QueryParser tries to detect ranges such as date values for a query involving date fields. The QueryParser is not threadsafe. However, you could also customize the QueryParser by deriving from the default and this can be made modular. The only caveat is that the .Net library based indexer does not have the enumerateFilesOrDirectories as the Java library does.

Thursday, September 19, 2013

Database unit-tests
In the former posts I have mentioned about using stored procedures with Enterprise library data access application block. Here I want to take a moment to describe some tests at the database unit-level. Since we mentioned that the application relies on the datareader so we are free to make changes at the database level, we should have tests for this interface. These tests are called Database unit-tests or DUTs for short. In our case, with the stored procedures, we simply write execute stored procedures with different parameters in these tests. They cover both the input as well as output parameters and check the results. The parameters, their default values and nulls, if any, are important to the execution of the stored procedure and the application should be able to rely on the checks within the stored procedure for proper user response. This means that the stored procedure should not make any assumptions about the call.
Let's take some examples here.
If I have a procedure that validates whether a token matches a client id, then I can pass in the client Id directly to the predicate in the where clause and this will handle null values correctly.
Similarly each CRUD operation could be expected to fail with the lines above it
and if there are two or more stored procedures that are expected to work one with the other, then they should be tested together.
Thus the granularity of the tests depend on the usage and purpose of these database objects.

Its upto the database objects to make the interface strict or inviting by using default values or nulls for the parameters but this has nothing to do with the fact that we want tests at this level to prevent regressions during application development. The goal of the enterprise library has been to make it easy to build applications by providing building blocks wherever applicable. We could choose from the application blocks as appropriate.
Another thing I wanted to bring up is the NUnit tests for the application development. These tests at the application level capture and determine the user expectations. This is very helpful to keep a seamless and consistent view to the user as the application evolves. When we make changes to the database and the application accommodates the changes, these tests help to determine regressions at the user level.
Finally, I want to mention that the package should be portable. Most libraries and application code can now be packaged and repackages as NuGet libraries. NuGet is a package manger that works well with the tools we discuss.

Enterprise Library Data Access Application Block discussion continued.
DAAB supports Transactions. The get and save method for the tables are LoadDataSet and Update DataSet. LoadDataSet executes a command and returns the results into an existing dataset. UpdateDataSet accepts a dataset and saves all the modified data records into the database. ExecuteDataSet executes a command and returns the results to newly created dataset. ExecuteNonQuery can be used to invoke a stored procedure or a SQL Statement. Most of these are already guarded against SQL injection attack. Most of these accept a transaction that we can create with
IDbTransaction transaction = connection.BeginTransaction();
and a try catch block can be used to rollback or commit the transaction.
You can see that the transaction parameter is a reuse from ADO.Net. It is best not to keep the transact ion open for a long time. You could also change one row at a time using the UpdateDataSet method or do bulk updates to the entire table. You could execute a query or a stored procedure. In many cases, the latter will help encapsulate the logic at the database level. Since the DAAB itself includes the unit-tests these are also available for review or for use with your own. The unit-tests help with the test driven development and come in very helpful to ensure the quality of the development work as it proceeds.
I want to bring up a couple of important points from the posts made so far on this topic:
1) Application blocks are developed by Microsoft patterns and practices and they are very helpful in developing applications because they provide building blocks.
2) the block patterns allows for encapsulation and isolation of commonly used development libraries that and very often involved practices that have worked in more than one deployments.
3) The seven different application blocks are:
caching application block
configuration application block
cryptography application block
data access application block
exception handling application block
logging and instrumentation application block
security application block
4) The security, caching and logging application blocks have an optional dependency on the data access application block.
5) The blocks can be chosen and included in the application with minimal changes to the application
6) Since they are already developed and tested, each application block saves you time for the purpose that they are built
7) In the case of the DAAB, the connection management, abstracting ADO.Net and database objects, their dependencies and avoiding the use of object graph in the code enables you to focus on the application and the results from the data access that you are interested in.

Wednesday, September 18, 2013

In addition to the earlier post on the benefits of enterprise library data access application block (DAAB), I want to bring up some more in this post. I mentioned the

removal of the dependency on the object graph and its persistence mechanism with DAAB but another advantage that may not be so obvious is that the database changes are

now decoupled from the application. And this is a very much desirable thing for rapid application development because database changes are very likely with new scenarios

or refinements as the application is being developed. Choice of DAAB over EDM in such cases helps alleviate the refreshes to the code and model which were previously

causing code and test churns. In this case as long as the datareaders don't see the changes to the database, the application can continue to evolve separately. Many a times

the database changes affect more than one table or schema. For example, a column for an identifier to an entity in an external system may need to be maintained. Or a new

column may need to be added to indicate the source of the entity. Or a computed or translated column may need to be added for lookups. Or the data may exist side by side

in two different forms for use by different consumers. Indexes may need to be added or schema redesign may be involved. To the stored prodedures invoked by the

application, these changes need not be visible. In fact, DAAB allows for the interface between the application and data provider to be established first.
DAAB doesn't take away any of the benefits of the schema and objects in the database and they can continue to be accessed via the Database object. If anything, the

database object can be created simply with a factory class.

Further, parameterized queries can be constructed for adhoc commands. This is done by passing the text to the GetSqlStringCommandWrapper and invoking the AddInParameter The ExecuteDataSet method executes the DBCommandWrappers command, retrieves the results and creates a new DataSet and fills it.There are many overloads to the Execute DataSet method which allows for a variety of calls.

XML data can also be retrieved with an XMLReader object and the ExecuteXmlReader method. The stored procedure uses the For XML AUTO clause to return the data in XML Format.

This post talks about enterprise library data access block. I was able to go over the documentation on the msdn. Applications that have been using custom connection state and pooling as well as connections usually run into scalability and resource problems. The enterprise library opens and closes the connection as needed leaving you to focus on the dataReader and the data. You can edit the configuration with a visual tool. A database object is created using the DatabaseFactory.CreateDatabase method. The configuration tool stores the settings in the DAAB ( Data Access Application Block). The databases instances node is used to associate one of the instances type such as SQL or Oracle with one of the connection strings. The connection string is stored in the configuration file and provides mechanisms to configure the security for the connection. These can be encrypted so that they are not in clear text. As mentioned the database node is created first and this is done with a factory pattern. In ADO.Net, you open a connection, then fill a dataset or retrieve data through a data reader usually typed to the provider. In this application block, all of these are abstracted. You just instantiate the database object and execute the reader using a command wrapper. This Database class has dozens of methods most notably the ones to execute a stored procedure or SQL statement, return a dataset, a datareader, a scalar value, an xmlreader or nothing, allow for specific parameters to be created and passed in, determine which parameters that it needs, creates them and caches them, and involves commands in transaction.
The database object has some methods like the GetStoredProcCommand, GetParameterValue, AddInParameter, AddOutParameter that give detailed control over the database object
to be executed. ExecuteScalar and ExecuteReader are some of the methods for execution. Since the results from the execution can be read from a dataReader, we can
directly populate objects with the results without having to create an object graph. This reduces a lot of complexity that comes with the object graph refreshes. Direct
manipulation of data is possible with methods like LoadDataSet and UpdateDataSet where you can specify the CRUD operation command and a transaction if necessary. You
could directly get the data adapter that implements the CRUD operation on the data source.

Tuesday, September 17, 2013

Today I tried out the enterpriselibrary.data framework and it was a breeze. To those familiar with entity framework this provides more streamlined access to data. For example, you could wire up the stored procedures results to the collection of models you define so that you can work with them instead of the entire object graph. There is a lot of debate around the performance of entity framework and perhaps in earlier blogs, I may have alluded to the different levers we have there to improve that. However, the enterpriselibrary comes with the block pattern that these libraries have become popular for. Blocks are reusable patterns across applications so that your development time is cut down and it comes with the reliability and performance that is come to be known with these libraries.
I want to bring up the fact that we associate the database by using a convenient DatabaseFactory.CreateDatabase method to work with the existing databases in the sql server. Some of the data access extensions may need to be written to translate the datareader columns to the objects and this helps because you can translate the results of the stored procedure execution directly into a collection of the objects you have already defined as models without the onus of the object graph.
In addition, there is no configuration sections involved in the config files and the assemblies can be installed and added to the solution using the visual studio nuget package manager.

OAuth bearer tokens practice

OAuth bearer tokens may currently be passed in the URL but the RFC seems to clearly call out that this should not be done. Therefore, checks and other mechanisms to safeguard these tokens should be in place. As an example, this parameter could be passed into the request body. or the authorization server may handle client validations. More is based on implementation.
In general, if the server and the clients communicate via TLS and they have verified the certificate chain, then there is little chance of token falling in wrong hands. The URL logging or https proxy are still vulnerabilities but the man in the middle attack is less of an issue if the client and the server exchange session id and keep track of each other's session id. As an API implementation, session Id's are largely site or application based and not the APIs concern but its good to validate based on session id if such is available.
Sessions are unique to the application. Even the client uses refresh tokens or re-authorizations to keep the session alive. At the API level, if the sessions were kept track of, it would not be tied to the OAuth revokes and re-authorizations, hence relying on session id alone is not preferable. At the same time, using session id as an additional parameter to confirm along with each authorization helps tighten security. It is safe to assume the same session prevalance until the next authorization or an explicit revoke. By tying the checks exclusively to the token, we keep this streamlined to the protocol.
OAuth can be improved upon but it certainly enables redirections that make it easier for the user. In addition, the use of expiry dated tokens enable clients to reduce the chat with the authorization server.
In addition, many applications can now redirect to each other for same user authorizations. So the user has to sign in far lesser than before. If the user is signed in to a few sites, he can use the existing signed in status to gain access to other sites. This is not just a mere convenience to the user, it enables same user to float between sites and also enables applications to integrate and share user profile information for a richer user experience.