Cluster computing

Saturday, September 21, 2013

I want to post some samples of select design patterns:
1) Builder pattern - This pattern separates the construction of a complex object from its representation so that the same construction process can create different representations.
public class RTFReader
{
private TextConverter builder;
public RTFReader( TextConverter t )
{
builder = t;
}
public void ParseRTF()
{
var t = token.Next();
while (t != null)
{
switch( typeof(t))
{
CHAR:
builder->ConvertCharacter(t.Char);
FONT:
builder->ConvertFont(t.Font);
PARA:
builder->ConvertParagraph(t.Para);
}
}
}
}

public abstract class TextConverter
{
public ConvertCharacter(CHAR);
public ConvertFont(FONT);
public ConvertParagraph(PARA);
}

BTW - StringBuilder in .Net is not a design pattern.

Factory Pattern: Here we use an interface for creating an object but let the sub-classes decide the class to instantiate. In .Net library, we use for example a WebRequest class that is used to make a request and receive a response.

public static class WebRequest {

public static WebRequest Create(string);
}
The Create method creates various instances of the WebRequest class such as HttpWebRequest, FileWebRequest, FTPWebRequest etc.

Adaptor Pattern: These can be often confused with Decorator patterns but they are different.
The Decorator patterns extend functionality dynamically. The Adaptors make one interface work with another so they change interfaces unlike the decorator.

SqlClient is an adapter pattern. Each provider is an adapter for its specific database. A class adapter uses multiple inheritance to adapt interfaces.

public sealed class SqlDataAdapter : DbDataAdapter, IDbDataAdapter, IDataAdapter, ICloneable
{
}

The key here is to inherit one and implement the other

Friday, September 20, 2013

In the previous post, we mentioned how we can index and store text with Lucene so that we can build a source index server. I also mentioned a caveat that unlike the Java version which may have a method to add files recursively from a directory, the Lucene.Net library does not come with it. So you build an index this way:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Lucene.Net;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Store;

namespace SourceSearch
{
class Program
{
static void Main(string[] args)
{
if (args.Count() != 1)
{
Console.WriteLine("Usage: SourceSearch <term>");
return;
}

var indexAt = SimpleFSDirectory.Open(new DirectoryInfo(Environment.SpecialFolder.LocalApplicationData.ToString()));
using (var indexer = new IndexWriter(
indexAt,
new SimpleAnalyzer(),
IndexWriter.MaxFieldLength.UNLIMITED))
{

var src = new DirectoryInfo(@"C:\code\text");
var source = new SimpleFSDirectory(src);

src.EnumerateFiles("*.cs", SearchOption.AllDirectories).ToList()
.ForEach(x =>
{
using (var reader = File.OpenText(x.FullName))
{
var doc = new Document();
doc.Add(new Field("contents", reader));
doc.Add(new Field("title", x.FullName, Field.Store.YES, Field.Index.ANALYZED));
indexer.AddDocument(doc);
}
});

indexer.Optimize();
Console.WriteLine("Total number of files indexed : " + indexer.MaxDoc());
}

using (var reader = IndexReader.Open(indexAt, true))
{
var pos = reader.TermPositions(new Term("contents", args.First().ToLower()));
while (pos.Next())
{
Console.WriteLine("Match in document " + reader.Document(pos.Doc).GetValues("title").FirstOrDefault());
}
}
}
}
}

Caching application block is another application block specifically designed for this purpose. Developers often choose Libraries such as this along with Lucene.Net or external solutions as AppFabric. While AppFabric is a caching framework serving across all API based on URI hashing and request and response caching, developers may look for caching within their library implementation. Lucene.net is more an inverted index that helps with search and storage.
By the way, Lucene can be used to build a source index server for all your source code.
Here we build an index and run the queries against it. Lucene.Net has an IndexWriter class that is used to make this index. You need an Analyzer object to instantiate this IndexWriter. Analyzers are based on language and you can also choose different tokenizers. For now, a SimpleAnalyzer could do. The input to the index is a set of documents. Documents are comprised of fields and fields are key value pairs. You can instantiate a SimpleFileIndexer and index the files in a directory. A Directory is a flat list of files. Directories are locked with a LockFactory such as a SingleInstanceLockFactory. Indices are stored in a single file.
I want to mention that Lucene comes with a Store to handle all the persistence. This store also lets us create RAM-based indices. A RAMDirectory is a memory resident Directory implementation.
Other than that it supports all operations as a regular directory in terms of listing the files, checking for the existence of a file, displaying the size in bytes, opening a file and reading bytes from a file or writing bytes to the file as input and output streams.
Lucene.Net's implementation of a Cache supports algorithms such as LRU or a HashMap to store key-value pairs. If you want a synchronizedCache you will have to explicitly mention so. The caches supports simple operations such as Get and Put.
Searching and storage are facilitated by Lucene.Net but you can also use such things as a SpellChecker and Regex. QueryStrings can be defined as a series of clauses. A Clause may be prefixed by a plus or minus sign indicating whether it is to be included or excluded, or a term following a colon indicating the field to be searched such as with multiple terms. A clause may either be a term indicating all the documents this term or a nested query, enclosed in paranthesis. If a range query is used, the QueryParser tries to detect ranges such as date values for a query involving date fields. The QueryParser is not threadsafe. However, you could also customize the QueryParser by deriving from the default and this can be made modular. The only caveat is that the .Net library based indexer does not have the enumerateFilesOrDirectories as the Java library does.

Thursday, September 19, 2013

Database unit-tests
In the former posts I have mentioned about using stored procedures with Enterprise library data access application block. Here I want to take a moment to describe some tests at the database unit-level. Since we mentioned that the application relies on the datareader so we are free to make changes at the database level, we should have tests for this interface. These tests are called Database unit-tests or DUTs for short. In our case, with the stored procedures, we simply write execute stored procedures with different parameters in these tests. They cover both the input as well as output parameters and check the results. The parameters, their default values and nulls, if any, are important to the execution of the stored procedure and the application should be able to rely on the checks within the stored procedure for proper user response. This means that the stored procedure should not make any assumptions about the call.
Let's take some examples here.
If I have a procedure that validates whether a token matches a client id, then I can pass in the client Id directly to the predicate in the where clause and this will handle null values correctly.
Similarly each CRUD operation could be expected to fail with the lines above it
and if there are two or more stored procedures that are expected to work one with the other, then they should be tested together.
Thus the granularity of the tests depend on the usage and purpose of these database objects.

Its upto the database objects to make the interface strict or inviting by using default values or nulls for the parameters but this has nothing to do with the fact that we want tests at this level to prevent regressions during application development. The goal of the enterprise library has been to make it easy to build applications by providing building blocks wherever applicable. We could choose from the application blocks as appropriate.
Another thing I wanted to bring up is the NUnit tests for the application development. These tests at the application level capture and determine the user expectations. This is very helpful to keep a seamless and consistent view to the user as the application evolves. When we make changes to the database and the application accommodates the changes, these tests help to determine regressions at the user level.
Finally, I want to mention that the package should be portable. Most libraries and application code can now be packaged and repackages as NuGet libraries. NuGet is a package manger that works well with the tools we discuss.

Enterprise Library Data Access Application Block discussion continued.
DAAB supports Transactions. The get and save method for the tables are LoadDataSet and Update DataSet. LoadDataSet executes a command and returns the results into an existing dataset. UpdateDataSet accepts a dataset and saves all the modified data records into the database. ExecuteDataSet executes a command and returns the results to newly created dataset. ExecuteNonQuery can be used to invoke a stored procedure or a SQL Statement. Most of these are already guarded against SQL injection attack. Most of these accept a transaction that we can create with
IDbTransaction transaction = connection.BeginTransaction();
and a try catch block can be used to rollback or commit the transaction.
You can see that the transaction parameter is a reuse from ADO.Net. It is best not to keep the transact ion open for a long time. You could also change one row at a time using the UpdateDataSet method or do bulk updates to the entire table. You could execute a query or a stored procedure. In many cases, the latter will help encapsulate the logic at the database level. Since the DAAB itself includes the unit-tests these are also available for review or for use with your own. The unit-tests help with the test driven development and come in very helpful to ensure the quality of the development work as it proceeds.
I want to bring up a couple of important points from the posts made so far on this topic:
1) Application blocks are developed by Microsoft patterns and practices and they are very helpful in developing applications because they provide building blocks.
2) the block patterns allows for encapsulation and isolation of commonly used development libraries that and very often involved practices that have worked in more than one deployments.
3) The seven different application blocks are:
caching application block
configuration application block
cryptography application block
data access application block
exception handling application block
logging and instrumentation application block
security application block
4) The security, caching and logging application blocks have an optional dependency on the data access application block.
5) The blocks can be chosen and included in the application with minimal changes to the application
6) Since they are already developed and tested, each application block saves you time for the purpose that they are built
7) In the case of the DAAB, the connection management, abstracting ADO.Net and database objects, their dependencies and avoiding the use of object graph in the code enables you to focus on the application and the results from the data access that you are interested in.

Wednesday, September 18, 2013

In addition to the earlier post on the benefits of enterprise library data access application block (DAAB), I want to bring up some more in this post. I mentioned the

removal of the dependency on the object graph and its persistence mechanism with DAAB but another advantage that may not be so obvious is that the database changes are

now decoupled from the application. And this is a very much desirable thing for rapid application development because database changes are very likely with new scenarios

or refinements as the application is being developed. Choice of DAAB over EDM in such cases helps alleviate the refreshes to the code and model which were previously

causing code and test churns. In this case as long as the datareaders don't see the changes to the database, the application can continue to evolve separately. Many a times

the database changes affect more than one table or schema. For example, a column for an identifier to an entity in an external system may need to be maintained. Or a new

column may need to be added to indicate the source of the entity. Or a computed or translated column may need to be added for lookups. Or the data may exist side by side

in two different forms for use by different consumers. Indexes may need to be added or schema redesign may be involved. To the stored prodedures invoked by the

application, these changes need not be visible. In fact, DAAB allows for the interface between the application and data provider to be established first.
DAAB doesn't take away any of the benefits of the schema and objects in the database and they can continue to be accessed via the Database object. If anything, the

database object can be created simply with a factory class.

Further, parameterized queries can be constructed for adhoc commands. This is done by passing the text to the GetSqlStringCommandWrapper and invoking the AddInParameter The ExecuteDataSet method executes the DBCommandWrappers command, retrieves the results and creates a new DataSet and fills it.There are many overloads to the Execute DataSet method which allows for a variety of calls.

XML data can also be retrieved with an XMLReader object and the ExecuteXmlReader method. The stored procedure uses the For XML AUTO clause to return the data in XML Format.

This post talks about enterprise library data access block. I was able to go over the documentation on the msdn. Applications that have been using custom connection state and pooling as well as connections usually run into scalability and resource problems. The enterprise library opens and closes the connection as needed leaving you to focus on the dataReader and the data. You can edit the configuration with a visual tool. A database object is created using the DatabaseFactory.CreateDatabase method. The configuration tool stores the settings in the DAAB ( Data Access Application Block). The databases instances node is used to associate one of the instances type such as SQL or Oracle with one of the connection strings. The connection string is stored in the configuration file and provides mechanisms to configure the security for the connection. These can be encrypted so that they are not in clear text. As mentioned the database node is created first and this is done with a factory pattern. In ADO.Net, you open a connection, then fill a dataset or retrieve data through a data reader usually typed to the provider. In this application block, all of these are abstracted. You just instantiate the database object and execute the reader using a command wrapper. This Database class has dozens of methods most notably the ones to execute a stored procedure or SQL statement, return a dataset, a datareader, a scalar value, an xmlreader or nothing, allow for specific parameters to be created and passed in, determine which parameters that it needs, creates them and caches them, and involves commands in transaction.
The database object has some methods like the GetStoredProcCommand, GetParameterValue, AddInParameter, AddOutParameter that give detailed control over the database object
to be executed. ExecuteScalar and ExecuteReader are some of the methods for execution. Since the results from the execution can be read from a dataReader, we can
directly populate objects with the results without having to create an object graph. This reduces a lot of complexity that comes with the object graph refreshes. Direct
manipulation of data is possible with methods like LoadDataSet and UpdateDataSet where you can specify the CRUD operation command and a transaction if necessary. You
could directly get the data adapter that implements the CRUD operation on the data source.