Cluster computing

Friday, September 6, 2013

pivoting pushed to SQL

In the previous post, we talked about pivoting logic to be implemented in the .Net logic. If we had sufficient access and if this was a small scale deployment where privileges to the tfs database can be provided, then the same pivoting operation described in the earlier post can be pushed down to SQL.
This is how it would look like:

CREATE TABLE [dbo].[Source](
[ID] [int] IDENTITY (1, 1) NOT NULL,
[owner] [nvarchar[(50) NOT NULL,
[WorkItemID] [int] NOT NULL,
[Priority] [int] NOT NULL) ON PRIMARY

INSERT INTO pvt VALUES (1,"abc",1);
INSERT INTO pvt VALUES (2,"def",2);
INSERT INTO pvt VALUES (3,"ghi",2);
INSERT INTO pvt VALUES (4,"jkl",3);

SELECT [Owner],
[1],
[2],
[3]
FROM
(SELECT Owner, WorkItemID, Priority FROM [dbo].[Source])
AS Src
PIVOT
(
COUNT(WorkItemID)
FOR Priority in ([1], [2], [3])
) AS PivotTable;

TFS queries over the web

TFS provides web access. Queries can be specified in the following manner.
http://Server:Port/q.aspx?pname=Project&wiql=Work Item Query Language.
The Work Item Query Language does not need to be specified by hand. The TFS query can be edited in say Visual Studio 2012 and then saved as a file. This file is like an xml and has the Work Item Query Language. The white spaces in the URI may have to be substituted with %20 and the @aliases with the project names. By the way, if you right click on the query name tab in Visual Studio and click copy Full path, you get the http uri for the query. An example looks like this:
http://server:8080/tfs/web/qr.aspx?puri=vstfs%3a%2f%2f%2fClassification%2fTeamProject%2f260d948e-6a1e-4841-97a9-805b79c39cf8&path=Project_name%2fShared+Queries%2fAllActiveBugs
However, if you were to try any of the URIs with say Fiddler, you get Unauthorized or resource NotFound error.
Note that the server resources are named q.aspx and qr.aspx. The error messages I have encountered when using
http://server:8080/tfs/web/q.aspx are:
TF400907: The URL specified appears to be from an earlier version of Web Access. The format was not recognized : http://server:8080/tfs/web/q.aspx
There doesn't seem to be any documentation on the web for this error.
If you give the URI as
http://teamsystem:8080/tfs/web/qr.aspx?pname=ProjectName&wiql=SELECT%20[System.Id],%20[System.CreatedDate],%20[System.Title],%20[Microsoft.VSTS.Common.Priority],%20[Microsoft.VSTS.Common.Severity],%20[System.AssignedTo],%20[System.State],%20[System.AreaPath],%20[System.History]%20FROM%20WorkItems%20WHERE%20[System.TeamProject]%20=%20ProjectName%20and%20[System.WorkItemType]%20=%20'Bug'%20and%20[System.State]%20=%20'Active'%20and%20not%20[System.Title]%20contains%20'Build%20Failure'%20ORDER%20BY%20[Microsoft.VSTS.Common.Priority]
TF400367: The request could not be performed due to a host type mismatch. Please check any connection information and verify the information is correct. The request was executed against a Deployment, Application.
The Team foundation version control tool also does not seem to have an option to execute a query.

Thursday, September 5, 2013

I will describe another way to achieve the task in the previous post. This is assuming that you want to package the logic to meet the objective with as little dependencies as possible. One way to do this is to just rely on source data as it refreshes each day and handles all the steps from transforming to a pivot table to sending out the daily mail. This could be written as an executable in C# and deployed on a server to be invoked as a scheduled job. Since the logic is self-contained, it provides a single point of maintenance.

The steps involved are as follows:

1) Read or fill a data set directly with a webclient request to the tfs server or the database. If there is access to the database, a stored procedure to pivot the data can be authored. Alternatively, there could be an excel layer available via excel apis that generates the pivot table and saves it on a worksheet which can be opened. The pivot table has to be marked for refresh on each opening. If the pivot table could alone be published from the workbook to a web page, the web page could be read to take the contents as html. This way we could skip step 2.

2) If the data is not already in the pivoted format, the data can be pivoted in the .Net layer. Here is the code to transform the data to a pivot

var input = new DataTable("Item-Priority");
input.Columns.Add("ID", typeof(int));
input.Columns.Add("Owner", typeof(string));
input.Columns.Add("Priority", typeof(int));
input.Rows.Add(new object[] { 1111, "abc", 1 });
input.Rows.Add(new object[] { 2222, "def", 2 });
input.Rows.Add(new object[] { 3333, "def", 2 });
input.Rows.Add(new object[] { 4444, "jkl", 3 });
var pivot = new DataTable();
var dict = new Dictionary<string, int>();
pivot.Columns.Add("owner", typeof(string));
int index = 0;
foreach (var r in input.Rows)
{
var row = r as DataRow;
var key = int.Parse(row.ItemArray[0].ToString());
var owner = row.ItemArray[1].ToString();
var pri = int.Parse(row.ItemArray[2].ToString());
if (!pivot.Columns.Contains(row.ItemArray[2].ToString()))
pivot.Columns.Add(row.ItemArray[2].ToString(), typeof(int));
if (dict.ContainsKey(owner) == false)
{
dict.Add(owner, index);
pivot.Rows.Add(new object[] { owner });
index++;
}
}
foreach (var r in input.Rows)
{
var row = r as DataRow;
var key = int.Parse(row.ItemArray[0].ToString());
var owner = row.ItemArray[1].ToString();
var pri = int.Parse(row.ItemArray[2].ToString());
int count = 0;
int.TryParse(input.Rows[dict[owner]].ItemArray[2].ToString(), out count);
int old = 0;
int.TryParse(pivot.Rows[dict[owner]][pri.ToString()].ToString(), out old);
pivot.Rows[dict[owner]][pri.ToString()] = old + 1;
pivot.Rows[dict[owner]].AcceptChanges();
if (pivot.HasErrors == true) throw new Exception();
pivot.AcceptChanges();
}

3) Finally, we can prepare the html with the insertions for the pivot table and send it out by mail as follows:
var client = new SmtpClient("your_smtp_server");
var from = new MailAddress("your_email_address");
var to = new MailAddress("your_recipient_address");
var message = new MailMessage();
message.Body = "your_prepared_html";
message.Subject = "your_title";
message.IsBodyHtml = true;
client.Send(message);

Excel and Sharepoint can be used to create reports. Excel can publish items, sheets and workbook to Sharepoint and they can be accessed on he Internet.

Wednesday, September 4, 2013

Today I want to make a post aside from the ones I have been lately. Much of this has been learned in an effort to automate a report and I want to write it down here. First, an introduction to the objective and then a choice of steps and a discussion on their tradeoffs followed by a possibly expedient solution.
The objective in this case was to generate daily automated mails of a pivot table. The table comes from a TFS query that lists work items assigned to owners with the priority of items say 1, 2, 3. The generated pivot table has the count of work items for each priority listed for every owner. This table should flow daily to a distribution list of subscribers so that they can get it in their inbox.
Although there could be several ways to do this task, I will list the things that need to be considered.
The data could change every day.
The pivot table format could change.
TFS queries may not support both the listing of the data and the pivoting.
THis is a custom report. By that I mean we cannot use the canned reports from a TFS connection because they don't pivot the data.
The items in a pivot table should be in HTML format for mail out.
The transformation of the data from tables to xslt to any other reporting or the invocation of tools on dedicated local servers are subject to maintenance and add to costs.
Ease of use such as with Excel to pivot the table or odor charts will help with changes in the format of the output over time.
Server that sends out the mails should preferably do this daily. It's alright to do it more than once and to a designated person so that one of the can be selected to send out to a wider audience. This lets some amount of manual control if desired. This is optional.
The solutions could include and in no particular order:
1) write SQL server reporting services RDL reports so that they can be viewed from the TFS server
2) write custom tool to generate the output and the mail using libraries and plugins. The executable can be scheduled to be invoked with a task scheduler
3) TFS queries can be saved and made accessible over the Internet at which point the source table is already in HTML and available via web requests
Out of the three, the tradeoffs are in terms of flexibility to changes, availability of data and report online and the cost of ownership involved. Users who are familiar with these will spot them right away and for those who are new, this could be explained as the more moving pieces there are, the more it costs to own them or transform the data flow between them.
Lastly, the solution that could be considered expedient is to have the excel application publish just the pivot table item to and not the sheets or the workbook. This item could appear directly on say a share point website which provides http access to the pivot tables. This can be requested with a web request and mailed out with a script that is invoked by the task scheduler or a reporting server.

SHA-1 hashing

SHA-1 hashes are 160 bits or 20 bytes long. It comprises of hexadecimal numbers 40 digits long. The message digest is similar to the Rivest design for MD4 MD5. Take 5 blocks of 32 bits each, unsigned and in Big-Indian. Then do a preprocessing to the message. Append the bit 1 to the message. Append a padding of upto 512 bits so that the message aligns with 448. Append the legneth of the message as an unsigned number. Then do the processing on successive 512 bit chunks. For each chink, break the chunk into sizteen 32bit big endian words. Initializes the hash value for this chunk as h0, h1, h2, h3 and h4. Extend the sixteen 32 bit words into eighty 32 bit words this way: for each of the 16 to 79 th word, XOR the word that appears 3, 8, 14, and 16 words earlier and then leftrotate by 1. Initialize the hash value for this chunk as set of 5. In the main loop for i from 0 to 7, for each of the four equal ranges, apply the and and the or to the chunks in a predefined manner specified differently for each range. Then recompute the hash values for the chunks by exchanging them and re-assinging the first and left rotating the third by 30. At the end of the look, recompute the chunk's hash to the result so far. The final hash value is the appending of each of these chunks.
.Net Library packages this up with ComputeHash method:
A hash can be computed like this:
var bytesToSign = Encoding.UTF8.GetBytes(content);
HMAC hmacSha256 = new HMACSHA256(secretKeyBytes);
byte[] hashBytes = hmacSha256.ComputeHash(bytesToSign);
return Convert.ToBase64String(hashBytes);

Tuesday, September 3, 2013

In this post, I'm going to talk about hash tables. Hash tables are popular because it takes a constant time to lookup a data record. A hash function is used to generate a fixed-length representation of the actual data record. Hash functions can make a table lookup faster and can detect duplicated or similar records. Two objects are similar if their hash codes are equal. Hashes don't retain the original data.
In a hash table, the hash function maps the search key to an index, which then gives the place where the data is inserted. Here the index only refers to a bucket since the range of the key values is typically larger than the range of hashes. Each bucket corresponds to a set of records.
Duplicate records are found by going through the set of records in the same bucket. This scanning is necessary because hashing ensures that same records end up in the same bucket. This is called collisions. When collisions are minimum, the hashing function has good performance.
Incidentally, geometric hashing can be used to find points that are close in a plane or three dimensional space. Here the hashing function is interpreted as a partition of that space into a grid of cells. The return value of the function is typically a tuple with two or more indices such as the dimensions of a plane.
Universal hashing is a scheme in which there are a family of hashing functions to choose from and a function is chosen such that the when two distinct keys are hashed, they would collide only once in n where n is the different hash values desired. However, it could have more collisions than a special purpose hash function.
Hash tables are also used for encryption because it gives a pretty good representation of the data and can guard against tampering of the data. If the data were to be modified, it would be hard to hide it in the same hash.

In cryptographic hash functions such as SHA-1, there is more even mapping of inputs across the entire range of hash values. Therefore, they serve as good general purpose hash functions. Note that this function is for a uniform spread and not random. A randomizing function is a better choice of a hashing function.

Hash functions have to be deterministic. If they are given the same keys, they should produce the same hash again. This does not mean hashing functions cannot be used with things that change such as the time of the day or a memory address. Whenever a key changes, it can generally be rehashed.