Saturday, August 30, 2014


Trying out Node.js, webmatrix and MongoDB.

var MongoClient = require('mongodb').MongoClient
    , format = require('util').format;

MongoClient.connect('mongodb://127.0.0.1:27017/test', function(err, db) {
    if(err) throw err;

    var collection = db.collection('test_events');
    collection.insert(   [
     { Timestamp: (new Date()).toString(), host: "local", source: "source1", sourcetype: "sourcetype1" },
     { Timestamp: (new Date()).toString(), host: "local", source: "source2", sourcetype: "sourcetype2" },
   ],{ ordered: true }, function(err, docs) {
        collection.count(function(err, count) {
            console.log(format("err = %s", err));
            console.log(format("count = %s", count));
            db.close();
        });
    });
});

C:\Users\Admin>node connect.js
err = null
count = 2

> db.test_events.find()
{ "_id" : ObjectId("54025fbb9c212420085d82fb"), "Timestamp" : "Sat Aug 30 2014 1
6:35:23 GMT-0700 (Pacific Daylight Time)", "host" : "local", "source" : "source1
", "sourcetype" : "sourcetype1" }
{ "_id" : ObjectId("54025fbb9c212420085d82fc"), "Timestamp" : "Sat Aug 30 2014 1
6:35:23 GMT-0700 (Pacific Daylight Time)", "host" : "local", "source" : "source2
", "sourcetype" : "sourcetype2" }
I came across an interesting topic about how to store key values in relational tables and whether we should move to NoSQL just for the sake of storing key values. The trouble with storing key values in relational tables is that the same key has multiple values. If we keep each record for a key value pair soon we have a flooding of the table but more importantly this just seems like  a collection and not an entity which is fundamental to the database model. We could alleviate the storage concerns and call them different entities by say calling one table as keys and another table as values and adding a relation between them.
That said, probably an easy to implement way is to store it as an XML in a single column. This alleviates the problem of having two columns where one is the fieldvalue and the other is the fieldvaluetype. Moreover, the column can be indexed and queried with XPath. This is why it is preferred over JSON.
Another approach is to use the EntityAttributeValue model also called the EAV model. Here the attributes are available as columns and they can be numerous with only a few columns holding values at some time. This is also called sparse matrix.
The thing to note here is that the tendency to add custom properties is not restricted to a single table and can become an epidemic in the system. That is why the data model may need to be redesigned or at least extended if such things crop up. The normalization is important just as much as the convenience of using key-values.
Key Value are generally stored using a hashing function because they are essentially collections. The hashing allows to bucket the keys and collisions are resolved by overflow lookups.
The NoSQL stores such as MongoDB serve more purposes as well. They are better suited for the following use cases:
column stores
key value stores
document stores
graph stores
If we look at the

use EventsDB

go
 

CREATE TABLE dbo.Event

( ID int identity not null,

Timestamp datetime not null

CONSTRAINT IX_Event_Timestamp

PRIMARY KEY CLUSTERED (Timestamp, ID)

WITH (IGNORE_DUP_KEY = OFF),

Host nvarchar(4000) not null,

Source nvarchar(100) not null,

SourceType nvarchar(100) not null,

FieldMap xml null,

);

INSERT INTO dbo.Event VALUES (GETDATE(), HOST_NAME(), N'Source1', N'SourceType1', NULL);

INSERT INTO dbo.Event VALUES (DATEADD(DD, 1, GETDATE()), HOST_NAME(), N'Source2', N'SourceType2', NULL);

UPDATE dbo.Event set FieldMap = '<xml><Source>Source1</Source><SourceType>SourceType1</SourceType></xml>'

WHERE SOURCE=N'Source1'

 
 
UPDATE dbo.Event set FieldMap = '<xml><Source>Source2</Source><SourceType>SourceType2</SourceType></xml>'

WHERE SOURCE=N'Source2'

SELECT * FROM dbo.Event

go

ID Timestamp Host Source SourceType FieldMap
1 2014-08-31 09:22:28.193 ADMIN-PC Source1 SourceType1 <xml><Source>Source1</Source><SourceType>SourceType1</SourceType></xml>
2 2014-09-01 09:22:28.193 ADMIN-PC Source2 SourceType2 <xml><Source>Source2</Source><SourceType>SourceType2</SourceType></xml>


SELECT ID, FieldMap.query('data(/xml/Source)') as Value

FROM dbo.Event

WHERE FieldMap.exist('/xml/SourceType') = 1

Friday, August 29, 2014

Today we will be quickly reviewing SEO - search engine optimization from the SEO guide.
Search engines have two major functions - 1. crawling and building an index 2. providing answers by calculating relevancy and serving results.

The web is connected by links between pages. Crawlers index these pages for fast lookups. Search engines have to provide answers in fractions of seconds because users lose focus after 2-3 seconds. Relevancy and importance from such voluminous data is determined in this time. SEO targets both these metrics. Relevance is a way of ranking what pertains to the user query. Importance is somewhat irrespective of the user query and focuses on the popularity of the site. There are several algorithms used to determine relevance and importance. Each Search engine may have their own.
Google recommends  the following to get better ranking:
  • don't cloak your pages differently to search engines from what they appear to users
  • organize your site with each page reachable by a static link
  • pages are cohesive with respect to the content they provide
  • use redirects and rel="canonical" for duplicate content.
Bing suggests the following:
  • construct clean URLs with keywords
  • provide keyword-rich content and refresh content
  • don't hide the text to be indexed in resources 
 There are three types of queries search users perform. These are :
"Do" transactional queries such as to buy a plane ticket
"Know" informational queries such as to find the name of a restaurant
 "Go" such as site specific queries such as LinkedIn

In spite of the efforts of the search engine, the users attention is not drawn to the results but to the words appearing in bold or titles or brief descriptions - things that are explicitly targeted in paid search listings.
 To get your page noticed, you could:
wrap your images, plugins, video and audio content with text description of the  content.
Search boxes can be supplemented with navigable and crawlable links and include a sitemap. Crawlable links means that the webpages are connected and the hrefs in the html point to say static links. If you want to hide content from the search engine, you could add rel="nofollow" for your hrefs or add meta robots tag or the robots.txt. When using keywords, use specific ones and don't stuff it on the content. keyword density does not help. Keywords in the titles help but don't make it longer than 65-75 characters. Always improve readability and emotional impact.

Search engines have millions of smaller databases that have keyword based indices. This makes it much faster for the engines to retrieve the data. Keywords thus play an important role in the search engine's ranking algorithm. Keywords appearing in title, text, image alt attributes and metadata promote a page's relevance. When keywords appear in the title, be mindful of the length, order and leverage branding.
Meta robot tags such as 'index/noindex' 'follow/nofollow' 'noarchive' 'nosnippet' 'noodp/noydir' restrict spider activity. They could be used judiciously.  The meta description tag is what appears together with a search listing.
URLs could be made in a way such that they are shorter, use keywords, use hyphens to separate keywords and are static.  Use canonicalization so that every unique piece of content has one and only one URL.

Thursday, August 28, 2014

We will be reviewing some simple random walk theorems. We will look at the amount of time that a SSRW is positive. If the random walk is at zero, it takes an even number of steps in either direction to return to zero.
We will look at the distribution when the random walk is positive. We can take an example with coin tosses. When we toss a coin, the probability that the number of heads equals k in 2n coin tosses is maximum when k = n . What we are saying is that we know for sure we can exceed the number of tails when the number of heads and the number of tails are equal. If we wanted to maximize the number for one outcome, we would have to choose the value 0 or 2n for the remainder.  Since we mentioned earlier that the probability a random walk reaches a certain count can be derived from the probabilities from the outcomes of the states from the distribution, we can express the random walk probabilities in terms of  the counts of the probabilities for the individual states. 

Tuesday, August 26, 2014

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;

public class Test
{
    public static void Main()
    {
        char[,] wordSearch = new char [3,3] { {'B', 'O', 'B'}, {'B', 'O', 'B'}, {'B', 'O', 'B'} };
        var wordList = new List<string>() { "BOB", "BOX", "COB" };
        printWordsInMaze(wordSearch, 3, 3, wordList);
    }
   
    public static void printWordsInMaze(char[,] wordSearch, int row, int col, List<string> wordList)
    {
        string[ ] directions = {"LR", "RL", "U", "D", "DUL", "DUR", "DDL","DDR"};
    for (int i = 0; i < col; i++)
    for (int j = 0; j < row; j++)
    {
        char letter = wordSearch[i,j];
          string pattern;
           List<string> candidates = wordList;
      
          int start = i;
          int end = j;
          foreach (var direction in directions)
          {
              int r = i;
              int c = j;
              pattern = String.Empty;
              switch (direction)
             {
                            case "LR":
                              while (c < col) { pattern += wordSearch[r,c]; c++;}
                              printMatchingWord(start, end, pattern, candidates, direction);
                            break;

                            case "RL":
                              while (c >= 0) { pattern+= wordSearch[r,c]; c--;}
                              printMatchingWord(start, end,pattern, candidates, direction);
                            break;

                            case "U":
                              while (r >= 0) { pattern += wordSearch[r,c]; r--;}
                              printMatchingWord(start, end,pattern, candidates, direction);
                            break;

                            case "D":
                              while (r < col) { pattern += wordSearch[r,c]; r++;}
                              printMatchingWord(start, end, pattern, candidates, direction);
                            break;

                            case "DUL" :
                              while (r >= 0 && c >= 0) { pattern += wordSearch[r,c]; r--; c--;}
                              printMatchingWord(start, end, pattern, candidates, direction);
                            break;

                            case "DUR" :
                              while (r >= 0 && c < col) { pattern += wordSearch[r,c]; r--; c++;}
                              printMatchingWord(start, end, pattern, candidates, direction);
                            break;

                            case "DDL" :
                              while (c >= 0 && r < col) { pattern += wordSearch[r,c]; c--; r++;}
                              printMatchingWord(start, end,pattern, candidates, direction);
                            break;

                            case "DDR" :
                              while (r < col && c < row) { pattern += wordSearch[r,c]; r++; c++;}
                              printMatchingWord(start, end, pattern, candidates, direction);
                            break;
               }
               //i = start;
               //j = end;
              }
         }
    }
   
private static void printMatchingWord(int i, int j, string pattern, List<string> candidates, string direction)
{
                                  foreach (var word in candidates)
                                  {
                                      Console.WriteLine("Pattern={0},word={1}", pattern, word);
                                       if (pattern.StartsWith(word))
                                        Console.WriteLine ("Word = {0} at row = {1} and col = {2} and direction = {3}", word, i,j, direction);
                                }
}

}

  • Success time: 0.06 memory: 33936 signal:0
    Pattern=BOB,word=BOB
    Word = BOB at row = 0 and col = 0 and direction = LR
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BBB,word=BOB
    Pattern=BBB,word=BOX
    Pattern=BBB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 0 and col = 0 and direction = DDR
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=O,word=BOB
    Pattern=O,word=BOX
    Pattern=O,word=COB
    Pattern=OOO,word=BOB
    Pattern=OOO,word=BOX
    Pattern=OOO,word=COB
    Pattern=O,word=BOB
    Pattern=O,word=BOX
    Pattern=O,word=COB
    Pattern=O,word=BOB
    Pattern=O,word=BOX
    Pattern=O,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 0 and col = 2 and direction = RL
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BBB,word=BOB
    Pattern=BBB,word=BOX
    Pattern=BBB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 0 and col = 2 and direction = DDL
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 1 and col = 0 and direction = LR
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BB,word=BOB
    Pattern=BB,word=BOX
    Pattern=BB,word=COB
    Pattern=BB,word=BOB
    Pattern=BB,word=BOX
    Pattern=BB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BO,word=BOB
    Pattern=BO,word=BOX
    Pattern=BO,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BO,word=BOB
    Pattern=BO,word=BOX
    Pattern=BO,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OO,word=BOB
    Pattern=OO,word=BOX
    Pattern=OO,word=COB
    Pattern=OO,word=BOB
    Pattern=OO,word=BOX
    Pattern=OO,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 1 and col = 2 and direction = RL
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=BB,word=BOB
    Pattern=BB,word=BOX
    Pattern=BB,word=COB
    Pattern=BB,word=BOB
    Pattern=BB,word=BOX
    Pattern=BB,word=COB
    Pattern=BO,word=BOB
    Pattern=BO,word=BOX
    Pattern=BO,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BO,word=BOB
    Pattern=BO,word=BOX
    Pattern=BO,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 2 and col = 0 and direction = LR
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BBB,word=BOB
    Pattern=BBB,word=BOX
    Pattern=BBB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 2 and col = 0 and direction = DUR
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OOO,word=BOB
    Pattern=OOO,word=BOX
    Pattern=OOO,word=COB
    Pattern=O,word=BOB
    Pattern=O,word=BOX
    Pattern=O,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=OB,word=BOB
    Pattern=OB,word=BOX
    Pattern=OB,word=COB
    Pattern=O,word=BOB
    Pattern=O,word=BOX
    Pattern=O,word=COB
    Pattern=O,word=BOB
    Pattern=O,word=BOX
    Pattern=O,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 2 and col = 2 and direction = RL
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=BBB,word=BOB
    Pattern=BBB,word=BOX
    Pattern=BBB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=BOB,word=BOB
    Word = BOB at row = 2 and col = 2 and direction = DUL
    Pattern=BOB,word=BOX
    Pattern=BOB,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    Pattern=B,word=BOB
    Pattern=B,word=BOX
    Pattern=B,word=COB
    
Today we will be reviewing the Atom web standards. There is the Atom publishing format and the Atom syndication format. The latter is the XML language used to describe the Atom feeds. The former is the HTTP based protocol for creating and updating the web resources.Web feeds follow a publisher subscriber model. A web feed provider is usually a site owner that publishes resources in say xml format. Subscribers then use feed readers to view their content. Browsers typically support feed readers. One usage is say when bloggers keep track of each others posts via Atom feeds.
A feed can contain both data and metadata. It has  labeled content. It is usually timestamped to communicate the information on when it was created/updated. Each text element can be internationalized using xml:lang  The Atom format came after the RSS format which was struggling with backward compatibility. While RSS is still used by Apple and iTunes, the Atom format has been adopted widely by Google.
To provide a link to an Atom feed, we use the following in HTML:
<link href="atom.xml" type="application/atom+xml" rel="alternate" title="feedtitle" />
The feed is specified as  <feed xmlns="http://www.w3.org/2014/Atom"><metadata/><data/></feed>
.Net MVC webservers allow RSSResult ActionResult to be cached with an expiration time because it is just a file.
The iCalendar format is different from Atom in that it specifies a meeting request and tasks for sharing with a file and an extension named .ics. The content is specified in begin and end tags with the top level being vcalendar. The items are listed in key:value pairs and form a succint description of the meeting. It has syntax to specify the event/journal/free/busy time.
Flair is a javascript based application framework for Adobes AIR html+ajax SDK. Its an acronym for Framework Leveraged AIR. It can be used to specify a list of items say for example based on source ip of request as gathered from the request headers HTTP_X_FORWARDED_FOR and REMOTE_ADDR

Monday, August 25, 2014

In this post, we describe an application that does a time series inverted search on computer logs that are described as raw text, timestamp, host, source and source type. We call these events and we keep them in a table. The events are stored in a table in a SQL database with a clustered index on the time-stamp. The application is a .Net Web API application that connects to the table over entity framework or any object-relational mapping library. The Web API application exposes an API to post the events into the server and an API to do search over the events. When the events are posted into the table, they are sorted by their time. When the search is executed by the table, the time range is used to select the events and returned to the user as search results. Code is being written at http://github.com/ravibeta/csharpexamples/SplunkLite.Net