Fulltext Search versus NLP in databases
Many databases in the 1990s started providing full text search techniques. It was a text retrieval technique that relied on searching all of the words in every document stored in a specific full text database.Database engines and web search engines started providing this capability so that search did not just cover the metadata and the selected sections but also the entire content. Since it was difficult to scan the entire content of a file one by one, it was offloaded to its own process separate from the OLTP to create a concordance. Stop words and stemming helped to keep the index relevant. Querying also improved significantly with fulltext and new forms of queries were made available. Still FullText did not really make use of clustering or page ranking techniques.
The difference in techniques between fulltext and nlp rely largely on the data structures used. While one uses inverted document lists and indexes, the other is making use of word vectors. The latter can be used to create similarity graphs and used with inference techniques. The word vectors can also be used with a variety of data mining techniques. With the availability of managed service databases in the public clouds, we are now empowered to creating nlp databases in the cloud.
Discussion to be continued ...
A text summarization attempt: http://scratchpad.io/ scintillating-spot-7410
API
UI
#codingexercise
Given a rectangular matrix with rowwise and columnwise sorted distinct integers, find a given integer
Tuple<int,int> GetPosition(int[,] A, int Rows, int Cols, int X)
{
var ret = new Tuple<int, int>(){-1,-1};
// x, y coordinates
for (int j =0; j < Cols; j++)
{
int index = binary_search(A, rows, j, X);
if ( index != -1)
{
ret.first = index;
ret.second = j;
break;
}
}
return ret;
}
Sorting nested dictionary : http://ideone.com/SlHBib
Many databases in the 1990s started providing full text search techniques. It was a text retrieval technique that relied on searching all of the words in every document stored in a specific full text database.Database engines and web search engines started providing this capability so that search did not just cover the metadata and the selected sections but also the entire content. Since it was difficult to scan the entire content of a file one by one, it was offloaded to its own process separate from the OLTP to create a concordance. Stop words and stemming helped to keep the index relevant. Querying also improved significantly with fulltext and new forms of queries were made available. Still FullText did not really make use of clustering or page ranking techniques.
The difference in techniques between fulltext and nlp rely largely on the data structures used. While one uses inverted document lists and indexes, the other is making use of word vectors. The latter can be used to create similarity graphs and used with inference techniques. The word vectors can also be used with a variety of data mining techniques. With the availability of managed service databases in the public clouds, we are now empowered to creating nlp databases in the cloud.
Discussion to be continued ...
A text summarization attempt: http://scratchpad.io/
API
UI
#codingexercise
Given a rectangular matrix with rowwise and columnwise sorted distinct integers, find a given integer
Tuple<int,int> GetPosition(int[,] A, int Rows, int Cols, int X)
{
var ret = new Tuple<int, int>(){-1,-1};
// x, y coordinates
for (int j =0; j < Cols; j++)
{
int index = binary_search(A, rows, j, X);
if ( index != -1)
{
ret.first = index;
ret.second = j;
break;
}
}
return ret;
}
Sorting nested dictionary : http://ideone.com/SlHBib
No comments:
Post a Comment