Tuesday, February 26, 2013

FullText revisited

Let's take a look at full text search again. Fulltext is about indexing and searching. Fulltext is executed on a document or a full text database. Full text search is differentiated from searches based on metadata  or on parts of the original texts represented in the databases because it tries to match all of the words in all the documents  for the pattern mentioned by the user. It builds a concordance of all the words ever encountered and then executes the search on this catalog. The catalog is refreshed in a background task. Words may be stemmed or filtered before pushing it into the database. This also means that there can be many false positives for a seach, an expression used to denote the results that are returned but not relevant to the intended search. Clustering techniques based on Bayesian algorithms can help reduce false positives.
Depending on the occurences of words relevant to the categories, a search term can be placed in one or more of the categories.  There are  a set of metrics used to describe the search results - precision and recall. Recall measures the relevancy of the results returned by a search and precision is the measure of the quality of the results returned.  Some of the tools aim to improve querying so as to improve relevancy of the results. These tools utilize different forms of searches such as keyword search, field-restricted search, boolean queries, phrase search, concept search, concordance search, proximity search, regular expression, fuzzy search and wildcard search. 

No comments:

Post a Comment