Friday, September 20, 2013

Caching application block is another application block specifically designed for this purpose. Developers often choose Libraries such as this along with  Lucene.Net or external solutions as AppFabric. While AppFabric is a caching framework serving across all API based on URI hashing and request and response caching, developers may look for caching within their library implementation. Lucene.net is more an inverted index that helps with search and storage.
By the way, Lucene can be used to build a source index server for all your source code.
Here we build an index and run the queries against it. Lucene.Net has an IndexWriter class that is used to make this index. You need an Analyzer object to instantiate this IndexWriter.  Analyzers are based on language and you can also choose different tokenizers. For now, a SimpleAnalyzer could do. The input to the index is a set of documents. Documents are comprised of fields and fields are key value pairs. You can instantiate a SimpleFileIndexer and index the files in a directory. A Directory is a flat list of files. Directories are locked with a LockFactory such as a SingleInstanceLockFactory. Indices are stored in a single file.
I want to mention that Lucene comes with a Store to handle all the persistence. This store also lets us create  RAM-based indices.  A RAMDirectory is a memory resident Directory implementation.
Other than that it supports all operations as a regular directory in terms of listing the files, checking for the existence of a file, displaying the size in bytes, opening a file and reading bytes from a file or writing bytes to the file as input and output streams.
Lucene.Net's implementation of a Cache supports algorithms such as LRU or a HashMap to store key-value pairs. If you want a synchronizedCache you will have to explicitly mention so. The caches supports simple operations such as Get and Put.
Searching and storage are facilitated by Lucene.Net but you can also use such things as a SpellChecker  and Regex. QueryStrings can be defined as a series of clauses. A Clause may be prefixed by a plus or minus sign indicating whether it is to be included or excluded, or a term following a colon indicating the field to be searched such as with multiple terms.  A clause may either be a term indicating all the documents this term or a nested query, enclosed in paranthesis. If a range query is used, the QueryParser tries to detect ranges such as date values for a query involving date fields. The QueryParser is not threadsafe. However, you could also customize the QueryParser by deriving from the default and this can be made modular. The only caveat is that the .Net library based indexer does not have the enumerateFilesOrDirectories as the Java library does.

No comments:

Post a Comment