Saturday, August 14, 2021

 Azure Cognitive Search  

This article is a continuation of the series of articles starting with the description of SignalRservice. In this article, we begin to discuss Azure cognitive search service aka Azure Search, after the last article on Azure Stream Analytics.    

 

Azure Cognitive Search differs from the Do-It-Yourself techniques in that it is a fully managed search-as-a-service, but it is primarily a full-text search. It provides a rich user experience with searching all types of content including vision, language, and speech. It provides machine learning features to contextually rank search results. It is powered by deep learning models. It can extract and enrich content using AI-powered algorithms. Different content can be consolidated to build a single index.  

 

The search service supports primarily indexing and querying. Indexing is associated with the input data path to the search service. It processes the content and converts them to JSON documents. If the content includes mixed files, searchable text can be extracted from the files. Heterogeneous content can be consolidated into a private user-defined search index. Large amounts of data stored in external repositories including Blob storage, Cosmos DB, or other storage can now be indexed.  The index can be protected against data loss, corruption, and disasters via the same mechanisms that are used for the content.  The index is also independent of the service, so another can read the same service if one goes down.   

 

We evaluate the features of the Azure Cognitive Service next. 

 

The indexing features of assured cognitive search include a full-text search engine, but assistant storage of search indexes integrated AI and API and tools. The data sources for the indexing can be arbitrary if the more of transferring data is a JSON document. Indexers automate data transfer from these data sources and send it to searchable content in the primary storage. Connectors help the indexers with the data transfer and are specific to the data sources such as Azure SQL databases cosmos DB or Azure BLOB storage. Complex data types and collections allow us to model any type of JSON data structure within a search index. The use of collections and complex types helps with one too many and many too many mappings. The analyzers can be used for linguistic analysis of the data ingested into the indexes.  

 

The standard Lucene analyzer is used by default, but it can be overridden with a language analyzer, or a custom analyzer, or one of many predefined analyzers which produce tokens used for search. The AI processing for image and text analysis can be applied to an indexing pipeline at the time of extracting text information some examples of built-in skills include optical character cognition and key phrase recognition. It can also be integrated with azure machine learning authored skills  

 

The indexing pipeline also generates a knowledge store. Instead of sending tokenized terms to an index, it can enrich documents and send them to a knowledge store. This store could be native to Azure in the form of BLOB storage or table storage. The purpose of the knowledge store is to support downstream analysis for processing. With the availability of a separate knowledge store, all analysis and reporting stacks can now be decoupled from the indexing pipeline. 

Another feature of the indexing pipeline is the cached content. This limits the processing to just the documents that are changed by specific edits to the pipeline. Most usages read from the cache. 

Query pipeline also has several features to enhance the analysis from the Lucene search store. these include free-form text search, relevance, and geo-search. The freeform text search is the primary use case for queries. The simple syntax might include logical operators, phrase operators, suffix operators, and precedence operators, and others. Extensions to this search could include proximity search, term boosting, and regular expressions. Simple scoring is a key benefit of this search. A set of scoring rules is used to model relevance for the documents. These rulesets can be built using tags for personalized scoring based on customer search preferences. 

 

 

 

 

 

 

 

 

 

No comments:

Post a Comment