Cluster computing: Azure Congitive Search

This article is a continuation of the series of articles starting with the description of SignalR service. In this article, we begin to discuss Azure cognitive search service after the last article on Azure Stream Analytics. We had also discussed Azure Kubernetes service that provides a Kubernetes cluster in a serverless environment hosted in the cloud. The workload can be run in the cloud at the edge or as a hybrid with support for running .NET applications on Windows Server containers. Java applications on Linux containers or microservice applications in different languages and environments. Essentially, this provisions a datacenter on top of Azure stack HCI. The hyper-converged infrastructure is an Azure service that provides security performance and feature updates and allows the data center to be extended to the cloud. When the AKS is deployed on a Windows server 2019 data center the cluster is local to the Windows Server but when it is deployed to the Azure stack HCI it can scale seamlessly because the HCI is hosted on its set of clusters. We also reviewed Azure Stream analytics service that provides a scalable approach to analyzing streams with its notion of jobs and clusters. Let us review the Azure Cognitive search service for it search capabilities.

Azure Cognitive Search differs from the Do-It-Yourself techniques in that it is a fully managed search-as-a-service but it is primarily a full-text search. It provides rich user experience with searching all types of content including vision, language and speech. It provides machine learning features to contextually rank search results. It is powered by deep learning models. It can extract and enrich content using AI-powered algorithms. Different content can be consolidated to build a single index.

The Fulltext search query is based on Lucene functionality that has been customized with extensions and lock downs to enable core scenarios. There are four stages to the query execution involving query parsing, lexical analysis, document matching, and scoring. When the query text comes in, the Query Parser must separate query terms from the query operators and create the query tree to be sent to the search engine. The separated terms are sent to the analyzers which must perform stemming, canonicalization and removals to efficiently utilize the terms. The analyzed terms are sent back to the parser. The terms proceed to the search engine that must store and organize searchable terms extracted from indexed documents. This index lives separately from the document, and it is easy to regenerate it offline from query execution. Finally, the search engine scores and retrieves the contents of the inverted index to display the top matches. A sample program to illustrate this example is included here.

The REST API for Azure Cognitive Search takes a payload with properties such as “search”, “searchFields”, “searchMode”, “filter”, “order by”, and “queryType”. The query is broken down into three sub-queries involving a term query, a phrase query and a prefix query. The search terms can include wild-cards for matching several terms say as prefix. The search engine scans the fields specified in the searchFields property for documents that match one or more of the search terms. The resulting sets are ordered and it is easy to specify geography data type based queries for proximity basis to sorting the results.

Cluster computing

Wednesday, August 11, 2021

Azure Congitive Search

No comments:

Post a Comment