This article is a continuation of the series of articles starting with the description of SignalR service. In this article, we begin to discuss Azure cognitive search service after the last article on Azure Stream Analytics. We had also discussed Azure Kubernetes service that provides a Kubernetes cluster in a serverless environment hosted in the cloud. The workload can be run in the cloud at the edge or as a hybrid with support for running .NET applications on Windows Server containers. Java applications on Linux containers or microservice applications in different languages and environments. Essentially, this provisions a datacenter on top of Azure stack HCI. The hyper-converged infrastructure is an Azure service that provides security performance and feature updates and allows the data center to be extended to the cloud. When the AKS is deployed on a Windows server 2019 data center the cluster is local to the Windows Server but when it is deployed to the Azure stack HCI it can scale seamlessly because the HCI is hosted on its set of clusters. We also reviewed Azure Stream analytics service that provides a scalable approach to analyzing streams with its notion of jobs and clusters. Let us review the Azure Cognitive search service for it search capabilities.
Azure Cognitive Search differs from the Do-It-Yourself techniques in that it is a fully managed search-as-a-service but it
is primarily a full-text search. It provides rich user experience with
searching all types of content including vision, language and speech. It
provides machine learning features to contextually rank search results. It is
powered by deep learning models. It can extract and enrich content using
AI-powered algorithms. Different content can be consolidated to build a single
index.
The Fulltext search query is based on Lucene
functionality that has been customized with extensions and lock downs to enable
core scenarios. There are four stages to the query execution involving query
parsing, lexical analysis, document matching, and scoring. When the query text
comes in, the Query Parser must separate query terms from the query operators
and create the query tree to be sent to the search engine. The separated terms
are sent to the analyzers which must perform stemming, canonicalization and
removals to efficiently utilize the terms. The analyzed terms are sent back to
the parser. The terms proceed to the search engine that must store and organize
searchable terms extracted from indexed documents. This index lives separately
from the document, and it is easy to regenerate it offline from query
execution. Finally, the search engine scores and retrieves the contents of the
inverted index to display the top matches. A sample program to illustrate this
example is included here.
The REST API for Azure Cognitive Search takes a payload
with properties such as “search”, “searchFields”, “searchMode”, “filter”,
“order by”, and “queryType”. The query is broken down into three sub-queries
involving a term query, a phrase query
and a prefix query. The search terms can include wild-cards for matching
several terms say as prefix. The search engine scans the fields specified in
the searchFields property for documents that match one or more of the search
terms. The resulting sets are ordered and it is easy to specify geography data
type based queries for proximity basis to sorting the results.
No comments:
Post a Comment