Thursday, April 22, 2021

 Utilizing public cloud infrastructure for text summarization service:

Introduction: A text summarization service exists utilizing a word embeddings model. The API for the service serves a portal where users can enter text directly or upload files. This article investigates the migration of the web service to the public cloud using the cloud services.

Description: There is some planning involved when migrating such a service to use the public cloud services. First, the deployment of the service will have to move from the existing legacy virtual machine to one that is ML friendly with a GPU such as “STANDARD-NC6”. There is a BERT notebook that allows the selection of a GPU suitable for the NLP processing. Second, a training dataset is required if we plant to use BERT. There are other models available for the NLP service such as the PMI model, but it is typical to pick one and use it with the web service.  BERT just helps with transfer learning where knowledge gained from earlier training is used with novel cases not encountered during training.

There are specifically three different types of NLP services available from Azure. These are:

Azure HDInsight with Spark and Spark MLlib

Azure Databricks and 

Microsoft Cognitive Services

Since we want prebuilt model to use with our web service, we can use the Microsoft Cognitive Service.  If we were to create a custom model, we would use the Azure HDInsight with Spark MLLib and Spark NLP which also provides low-level tokenization, stemming, lemmatization, TF-IDF and sentence-boundary detection. Cognitive services do not support large documents and big data sets. 

Cognitive services provide the following APIs:

Linguistic Analysis API - for low level NLP processing such as tokenizer and part of speech tagging.

Language Understanding Intelligent Service (LUIS) API for entity/Intent identification and extraction.

Text analysis API for topic detection, sentiment analysis and language detection

Bing Spell check API for spelling check

Out of these we only need to review the text analysis API to extract key phrases and if we rank them then we can perform text summarization. The ranking may not be like those from word embeddings from a SoftMax classifier, but we don’t have to calculate similarity distance between key phrases. Instead, we allow the key phrase extraction to give us the terms and then extract sentences with those phrases. There is no Text Summarization web service API in Azure but there is an Agolo service in the Azure marketplace that provides NLP summarization for Enterprises. Agolo services summarize news feeds. Azure Databricks does not have an out-of-box NLP service but provides the infrastructure to create one. MPhasis deep insights text summarizer on AWS marketplace provides text summarization in three sentences of approximately 30 words for text snippets of size 512 words.

curl -v -X POST "https://westus2.api.cognitive.microsoft.com/text/analytics/v3.0/keyPhrases?model-version={string}&showStats={boolean}"

-H "Content-Type: application/json"

-H "Ocp-Apim-Subscription-Key: {subscription key}"

--data-ascii "{body}"


No comments:

Post a Comment