A previous article explained just the extractive summary with Azure REST APIs. This is about producing summaries for entire manuscripts:
import requests
import json
import time
from docx import Document
import os
# Azure AI Language Service configuration
endpoint = "https://<your-azure-ai-resource-name>.cognitiveservices.azure.com/language/analyze-text/jobs?api-version=2023-04-01"
api_key = "<your-api-key>"
headers = {
"Content-Type": "application/json",
"Ocp-Apim-Subscription-Key": api_key
}
def summarize_text(text):
body = {
"displayName": "Document Summarization",
"analysisInput": {
"documents": [
{
"id": "1",
"language": "en",
"text": text
}
]
},
"tasks": [
{
"kind": "ExtractiveSummarization",
"parameters": {
"sentenceCount": 5
}
}
]
}
response = requests.post(endpoint, headers=headers, json=body)
if response.status_code == 202:
operation_location = response.headers["Operation-Location"]
return operation_location
else:
raise Exception(f"Failed to start summarization job: {response.text}")
def get_summary_result(operation_location):
while True:
response = requests.get(operation_location, headers=headers)
result = json.loads(response.text)
if result["status"] == "succeeded":
summary = result["tasks"]["items"][0]["results"]["documents"][0]["sentences"]
return " ".join([sentence["text"] for sentence in summary])
elif result["status"] == "failed":
raise Exception(f"Summarization job failed: {result}")
time.sleep(5) # Wait for 5 seconds before checking again
def get_text(file_path):
with open(file_path, 'r') as file:
file_contents = file.read()
return file_contents
# Main execution
if __name__ == "__main__":
docx_file_path = "1.txt"
# Extract text from Word document
document_text = get_text(docx_file_path)
# Start summarization job
operation_location = summarize_text(document_text)
print(operation_location)
# Get summary result
summary = get_summary_result(operation_location)
print("Summary:")
print(summary)
Sample Output:
“””
https://text-ctl-3.cognitiveservices.azure.com/language/analyze-text/jobs/9afb7002-7930-4448-8bd3-e3cb02287708?api-version=2023-04-01
Summary:
The public cloud offers capabilities to the general public in the form of services from the provider̢۪s services portfolio that can be requested as instances called resources. Both for the provider and the general public, IaC is a common paradigm for self-service templates to manage, capture and track changes to a resource during its lifecycle. Public cloud is the epitome of infrastructure both in terms of history and landscape and this book describes principles using references to public cloud. The IaC architecture is almost always dominated by the choice of technology stacks. When the code is delivered, configuration management and infrastructure management provide a live operational environment for testing.
“””
A large document can be split into text as shown:
from docx import Document import os
input_file = 'Document1.docx'
output_file = Text1.txt'
def process_large_file(input_file_path, output_file_path):
try:
doc = Document(input_file_path)
print(f"Number of paragraphs: {len(doc.paragraphs)}")
with open(output_file_path, 'a', encoding='utf-8') as output_file:
for para in doc.paragraphs: chunk = para.text
if chunk:
output_file.write(chunk)
output_file.write("\r\n")
except Exception as e: print(f"An error occurred: {e}")
process_large_file(input_file, output_file)
print(f"Text has been extracted from {input_file} and written to {output_file}")
Instead of “ExtractiveSummarization” value in the request, we can use “AbstractiveSummarization”. The parsing of the operation status will also require to be changed as follows in that case:
def get_summary_result(operation_location):
while True:
response = requests.get(operation_location, headers=headers)
result = json.loads(response.text)
if result["status"] == "succeeded":
print(repr(result))
summary = result["tasks"]["items"][0]["results"]["documents"][0]["summaries"]
return " ".join([sentence["text"] for sentence in summary])
elif result["status"] == "failed":
raise Exception(f"Summarization job failed: {result}")
time.sleep(5) # Wait for 5 seconds before checking again
and a sample output will result as follows:
https://text-ctl-3.cognitiveservices.azure.com/language/analyze-text/jobs/3f246bed-ebfb-4b2b-bcc3-e40582b800d1?api-version=2023-04-01
Summary:
The document discusses the architecture of Infrastructure-as-Code (IaC) within public clouds, highlighting its tiered implementation that includes IaaS, PaaS, and DevOps tools. It emphasizes the role of IaC in managing resources through code, facilitating quick and consistent provisioning, and addressing changes throughout a resource's lifecycle. The architecture is heavily influenced by the choice of technology stacks, with tools like Ansible, Terraform, and Pulumi being prominent choices. The document notes the benefits of IaC in reducing shadow IT, integrating with CI/CD platforms, and standardizing infrastructure across environments, whether cloud-based or on-premises. It distinguishes between configuration management tools, such as CFEngine, and infrastructure management tools, like Terraform and Pulumi, which can be mixed and matched to meet specific organizational needs. The summary encapsulates the essence of IaC's role in modern cloud environments, its impact on DevOps, and its capacity to manage complex infrastructures effectively.