Thursday, April 10, 2025

 The following script can be used to covert the manuscript of a book into its corresponding audio production.

Option 1: individual chapters

import azure.cognitiveservices.speech as speechsdk

import time

def batch_text_to_speech(text, output_filename):

      # Azure Speech Service configuration

      speech_key = "<use-your-speech-key>"

      service_region = "eastus"

# Configure speech synthesis

speech_config = speechsdk.SpeechConfig(

     subscription=speech_key,

     region=service_region

)

# Set output format to MP3

                          speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3)

speech_config.speech_synthesis_voice_name = "en-US-BrianMultilingualNeural"

# Create audio config for file output

audio_config = speechsdk.audio.AudioOutputConfig(filename=output_filename)

# Create speech synthesizer

synthesizer = speechsdk.SpeechSynthesizer(

    speech_config=speech_config,

    audio_config=audio_config

)

# Split text into chunks if needed (optional)

# text_chunks = split_large_text(text)

# Synthesize text

result = synthesizer.speak_text_async(text).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:

    print(f"Audio synthesized to {output_filename}")

elif result.reason == speechsdk.ResultReason.Canceled:

    cancellation_details = result.cancellation_details

    print(f"Speech synthesis canceled: {cancellation_details.reason}")

    if cancellation_details.reason == speechsdk.CancellationReason.Error:

        print(f"Error details: {cancellation_details.error_details}")

def split_large_text(text, max_length=9000):

        return [text[i:i+max_length] for i in range(0, len(text), max_length)]

input_filename = ""

large_text = ""

for i in range(1,100):

        input_filename=f"{i}.txt"

        print(input_filename)

        if input_filename:

          with open(input_filename, "r") as fin:

              large_text = fin.read()

              print(str(len(large_text)) + " " + input_filename.replace("txt","mp3"))

              batch_text_to_speech(large_text, input_filename.replace("txt","mp3"))

Option 2. Whole manuscript:

import requests import json import time from docx import Document import os import uuid

# Azure AI Language Service configuration

endpoint = "https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/JOBID?api-version=2024-04-01" api_key = "<your_api_key>"

headers = {

 "Content-Type": "application/json",

 "Ocp-Apim-Subscription-Key": api_key

 }

def synthesize_text(inputs):

body = {

 "inputKind": "PlainText", # or SSML

 'synthesisConfig': {

 "voice": "en-US-BrianMultilingualNeural",

 },

 # Replace with your custom voice name and deployment ID if you want to use custom voice.

 # Multiple voices are supported, the mixture of custom voices and platform voices is allowed.

 # Invalid voice name or deployment ID will be rejected.

 'customVoices': {

  # "YOUR_CUSTOM_VOICE_NAME": "YOUR_CUSTOM_VOICE_ID" }, "inputs": inputs,

   "properties": {

     "outputFormat": "audio-48khz-192kbitrate-mono-mp3"

    }

 }

 response = requests.put(endpoint.replace("JOBID", str(uuid.uuid4())), headers=headers, json=body)

 if response.status_code < 400:

    jobId = f'{response.json()["id"]}'

    return jobId

 else:

    raise Exception(f"Failed to start batch synthesis job: {response.text}")

def get_synthesis(job_id: str):

 while True:

 url = f'https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/{job_id}?api-version=2024-04-01'

     headers = { "Content-Type": "application/json", "Ocp-Apim-Subscription-Key": api_key }

     response = requests.get(url, headers=headers)

  if response.status_code < 400:

status = response.json()['status']

if "Succeeded" in status:

return response.json()

else:

print(f'batch synthesis job is still running, status [{status}]')

time.sleep(5) # Wait for 5 seconds before checking again

def get_text(file_path):

with open(file_path, 'r') as file:

  file_contents = file.read()

print(f"Length of text: {len(file_contents)}")

return file_contents

if name == "main":

input_file_name = ""

large_text = ""

inputs = []

  for i in range(1,100):

   input_file_name=f"{i}.txt"

   print(input_file_name)

   if input_file_name:

document_text = get_text(input_file_name)

inputs += [ { "content": document_text }, ]

jobId = synthesize_text(inputs)

 print(jobId)

 # Get audio result

 audio = get_synthesis(jobId)

 print("Result:")

 print(audio)

#Codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/EV8iyT_-kuVCp1f6IVela_0BRuHHSQwBqNnng7Ztz4cQaA?e=ZHpPON


Wednesday, April 9, 2025

 This is a summary of the book titled “Crash Landing” written by Liz Hoffman and published by Crown in 2023. When the pandemic hit, the 2008 recession was dwarfed. Leaders had to act fast. Billions of dollars changed hands. Some companies made money while others barely endured. Hoffman provided intimate portraits of leaders who navigated these times as the inside story of how some companies survived an economy on the brink. Many were blindsided by the pandemic such as the America’s airline industry. By the time they could grapple with the reality, the crisis was a major one. When money stopped flowing, companies borrowed. The US Government threw in its vast financial firepower at the crisis but the economy that survived was no longer the original before the pandemic.

In late January 2020, the financial elite at the World Economic Forum in Davos, Switzerland, were unaware of the potential impact of COVID-19 on the global economy. The US economy had experienced 10 consecutive years of growth and had a record high in 2019 corporate profits. However, the American economy was particularly vulnerable to a health crisis due to stagnant wages, reduced workers' benefits, and lack of surplus funds. The virus was already affecting Taiwan and Japan, and it would soon appear in Europe and North America. The airline industry, which had enjoyed a champagne decade, was also vulnerable to the virus. In 2020, the globalized world was not interconnected by land or sea, but by air. Direct flights reached twice as many cities as they had 20 years earlier. A 35-year-old man returning from China in mid-January became America's first reported case of COVID-19, unaware of the potentially deadly virus.

In March 2020, the world faced a major crisis due to the COVID-19 pandemic. American executives and Wall Street bankers were not taking the situation seriously, as businesses in other countries did. The virus spreads through the air and resembles an ordinary flu, leading to widespread social distancing. Despite rigorous lockdowns, COVID-19 quickly crossed out of China, leading to the closure of Disney's Shanghai Park, McDonald's, Starbucks, Delta, and Hilton's hotels in China. The world's greatest economy shut down, and Wall Street's financial markets experienced panic. Bill Ackman, founder and CEO of Pershing Square Capital Management, believed the virus might be difficult to control in the US, leading to massive unemployment and civil unrest. Investors began feeling spooked, and stock values fell. The Federal Reserve intervened with an interest rate cut, but the markets remained open. On March 11, 2020, the World Health Organization announced that COVID-19 had reached the level of a global pandemic, leading to stock declines, sports leagues ending, and Disney parks closing.

The COVID-19 pandemic severely impacted the travel industry, leading to a significant drop in revenue per available room, a crucial financial metric in the hotel industry. Hilton, a major hotel chain, had barely survived the 2008 financial collapse and was unable to survive the 2020 crisis. The pandemic exposed the dangers of a financial playbook that had become the default in corporate boardrooms over the previous two decades. Hilton's leaders called in its $1.75 billion line of credit to borrow money and worry that banks themselves could go under. The 2020 financial meltdown differed from the 2008 crisis, as it was not as severe as the 2008 crisis. Wall Street traders were uneasy and the sudden need to work from home worsened volatility. Bank reforms in 2020 limited Wall Street's activities and freedoms, leading to a decline in productivity and a dramatic fall in the S&P 500.

The US government and airline executives faced financial challenges during the COVID economic crisis, aiming to avoid bankruptcy and a complete meltdown. After negotiations, Congress pursued a multibillion-dollar payroll relief package, leading to major hedge funds selling bonds and Airbnb spending billions on COVID refunds. The number of Americans with COVID increased exponentially, and banks borrowed billions. The economy that survived the pandemic is not the one that crashed headlong into it, but it did not fall into depression. The pandemic created value, such as improved telecommunications infrastructure and higher pay for essential workers. However, inflation and interest rates rose, making life difficult for ordinary people. While the pandemic wasn't a total disaster, it required a careful balance between swift action and making the right choices.

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNYlIwEAAAABD8nSsN--hM7kfA-W_mzuWw?e=BQczmz 


Tuesday, April 8, 2025

 Lessons from storage engineering for Knowledge bases and RAGs.

Data at rest and in transit are chunks of binaries that make sense only when there are additional layers built to process them, store them, ETL them or provide them as results to queries and storage engineering has a rich tradition in building datastores, as databases and data warehouses and even making them virtual and hosted in the cloud. Vector databases, albeit the authority in embeddings and semantic similarity, do not operate independently but must be part of a system that serves as a data platform and often spans multiple and hybrid data sources for best results. The old and the new worlds can enter a virtuous feedback loop that can improve the use of new datastores.

Take Facebook Presto, for example, as a success story in bridging structured and unstructured social engineering data. Developed as an open-source distributed SQL query engine, it revolutionized data analytics by enabling seamless querying across structured and unstructured data sources. Presto's ability to perform federated queries allowed users to join and analyze data from diverse sources, such as Hadoop Distributed File System (HDFS), Apache Cassandra, and relational databases, in real-time. This unified approach eliminated the need for multiple specialized tools, bridging the gap between structured and unstructured data. Presto's architecture, optimized for low query latency, employed in-memory processing and pipelined execution, significantly reducing end-to-end latency compared to traditional systems like Hive3. Its scalability and flexibility made it a valuable tool for handling petabyte-scale datasets.

Drawing parallels to technologies that work with structured and vector data, vector databases emerge as a compelling counterpart. These databases are designed to store and retrieve high-dimensional vectors, which are mathematical representations of objects. By mapping structured data into vector space, vector databases facilitate similarity searches and enable AI algorithms to retrieve relevant information efficiently. For example, Milvus, a popular vector database, supports vectorizing structured data and querying it for advanced analytics. This process involves converting structured data into numerical vectors using machine learning models, allowing for nuanced analysis and pattern detection.

Both Presto and vector databases share a common goal: unifying disparate data types for seamless analysis. Some examples of vector databases include:

• Milvus: Milvus is an open-source vector database designed for managing large-scale vector data. It supports hybrid searches, combining structured metadata with vector similarity queries, making it ideal for applications like recommendation systems and AI-driven analytics.

• Weaviate: Weaviate is another open-source vector database that integrates structured data with vector embeddings. It offers semantic search capabilities and allows users to query data using natural language prompts.

• Redis (Redis-Search and Redis-VSS): Redis has extensions for vector search that enable hybrid queries, combining structured data with vector-based similarity searches. It's optimized for high-speed lookups and real-time applications.

• Qdrant: Qdrant is a vector database that supports hybrid queries, allowing structured filters alongside vector searches. It is designed for scalable and efficient AI applications.

Azure Cosmos DB stands out as a versatile database service that integrates vector search capabilities alongside its traditional NoSQL and relational database functionalities. When compared with the above list, here’s how it stands out:

• Hybrid Data Support: Like Milvus, Weaviate, Redis, and Qdrant, Azure Cosmos DB supports hybrid queries, combining structured data with vector embeddings. This makes it suitable for applications requiring both traditional database operations and vector-based similarity searches.

• Integrated Vector Store: Azure Cosmos DB allows vectors to be stored directly within documents alongside schema-free data. This colocation simplifies data management and enhances the efficiency of vector-based operations, a feature that aligns with the capabilities of vector databases.

• Scalability and Performance: Azure Cosmos DB offers automatic scalability and single-digit millisecond response times, ensuring high performance at any scale. This is comparable to the optimized performance of vector databases like Redis and Milvus.

• Vector Indexing: Azure Cosmos DB supports advanced vector indexing methods, such as DiskANN-based quantization, enabling efficient and accurate vector searches. This is like the indexing techniques used in specialized vector databases.

• AI Integration: Azure Cosmos DB is designed to support AI-driven applications, including natural language processing, recommendation systems, and multi-modal searches. This aligns with the use cases of vector databases like Weaviate and Qdrant.

While Azure Cosmos DB provides robust vector search capabilities, it also offers the flexibility of a general-purpose database, making it a compelling choice for organizations looking to unify structured, unstructured, and vector data within a single platform.

#codingexercise: https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNYlIwEAAAABD8nSsN--hM7kfA-W_mzuWw?e=BQczmz 

Monday, April 7, 2025

 The following script can be used to covert the manuscript of a book into its corresponding audio production.

Option 1: individual chapters

import azure.cognitiveservices.speech as speechsdk

import time

def batch_text_to_speech(text, output_filename):

      # Azure Speech Service configuration

      speech_key = "<use-your-speech-key>"

      service_region = "eastus"

# Configure speech synthesis

speech_config = speechsdk.SpeechConfig(

     subscription=speech_key,

     region=service_region

)

# Set output format to MP3

                          speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio48Khz192KBitRateMonoMp3)

speech_config.speech_synthesis_voice_name = "en-US-BrianMultilingualNeural"

# Create audio config for file output

audio_config = speechsdk.audio.AudioOutputConfig(filename=output_filename)

# Create speech synthesizer

synthesizer = speechsdk.SpeechSynthesizer(

    speech_config=speech_config,

    audio_config=audio_config

)

# Split text into chunks if needed (optional)

# text_chunks = split_large_text(text)

# Synthesize text

result = synthesizer.speak_text_async(text).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:

    print(f"Audio synthesized to {output_filename}")

elif result.reason == speechsdk.ResultReason.Canceled:

    cancellation_details = result.cancellation_details

    print(f"Speech synthesis canceled: {cancellation_details.reason}")

    if cancellation_details.reason == speechsdk.CancellationReason.Error:

        print(f"Error details: {cancellation_details.error_details}")

def split_large_text(text, max_length=9000):

        return [text[i:i+max_length] for i in range(0, len(text), max_length)]

input_filename = ""

large_text = ""

for i in range(1,100):

        input_filename=f"{i}.txt"

        print(input_filename)

        if input_filename:

          with open(input_filename, "r") as fin:

              large_text = fin.read()

              print(str(len(large_text)) + " " + input_filename.replace("txt","mp3"))

              batch_text_to_speech(large_text, input_filename.replace("txt","mp3"))

Option 2. Whole manuscript:

import requests import json import time from docx import Document import os import uuid

# Azure AI Language Service configuration

endpoint = "https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/JOBID?api-version=2024-04-01" api_key = "<your_api_key>"

headers = {

 "Content-Type": "application/json",

 "Ocp-Apim-Subscription-Key": api_key

 }

def synthesize_text(inputs):

body = {

 "inputKind": "PlainText", # or SSML

 'synthesisConfig': {

 "voice": "en-US-BrianMultilingualNeural",

 },

 # Replace with your custom voice name and deployment ID if you want to use custom voice.

 # Multiple voices are supported, the mixture of custom voices and platform voices is allowed.

 # Invalid voice name or deployment ID will be rejected.

 'customVoices': {

  # "YOUR_CUSTOM_VOICE_NAME": "YOUR_CUSTOM_VOICE_ID" }, "inputs": inputs,

   "properties": {

     "outputFormat": "audio-48khz-192kbitrate-mono-mp3"

    }

 }

 response = requests.put(endpoint.replace("JOBID", str(uuid.uuid4())), headers=headers, json=body)

 if response.status_code < 400:

    jobId = f'{response.json()["id"]}'

    return jobId

 else:

    raise Exception(f"Failed to start batch synthesis job: {response.text}")

def get_synthesis(job_id: str):

 while True:

 url = f'https://eastus.api.cognitive.microsoft.com/texttospeech/batchsyntheses/{job_id}?api-version=2024-04-01'

     headers = { "Content-Type": "application/json", "Ocp-Apim-Subscription-Key": api_key }

     response = requests.get(url, headers=headers)

  if response.status_code < 400:

status = response.json()['status']

if "Succeeded" in status:

return response.json()

else:

print(f'batch synthesis job is still running, status [{status}]')

time.sleep(5) # Wait for 5 seconds before checking again

def get_text(file_path):

with open(file_path, 'r') as file:

  file_contents = file.read()

print(f"Length of text: {len(file_contents)}")

return file_contents

if name == "main":

input_file_name = ""

large_text = ""

inputs = []

  for i in range(1,100):

   input_file_name=f"{i}.txt"

   print(input_file_name)

   if input_file_name:

document_text = get_text(input_file_name)

inputs += [ { "content": document_text }, ]

jobId = synthesize_text(inputs)

 print(jobId)

 # Get audio result

 audio = get_synthesis(jobId)

 print("Result:")

 print(audio)


Sunday, April 6, 2025

 Problem #1:

 A stream of arbitrary integers appears in no particular order and without duplicates

  the rank of each integer is determined by the number of smaller integers before and after it up to the current position

  Write a method to get the rank of the current integer in an efficient way.

  eg: 7, 1, 3, 9, 5, 8

Solution:

A max heap is used to keep track of the elements we have seen and to count those that are smaller

using System;

using System.Collections.Generic;

using System.Linq;

public class Test

{

 private static SortedList<int, int> itemRankPair = new SortedList<int, int>();

 public static void Main()

 {

      var items = new List<int>(){7, 1, 3, 9, 5, 8};

      for (int i = 0; i < items.Count; i++)

      {

       var item = items[i];

       Console.WriteLine("Item={0}", item);

       if (itemRankPair.ContainsKey(item) == false)

       {

           itemRankPair.Add(item, GetRank(item));

       }

       Console.WriteLine();

       for (int j = 0; j < i; j++)

       {

           int k = items[j];

           if (k >= item)

           {

            itemRankPair[k] += 1;

            Console.WriteLine("item={0}, Rank={1}", k, itemRankPair[k]);

           }

       }

       Console.WriteLine();

      }

      foreach (var k in itemRankPair.Keys.ToList()){

  Console.WriteLine("item={0}, Rank={1}", k, itemRankPair[k]);

      }

 }

 private static int GetRank(int n)

 {

  int rank = 0;

  foreach( var key in itemRankPair.Keys.ToList())

  {

  if (key < n)

  {

      rank++;

  }

  }

  return rank;

 }

}

# Azure infrastructure-as-code solution: IaCResolutionsPart274.docx

Saturday, April 5, 2025

 These are the steps to create an AI agent. As discussed earlier, they came in various types and forms, but they can be quite capable. From design, implementation, test to deployment, it must be specific to its requirements, make the right selections of model and knowledge base, and remain pertinent and accurate, while avoiding the pitfalls such as bias, hallucinations, and concerns against safety, security and privacy.

The first step is to draw the requirements, which involves 1. identifying the problem, 2. prompt engineering, and 3. determining user interaction. For example, scheduling meetings, answering common questions, and generating creative content requires have different approaches. Clear instructions for guiding the agents’ behavior, outlining what the agent should do in different scenarios and providing specific instructions while allowing flexibility to handle parameters will help with prompts. The way the user interacts with the agent such as a chat is also important.

The second step is to choose the right model. This is a critical step in building an effective AI agent. The models have their advantages such as GPT-4 from OpenAI for advanced NLP, LLaMA 3 by Meta for its efficiency and adaptability, and Google’s PaLM 2 for handling multilingual tasks. Model’s like Meta’s LLaMA are open and offer customization options while OpenAI’s closed/hosted model GPT-4 come with support, maintenance and ease of use. Also, a less complex model such as GPT-3 might suffice for the requirements.

The performance metrics of different models such as accuracy, response time, scalability, and the ability to handle concurrent requests are important to align with the requirements. Consider customization options when you must fine-tune them for your specific tasks, especially when there is a domain-specific language.

Check if the model can easily be integrated with existing systems and tools especially for API support, environments and external databases, web services or interfaces. There might be some cost implications of various models, and some effort required for experimentation and iteration.

The third step involves enabling tools, for say information retrieval, web browsing and function calling, along with the configuration of specific settings for each tool, the testing of its integration, and adherence to security best practices and observability.

The fourth step is the extension of capabilities via custom functions, such as for summarization or reports, even if it involves writing code, testing the implementation and integrating via endpoints or webhooks, defining parameters and configuring invocations, testing and optimization

As with all software, some best practices are upheld. AI agents are only as good as their data which must be comprehensive and free of bias. They could provide transparency in terms of references or statistics or enumeration of thought,action and observations, enhance security with robust access control with defense-in-depth strategy, avoiding complexity that involves common sense, reasoning, or understanding context and not require a whole lot of or absence of human supervision.

#codingexercise: 

https://1drv.ms/w/c/d609fb70e39b65c8/Echlm-Nw-wkggNYlIwEAAAABD8nSsN--hM7kfA-W_mzuWw?e=f29Tjt

Friday, April 4, 2025

 This article is all about AI agents. There are many types, development processes and real-world implementations. There has always been automation for complex workflows with curated artifacts, they have been boutique and never really intended for Large Language Models. While information can be tapped from multiple data sources or a knowledge base, enhanced decision-making processes needed to leverage AI agents. The operational framework of AI agents and the ways they augment LLMs is described here.

AI agents are software entities that complete a task autonomously on behalf of a user, including making requests to other services to improve the reach of standalone LLMs. They can retrieve real-time data from external databases, and APIs, manage interactive sessions with users and automate routine tasks that can be invoked dynamically or on schedule and with different parameters. An agent framework provides the tools and structures necessary to a developer to build robust, scaleable and efficient agent-based systems. This agent framework is an evolution of Reason-Action (ReAct) framework where an LLM is prompted to follow Thought/Action/Observation sequences. The Agent framework extends this by including external tools into the action step. The tools can range from simple calculators and database calls to python code generation and execution and even interactions with other agents. The calling program typically parses the output of the LLM at each step to determine the next steps. As an example, a prompt to find the weather in Seattle, WA involves a thought for needing to access a weather API to get the information, an action to call the weather API with location information and an observation on the response, followed by a thought that the relevant information is 65 degrees Fahrenheit and sunny, an action to report it to the user and an observation that the user is informed. By increasing iterations, articulation of granularity, dynamic adaptability and interoperability, the decision-making process can be arbitrarily enhanced. Compared with traditional software agents, there is very little distraction from syntax and format and more emphasis on semantics and latent meaning by virtue of LLMs and vector databases. This helps them to provide more dynamic, context-aware responses.

LLM agents can be diverse with each tailored to address specific challenges in information processing, decision support, and task automation. The task-specific agents are designed to perform specific, well-defined tasks. The conversational agents leverage natural language not a query language to interact with users. The decision support agents analyze complex data and provide insights. The workflow-automation agents co-ordinate and execute multi-step processes across different systems. The information retrieval agents can search and extract relevant information from large datasets or document repositories. The collaborative agents are creative and work with humans to accomplish complex tasks. The predictive agents use historical data and current trends to forecast future outcomes. The adaptive learning agents improve performance over time by learning from interactions and feedback. By categorizing different types of agents, an organization can streamline their operations, improve customer experiences and gain valuable insights.