Sunday, February 16, 2025

 Public-cloud based capabilities for text-to-speech can be instantiated with a resource initialization from the corresponding service in their service portfolio. The following explains just how to do that.

Sample implementation:

1. Get text input over a web api:

from flask import Flask, request, jsonify, send_file

import os

import azure.cognitiveservices.speech as speechsdk

app = Flask(__name__)

# Azure Speech Service configuration

SPEECH_KEY = "<your-speech-api-key>"

SERVICE_REGION = "<your-region>"

speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SERVICE_REGION)


speech_config.speech_synthesis_voice_name = "en-US-GuyNeural" # Set desired voice

@app.route('/text-to-speech', methods=['POST'])

def text_to_speech():


        # Check if text is provided directly or via file

        if 'text' in request.form:

            text = request.form['text']

        elif 'file' in request.files:

            file = request.files['file']

            text ='utf-8')


            return jsonify({"error": "No text or file provided"}), 400

        # Generate speech from text

        audio_filename = "output.mp3"

        file_config =

        synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)

        result = synthesizer.speak_text_async(text).get()

        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:

            # Save audio to file

            with open(audio_filename, "wb") as audio_file:


            return send_file(audio_filename, as_attachment=True)


            return jsonify({"error": f"Speech synthesis failed: {result.reason}"}), 500

    except Exception as e:

        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":"", port=5000)

2. Prerequisites to run the script:

a. pip install flask azure-cognitiveservices-speech

b. Create an Azure Speech resource in the Azure portal and retrieve the SPEECH_KEY and SERVICE_REGION from the resources’ keys and endpoint section and use them in place of `<your-speech-api-key>` and `<your-region>` above

c. Save the script and run it in any host as `python`

3. Sample trial

a. With curl request as `curl -X POST -F "text=Hello, this is a test." --output output.mp3`

b. Or as file attachment with `curl -X POST -F "file=@example.txt" --output output.mp3`

c. The mp3 audio file generated can be played.

Sample output:

Pricing: Perhaps the single most sought-after feature on text-to-speech is the use of natural sounding voice and service providers often markup the price or even eliminate programmability options for the range of the natural-voices offered. This severely limits the automation of audio books. A comparison of costs might also illustrate the differences between the service providers. Public Cloud text-to-speech services typically charge $4 and $16 per million characters for standard and neural voices respectively which is about 4-5 audio books. Custom voices require about $30 per million characters while dedicated providers such as Natural Voice with more readily available portfolio of voices charge about $60/month as a subscription fee and limits on words. This is still costly but automation of audio production for books is here to stay simply because of the time and effort saved.

#codingexercise: CodingExercise-02-16-2025.docx

No comments:

Post a Comment