Public-cloud based capabilities for text-to-speech can be instantiated with a resource initialization from the corresponding service in their service portfolio. The following explains just how to do that.
Sample implementation:
1. Get text input over a web api:
from flask import Flask, request, jsonify, send_file
import os
import azure.cognitiveservices.speech as speechsdk
app = Flask(__name__)
# Azure Speech Service configuration
SPEECH_KEY = "<your-speech-api-key>"
SERVICE_REGION = "<your-region>"
speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SERVICE_REGION)
speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)
speech_config.speech_synthesis_voice_name = "en-US-GuyNeural" # Set desired voice
@app.route('/text-to-speech', methods=['POST'])
def text_to_speech():
try:
# Check if text is provided directly or via file
if 'text' in request.form:
text = request.form['text']
elif 'file' in request.files:
file = request.files['file']
text = file.read().decode('utf-8')
else:
return jsonify({"error": "No text or file provided"}), 400
# Generate speech from text
audio_filename = "output.mp3"
file_config = speechsdk.audio.AudioOutputConfig(filename=file_name)
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)
result = synthesizer.speak_text_async(text).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
# Save audio to file
with open(audio_filename, "wb") as audio_file:
audio_file.write(result.audio_data)
return send_file(audio_filename, as_attachment=True)
else:
return jsonify({"error": f"Speech synthesis failed: {result.reason}"}), 500
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
2. Prerequisites to run the script:
a. pip install flask azure-cognitiveservices-speech
b. Create an Azure Speech resource in the Azure portal and retrieve the SPEECH_KEY and SERVICE_REGION from the resources’ keys and endpoint section and use them in place of `<your-speech-api-key>` and `<your-region>` above
c. Save the script and run it in any host as `python app.py`
3. Sample trial
a. With curl request as `curl -X POST -F "text=Hello, this is a test." http://127.0.0.1:5000/text-to-speech --output output.mp3`
b. Or as file attachment with `curl -X POST -F "file=@example.txt" http://127.0.0.1:5000/text-to-speech --output output.mp3`
c. The mp3 audio file generated can be played.
Sample output: https://b67.s3.us-east-1.amazonaws.com/output.mp3
Pricing: Perhaps the single most sought-after feature on text-to-speech is the use of natural sounding voice and service providers often markup the price or even eliminate programmability options for the range of the natural-voices offered. This severely limits the automation of audio books. A comparison of costs might also illustrate the differences between the service providers. Public Cloud text-to-speech services typically charge $4 and $16 per million characters for standard and neural voices respectively which is about 4-5 audio books. Custom voices require about $30 per million characters while dedicated providers such as Natural Voice with more readily available portfolio of voices charge about $60/month as a subscription fee and limits on words. This is still costly but automation of audio production for books is here to stay simply because of the time and effort saved.
#codingexercise: CodingExercise-02-16-2025.docx
No comments:
Post a Comment