Azure AI Speech Service and PowerShell

I have been playing around with the Azure AI Speech Service lately. In short, it’s a service to generate text-to-voice and voice-to-text. You can even train it to understand certain voices from recordings and generate a new text-to-speech voice based on a recording of your own voice. You can read more in-depth details here.

I have been using the text-to-speech function through its API and created some examples of generating speech using PowerShell.

Here is some test i created:

Before you can test this your self you need to create a deployment for Azure AI Speech Service.

Go to the Speech menu under Azure AI in the Azure Portal. Click Create.

Fill in and click review/create, and then create.

After it’s created, go to your deployment. Then go to the Keys and Endpoint menu.

Copy one of the subscription keys and the region name.

Now you need to find which voice you want to use, and if you want to use a style on that voice too. You see the documentation around this here.

Here is an example that will generate a whispering American female voice.

$AzureSpeechSubscriptionKey = 'enter your key here'
$AzureSpeechRegion = 'norwayeast'
$Language = 'en-us'
$VoiceName = 'en-US-JennyNeural'
$Style = 'whispering'
$FetchTokenHeader = @{
'Content-type'='application/x-www-form-urlencoded';
'Content-Length'= '0';
'Ocp-Apim-Subscription-Key' = $AzureSpeechSubscriptionKey
}
$OAuthToken = Invoke-RestMethod -Method POST -Uri https://$AzureSpeechRegion.api.cognitive.microsoft.com/sts/v1.0/issueToken -Headers $FetchTokenHeader
# show the token received
$OAuthToken
$MyHeader = @{"Authorization" = "Bearer $OAuthToken";
"X-Microsoft-OutputFormat" = "audio-16khz-128kbitrate-mono-mp3" }
$uri = "https://$AzureSpeechRegion.tts.speech.microsoft.com/cognitiveservices/v1"
$Body = @"
<speak version='1.0' xml:lang='$Language'>
<voice name="$VoiceName" style="$Style" styledegree="2">
Hi my name is Jenny. I am a neural voice. This is what I sound like when I have a American voice and im whispering.
</voice>
</speak>
"@
Invoke-RestMethod -Method Post -ContentType "application/ssml+xml" -Headers $MyHeader -Body $Body -Uri $uri -OutFile "audio1.wav"
view raw VoiceStyle.ps1 hosted with ❤ by GitHub

Here is an example that will generate an Irish female voice without any style.

$AzureSpeechSubscriptionKey = 'enter your key here'
$AzureSpeechRegion = 'norwayeast'
$Language = 'en-IE'
$VoiceName = 'en-IE-EmilyNeural'
$FetchTokenHeader = @{
'Content-type'='application/x-www-form-urlencoded';
'Content-Length'= '0';
'Ocp-Apim-Subscription-Key' = $AzureSpeechSubscriptionKey
}
$OAuthToken = Invoke-RestMethod -Method POST -Uri https://$AzureSpeechRegion.api.cognitive.microsoft.com/sts/v1.0/issueToken -Headers $FetchTokenHeader
# show the token received
$OAuthToken
$MyHeader = @{"Authorization" = "Bearer $OAuthToken";
"X-Microsoft-OutputFormat" = "audio-16khz-128kbitrate-mono-mp3" }
$uri = "https://$AzureSpeechRegion.tts.speech.microsoft.com/cognitiveservices/v1&quot;
$Body = @"
<speak version='1.0' xml:lang='$Language'>
<voice name="$VoiceName">
Hi my name is Emily. I am a neural voice. This is what I sound like when im using an Irish voice with no voice style.
</voice>
</speak>
"@
Invoke-RestMethod -Method Post -ContentType "application/ssml+xml" -Headers $MyHeader -Body $Body -Uri $uri -OutFile "audio1.wav"

I might dig deeper into these API endpoints in the future.

2 thoughts on “Azure AI Speech Service and PowerShell

  1. Hi,

    This script works perfectly. I have added some of my additional small improvements, but I am struggling with one issue. Namely, I can’t get Croatian diacritical marks such as “Č” or “Š” in the output audio file, as the Speech API reads them as simple “C” or “S”, which is not correct. How to solve it?

    Like

    • I had the same issue with norwegian characters like æøå. Havent look any more into it. Didnt there come new text to speech AI models from Open AI a few weeks ago? Havent looked if they are in Azure OpenAI now.

      Like

Leave a comment