Sunday, December 21, 2014

PowerShell Text-to-Speech (TTS).

Here is a PowerShell one liner for Text-to-Speech (TTS) using Microsoft's desktop oriented Speech API (SAPI).

(New-Object -ComObject Sapi.SpVoice).Speak("Hello There!")

It uses New-Object to create a Component Object Model (COM) instance of SAPI spVoice.

Actually, if you plan to use speech in script it will make more sense to keep the object around for reuse.

$synth = New-Object -ComObject Sapi.SpVoice
$synth.Speak("Hello Again!")

In the Windows jungle there is no escape from King-COM!


Oh...wait..You can also access SAPI via .NET instead of directly using COM.  You can make the SAPI  System.speech assembly accessible by using Add-Type.

Add-Type -AssemblyName System.speech
$synth = New-Object System.Speech.Synthesis.SpeechSynthesizer
$synth.Speak("Hello from dot net")

SAPI is fun to play with but it comes with a limited set of voices and speech recognizers.  If you want to experiment with other voices you'll need to purchase them or switch speech systems.  One option is the Microsoft Speech Platform which supports several additional voices.

Unfortunately, voices and speech recognizers are not compatible between the two Microsoft speech systems. They have slightly different designs reflecting their different use cases.  SAPI is designed for desktop platforms and single users.  The SAPI speech recognizers are tuneable to a specific user and they support recognition of arbitrary words with a diction engine.  A single  running instance of the SAPI speech system can be shared among many applications (i.e. the SAPI provider runs out-of-process).  The Speech Platform is server oriented.  It is in-process (AKA InProc) so each process that requires speech capabilities will have it's own instance of the Speech Platform speech system.  You could run multiple speech capable processes on a single server (e.g. concurrent voice recognition processes on several users voice mailboxes).

I'm assuming that you have already downloaded and installed the Microsoft Speech Platform SDK, runtime, language packs (speech recognizers and text-to-speech voices) you want to use.  Once again, use Add-Type to add the Speech Platform assembly and create Microsoft.Speech objects in a PowerShell environment.  The Speech Platform requires you to set the audio output destination so you can hear what is said.

Add-Type -Path "C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly\Microsoft.Speech.dll"
$ms_speak = New-Object Microsoft.Speech.Synthesis.SpeechSynthesizer
$ms_speak.setOutputToDefaultAudioDevice()
$ms_speak.Speak("Hello, again, and again!")

After creating the SpeechSynthesizer object You can record the speech to a file with:

$ms_speak.setOutputToWaveFile("hello.wav")
$ms_speak.Speak("Greetings")
$ms_speak.Dispose()

You must Dispose of the object to commit the speech audio data to the named file.

I recommend  reviewing the MSDN documentation for both speech systems. Also, check out the Out-Voice function and this blog post (both by Boe Prox) he describes how you can spelunk the two systems from PowerShell with Get-Member.  Finally, Language Packs provide SAPI text-to-speech voices and speech recognizers for a few non-English languages.
--
P.S. Technically you can use SAPI InProc or shared (out-of-process).
P.P.S. There really is no getting away from COM.  It's still one of the architectural pillars of Microsoft server and desktop products.