Speech-To-Text transcribes words spoken in audio into text.

Configuration Parameter Description
CustomLM The path and interpolation weight of each custom language model to use.
CustomLMBuildLabel The build label and interpolation weight of a custom language model to use.
CustomLMCheckInterval The amount of time to wait before checking to see if the language model specified by CustomLMBuildLabel has been updated.
ErrorMessage (Deprecated) The message that appears in the transcript when Media Server cannot connect to an IDOL Speech Server.
FilterMusic Specifies whether to include speech-to-text results for audio segments that Speech Server identifies as music or noise.
Input The audio track to process.
Language The language pack to use for speech-to-text processing.
MaxConsecutiveTries The maximum number of attempts that Media Server makes to connect to the servers listed in the SpeechToTextServers parameter.
Mode The mode for speech-to-text analysis (you can prioritize accuracy or speed).
ModeValue The processing rate. The meaning of this parameter depends on the value of the Mode parameter.
SampleFrequency The sample frequency of the audio to send to the IDOL Speech Server.
SpeechToTextServers A list of IDOL Speech Servers to use for speech-to-text.
Type The analysis engine to use. Set this parameter to SpeechToText.
UseFrameDuplication Allows for greater processing speed without significant change in recognition accuracy.

Output Tracks

Output track Type Description
Result SpeechToTextResult Contains a record for each word.


Field name Type Description
id UUID A universally unique identifier to identify the section of audio described by the record.
text TextData The spoken word converted to text.
confidence Int The confidence score for the speech-to-text process.