Identify Speakers


Before IDOL Speech Server can identify speakers, you must train the Speech Server. Without training, Speech Server can divide the audio into different speakers and identify the gender of each speaker. For information about training IDOL Speech Server, refer to the IDOL Speech Server Administration Guide.

To identify speakers in video

  1. Create a new configuration to send to Media Server with the process action, or open an existing configuration that you want to modify.

  2. In the [Analysis] section, add a new analysis task by setting the AnalysisEngineN parameter. You can give the task any name, for example:

  3. Create a new section to contain the settings for the task, and set the following parameters:

    Type The analysis engine to use. Set this parameter to speakerid.
    Input (Optional) The audio track to process. If you do not specify an input track, Media Server processes the first track of the correct type produced by the ingest engine.

    The host name and ACI port of an IDOL Speech Server. Separate the host name and port with a colon (for example, speechserver:13000). You can specify multiple IDOL Speech Servers by using a comma-separated list. Media Server can connect to only one IDOL Speech Server at a time, but you can provide multiple servers for failover.


    You can specify a default IDOL Speech Server to use for all speaker identification tasks by setting the SpeakerIdServers parameter in the [Resources] section of the Media Server configuration file.

    TemplateSet (Optional) The path to the audio template set file (.ivs file) to use for speaker identification. You must create this file. Specify the path relative to the directory defined by the SpeakerIDDir parameter in the IDOL Speech Server configuration file. If you do not set this parameter Speech Server cannot identify speakers, but can divide the audio into different speakers and detect the gender of each speaker.
    SampleFrequency (Optional) The sample frequency of the audio to send to the IDOL Speech Server for analysis, in samples per second (Hz). IDOL Speech Server accepts audio at either 8000Hz or 16000Hz.

    For example:


    For more information about the parameters that you can use to configure this task, refer to the Media Server Reference.

  4. Save and close the configuration file. HPE recommends that you save your configuration files in the location specified by the ConfigDirectory parameter.