Acoustics Engineering - Speech Intelligibility

Speech

Speech is our primary method of communication. It is therefore important that uttered speech is received intelligibly. The intelligibility of speech depends (in part) on the acoustical properties of the enclosure in which the speech is transmitted from speaker to listener. Another important factor determining the speech intelligibility is the background noise level.
Although there have been many attempts to objectively quantify the speech intelligiblity, the most widely used parameter is no doubt the Speech Transmission Index (STI) and its derivatives. The STI is based on the relation between perceived speech intelligibility and the intensity modulations in the talker's voice, as described by Houtgast, Steeneken and Plomp. ¹⁾
The STI method is described in the IEC 60268-16 standard.

When a sound source in a room is producing noise that is intensity modulated by a low frequency sinusoidal modulation of 100% depth, the modulation at the receiver position will be reduced due to room reflections and background noise. The Modulation Transfer Function (MTF) describes to what extent the modulation m is transferred from source to receiver, as a function of the modulation frequency F, which ranges from 0.63 to 12.5 Hz. Hence, the MTF depends on the system properties and the background noise.

With the introduction of Dirac 6 and the Echo Speech Source, speech intelligibility measurements can be performed very quickly and easily. The Echo delivers a calibrated signal that is used by Dirac to calculate different speech intelligibility parameters. The Echo / Dirac combination also performs well with high levels of background noise. It is therefore no longer necessary to work with external equalizers or to set the output level. Quite often, the difficulty of emitting a signal with the correct speech spectrum and level, meant that shortcuts were taken which led to questionable results.

¹⁾ T. Houtgast, H.J.M. Steeneken and R. Plomp, 'Predicting Speech Intelligibility in Rooms from the Modulation Transfer Function. I. General Room Acoustics,' Acustica 46, 60 - 72 (1980).

Download

You can download our speech intelligibility technical note as a pdf.

Download

Modulated noise versus impulse responses

The work of Houtgast and Steeneken is based on a STI method using modulated noise as a test signal. Although Schroeder²⁾ has shown that the MTF can also be calculated from the impulse response, handheld (STIPA) meters use modulated noise as a test signal, whereas most PC based simulation and measurement software uses the impulse response method. This is because movement of the handheld device during a measurement would make impulse response measurements unsuitable. In general it can be said that the modulated noise method is somewhat more resilient to non-linear and time-varying systems, whereas the impulse response method is faster and provides more information.

²⁾ M.R. Schroeder, 'Modulation Transfer Functions: Definition and Measurement,' Acustica 49, 179 - 182 (1981).

STI and STIPA

The STIPA parameter features prominently in the latest edition of IEC 60268-16. It is described as the preferred parameter for almost all measurement situations. However, the STIPA was originally conceived much like the RASTI as a way to estimate the STI within a reasonable time. Traditionally the STIPA is measured with modulated noise. A full STI measurement with modulated noise would take at least 15 minutes. However, by using the impulse response approach the STI can be measured just as fast as the STIPA. The STI having many more modulation frequencies in each octave band, has far greater diagnostic power than the STIPA.

A short history of the STI method

1971: The first mention of the STI (Speech Transmission Index, a measure of speech intelligibility) in an article in Acustica by Tammo Houtgast and Herman Steeneken.
1981: Manfred Schroeder writes an article in Acustica in which he showed that the modulation transfer function (MTF) can be derived from an impulse response.
1985: B&K introduces the Rapid Speech Transmission Index Meter 3361.
1988: The STI method and specifically the RASTI parameter are described in the IEC 60268-16 standard. Subsequent revisions of the standard have added i.a. gender specific test signals, redundancy factors and level dependent masking.
2003: The STIPA parameter is added to the revision 3 of the IEC 60268-16 standard.
2011: IEC 60268-16 Ed 4.0 is adopted.

Echo stimuli

MLS versus sweep
The Echo Speech Source contains a male speech signal to help adjust the volume of PA systems. For STI measurements it contains pink MLS signals. Normally this would present a serious problem because the slightest difference in clock rate between source and receiver would make it impossible to properly extract the impulse response. Dirac however has been able to handle asynchronous MLS signals since version 4. The advantage of MLS as opposed to an e-sweep signal is that the MLS is far less intrusive. Also, because in sweeps all the energy is always concentrated at a single frequency, it is more difficult for amplifiers and speakers to handle this type of signal.

Intermittent stimulus
New in Dirac 6 and also used in the Echo is the intermittent stimulus. With the standard impulse response technique it is difficult to measure a high quality impulse response and at the same time retrieve an accurate (background) noise level from this measurement. The new intermittent stimulus consists of an MLS sequence followed by an equally long period of silence. The full stimulus (MLS + silence) is measured in one pass, and Dirac extracts the impulse response and the background noise into two separate channels of a .wav file. This stimulus allows you to increase the output level and perform pre-averaging to improve the INR of the impulse response in speech intelligibility measurements, while still retaining accurate noise values.

The Echo signals
One of the signals in the Echo is a speech fragment that can be used to set the volume of a PA system to a ‘normal’ level. The speech signal has a standard level of 60 dB(A), and cannot be used for speech intelligibility measurements. When the background noise level is relatively low and/or the reverberation time is relatively long (SNR * RT > 120 dBs), a simple continuous MLS stimulus can be used. This signal is available at 60 dB(A) and at a raised level of 72 dB(A). Note that the MLS sequences are coded such that Dirac can always determine the output level, and correct the STI calculations appropriately. For scenarios where the background noise has a significant impact (SNR * RT < 120 dBs), an intermittent MLS signal is available both at 60 dB(A) and 72 dB(A). The signals generated by the Echo can also be injected directly into a PA system using the BNC output connector. The electrical output always operates at the same level. You can also play the Echo signals via the PA system from a CD or MP3 player. For this purpose we have made the Echo signal available as a separate download.

ISO 3382-3 open plan offices

For speech intelligibility measurements in open plan offices, the ISO 3382-3 suggests you perform 4 measurements at each position. With Dirac 6 you only need a single (system- and level-calibrated) measurement per position. Using the new intermittent stimulus, the speech signal levels, the impulse response and the background noise levels can be acquired in a single pass. Plots of the STI versus the source-receiver distance can be created with a few mouse clicks. You perform the minimum amount of measurements and DIrac will give you L_P,A,S, L_P,A,S,4, L_P,A,B, D_2,S, STI, r_D and r_P.

The ISO 3382-3 standard prescribes the use of an omnidirectional sound source such as the OmniPower 4292-L. Also, the STI that is used is a little diferent from the STI defined in IEC 60268-16, in that the auditory masking and hearing threshold corrections are not used. Dirac contains separate STI parameters for ISO 3382-3 and IEC 60268-16.
In some cases it may be useful to use a directional source for ISO 3382-3 measurements. The Echo speech source can now be used for this purpose as explained in this blog post.

The video below will show you how open plan office measurements are prepared, performed and analyzed.

MTF graphs

The modulation transfer function (MTF) displays the modulation depth as a function of the modulation frequency. It is an intermediate result obtained during the calculation of the STI and related parameters. However, it is also an important diagnostic tool to investigate the causes of poor speech intelligibility.
An MTF that is constant over the modulation frequencies indicates that the speech intelligibility is mainly determined by background noise. A continuously decreasing MTF indicates the influence of reverberation and an MTF that decreases first and then increases again indicates the presence of an echo.

Traditional modulated noise based speech intelligibility measurements contain a limited number of modulation frequencies. This means that many problems can remain hidden in the coarsely sampled modulation spectrum. Impulse responses contain the full spectrum of modulation frequencies, and Dirac 6 now has the ability to show them in continuous MTF graphs.