Loudmouth: Improving Intelligibility of Text-to-Speech Synthesizers in the Presence of Noise
Text-to-speech synthesis technology has become increasingly popular with advancements to information access via telephone such as automated banking, directory assistance, and flight reservations. Despite the proliferation of applications using text-to-speech synthesis, little has been done to improve speech intelligibility in everyday noise situations. Increasing the volume alone is not sufficient to achieve acceptable levels of intelligibility. More often, increasing the volume distorts the signal and actually degrades the overall intelligibility of synthesized speech. Although there is a significant body of work on how humans modify their speech in the presence of noise, these results have yet to be implemented in synthesized speech. Algorithms capable of processing and incorporating these modifications may lead to improved speech intelligibility of spoken dialogue systems.
We will present our efforts in building the Loudmouth speech synthesizer which emulates human modifications to speech in noise. Specifically, we modified the pitch, loudness and duration of salient words relative to non-salient words and then assessed the resultant improvement in intelligibility in the presence of background noise. We will demonstrate the current implementation and discuss the finidngs of the perceptual experiment which compared the intelligibility of the Loudmouth synthesizer to that of a standard synthesizer.