Robust encoded speech recognition over IP networks

Abstract

In this paper the robustness of Network Speech Recogni- tion (NSR) systems is analyzed. In NSR the speech signal is transmitted using a conventional speech codec from the client to the server, where the recognition task is carried out. The use of speech codecs degrades its performance, mainly in the presence of acoustic noise and packet losses. First, we study the importance of possible degradation sources in these systems. Later, we propose a new NSR solution based on a more robust feature extractor and an efficient packet loss concealment (PLC) algorithm, which compensates the possible degradations by means of a cepstral compensation and linear interpolation. The experimental results are ob- tained for a well-known speech codec, AMR 12.2 kbps, using a noisy database (test A of Aurora-2) and several packet loss conditions. The results show that our proposal achieves no- ticeable improvements over baseline results.