Objective Speech Quality Measurement for Chinese Speech.
Fong Loong Choong, Masters Student
Dept of CSSE, University of Canterbury
Fri Jul 08 15:10:00 NZST 2005 in Room 031 MSCS
Abstract
Objective Speech Quality Measurement (OSQM) systems have been found to provide high accuracy in measuring the quality of speech output from sound processing systems like codecs and telecom systems for English and some other European languages. However, the quality of sound systems used to process Chinese speech has not been adequately investigated to date. In order to accurately measure speech quality, speech intelligibility must first be optimised so that this attribute will not influence the measurement. While intelligibility can be high for sound processing systems in English or some European languages, this may not be true for Chinese speech due to two of its unique phonetic characteristics: the consonant-vowel-consonant (CVC) structure and use of tones. Each of these two characteristics can affect intelligibility of Chinese speech. The intelligibility issue that is related to the CVC structure is called consonantal intelligibility while that from the use of tones is known as tonal intelligibility. The degradation in these two intelligibility types may not be taken into account by the OSQM systems and therefore result in an inaccurate quality rating. The first purpose of this thesis was to evaluate OSQM systems to investigate whether they regarded the degradation in Chinese speech intelligibility in their computation of an objective quality score. After evaluating the OSQM systems, it was found that correlation between both consonant and tonal intelligibility, and quality is low. To resolve the problem of a low correlation between consonant intelligibility and quality, the second purpose of this thesis was to expose or magnify the discrepancies that cause intelligibility degradations so as to improve the OSQM systems' sensitivity toward consonantal intelligibility. Two methods namely high pass filtering and consonant amplification were proposed for improvement. While both methods yielded improvements, it was concluded that the consonant amplification method is more effective than high pass filtering such that it yielded a better correlation. Having in mind the enormous Chinese population of over one billion, the results of this research can find many industrial applications in electronic and computer industries, providing reliable tools for benchmarking sound processing systems for Chinese customers.View past or future seminars; or view the CSSESS Home Page.