Luo, Yangchun
Visual Intelligence Lab
Department of Computer Science
Harbin Institute of Technology
July 10, 2007
Computer lip reading is of great importance and significance in the area of voice recognition, human-machine intelligent interface, multimedia system and human face compression. Lip movement detection, as the initial stage of any complete lip reading system, its accuracy is essential to the performance of the whole system. Furthermore, the importance of real-time lip movement detection is that with its refined algorithm which is much faster, lip movement detection can be deployed onto systems that are incapable to handle before. The image database system, remote video conference system and security check system will benefit a lot from it.
In this paper, we present refinements to traditional algorithms, such as the chromatic-based method and template-matching method. We then apply the strategy of combining Harr-like feature and RealBoost algorithm to lip movement detection. And we also bring optimization policy to several key local algorithms. Secondly, as for the traditional linear cascade structure, this paper introduces the two-dimension matrix cascade structure, which benefit not only the capability of processing large training set, but also actual detection rate and speed. After lip detection, the Kalman Filter which is very mature and stable is adopted as the tracking algorithm. The performance of our lip movement detection improves a lot, up to the real-time level.
Based on those algorithms, we build an application system on Win32 platform with MFC and OpenCV library. This system contains the implementation of the refined algorithms, the replaceable detection-tracking framework and graphic user interface. Especially the replaceable framework built in our system is transparent to specific algorithms, thus very flexible and agile.
Keywords lip reading; lip movement detection and tracking; Harr-like feature; RealBoost; 2D cascade
Snapshots are taken in every five frames.
