To ahc homepage
To ahc homepage
Login (ssl)   Login (non ssl)    
English > Programme AHC 2005 >  Help   Powered by 

Robust audio indexing for Dutch spoken-word collections

Show printer-friendly view Print View         
R. Ordelman


Digital audio and video resources are increasingly becoming a natural
part of our every day life. Due the rapidly declining costs of
recording and preservation technology, we possess a growing variety of
spoken-word collections, set-up by professionals at organisations and
non-professionals at home. However, much of the rich content in these
collections risk to remain inaccessible for lack of robust search
technologies. The focus of this paper is on the history and
development of robust audio indexing technology for searching Dutch
spoken-word collections taken up in a series of multimedia retrieval
projects in different domains, including radio and television
broadcasts, governmental proceedings, historical archives, lectures
and meetings. Audio indexing involves topics such as audio
partitioning, keyword spotting, speech recognition, speaker
identification and information extraction. After introducing the
technology, this paper gives an overview of the varying development
requirements and processing steps given the various Dutch collections.
A number of general and language specific technology problems are
addressed in more detail, such as problems with deteriorated audio in
historical archives due to imperfect analogue recording technology,
language model mismatches due to outdated or 'non-native' word usage
and syntax, compounding, and the recognition of unknown words and
proper names. Available audio indexing performance results for
different collections and different technologies are presented and
compared. It is concluded that despite significant advances in Dutch
indexing technology and demonstrated applicability in several domains,
many general and language specific issues call for additional
research.

 


Last modified: 16-09-2005 08:48