Cantonese Lexicographical Database Project


Owen Nancarrow

The aims of this project are: to make a collection of tape recordings of Cantonese speech, to build an archive of Cantonese texts based on transcriptions of these recordings, to provide an English translation of the texts, to construct a corpus of Cantonese words and expressions together with their English equivalents, and to compile an on-line bilingual dictionary from that database. Linguistic databases are now available for many of the world's languages, including English and written Chinese, but Cantonese does not yet have a comparable pool of reliable data. Our first target is to have a database of 200,000 words. This will then be expanded in stages towards the one million word final target. Much work has been done especially in recent years on Cantonese grammar and lexis. It is hoped that further research in the language can benefit from a computerized and easily searchable database.