Project Details |
Funding Scheme : | Early Career Scheme | |||||||||||||||||||||
Project Number : | 859713 | |||||||||||||||||||||
Project Title(English) : | Linguistic Analysis of Mid-20th Century Hong Kong Cantonese by Constructing an Annotated Spoken Corpus | |||||||||||||||||||||
Project Title(Chinese) : | 20世紀中葉香港粵語的語料庫建構及研究 | |||||||||||||||||||||
Principal Investigator(English) : | Dr Chin, Chi On | |||||||||||||||||||||
Principal Investigator(Chinese) : | ||||||||||||||||||||||
Department : | ||||||||||||||||||||||
Institution : | The Education University of Hong Kong | |||||||||||||||||||||
E-mail Address : | andychin@eduhk.hk | |||||||||||||||||||||
Tel : | 29487780 | |||||||||||||||||||||
Co - Investigator(s) : |
Panel : | Humanities, Social Sciences | |||||||||||||||||||||
Subject Area : | Psychology and Linguistics | |||||||||||||||||||||
Exercise Year : | 2013 / 14 | |||||||||||||||||||||
Fund Approved : | 757,000 | |||||||||||||||||||||
Project Status : | Completed | |||||||||||||||||||||
Completion Date : | 30-11-2016 | |||||||||||||||||||||
Project Objectives : |
Abstract as per original application (English/Chinese): |
傳統粵語方言學研究主要以田野調查為手段,通過對年長的發音人來了解有關方言的面貌。過去二十多年,不少十九世紀至二十世紀初用粵語編寫的方言材料陸續被發現,粵語歷時研究成為了另一個研究重點。這些早期方言材料反映二十世紀中葉前後,粵語出現不少重要轉變,當時的粵語很有可能同時存在變化前後的語言特徵。另一方面,現今距離二十世紀中葉已有六十多年,粵語應當有不少的變化,然而很少研究探討二十世紀中葉時期的粵語,而且當時的粵語材料,相比十九世紀時代,也十分少。 我們認為單憑和年長的發音人做訪問調查並不能夠全面地和準確地了解早期語言的面貌。本研究項目擬運用「實時方法」(real time approach),搜集當時實實在在使用過的語言材料。 本項目的語料會以香港五六十年代粵語長片的對白為基礎。本項目會把五十套電影(包括不同體裁,內容,粵劇除外)的對白記錄下來,進行初步處理後,如分詞和注音,並建構成語料庫。語料庫將會配以檢索功能,並在網上分享,好讓其他有興趣的學者使用。 通過這些語料,我們可以分析半世紀前香港粵語在詞匯,語義和句法等各範疇的面貌,同時也可以跟十九世紀粵語和現代粵語進行比較,對粵語過去約二百年的發展,有進一步的了解。除了語言學研究,這些語料也可以用來探討語言和香港社會文化的關係。 |
Realisation of objectives: | ||||||||||||||||||||||
Summary of objectives addressed: |
Research Outcome | ||||||||||||||||||||||
Major findings and research outcome: | This project adopted a corpus-based approach to study the diachronic development of Hong Kong Cantonese. The corpus data was drawn from the dialogues of 60 Cantonese movies (also known as 粵語長片) produced between 1940 and 1970. A total of 61 hours of speech data was transcribed. The corpus has 764k character tokens and 5200+ word types and 376k word tokens [objective 1]. With this amount of corpus data, we are able to carry out a number of studies on the Cantonese language. There are two types of research that can be carried out with this set of corpus data. 1. Examining the linguistic features of Cantonese in the mid-20th century (objectives 2 and 3). One of the output of the project is a paper on the verbal aspect 着 as in the sentences like 好揀唔揀,揀着個爛燈盞. This aspect marker is rarely used in contemporary and is often left out in the study of Cantonese aspectual system. The movie dialogues allowed us to extract not just the sentences containing this aspect marker, but also the context in which it is used. These allow us to have a better picture about the usage of the marker concerned (refer to the joint paper by Lai Yik Po and the PI). 2. The interactive nature of corpus data can allow us to conduct studies from the discourse and sociolinguistic perspectives. One of the studies is on classifiers. Previous studies on Cantonese classifiers focus on their usage in terms of describing the dimensions of the objects, and the properties of definiteness. There is one usage of Cantonese classifiers which was not studied before - the co-referential usage of classifiers. One example is 你個衰女 in which 你 is co-referential with 衰女 meaning that "you, the bad daughter" instead of "your bad daughter". The joint paper by the PI and his RA (Crono Tse) examined the similarities and differences of this classifier usage and those of the possessive usage. There are a number of discourse factors involved in this usage which are not found in the possessive usage. The recent paper by the PI and his RA (Ou Lili) examines the kinship terms found in the corpus and how the development of these terms can reflect the societal development and family structure of mid-20th century Hong Kong. Another paper by the PI is on the rhetorical function of xiehouyun 歇後語, which is like riddles. Through the examination of xiehouyu used in the movie, we explored some deeper issues related to human communication such as why people only express the riddle (謎面) but not the answer (謎底) which in fact is the intended meaning of the speaker. That paper attempted to explore the explore this issue in terms of rhetorical function (修辭功能). The project has also attracted media's attention. As listed in the Section C, the PI was invited to introduce the corpus as well as the research outcomes and Cantonese studies. Overall speaking, the project has achieved the three objectives satisfactorily and successfully. | |||||||||||||||||||||
Potential for further development of the research and the proposed course of action: |
The data of this project does not only provide linguistic data for Cantonese linguistics studies. It also serves as a basis for developing other resources for the area of Digtial Humanities. Besides traditional keyword search, the new corpus also incorporates theasurus information from 《實用廣州話分類詞典》compiled by Mai Yun and Tan Bukang. The theasurus data provides more information to examine words with semantic relatedness. For example, searching the word 爸爸 will also yield the results of 爹哋, 老竇, etc. Other relevant kinship terms such as 媽媽, 仔女, etc. will also be retrieved and presented for users' reference. This kind of ontological and lexical semantic information is important for studying words that can bears cultural elements. For example, the transportation tools of mid-20th century differ significantly from contemporary. This search algorithm can capture this kind of information. At present, no such ontological tools have been developed for the Cantonese language. The corpus data can thus serve as a basis for this new direction of research in Cantonese linguistics which has been focusing on the language internal system. | |||||||||||||||||||||
Layman's Summary of Completion Report: | This project took on a real-time and corpus-based approach to study the linguistic change and situation of Hong Kong Cantonese in the mid-20th century. The data of the project came from one of the most valuable cultural assests - early Cantonese movies. Dialogues of 60 Cantonese movies produced between 1940 and 1970 were transcribed to produce a textual database which can help us to reconstruct the Cantonese language of six decades ago. The data can also help us to trace the changes of the language since then. In addition, we can also look at some socio-cultural issues through the language used in the period concerned. Kinship terms are one of the examples. At the same time, the interactive dialogic nature of the corpus data also allows us to look at the use of Cantonese at the pragmatic and discourse levels. This project can be seen as a timely response to the First Inventory of Intangible Cultural Heritage of Hong Kong released by the Hong Kong SAR Government in June 2014. Mid-20th century was a critical period in Hong Kong's history during which a large number of immigrants with diverse cultural and language backgrounds came to Hong Kong. Nonetheless, Cantonese remains an important language in Hong Kong today, which is still spoken as the first language by the majority of Hong Kong's population. Placing the project under this historical and sociolinguistic context, the project would not only advance our understanding of the development of Cantonese, but also the inter-relationship between language, society and culture. | |||||||||||||||||||||
Research Output | ||||||||||||||||||||||
Peer-reviewed journal publication(s) arising directly from this research project : (* denotes the corresponding author) |
Recognized international conference(s) in which paper(s) related to this research project was/were delivered : |
Other impact (e.g. award of patents or prizes, collaboration with other research institutions, technology transfer, etc.): |
From January to November 2014, the PI was invited by RTHK to contribute a weekly 5-minute interview at the programme 《晨光第一線》every Friday. There were altogether 41 sessions. The topics were about language, culture and society. Some of the data was taken from the PI's corpus. For the details of the topics, see https://sites.google.com/site/hkseattle/media-interview. The PI was interviewed by 《灼見名家》on his research on Cantonese linguistics. Specifically, the corpus construction and related research issues were discussed in the interview. URL of the interview (released in May 2017): 《教大學者醉心粵語研究 以懷舊電影創建網上語料庫》(http://www.master-insight.com/%E6%95%99%E5%A4%A7%E5%AD%B8%E8%80%85%E9%86%89%E5%BF%83%E7%B2%B5%E8%AA%9E%E7%A0%94%E7%A9%B6%E3%80%80%E4%BB%A5%E6%87%B7%E8%88%8A%E9%9B%BB%E5%BD%B1%E5%89%B5%E5%BB%BA%E7%B6%B2%E4%B8%8A%E8%AA%9E%E6%96%99/) The PI was invited by TVB (August 2017) to appear in two episodes of a programme 《粵講粵㜺鬼》introducing some old features of Cantonese based on the corpus data. https://www.youtube.com/watch?v=ZqOVfb4dw80 https://www.youtube.com/watch?v=-TZnW2G04as | |||||||||||||||||||||
Realisation of the education plan: |