ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC

Project Details
Funding Scheme : Early Career Scheme
Project Number : 859713
Project Title(English) : Linguistic Analysis of Mid-20th Century Hong Kong Cantonese by Constructing an Annotated Spoken Corpus 
Project Title(Chinese) : 20世紀中葉香港粵語的語料庫建構及研究 
Principal Investigator(English) : Dr Chin, Chi On 
Principal Investigator(Chinese) :  
Department : Department of Linguistics and Modern Language Studies
Institution : The Education University of Hong Kong
E-mail Address : andychin@eduhk.hk 
Tel : 29487780 
Co - Investigator(s) :
Panel : Humanities, Social Sciences
Subject Area : Psychology and Linguistics
Exercise Year : 2013 / 14
Fund Approved : 757,000
Project Status : Completed
Completion Date : 30-11-2016
Project Objectives :
To construct an annotated corpus of mid-20th century Cantonese with data drawn from the dialogs of 50 selected Cantonese movies produced in Hong Kong during 1950 and 1970;
To identify, on the basis of the corpus data, some major and salient characteristics of lexical items (and their usage), syntax, semantics and discourse in mid-20th century Cantonese which are significantly different from contemporary Cantonese and 19th century Cantonese;
To explore the possible factors and mechanisms (language internal, language external, paths of change) for the changes or the emergence of new linguistic features;
Abstract as per original application
(English/Chinese):

傳統粵語方言學研究主要以田野調查為手段,通過對年長的發音人來了解有關方言的面貌。過去二十多年,不少十九世紀至二十世紀初用粵語編寫的方言材料陸續被發現,粵語歷時研究成為了另一個研究重點。這些早期方言材料反映二十世紀中葉前後,粵語出現不少重要轉變,當時的粵語很有可能同時存在變化前後的語言特徵。另一方面,現今距離二十世紀中葉已有六十多年,粵語應當有不少的變化,然而很少研究探討二十世紀中葉時期的粵語,而且當時的粵語材料,相比十九世紀時代,也十分少。 我們認為單憑和年長的發音人做訪問調查並不能夠全面地和準確地了解早期語言的面貌。本研究項目擬運用「實時方法」(real time approach),搜集當時實實在在使用過的語言材料。 本項目的語料會以香港五六十年代粵語長片的對白為基礎。本項目會把五十套電影(包括不同體裁,內容,粵劇除外)的對白記錄下來,進行初步處理後,如分詞和注音,並建構成語料庫。語料庫將會配以檢索功能,並在網上分享,好讓其他有興趣的學者使用。 通過這些語料,我們可以分析半世紀前香港粵語在詞匯,語義和句法等各範疇的面貌,同時也可以跟十九世紀粵語和現代粵語進行比較,對粵語過去約二百年的發展,有進一步的了解。除了語言學研究,這些語料也可以用來探討語言和香港社會文化的關係。
Realisation of objectives:
Summary of objectives addressed:
Objectives Addressed Percentage achieved
1.To construct an annotated corpus of mid-20th century Cantonese with data drawn from the dialogs of 50 selected Cantonese movies produced in Hong Kong during 1950 and 1970;Yes100%
2.To identify, on the basis of the corpus data, some major and salient characteristics of lexical items (and their usage), syntax, semantics and discourse in mid-20th century Cantonese which are significantly different from contemporary Cantonese and 19th century Cantonese;Yes95%
3.To explore the possible factors and mechanisms (language internal, language external, paths of change) for the changes or the emergence of new linguistic features;Yes90%
Research Outcome
Major findings and research outcome: This project adopted a corpus-based approach to study the diachronic development of Hong Kong Cantonese. The corpus data was drawn from the dialogues of 60 Cantonese movies (also known as 粵語長片) produced between 1940 and 1970. A total of 61 hours of speech data was transcribed. The corpus has 764k character tokens and 5200+ word types and 376k word tokens [objective 1]. With this amount of corpus data, we are able to carry out a number of studies on the Cantonese language. There are two types of research that can be carried out with this set of corpus data. 1. Examining the linguistic features of Cantonese in the mid-20th century (objectives 2 and 3). One of the output of the project is a paper on the verbal aspect 着 as in the sentences like 好揀唔揀,揀着個爛燈盞. This aspect marker is rarely used in contemporary and is often left out in the study of Cantonese aspectual system. The movie dialogues allowed us to extract not just the sentences containing this aspect marker, but also the context in which it is used. These allow us to have a better picture about the usage of the marker concerned (refer to the joint paper by Lai Yik Po and the PI). 2. The interactive nature of corpus data can allow us to conduct studies from the discourse and sociolinguistic perspectives. One of the studies is on classifiers. Previous studies on Cantonese classifiers focus on their usage in terms of describing the dimensions of the objects, and the properties of definiteness. There is one usage of Cantonese classifiers which was not studied before - the co-referential usage of classifiers. One example is 你個衰女 in which 你 is co-referential with 衰女 meaning that "you, the bad daughter" instead of "your bad daughter". The joint paper by the PI and his RA (Crono Tse) examined the similarities and differences of this classifier usage and those of the possessive usage. There are a number of discourse factors involved in this usage which are not found in the possessive usage. The recent paper by the PI and his RA (Ou Lili) examines the kinship terms found in the corpus and how the development of these terms can reflect the societal development and family structure of mid-20th century Hong Kong. Another paper by the PI is on the rhetorical function of xiehouyun 歇後語, which is like riddles. Through the examination of xiehouyu used in the movie, we explored some deeper issues related to human communication such as why people only express the riddle (謎面) but not the answer (謎底) which in fact is the intended meaning of the speaker. That paper attempted to explore the explore this issue in terms of rhetorical function (修辭功能). The project has also attracted media's attention. As listed in the Section C, the PI was invited to introduce the corpus as well as the research outcomes and Cantonese studies. Overall speaking, the project has achieved the three objectives satisfactorily and successfully.
Potential for further development of the research
and the proposed course of action:
The data of this project does not only provide linguistic data for Cantonese linguistics studies. It also serves as a basis for developing other resources for the area of Digtial Humanities. Besides traditional keyword search, the new corpus also incorporates theasurus information from 《實用廣州話分類詞典》compiled by Mai Yun and Tan Bukang. The theasurus data provides more information to examine words with semantic relatedness. For example, searching the word 爸爸 will also yield the results of 爹哋, 老竇, etc. Other relevant kinship terms such as 媽媽, 仔女, etc. will also be retrieved and presented for users' reference. This kind of ontological and lexical semantic information is important for studying words that can bears cultural elements. For example, the transportation tools of mid-20th century differ significantly from contemporary. This search algorithm can capture this kind of information. At present, no such ontological tools have been developed for the Cantonese language. The corpus data can thus serve as a basis for this new direction of research in Cantonese linguistics which has been focusing on the language internal system.
Layman's Summary of
Completion Report:
This project took on a real-time and corpus-based approach to study the linguistic change and situation of Hong Kong Cantonese in the mid-20th century. The data of the project came from one of the most valuable cultural assests - early Cantonese movies. Dialogues of 60 Cantonese movies produced between 1940 and 1970 were transcribed to produce a textual database which can help us to reconstruct the Cantonese language of six decades ago. The data can also help us to trace the changes of the language since then. In addition, we can also look at some socio-cultural issues through the language used in the period concerned. Kinship terms are one of the examples. At the same time, the interactive dialogic nature of the corpus data also allows us to look at the use of Cantonese at the pragmatic and discourse levels. This project can be seen as a timely response to the First Inventory of Intangible Cultural Heritage of Hong Kong released by the Hong Kong SAR Government in June 2014. Mid-20th century was a critical period in Hong Kong's history during which a large number of immigrants with diverse cultural and language backgrounds came to Hong Kong. Nonetheless, Cantonese remains an important language in Hong Kong today, which is still spoken as the first language by the majority of Hong Kong's population. Placing the project under this historical and sociolinguistic context, the project would not only advance our understanding of the development of Cantonese, but also the inter-relationship between language, society and culture.
Research Output
Peer-reviewed journal publication(s)
arising directly from this research project :
(* denotes the corresponding author)
Year of
Publication
Author(s) Title and Journal/Book Accessible from Institution Repository
黎奕葆, 錢志安  丁邦新先生八秩壽慶論文集 編輯: 何大安、姚玉敏、陳忠敏、孫景濤、張洪年  No 
錢志安  粵語(四字)歇後語的修辭研究, 《粵語研究》  No 
  Initiatives of Digital Humanities in Cantonese Studies: A Corpus of Mid-20th Century Hong Kong Cantonese The paper was submitted to the volume 'Digital Humanities and New Ways of Teaching' in the book series on Digital Culture and Humanities to be published by Springer.  No 
Recognized international conference(s)
in which paper(s) related to this research
project was/were delivered :
Month/Year/City Title Conference Name
12/2015/Hong Kong 從《香港二十世紀中期粵語語料庫》探討粵語多元研究  第二十屆國際粵方言研討會 
12/2015/Hong Kong A Corpus of Mid-20th Century Hong Kong Cantonese  Conference on Digital Humanities 2015 
12/2015/Taipei 粵語數位研究/Digital Research in Cantonese  第六屆數位典藏與數位人文國際研討會 
5/2015/Hong Kong 粵語(四字)歇後語的修辭研究  中國南方語言四音節慣用語研討會 
4/2015/Hong Kong 粵語「名-量-名」結構的 同指用法  第十五屆粵語討論會 
2017 從香港粵語親屬稱謂詞窺看香港二十世紀中期家庭文化的演變  第十四屆青年學者國際學術研討會 
Other impact
(e.g. award of patents or prizes,
collaboration with other research institutions,
technology transfer, etc.):
From January to November 2014, the PI was invited by RTHK to contribute a weekly 5-minute interview at the programme 《晨光第一線》every Friday. There were altogether 41 sessions. The topics were about language, culture and society. Some of the data was taken from the PI's corpus. For the details of the topics, see https://sites.google.com/site/hkseattle/media-interview. The PI was interviewed by 《灼見名家》on his research on Cantonese linguistics. Specifically, the corpus construction and related research issues were discussed in the interview. URL of the interview (released in May 2017): 《教大學者醉心粵語研究 以懷舊電影創建網上語料庫》(http://www.master-insight.com/%E6%95%99%E5%A4%A7%E5%AD%B8%E8%80%85%E9%86%89%E5%BF%83%E7%B2%B5%E8%AA%9E%E7%A0%94%E7%A9%B6%E3%80%80%E4%BB%A5%E6%87%B7%E8%88%8A%E9%9B%BB%E5%BD%B1%E5%89%B5%E5%BB%BA%E7%B6%B2%E4%B8%8A%E8%AA%9E%E6%96%99/) The PI was invited by TVB (August 2017) to appear in two episodes of a programme 《粵講粵㜺鬼》introducing some old features of Cantonese based on the corpus data. https://www.youtube.com/watch?v=ZqOVfb4dw80 https://www.youtube.com/watch?v=-TZnW2G04as
Realisation of the education plan:

  SCREEN ID: SCRRM00542