PolyU Corpus of Spoken Chinese

v1.3 (released on 1 January 2015)

This corpus is a set of audio-recordings of conversational exchanges in Chinese between interviewers and interviewees discussing a wide range of subjects, including travel talk, and life experiences. There are presently 28 transcripts. These transcripts are rendered in Chinese characters.


The creation of this corpus was made possible by the following grants (PI: Dr Foong Ha YAP):

"Stance Marking in Asian Languages: Linguistic and Cultural Perspectives" (RGC GRF Grant 2010-2013, PolyU 5513/10H)

"Non-referential Uses of Nominalization Constructions: Asian Perspectives" (HKPU Internal Grant, 2010-2013, HKPU 1-ZV6W)

"Establishing Common Ground in Public Discourse: An Analysis of Electoral Speeches, Press Conferences and Q&A Sessions in Hong Kong" (PolyU ICRG, 2012-2014, HKPU G-YK85)

 To read the transcriptions, you will need Adobe Reader. To decompress a batch of transcriptions, you will need 7-Zip.

We are carrying on updating this corpus. More data will be uploaded in future release.

Suggestions, feedback, queries and comments are welcome and can be directed to Wong Tak-sum at wong_taksum@hotmail.com .

 We recommend that you cite our corpus as:                                                

PolyU Corpus of Spoken Chinese, Department of English, Hong Kong Polytechnic University, Modified 4 June 2015, Retrieved DATE, from <http://asianlang.engl.polyu.edu.hk/> .


Click here to search this corpus

Cantonese Discourse Data

ID# of Participant


Travel Pictures

Ritual Pictures

Free conversation

Informant 1

Sound Track


Sound Track


Sound Track


Sound Track (4'33")


Informant 4

Sound Track


Sound Track


Sound Track


Sound Track (2'09")


Informant 6

Sound Track


Sound Track


Sound Track


Sound Track (2'44")


Informant 13

Sound Track


Sound Track


Sound Track


Sound Track (5'14")


Informant 14

Sound Track


Sound Track


Sound Track


Sound Track (5'24")


Informant 15

Sound Track


Sound Track


Sound Track


Sound Track (8'59")


Informant 16

Sound Track (9'56") Transcript Sound Track (12'38") Transcript Sound Track (17'37") Transcript

Sound Track (5'32")


Informant 17

Sound Track (11'14")   Sound Track (16'35")   Sound Track (20'34")  

Sound Track (9'7")


Informant 18

Sound Track (8'41") Transcript Sound Track (14'1") Transcript Sound Track (21'49") Transcript

Sound Track (4'31")


Informant 19

Sound Track (7'39") Transcript Sound Track (17'46") Transcript Sound Track (30'23") Transcript

Sound Track (8'28")






All informants (4h 5'35")

Sound Track (37'30") All transcripts Sound Track (1h 1') All transcripts Sound Track (1h 30'23") All transcripts

Sound Track (56’41”)

All transcripts

Cantonese Debates Hosted by RTHK


Geographical Constituency Areas concerned



18 Aug 2012

Hong Kong Island

1h 6’43”


19 Aug 2012

Kowloon East



25 Aug 2012

New Territories East

1h 7'04"


26 Aug 2012

Kowloon West



01 Sept 2012

New Territories West

1h 7'03"


02 Sept 2012

District Council (Second)



Total time duration

5h 35’08”

All transcripts

Note: Given that a Chinese character may correspond to more than one morpheme and have more than one pronunciation, sometimes there is no one-to-one correspondence between a Chinese character and its pronunciation. Jyutping romanization of the character is thus tagged when there is potential ambiguity.

Mandarin Discourse Data

ID# of Participant


Travel Pictures

Ritual Pictures

Free conversation


Sound Track


Sound Track (11'39")

Transcript Sound Track Transcript Sound Track Transcript


Sound Track (4'41")


Sound Track (13'35")


Sound Track (6'36")


Sound Track (3'28")



Sound Track 7.50


Sound Track (12'15")


Sound Track


Sound Track



Sound Track (5'26")


Sound Track (9'45")


Sound Track (6'55")


Sound Track (2'34")



Sound Track


Sound Track (14'49")


Sound Track


Sound Track



Sound Track


Sound Track (4'25")


Sound Track


Sound Track



Sound Track


Sound Track (10'19")


Sound Track


Sound Track



Sound Track 6.12


Sound Track (10'38")


Sound Track


Sound Track



Sound Track (10'17")


Sound Track (21'34")

Transcript Sound Track (9'13") Transcript

Sound Track (10'45")

IE_18 Sound Track (11'9") Transcript Sound Track (8'44") Transcript Sound Track (3'57") Transcript    
IE_19 Sound Track (6'1") Transcript Sound Track (9'20") Transcript Sound Track (12'1") Transcript    
IE_20 Sound Track (7'32") Transcript Sound Track (11'2") Transcript Sound Track (17'37") Transcript Sound Track (6'5") Transcript

Sound Track


Sound Track


Sound Track


Sound Track


Sound Track


Sound Track


Sound Track


Sound Track

IE_FC11             Sound Track Transcript

All informants (3h 7'41")

Sound Track (12'37")

All transcripts

Sound Track (2h 43'48")

All transcripts Sound Track (hh hh' ss") All transcripts

Sound Track (11’16”)

All transcripts


List of research publications and presentations that have benefited from data from this corpus


Yap, Foong Ha, Ying Yang and Tak-Sum Wong. (accepted). On the development of sentence final particles (and utterance tags) in Chinese. In The Role of the Left and Right Periphery in Semantic [Studies in Pragmatics Series], Kate Beeching & Ulrich Detges (eds). Bingley, UK: Emerald Publishers.

Yap, Foong Ha, Winnie Chor and Jiao Wang. (2012). On the development of epistemic ‘fear’ markers: An analysis of Mandarin kongpa and Cantonese taipaa. Covert Patterns of Modality, Werner Abraham Elisabeth Leiss (eds), 312-342. Cambridge, UK: Cambridge Scholars.

Conference Presentations

Yang, Ying, Foong Ha Yap and Tak-Sum Wong. (2012). “I am sure but I hedge”: fear expression kongpa as a rhetorical interactive strategy in Mandarin conversation. Paper presented at the Workshop on Epistemicity, Evidentiality and Attitude in Asian Languages: Typological, Diachronic and Discourse Perspectives, Hong Kong Polytechnic University, September 3-5.

Yap, Foong Ha and Winnie Oi-wan Chor. (2012). Epistemic downgrading in Cantonese conversations. Paper presented at the 20th Annual Conference of the International Association of Chinese Linguistics (IACL-20), Hong Kong Polytechnic University, September 3-5.

We would appreciate hearing from you if your publications or conference presentations have made use of or referred to results based on this corpus.


We wish to thank the members the following research team members who worked on different stages of this corpus:


Preparation of Interview Questions:

CHOR Winnie Oi-wan 



CHAN Shuk-ling Ariel 

YANG Ying Vivien 



CHAN Shuk-ling Ariel 

CHAN Yu-kwan Daniel 

CHING Yuk-yin Jessie 

KONG Pui-yu Polly

LAM Chi-fung

MIN Wei Phyllis

SIU Pui-shan Gloria 

TONG Ka-tai Rosanne 

YUNG Hiu-lam Landia 


Transcription Editors:

WONG Tak-sum Sam (Cantonese)

YANG Ying Vivien (Mandarin)


Corpus Website Supervisor:

WONG Tak-sum Sam

Links to Other Corpora and Databases


Ÿ             Early Cantonese Colloquial Texts: A Database

Ÿ             Early Cantonese Tagged Database

Ÿ             The Early Cantonese Bible Database

Ÿ             Late 19th Century (18651894) Cantonese Christian Writings Database

Ÿ             A Linguistic Corpus of Mid-20th Century Hong Kong Cantonese Paper

Ÿ             Hong Kong Cantonese Child Language Corpus (CANCORP)

Ÿ             The Hong Kong Bilingual Child Language Corpus

Ÿ             CHILDES Cantonese HKU-70 Corpus

Ÿ             English Loanwords in Hong Kong Cantonese

Ÿ             Ideophones in African and Asian Languages

Ÿ             Hong Kong Cantonese Corpus (HKCanCor) POS Tagset Paper PyCantonese

Ÿ             The Hong Kong Cantonese Adult Language Corpus (HKCAC) Paper1 Paper2

Ÿ             Cantonese HKU Corpus Report

Ÿ             A Parallel Corpus of Spoken Cantonese and Written Chinese Paper Cantonese Mandarin

Ÿ             Cantonese Chinese Corpus of Oral Narratives (CANON) Paper

Ÿ             Hong Kong Mid-1990s Newspaper Corpus  Paper

Ÿ             PolyU Corpus of Spoken Chinese

Ÿ             CantoneseWaC: Cantonese corpus from the Web

Ÿ             The Malaysia Cantonese Corpus (MYCanCor) Paper

Ÿ             HKU-70 Corpus

Ÿ             CantoMap, the Cantonese MapTask corpus

Ÿ             A corpus of Cantonese verbal comments Paper

Ÿ             A Counselling Corpus in Cantonese Paper

Ÿ             SpiCE: Speech in Cantonese and English Paper


Academia Sinica

Ÿ             Academia Sinica Balanced Corpus of Modern Chinese 19247

Ÿ            Sinica Treebank Version 3.0

Ÿ             A Socio-phonetic Study of Spoken Taiwan Mandarin

Ÿ             Mandarin Topic-oriented Conversation Corpus Paper

Beijing Language and Culture University

Ÿ             Tagged Corpus of People's Daily Paper1

Ÿ             Corpus of Active Written Samples for HSK

Beijing Foreign Studies University

Ÿ             Texts of Recent Chinese (TORCH) 2009

Ÿ             The IMS Open Corpus Workbench

Chilin (Hong Kong) Limited

Ÿ             Linguistic Variations in Chinese Speech Communities (LIVAC) Synchronous Corpus

Chinese Academy of Sciences

Ÿ             SCTB: A Chinese Treebank in Scientific Domain

The Chinese University of Hong Kong

Ÿ             The Tong Corpus

Ÿ             The CUHK Discourse Treebank for Chinese Paper

Communication University of China

Ÿ             Mass Media Language Corpus of Audio and Video Materials

Ÿ             Management System of Broadcast Media Language Corpus

Ÿ             Resources of Neology Research


Ÿ             Chinese Weibo Syntactic Treebank with 50k Sentences

Harbin Institute of Technology

Ÿ             Harbin Institute of Technology - Center for Information Retrieval (HIT-CIR) Chinese Dependency Treebank

Hong Kong Polytechnic University

Ÿ             PolyU Treebank

Lancaster University

Ÿ             CALLHOME Mandarin Chinese Transcripts

Ÿ             The Lancaster Corpus of Mandarin Chinese (LCMC) version 1 version 2

Ministry of Education

Ÿ             Corpus On-line: Balanced Corpus of Modern Chinese in Common Use

National Chengchi University

Ÿ             The NCCU Corpus of Spoken Mandarin

National Taiwan University

Ÿ             Taiwan Corpus of Child Mandarin (TCCM)

Peking University

Ÿ            Center for Chinese Linguistics (CCL) Corpus

Ÿ            Diachronic Retrieval System of Lexical Items of Modern Chinese

Ÿ            The Peking University Multi-view Chinese Treebank

Ÿ            The Peking University Chinese Treebank

Tsinghua Univeristy

Ÿ            Tsinghua Chinese Treebank (TCT)

University of California, Los Angeles

Ÿ            The UCLA Written Chinese Corpus

University of Pennsylvania

Ÿ             The Chinese Treebank Project (Subscription required)

Video Annotation for Speech Technologies

Ÿ             VAST Chinese Speech and Transcripts (Subscription required)

Classical Chinese

Academia Sinica

Ÿ             Corpus of Classical Chinese

Ÿ             Scripta Sinica Database

Ÿ             Academia Sinica Tagged Corpus of Old Chinese

Ÿ             Academia Sinica Tagged Corpus of Middle Chinese

Ÿ             Academia Sinica Tagged Corpus of Early Mandarin Chinese

Beijing Normal University

Ÿ             Ancient Chinese Corpus with Word Sense Annotation 古漢語語義標註語料庫

City University of Hong Kong

Ÿ             CityU Treebank of Classical Chinese Poems

Ÿ             A Dependency Treebank of Chinese Buddhist Texts Paper1 Paper 2

The Chinese University of Hong Kong

Ÿ             Chinese Ancient Texts (CHANT) Database Access via PolyU Library

Ÿ             A Database on the Chu Bamboo Manuscripts of Guodian 郭店楚簡資料庫

Ÿ             走馬樓三國吳簡.嘉禾吏民田家 資料庫

The Hong Kong Institute of Education

Ÿ             A Database of Chinese Buddhist Translation and their Sanskrit Parallels for the Buddhist Chinese Studies

Kyoto University

Ÿ             Classical Chinese Universal Dependencies Treebank treebank visualizer1 2

Ministry of Education

Ÿ            Corpus On-line: Corpus of Classical Chinese

Peking University

Ÿ            Center for Chinese Linguistics (CCL) Corpus

The University of Sheffield

Ÿ            Sheffield Corpus of Chinese for Diachronic Linguistics Study

University of Washington

Ÿ            The Huainanzi Corpus


Academia Sinica

Ÿ             Southern Min Archives: A Database of Historical Change and Language Distribution

Ÿ             Min and Hakka Language Archives

Ÿ             The Texts Database of Folk Songs in Southern Min Dialect preface

Ÿ             iCorpus Mandarin Taiwanese Bilingual Corpus Online System

National Chengchi University

Ÿ             The NCCU Corpus of Spoken Southern Min

National Chung Cheng University

Ÿ             A Spoken Corpus of Taiwan Southern Min

Ÿ             Taiwanese Child Language Corpus (TAICORP)

National Taichung University of Education

Ÿ             Digital Archive Database for Written Taiwanese (DADWT)

Ÿ             Taiwanese Concordancer

Ÿ             Memory of the Written Taiwanese

Other Sinitic Varieties

Ÿ             Wenzhou Spoken Corpus

Ÿ             The NCCU Corpus of Spoken Hakka

Ÿ             Association for Conversation of Hong Kong Indigenous Languages 香港本土語言保育協會

Ÿ             CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition

Spoken English in Hong Kong

Ÿ             Hong Kong Corpus of Spoken English

Ÿ             Hong Kong Corpus of Surveying and Construction Engineering

Ÿ             Hong Kong Engineering Corpus

Ÿ             Hong Kong Financial Services Corpus

Ÿ             Hong Kong Budget Speeches Corpus 1997-2010

Ÿ             Hong Kong Policy Address Speeches Corpus 1997-2009

Other Parallel Corpora

Ÿ             Multilingual and Multimodal English-Mandarin-Cantonese-Japanese Parallel Corpus (Subscription Required)

Ÿ             Parallel Concordancer

Ÿ             TRAD Chinese-French Parallel Text -- Broadcast News

Links to Cantonese References

On-line Dictionaries

Ÿ             香港教育大學粵語自學平台  2020

Ÿ             粵語詞彙研究所  2019

Ÿ             CC-Canto beta 2017

Ÿ             粵典 beta 2014

Ÿ             粵語詞典──開放詞典網 2009

Ÿ             CantoDict 2003

Ÿ             現代標準漢語與粵語對照資料庫 2001

Ÿ             粵語同音字典 1974

Ÿ             Dictionary of Cantonese Slang

Ÿ             Glosbe 英文粵語字典在線

Ÿ             香港常用語詞典 2014

On-line Pronunciation Dictionaries

Ÿ             Cantonese Dictionary 2021

Ÿ             MDBG English to Chinese Dictionary 2021

Ÿ             泛粵大典 2019

Ÿ             Association for Conservation of Hong Kong Indigenous Languages Pronouncing Dictionary 香港本土語言保育協會音字典 2016

Ÿ             音資料集叢 2014

Ÿ             Lexical Items with English Explanations for Fundamental Chinese Learning in Hong Kong Schools 中英對照香港學校中文學習基礎字詞 2008

Ÿ             東東讀音小字典 2003

Ÿ             Chinese Character Database: With Word-formations Phonologically disambiguated According to the Cantonese Dialect 語審音配詞字庫 2003

Ÿ             S. L. Wong's A Chinese Syllabary Pronounced according to the Dialect of Canton 黃錫凌《語韻彙》電子版 1996

Ÿ             語發音詞典

Ÿ             CKC Online Chinese Dictionary 縱橫在線中文字典

Ÿ             CantoneseAID v5.2 粵音檢字法

Printed Dictionaries

Ÿ             虞學圃、溫岐石 1782《江湖尺牘分韻撮要合集》。澳大利亞國家圖書館館藏 韻典網

Ÿ             Morrison, Robert. 1828. Vocabulary of the Canton Dialect 廣東省土話字彙. Macao: East India Company's Press. Reprint

Ÿ             Williams, Sameul Wells. 1856. A Tonic Dictionary of the Chinese Language in the Canton Dialect 英華分韻撮要. Canton: Office of the Chinese Repository. Reprint

Ÿ             Chalmers, John. 1859. An English and Cantonese Pocket-Dictionary 英粵字典. Hong Kong: The London Missionary Society's Press. 1862 1870

Ÿ             Lobscheid, W. 18661868. English and Chinese Dictionary 增訂英華字典. Hong Kong: The Daily Press Office. A-C D-H I-Q

Ÿ             Eitel, E. G. 1877. A Chinese-English Dictionary in the Cantonese Dialect. London: Trubner & Co. Reprint

Ÿ             Ball, James Dyer. 1886. The Cantonese Made Easy Vocabulary 廣東話入門辭彙表. Hongkong: Kelly & Walsh, Ld. 1892 1908

Ÿ             Chalmers, John. 1891. An English and Cantonese Dictionary 英粵字典, 6th ed. Hong Kong: Kelly & Walsh Limited. 1907

Ÿ             Aubazac, Louis. 1912. Dictionnaire Cantonnais-Francais 粵法字典. Hong Kong: Imprimerie de la Société des Missions-Étrangères.

Ÿ             Cowles, Roy T. 1914. A Pocket Dictionary of Cantonese: Cantonese-English with English-Cantonese Index 廣州話袖珍字典. Hong Kong: Hong Kong University Press. 1990 1999

Ÿ             孔仲南 1933:《廣東俗語攷》 。廣州:南方扶輪社。Google Drive

Ÿ             Meyer, Bernard F. & Wempe, Theodore F. 1934. The Student's Cantonese-English Dictionary. Hong Kong, St. Louis Industrial School Printing Press.

Ÿ             黃錫凌 1941:《粵音韻彙 :廣州標準音之研究》 。上海:中華書局。 1970 1991 1998

Ÿ             Chiang, Ker-Ch'iu 蔣克秋. 1956. A Practical English-Cantonese Dictionary 實用英粵詞典. Singapore: Chin Fen Book Store.

Ÿ             馮思禹 1962:《廣州音字彙》。香港:世界書局。

Ÿ             Cowles, Roy T. 1965. The Cantonese Speaker's Dictionary. Hong Kong: Hong Kong University Press.

Ÿ             Huang, Po-Fei Parker. 1970. Cantonese Dictionary. New Haven and London: Yale University Press.

Ÿ             Lau, Sidney 劉錫祥. 1977. A Practical Cantonese-English Dictionary. Hong Kong: The Government Printer.

Ÿ            饒秉才、 歐陽覺亞、周無忌 1981:《廣州話方言詞典》。香港:商務印書館。 2013修訂版

Ÿ            曾子凡 1982:《廣州話.普通話口語詞對譯手冊》。香港:三聯書店 1991 1998 2001 2002 2014

Ÿ            饒秉才 1985:《廣州話..普通话雙音對照漢語字典》。香港: 三聯書店

Ÿ            張日昇、詹伯慧、甘于恩 主編 1987:《珠江三角洲方言字音對照》。香港:新世紀出版社。

Ÿ            張日昇、詹伯慧、甘于恩 主編 1988:《珠江三角洲方言詞彙對照》。香港:新世紀出版社。

Ÿ             張勵妍、張賽洋 1987:《國音粵音索音字彙》。香港: 中華書局。 1997\

Ÿ            饒秉才、周無忌 1988廣州話標準音字彙。香港:商務印書館。

Ÿ            關傑才  1990:《英譯廣東口語詞典》。香港:商務印書館。 2010

Ÿ            香港敎育署語文敎育學院中文系 1990:《常用字廣州話讀音表》。

Ÿ            吳開斌  1991:《簡明香港方言詞典》。廣州:花城出版社

Ÿ            陳慧英  1994:《實用廣州話詞典》。上海:漢語大詞典出版社

Ÿ            蘇翰  1994:《實用廣州音字典》。廣州: 中山大學出版社

Ÿ            張日昇、詹伯慧、甘于恩 1994粵北十縣市粵方言調查報告》。廣州:暨南大學出版社。

Ÿ             北京大學中國語言文學系語言學敎研室 1995:《漢語方言詞彙》,第二版。北京:語文出版社。

Ÿ             Hung, Betty. 1996. Phrases in Cantonese 非常廣東話. Hong Kong: Greenwood Press.

Ÿ             Lo, Wood Wai & Tam, Fee Yin. 1996. Interesting Cantonese Colloquial Expressions. Hong Kong: The Chinese University Press.

Ÿ            饒秉才、 歐陽覺亞、周無忌 1997:《廣州話詞典》 。廣州:廣東人民出版社。

Ÿ             鄭定歐 1997:《香港粵語詞典》。南京:江蘇敎育出版社。

Ÿ             麥耘、譚步云 1997:《實用廣州話分類詞典》。廣州:廣東人民出版社。 2011

Ÿ             朱永楷 1997:《香港話普通話對照詞典》。北京:漢語大詞典出版社。

Ÿ             余秉昭 1997:《同音字彙》,修訂版。香港:新亞洲文化基金會有限公司。

Ÿ             魏偉新 1997:《粵講俗語諺語歇後語詞典》。廣州:廣州出版社。

Ÿ             白宛如 1998:《廣州方言詞典》。南京:江蘇敎育出版社,《現代漢語方言大詞典.分卷》本。

Ÿ            詹伯慧、張日昇 1998粵西十縣市粵方言調查報告》。廣州:暨南大學出版社。

Ÿ             張勵妍、倪列懷 1999:《港式廣州話詞典》。香港:萬里書店。

Ÿ             許寶華、宮田一郎 主編 1999:《漢語方言大詞典》。北京:中華書局。

Ÿ             何文匯、朱國藩 1999:《粵音正讀字彙》。香港:香港敎育圖書公司。

Ÿ             楊明新 1999:《簡明粵英詞典》。`廣州:廣東高等教育出版社

Ÿ             New Asia - Yale-in-China Chinese Language Center, CUHK. 1991. English-Cantonese Dictionary 英粵字典. Hong Kong: The Chinese University Press.

Ÿ             詹伯慧 2002:《廣州話正音字典。廣州:廣東人民出版社。

Ÿ             詹伯慧 2002:《廣東粵方言概要》。廣州:暨南大學出版社。

Ÿ             饒秉才 2002:《廣州音字典(普通話對照)》。香港:三聯書店。 2004修訂版

Ÿ             So, Siu-hing Simon 蘇紹興. 2002. A Glossary of Common Cantonese Colloquial Expressions 英譯廣州話常用口語詞. Hong Kong: The Chinese University Press.

Ÿ             Lee, Yungkin Philip. 2003. Pocket Cantonese Dictionary. Hong Kong: Periplus Editions Limited.

Ÿ            曾子凡 、溫素華 2003:《廣州話.普通話 速查字典》。香港:世界圖書出版有公司

Ÿ             歐陽覺亞、饒秉才、周耀文 2005廣州話、客家話、潮汕話與普通話對照詞典。廣州:廣東人民出版社。

Ÿ             Hutton, Christopher & Bolton Kingsley. 2005. A Dictionary of Cantonese Slang: The Language of Hong Kong Movies, Street Gangs, and City Life. London: Hurst & Company.

Ÿ             Lo, Tam Fee-yin. 2006. Cantonese Colloquial Expressions 廣州話口語詞. Hong Kong: The Chinese University Press.

Ÿ             湯志祥 2006:《廣州話•普通話•上海話 6000 常用詞對照手冊》。香港:中華書局。

Ÿ             劉扳盛 2008:《廣州話普通話詞典》。香港:商務印書館。

Ÿ             歐陽覺亞、周無忌、饒秉才 2010:《廣州話俗語詞典》。廣州:廣東人民出版社。

Ÿ             Chiu, Aman 2010:《香港常用俗語小辭典》。香港:青春文化

Ÿ             Yang, N. 2011. English-Cantonese & Cantonese-English One-to-One Dictionary. Star Foreign Language Books.

Ÿ             Editors of Hippocrene Books (Ed.). 2012. Cantonese-English/English-Cantonese Dictionary & Phrasebook. Hippocrene Books.

Ÿ             Numlake. U. P. 2013. English Chinese Cantonese Dictionary. TraffordSG.

Ÿ             Hippocrene Books (Ed.). 2014. Cantonese-English/English-Cantonese Practical Dictionary. Hippocrene Books.

Ÿ             Kataoka, Shin 片岡新 & Lee, Cream Yin-Ping 李燕萍. 2014. Putonghua-Cantonese-English Converter. Hong Kong: Greenwood Press.

Ÿ             曾焯文 2016:粵辭正典:健康篇》。香港:文化現場。

Ÿ             張勵妍、倪列懷、潘禮美 2018:《香港粵語大詞典》。香港: 天地圖書。

Studies in Vocabulary Items and Idioms

Ÿ             詹憲慈 1924:《廣州語本字》,手稿。 1995 Wikipedia

Ÿ             喬硯農 1966:《廣州話口語詞的研究》。香港:華僑語文出版社。

Ÿ             石 人 1983:《東話趣譚》。香港: 博益。

Ÿ             石 人 1984:《東話再》。香港: 博益。

Ÿ             宋郁文 1985:《俗語拾趣》。香港:博益。

Ÿ             丘學強 1989:《妙語方言》。香港:中華書局。

Ÿ             阿 丁 1989:《趣怪香港話》。香港:香港周刊。

Ÿ             吳 昊 1990:《懷舊香港話》。 香港:創藝文化企業有限公司

Ÿ             陳均潤 1991港人自講》。

Ÿ             文若稚 1992:《廣州方言古語選釋》。澳門:澳門日報出版社。合訂本

Ÿ             文若稚 1993:《廣州方言古語選釋(續編)》。澳門:澳門日報出版社。合訂本

Ÿ             黃 氏 1993:《粵語古趣談》。香港:文星圖書有限公司。

Ÿ             楊子靜 1993:《粵語沉──廣州方言俗語攷》。廣州:廣東高等出版社。

Ÿ             彭志銘 1994:《次文化語言:香港新方言概論》。 香港:次文化堂

Ÿ             吳 昊 1994:《俗文化語言》。 香港:次文化堂上冊 下冊

Ÿ             曾子凡 1995:《廣州話.普通話語詞對比研究》 ,修訂本。香港:香港普通話研習社

Ÿ             莊澤義 1995:《省港民間俗語》。香港:海峰。

Ÿ             黃 氏 1997:《粵語古趣談續編》。香港:文星圖書有限公司。

Ÿ             伯煇 1998:《論粵方言詞本字考釋》。香港:中華書局。

Ÿ             伯煇 1998:《生活粵語本字趣談》。香港:中華書局。

Ÿ             石人(梁小中) 1999:《趣談廣東話》。香港:一本堂。

Ÿ             魯 金 1999:《香江舊語》。香港: 次文化堂。

Ÿ             陳渭泉 2000:《歇後語趣談》。澳門:凌智廣告公司。

Ÿ             陳渭泉 2001:《拙中求趣》。澳門: 凌智廣告公司。

Ÿ             容 若 2001:《粵語國語好雙語》。 香港:次文化堂。

Ÿ             陳渭泉 2002:《笑談歇後語》。澳門:凌智廣告公司。

Ÿ             陳小雄 2005:《地道廣州話用語》。廣州:羊城晚報出版社

Ÿ             梁仲森 2005:《當代香港粵語語助詞的硏究》。香港:香港城市大學語言資訊科學硏究中心。

Ÿ             潘永強 2005:《擔天望地──廣府俗語探奇》。香港:中華書局。

Ÿ             饒原生 2006:《粵語口頭禪》。廣州:廣東敎育出版社

Ÿ             陳雄根、何杏楓、張錦少 2006:《追本窮源:粵語詞彙趣談》。 香港:三聯書店

Ÿ             吳 昊 2006:《港式廣府話研究》。香港:次文化堂

Ÿ             彭志銘 2006:《正字正確》。香港:次文化堂

Ÿ             盧活為 2006:《香港話一知半解》。香港:獲益出版事業有限公司

Ÿ             余一詠 2007:《粵口語與說文》。香港:自資出版

Ÿ             饒原生 2007:《港粵口頭禪趣解》。香港:洪波出版社

Ÿ             彭志銘 2007:《正字審查》。香港:次文化堂

Ÿ             彭志銘 2007:《小狗懶擦鞋》。香港:次文化堂

Ÿ             彭志銘 2008:《香港潮語話齋》。香港:次文化堂

Ÿ             曾子凡 2008:《粵語慣用語研究》。香港:香港城市大學出版社

Ÿ             朱 薰 2008:《Fun E潮語大敎訓》。香港:萬里書店

Ÿ             蘇眞眞 2008:《香港潮語 學習字卡》。香港:Kubrick

Ÿ             蘇眞眞 2009:《香港潮語 學習字卡【貳】》。香港:Kubrick

Ÿ             彭志銘 2009:《旺角詞話》。 香港:次文化堂

Ÿ             彭志銘 2009:《廣東俗語正字考》。 香港:次文化堂

Ÿ             彭志銘 2010:《次文化語言:香港新方概論》。 香港:次文化堂

Ÿ             潘永強 2010:《粵語俗話(一)動作篇》。香港:中華書局。

Ÿ             潘永強 2010:《粵語俗話(二)人物•事物篇》。香港:中華書局。

Ÿ             梁慧敏 2011:《潮語解密》。香港:萬里機構。

Ÿ             彭志銘 2011:《香港黑詞典》。 香港:次文化堂

Ÿ             黃 氏 2012:《粵語古趣談三編》。香港:金石圖書貿易有限公司。

Ÿ             蘇萬興 2014:《講開有段古:老餅潮語》。香港:中華書局。

Ÿ             彭志銘 2014:《香港粵語頂硬上》。 香港:次文化堂

Ÿ             陳 雲 2015:《廣東雅言》。 香港:次文化堂

Ÿ             彭志銘 2015:《老師怕問字》。 香港:次文化堂

Ÿ             彭志銘 2016:《粵港歇後語鈎沉》。 香港:次文化堂

Links to Classical Chinese References

On-line Dictionaries

Ÿ             Unihan Database 1993

Ÿ             Chinese Etymology 字源 1994

Ÿ             漢字字典 2002

Ÿ             A Study of the Chinese Characters Recommended for the subject of Chinese Language in Primary Schools 小學中文科常用字研究 2003

Ÿ             漢典 2004

Ÿ             說文解字注 2006

Ÿ             開放康熙字典

Ÿ             康熙字典網上版

Ÿ             International Encoded Han Character and Variants Database 國際電腦漢字及異體字知識庫  2010

Ÿ             象形字典  2010

Ÿ             引得市 2012

Ÿ             小學堂文字學資料庫 2013

Ÿ             Multi-function Chinese Character Database 漢語多功能字庫 2014

Ÿ             甲骨文研究網 2014

Ÿ             重編國語辭典──修訂本 2015

Ÿ             Dictionary of Chinese Character Variants 異體字字典 2017

Ÿ             中華語文知識庫  2017

Ÿ             zi tools 字統 网 2019

Ÿ             白雲深處人家 2019

Ÿ             The Complete Collection of Ancient and Modern Characters 古今文字集成 2021

Ÿ             搜詞尋字

Ÿ             源查詢

Ÿ             國學迷 2021

Ÿ             數字化《說文解字》

Links to Tools for Developing Chinese Corpus

On-line Tools

Ÿ             A Chinese Word Segmentation System with Unknown Word Extraction and POS Tagging 中文斷詞系統

Ÿ             CKIP Chinese Parser demo version 中文剖析器線上測試

Ÿ             Chinese Knowledge and Information Processing

Ÿ             家敎育研究院分詞系統

Ÿ             語料庫在線──漢語分詞和詞性自動標註

Ÿ             漢文の依存文法解析と返り点の関係について

Ÿ             UDPipe Visualizer with Immediate Catena Tree

Off-line Tools

Ÿ             The Stanford Word Segmenter

Ÿ             The Stanford Log-linear Part-of-Speech Tagger

Ÿ             The Stanford Parser: A statistical parser

Ÿ             The Stanford Named Entity Recognizer

Ÿ             The Stanford CoreNLP

Ÿ             結巴中文分詞程式

Ÿ             Natural Language Processing with Python

Ÿ             PyCantonese: Cantonese Linguistics and NLP in Python

Ÿ             udchinese 0.5.0: Tokenizer POS-tagger and Dependency-parser for Chinese

Overview on Tools for Natural Language Processing in Chinese

Ÿ             中文處理工具簡介

Ÿ             中文斷詞技術簡介

Ÿ             Chinese Natural Language Processing and Speech Processing

Ÿ             Analyzing Cantopop with PyCantonese - Charles Lam

Ÿ             Extracting Cantonese data from Hong Kong Chinese corpora



Foong Ha YAP, Principal Investigator

PolyU Corpus of Spoken Chinese

9th February, 2013


This corpus was developed by the Stance Project research team of the Department of English, Hong Kong Polytechnic University. All proprietary rights reside with said University. This corpus is protected by copyright laws and international copyright treaties, as well as other intellectual property laws and treaties. No part of this corpus shall be reproduced or adapted without prior written permission approved by the Hong Kong Polytechnic University.

Last Updated: 22 October 2022 15:00 PM

Copyright © 2013 Department of English, Hong Kong Polytechnic University. All rights reserved.