Spoken Language and Text Corpora
Tok Pisin is a creole language spoken throughout Papua New Guinea. It is an official language of Papua New Guinea and the most widely used language in the country between five to six million speakers. Tok Pisin was developed as a trade pidgin referred to as "New Guinea Pidgin" or "Pidgin English". Urban dwellers in particular often communicate in Tok Pisin and perhaps one million people now use Tok Pisin as a primary language.
Tok Pisin Project is an initiative with funding from Department of Defence, Science and Technology (DST) who want CoEDL Tok Pisin corpus to build up transcribed materials involving up to around 15 hours of recorded naturalistic, conversational materials. There is 11 hours currently in PARADISEC and need another 4 hours or so from other sources transcribed. There are at present no existing transcription and DST want initially spoken corpus as a subset of the overall Tok Pisin corpus. The outcome of the Project is to have faithful transcriptions on the use of the Tok Pisin language to input into the machine learning system to match all sounds against the phonemic transcriptions.
DST want Tok Pisin corpus to build up transcribed materials involving up to around 15 hours of recorded naturalistic, conversational materials. Tok Pisin project is focusing on spoken language and can be sourced from PARADISEC collection items on Papua New Guinea or audio recordings from Radio Australia Pacific Tok Pisin Service . It is also useful for written language for comparison that need to be further explored. Textual corpus can be explored from sources with permission such as Tom Slone's text of his book "One thousand one Papua New Guinean nights : folktales from Wantok newspaper" (https://catalogue.nla.gov.au/Record/1334855) and from newspaper's website: http://wantokniuspepa.com/index.php/archives/wantok-niuspepa.