We're pleased to publish this guest post by Sarah Fitzgerald, a current third-year BA English Language and Linguistics student who took part in the Junior Research Associate programme in Summer 2015.
----------------------------------
Six months ago I went to see Dr Melanie Green. I wanted to discuss third year modules with her. I mentioned as I was about to leave that I had heard about the Junior Research Associate (JRA) scheme and thought it would be interesting to apply but had no idea what I might study. That passing remark ended up leading to the most interesting summer of work I’ve ever done and to so many opportunities which I would never otherwise have been offered.
Melanie is currently building a corpus of Cameroon Pidgin English (CPE) along with Gabriel Ozón at the University of Sheffield and Miriam Ayafor at the University of Yaoundé in Cameroon. She suggested that I apply for JRA funding for a project to create and test a system towards tagging CPE for parts of speech (POS). Just in case you are not nodding sagely at this information, and I certainly wasn’t when Melanie first suggested it, what follows is a brief rundown on pidgin languages, the context of CPE, linguistic corpora and POS tagging.
Pidgins evolve due to language contact, often through colonialism, and many modern pidgins developed due to the slave trade. People with no common language have to communicate as best they can when forced to work together. As a result there are many pidgins which are based on the vocabulary of European languages, particularly Dutch, English, French and Portuguese but which have very different systems of grammar from these languages. Pidgins start to develop their own grammar systems as the children of the origin
al speakers grow up speaking the pidgin as a first language – developing them into creoles. This natural development makes pidgins and creoles useful and interesting to study, particularly as their grammatical systems often have much in common with one another, even when thousands of miles apart.
Cameroon has two official languages, English and French, but more than 200 languages are spoken there. This means that people need a common language (or lingua franca) to communicate effectively. CPE, which is spoken by more than 50% of the population of Cameroon fills this role. While it is called Pidgin English CPE would more accurately be described as a creole language as it has its own grammar system and is spoken as a first language by a subset of its speakers. Radio talk show hosts often use CPE, people speak it for trade and use it on social media, but CPE is thought of as uneducated and its use is highly stigmatised. As a result it is rarely written and lacks the standardised spelling and the reference books such as grammars and dictionaries which might help to destigmatise a language. Which is where a corpus comes in.
In linguistics a corpus is a collection of texts. Most commonly an electronic collection which can be searched and used to identify patterns and frequency information about languages not otherwise apparent, even to native speakers. It is possible to learn new information about a language by gathering a selection of texts and using existing software to search for frequency or collocation information (words which occur together) but we can learn much more if the corpus is tagged for parts of speech. POS tagging involves attaching a tag to each word which identifies it as a noun, verb, adjective etc. This allows the corpus to be searched for patterns which would otherwise be hard to spot.
The first question most people have asked me at this point is how you make a corpus of a language which is not written down. Fortunately for me the wider project aims to transcribe 240,000 words of spoken CPE and they are well on their way to doing so, which means that texts were readily available. My task was to work out what tags the language needed and to create a tag set for it. In this I was aided by the grammar of CPE which Melanie and Miriam have recently written (Cameroon Pidgin English available 2016, I highly recommend it!). I then tagged 6,500 words manually and used this data to train learning software to tag CPE automatically. Day to day this meant sitting at my dining table staring intently at a computer screen. It was painstaking, absorbing and left me unable to string sentences together in English by the end of each day, but it was also very rewarding. I got to see something that I had created achieve a 90% success rate in the automatic tagging stage.
Nobody has tagged CPE before which meant that there wasn’t an instruction manual available when I started (I had to write myself one!) and that was a bit terrifying at first. What I did have was encouragement and support from Melanie, as well as from her colleague Gabriel. They were both willing to throw as much of their combined expertise at me as they could and some of it must have stuck as I managed to achieve the aims of my project. This support gave me a safety net which allowed me to try to work things out for myself wherever possible. I have gained so much from this project in terms of skills, experience and confidence in my abilities. I have also gained a greater appreciation of the limitations of my knowledge and understanding. My JRA project is over but I am still learning from it: this month I had the chance to present my work at the JRA poster exhibition, I also got the opportunity to write about my work as part of an article for publication in World Englishes and I have been able to continue tagging the corpus as I have been hired to work on the project as a research assistant this year. I am so glad that I spoke to Melanie when I did.
For any students in their second year I cannot recommend the JRA experience enough. If you are interested but don’t know what you want to study then talk to a tutor whose research interests you. They are likely to have plenty of ideas and working on a project in Melanie’s area of expertise has broadened my understanding of linguistics in ways that I doubt an idea of my own devising would have done. It may seem early (applications are in the spring) but it is worth thinking about now – research proposals are hard work!
----------------------------------
Six months ago I went to see Dr Melanie Green. I wanted to discuss third year modules with her. I mentioned as I was about to leave that I had heard about the Junior Research Associate (JRA) scheme and thought it would be interesting to apply but had no idea what I might study. That passing remark ended up leading to the most interesting summer of work I’ve ever done and to so many opportunities which I would never otherwise have been offered.
Melanie is currently building a corpus of Cameroon Pidgin English (CPE) along with Gabriel Ozón at the University of Sheffield and Miriam Ayafor at the University of Yaoundé in Cameroon. She suggested that I apply for JRA funding for a project to create and test a system towards tagging CPE for parts of speech (POS). Just in case you are not nodding sagely at this information, and I certainly wasn’t when Melanie first suggested it, what follows is a brief rundown on pidgin languages, the context of CPE, linguistic corpora and POS tagging.
Pidgins evolve due to language contact, often through colonialism, and many modern pidgins developed due to the slave trade. People with no common language have to communicate as best they can when forced to work together. As a result there are many pidgins which are based on the vocabulary of European languages, particularly Dutch, English, French and Portuguese but which have very different systems of grammar from these languages. Pidgins start to develop their own grammar systems as the children of the origin
al speakers grow up speaking the pidgin as a first language – developing them into creoles. This natural development makes pidgins and creoles useful and interesting to study, particularly as their grammatical systems often have much in common with one another, even when thousands of miles apart.
Cameroon has two official languages, English and French, but more than 200 languages are spoken there. This means that people need a common language (or lingua franca) to communicate effectively. CPE, which is spoken by more than 50% of the population of Cameroon fills this role. While it is called Pidgin English CPE would more accurately be described as a creole language as it has its own grammar system and is spoken as a first language by a subset of its speakers. Radio talk show hosts often use CPE, people speak it for trade and use it on social media, but CPE is thought of as uneducated and its use is highly stigmatised. As a result it is rarely written and lacks the standardised spelling and the reference books such as grammars and dictionaries which might help to destigmatise a language. Which is where a corpus comes in.
In linguistics a corpus is a collection of texts. Most commonly an electronic collection which can be searched and used to identify patterns and frequency information about languages not otherwise apparent, even to native speakers. It is possible to learn new information about a language by gathering a selection of texts and using existing software to search for frequency or collocation information (words which occur together) but we can learn much more if the corpus is tagged for parts of speech. POS tagging involves attaching a tag to each word which identifies it as a noun, verb, adjective etc. This allows the corpus to be searched for patterns which would otherwise be hard to spot.
The first question most people have asked me at this point is how you make a corpus of a language which is not written down. Fortunately for me the wider project aims to transcribe 240,000 words of spoken CPE and they are well on their way to doing so, which means that texts were readily available. My task was to work out what tags the language needed and to create a tag set for it. In this I was aided by the grammar of CPE which Melanie and Miriam have recently written (Cameroon Pidgin English available 2016, I highly recommend it!). I then tagged 6,500 words manually and used this data to train learning software to tag CPE automatically. Day to day this meant sitting at my dining table staring intently at a computer screen. It was painstaking, absorbing and left me unable to string sentences together in English by the end of each day, but it was also very rewarding. I got to see something that I had created achieve a 90% success rate in the automatic tagging stage.
Nobody has tagged CPE before which meant that there wasn’t an instruction manual available when I started (I had to write myself one!) and that was a bit terrifying at first. What I did have was encouragement and support from Melanie, as well as from her colleague Gabriel. They were both willing to throw as much of their combined expertise at me as they could and some of it must have stuck as I managed to achieve the aims of my project. This support gave me a safety net which allowed me to try to work things out for myself wherever possible. I have gained so much from this project in terms of skills, experience and confidence in my abilities. I have also gained a greater appreciation of the limitations of my knowledge and understanding. My JRA project is over but I am still learning from it: this month I had the chance to present my work at the JRA poster exhibition, I also got the opportunity to write about my work as part of an article for publication in World Englishes and I have been able to continue tagging the corpus as I have been hired to work on the project as a research assistant this year. I am so glad that I spoke to Melanie when I did.
For any students in their second year I cannot recommend the JRA experience enough. If you are interested but don’t know what you want to study then talk to a tutor whose research interests you. They are likely to have plenty of ideas and working on a project in Melanie’s area of expertise has broadened my understanding of linguistics in ways that I doubt an idea of my own devising would have done. It may seem early (applications are in the spring) but it is worth thinking about now – research proposals are hard work!
---Sarah FitzGerald
Well done! I can only imagine all the work that went into doing what you did, and the personal satisfaction you got at the end. I just began writing a research proposal on an aspect of CPE, and I can testify that writing one is a daunting task. But I love a challenge, and I hope to succeed and move on to actually carrying out the research. Your story gives me a lot of encouragement. Thank you, and I wish you more successes as you progress in your career.
ReplyDelete