Monday, 28 September 2015

ROLLS 30 Sept: Tom Devlin: Coalmining vocabulary & Durham English

Please join us at the Research on Language & Linguistics at Sussex seminar series:

Wednesday, 30 September 2015, 13.00
Fulton 214, University of Sussex 

The influence of coalmining vocabulary on variant usage in Durham English

Tom Devlin, University of Sussex
This research investigates the influence of coalmining vocabulary on variant usage by testing the claim that mining communities preserve distinctive and conservative phonological patterns (Wales 2006: 124). The study explores the degree of advancement of vowels belonging to the START lexical set (Wells 1982) in mining and non-mining words in the speech of sixteen older male speakers from former colliery villages in East Durham in the North East of England. The results show that regardless of the speaker's relationship to coalmining, START vowels are shifted to significantly backer realisations in mining words than in non-mining vocabulary, close to traditional pronunciations noted in historical dialect literature. This outcome is upheld even in identical lexical items with different meanings in mining and non-mining speech.

Wales, K. 2006. Northern English: A Social and Cultural History. Cambridge: CUP.
Wells, J. 1982. The Accents of English. Vol. 2: The British Isles. Cambridge: CUP.

Wednesday, 16 September 2015

Modelling semantic change workshop


Linguistic DNA of Modern Western Thought:

Modelling concepts and semantic change in English 1500–1800





Workshop 1: Computer-Assisted Language Processing

Friday 18th September 2015
University of Sussex, Jubilee Building, Room G22



Registration & coffee (9.00-9.30)

Session 1 (9.30-10.45)
                           Susan Fitzmaurice: Introduction to Linguistic DNA
                           Research Associates & HRI: Resources, progress, problems and queries
Coffee break (10.45-11.15)

Session 2 (11.15-12.30)
Diana McCarthy (University of Cambridge): Inducing and contrasting word meanings from different sources
Kathryn Allan (UCL): ‘Degrees of lexicalization’ in the Historical Thesaurus of the OED
Lunch (12.30-1.30)

Session 3 (1.30-3.15)

                           Gabriel Egan (De Montfort University): Instructive failures in authorship attribution by shared phrases in large textual corpora
                           Dirk Geeraerts (KU Leuven): Quantitative corpus approaches to lexical and conceptual variation I
Dirk Speelman (KU Leuven): Quantitative corpus approaches to lexical and conceptual variation II
Coffee break (3.15-3.30)

Session 4 (3.30-5.00)

                           Panel discussion: Dawn Archer (Manchester Metropolitan University), Scott Gibbens (Jisc Historical Texts), David Weir (University of Sussex), Pip Willcox (Bodleian Libraries)

Close (5.00)




‘Degrees of lexicalization’ in the Historical Thesaurus of the OED’ by Kathryn Allan

One of the most intriguing issues raised by the Historical Thesaurus of the Oxford English Dictionary (HTOED), which will be addressed by the ‘Linguistic DNA of Modern Western Thought’ project, is the significance of vocabulary size. Why are some semantic fields very densely populated in comparison to others, and why are concepts lexicalised to differing degrees across time? For some concepts, such as those in fields such as Food and Colour, there are obvious answers. There are no terms for ‘potato’ attested earlier than the late sixteenth century because it is not native to Britain and was only brought to the country then, and many terms from the late eighteenth onwards show the increasing numbers of varieties that have become familiar to speakers in modern times. Similarly, the rise in non-basic colour terms from the early Modern English period onwards corresponds to the technological changes that enabled the production of dyes, leading to sophisticated methods of creating and recreating precisely differentiated shades (discussed by Carole Biggam and Laura Wright, for example). This example seems to provide fairly clear evidence to support the view suggested in the preface of HTOED that in some cases the ‘degree of lexicalization [of a category] reflect[s] its considerable degree of importance to speakers of the language’. However, in other cases, including many abstract categories, the relationship between semantic field and conceptual domain is much less straightforward, and the emergence of a high number of new terms has no obvious external-world trigger. For example, HTOED records a spike of new terms for ‘sweet (in taste)’ between 1400 and 1700, including several variant forms with a common derivation such as douce, dulcet, dulce, dulcid, dulcorous and dulceous. Some of these are only attested a small number of times, and none replace the basic term sweet; their appearance is most readily explained as the result of shifts in stylistic norms combined with greater receptivity to Latinate vocabulary in this period. This paper considers some of the difficulties that emerge when considering the degree of lexicalisation of different concepts, and especially the complications that emerge from the data itself.

‘Instructive failures in authorship attribution by shared phrases in large textual corpora’ by Gabriel Egan

We can learn much from the mistakes made in recent authorship attribution endeavours that hunt for phrases shared between a work of unknown or contested authorship and the works in large textual corpora available to us digitally. Investigators have long known to suspect our intuitive conviction that a series of apparently unusual phrases cannot be shared between two works merely by chance--in fact they can--and have long acknowledged that a series of ‘negative checks’ are needed to be sure that linguistic constructions that seem rare really are rare. Despite knowing of these pitfalls, spectacular errors have been made recently because i) the methods of searching corpora are fallible in ways unforeseen by the investigator, ii) textual corpora are not necessarily as complete as investigators believe, iii) it is shockingly easy to introduce methodological bias into the experiments, and iv) it is easy to misunderstand and/or misrepresent the statistical significance of particular findings. This talk will discuss what went wrong in a series of recent investigations and draw lessons from them. Chiefly, the finding is that the principles that are supposed to prevail in scientific experiments should also govern work in our field. Our datasets and software source code should be made publicly available so that anyone may replicate our investigations. And our statistical methods should be subject to proper critique by professional statisticians. These rather dry findings will, it is hoped, be leavened by the telling of some amusing stories about what happens when authorship attribution goes wrong.

‘Inducing and Contrasting Word Meanings from Different Sources’ by Diana McCarthy

In Computational Linguistics, work on representing lexical meaning previously focused on manually created inventories. There is however now a large body of work that builds models of word meaning directly from corpus data. This has the advantage that one does not rely on advance knowledge of the relevant meanings; instead the knowledge emerges from the data.  This provides more scope for contrasting the meanings induced from different sources where the sources could differ with respect to, for example, textual domain, genre or time. In this talk I will outline some approaches for inducing word meanings and describe work, conducted with collaborators at the University of Melbourne, to induce and compare word meanings from different sources using topic models. I'll particularly focus on our work use these models to detect novel word senses in diachronic corpora. One key draw back with automatic word sense induction is the requirement for a large amount of data for training the models. Since corpora of a sufficient size have only been available in the last twenty years or so this limits application of these techniques to word meaning change attested within that period and for the types of corpora available. I will also therefore describe some corpus linguistics work, conducted with Sketch Engine, for the National Ecosystem Assessment. In this work we used Sketch Engine to contrast usages of  lexemes pertaining to the environment from different sources (academic, government and public). I'll discuss the pros and cons of the different approaches which can, of course, be complementary to one another.

‘Quantitative corpus approaches to lexical and conceptual variation’ by Dirk Geeraerts and Dirk Speelman

In this talk, we intend to present an overview of various types of corpus-based variation studies that we have been conducting in our research team Quantitative Lexicology and Variationist Linguistics and that we believe could be interesting for the 'Linguistic DNA' project. Specifically, we will introduce the distinction between formal and conceptual onomasiological variation, with a further distinction between direct and indirect approaches to the latter, and suggest that a formal onomasiological and an indirect conceptual onomasiological perspective could be the most relevant ones for the 'Linguistic DNA' project. We will illustrate these perspectives, with a methodological focus on the diagnostic concept of ‘onomasiological profile’ and the use of semantic vector spaces.

Monday, 14 September 2015

What's new for 2015-16

Welcome to the new academic year!

As ever, we grow and change. (That's what university's about, right?) We are happy to announce the following additions to English Language and Linguistics.

New people

We're looking forward to meeting our new students this week. Induction activities are under way. Be sure to come say hello at the welcome receptions as well as at official meetings (see your induction timetable).

Another new face (right) belongs to Tom Devlin, who will be teaching a range of modules, particularly those involving sociolinguistics and history of English. His current research concerns the relationship between sound change in Durham mining villages and external and extra-linguistic factors such as contact, speaker and group identity, orientation and perception.

We also welcome a visiting researcher, Professor Young-Shik Huangbo who works in English phonology at Sungkyul University, Korea.

New modules

At the MA level we are offering a new Spring module Language and Culture in Intercultural Communication

We also have a new autumn-term undergraduate elective, available to first- and second-year students from any single-honours course (including ours!): The Politics of Language

Both of these will be taught by Charlotte Taylor

New look

Arts B corridors have a cheery new paint scheme. Come and visit your tutors' office hours to see who's behind the doors painted 'Honey Mustard', 'Mountain Mist' and 'Proud Peacock' (among others). We'll let you be the judges of whether those colours suit us. 

Stay tuned to this blog for more news from us about new research and events in English Language and Linguistics.