Digital Humanities grant to support our work

Some excellent news to end the year on: we were awarded a grant from the Digital Approaches to the Humanities programme of the Netherlands eScience Center. The eScience center is the Netherlands’ national centre for academic research software.

One of our longer term goals is to make an impact on work in NLP and linguistics by building digital tools that support qualitative and quantitative approaches to conversational structure. Andreas Liesenfeld, postdoc in the ElPaCo project: “Support from the eScience center will help us to improve our existing codebase and work towards broader impact.”

L: New ways of visualizing conversational flow (osf.io/zd34r). R: Clustering Dutch conversational words using methods from bioacoustics (osf.io/7t9pn).

The funding means that the project can host a professional software engineer for a year. PI Mark Dingemanse: “Our work is code-heavy and we place special emphasis on new and compelling ways to analyse and visualize conversational structure. To do this well we need to pioneer new computional tools. We’re happy that the eScience center offers this opportunity to contribute to the strength and longevity of our project.”

Invited talk at Dept of Computational Linguistics, Düsseldorf

ElPaCo team members Andreas Liesenfeld and Mark Dingemanse visit Düsseldorf for an invited talk at the Department of Computational Linguistics. Abstract:

What social interaction and linguistic diversity can tell us about language (and technology)

The primary ecology of natural language is in real-life episodes of human interaction. This is where people learn language and where they use it to coordinate joint actions, build social relations, and exchange information. In contrast, when machines encounter language, it tends to be radically divorced from this habitat and reduced to large amounts of decontextualised non-interactive text. Natural languages are also characterized by diversity at many levels, from sound and sign systems to syntax and semantics. In contrast, the language samples that inform language technology tend to be limited to a handful of well-resourced languages, representing only a tiny sliver of the world’s linguistic diversity.

In this talk we show how a view of language rooted in social interaction sheds new light on turn-taking, pragmatic reasoning, and joint action coordination, with implications for linguistic typology, dialogue modelling, speech recognition, and the design of conversational interfaces. We will provide an overview of current work in our research project Elementary Particles of Conversation and cover a range of methods and results, from computational modelling to comparative linguistics and from distributional pragmatics to dialogue model evaluation.

3 conferences in September

It’s a busy month for ElPaCo project as we present papers, lectures and tutorials at the Joint Conference on Language Evolution in Kanazawa, KONVENS 2022 in Potsdam, and Interspeech 2022 in Incheon, Korea.

At the Joint Conference on Language evolution we presented new comparative and computational work on the cross-linguistic shapes and cultural evolution of response tokens (work by Marieke Woensdregt, Andreas Liesenfeld and Mark Dingemanse).

At KONVENS in Potsdam, team members Andreas Liesenfeld, Ada Lopez and Mark Dingemanse presented a tutorial covering work on conversational corpora, conversational AI, and ASR, building on the research programme set out in our ACL paper.

Also at KONVENS, Ada Lopez presented her first first author paper coming out of her Research MA lab rotation in the ElPaCo project. Pointing ASR at conversational corpora in 3 languages, we go beyond WER and trace what goes missing. Check out the short paper here.

At Interspeech 2022 in Incheon, Andreas Liesenfeld and Mark Dingemanse presented a first foray into language-agnostic approaches to identifying and comparing response tokens (aka backchannels) in conversational audio corpora across 16 languages (8 phyla). Find the paper here.

ElPaCo tutorial at KONVENS in Potsdam

ElPaCo members Ada Lopez and Andreas Liesenfeld have travelled to Potsdam to present their work at KONVENS, one of the largest computational linguistics conferences in Germany. Among other things, they presented a half-day tutorial.

The tutorial helps implement one of the ‘valorization’ goals of the Elementary Particles of Conversation project: to help engineers and practitioners get a handle on the subtleties of human interaction.

Featured in ‘Next Big Ideas’ talk at ACL

We wrote a position paper for ACL’s theme session on linguistic diversity: From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology (Dingemanse & Liesenfeld 2022).

We were pleasantly surprised when Thamar Solorio, one of the speakers in the Next Big Ideas plenary session at ACL, highlighted a key line from the conclusion of our paper as her Takeaway Message: “We need language models that are representative of the actual ways in which people use language [and that] give people the feeling they do not have to leave their own linguistic identities at the door”.

Screenshot of slide from Thamar Solorio, quoting the following line from our paper: "We need language models that are representative of the actual ways in which people use language [and that] give people the feeling they do not have to leave their own linguistic identities at the door".

By the way, one of the more puzzling ACL reviewer comments we got was precisely about that line (among others), and featured a serious charge that @a_liesenfeld and I now often lob at each other: 🚨 “figurative language in evidence” 🚨!

Video lecture ‘From text to talk’ now available

ACL being hybrid, we presented in person but also prepared a video lecture accompanying our paper that we made freely available:

Capped at 12 minutes, the video doesn’t cover everything that’s in the paper but we’ve crafted it to provide a useful overview of the major themes and contributions.

Invited talk at MilaNLP

Our accepted ACL paper led to an invitation by the MilaNLP group to present our work at their monthly Coding Aperitivo meeting. We had a lot of fun meeting members of the Mila NLP crowed and test-driving our work for a computer science audience.