How frequent are interjections?

The occurrence of interjections in 10-min excerpts of informal dyadic conversations in six spoken languages. Every panel shows the turns of a dyadic exchange; colored dots indicate turns that belong to the top 10 most common one-word standalone turn formats in the language. These excerpts cannot support strong comparative or typological inferences; they are only meant to give an impression of the prevalence of interjections across unrelated languages.

Dingemanse, M. (2024). Interjections at the heart of language. Annual Review of Linguistics, 10, 257–277. doi: 10.1146/annurev-linguistics-031422-124743 PDF

Anatomy and frequency of interactive repair

A With interactive repair, another participant initiates repair, inviting a repair solution by the first; the repair initiation is a pivot, pointing both back and forward. B While a fitted response is preferred, initiating repair is always a possible next move; likewise, within repair, while a restricted format is preferred, an open format is always an option. C Across diverse languages, formats for interactive repair range fall into three types, depending on how they target the trouble in prior turn and the kind of response they typically invite; these can be ranked from less to more specific in terms of the grasp of the trouble source they display. D Empirical cumulative distribution of independent repair sequences (black curve) as they occur over time in informal conversation in a global sample of 12 languages (grey curves). Across languages, the steepest part of the slope is around 17 s, the average 84 s, and nearly all sequences occur within a 4-min window from the last.

Dingemanse, M., & Enfield, N. J. (2024). Interactive repair and the foundations of language. Trends in Cognitive Sciences, 28(1), 30–42. doi: 10.1016/j.tics.2023.09.003 PDF

How conversational data challenges speech recognition (ASR)

A Word error rates (WER) for five speech-to-text systems in six languages. B One minute of English conversation as annotated by human transcribers (top) and by five speech-to-text systems, showing that while most do some diarization, all underestimate the number of transitions and none represent overlapping turns (Whisper offers no diarization). C Speaker transitions and distribution of floor transfer offset times (all languages), showing that even ASR systems that support diarization do not represent overlapping annotations in their output.

Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems. Proceedings of the 24th Annual SIGdial Meeting on Discourse and Dialogue, 482–495. doi: 10.18653/v1/2023.sigdial-1.45 PDF

Iconicity measures across tasks

Discriminability of iconicity measures from different tasks. Iconicity ratings have been transformed so that they vary between 0 and 1 (to compare with guessing accuracies). Guesses —where people try to guess the meaning of an iconic word, or the word form belonging to a given meaning— appear to be somewhat more evenly spread than ratings. Iconicity ratings by native speakers (rightmost, showing data from Thompson et al. 2020) are on average higher than iconicity ratings by people who don’t speak the language whose words they rate, confirming the notion that native speakers will generally feel that words of their own language are more iconic. (Figure by Bonnie McLean, open data here.)

McLean, B., Dunn, M., & Dingemanse, M. (2023). Two measures are better than one: combining iconicity ratings and guessing experiments for a more nuanced picture of iconicity in the lexicon. Language and Cognition, 15(4), 719–739. doi: 10.1017/langcog.2023.9 PDF

Simulating phonetic evolution

Plots of where in a phonetic possibility space different words end up after 10,000 rounds of interaction, across 20 independent simulation runs (each cloud of 100 exemplar dots/triangles represents a single word at round 10,000 of a single simulation run). Blue, yellow, green and orange are regular words; purple is the continuer word. On each independent simulation run, all words are initialised at randomly selected positions in the space. A shows a selection of 6 separate simulation runs chosen for illustrative purposes (showing how regular words end up in different positions); B shows the end-state of all 20 simulation runs overlaid. Parameter settings: (i) minimal effort bias 3 times as strong for continuer word (G=1250) than for regular vocabulary words (G=5000), and (ii) the bias for reuse of features (i.e. segment-similarity bias) is not applied to the continuer category.

Dingemanse, M., Liesenfeld, A., & Woensdregt, M. (2022). Convergent cultural evolution of continuers (mmhm). The Evolution of Language: Proceedings of the Joint Conference on Language Evolution (JCoLE), 61–67. PDF

Sequential context of continuers

A Candidate continuer forms in 10 unrelated languages, B shown in their natural sequential ecology (annotations as in the original data), C with spectrograms and pitch traces of representative tokens made using the Parselmouth interface to Praat (Jadoul et al., 2018; Boersma & Weenink, 2013).

Dingemanse, M., Liesenfeld, A., & Woensdregt, M. (2022). Convergent cultural evolution of continuers (mmhm). The Evolution of Language: Proceedings of the Joint Conference on Language Evolution (JCoLE), 61–67. PDF

Sampling response tokens

A. Overview of included languages with dataset size in hours and top 3 sequentially identified response tokens as transcribed in the corpus. B. Location of largest speech community. C. Assessing the impact of sparse data on UMAP projections using three samples of Dutch response tokens. A look at the full dataset (a) and random-sampled subsets of decreasing size (b, c) suggests isomorphism across scales and interpretability of clustering solutions as small as 150 tokens.

Liesenfeld, A., & Dingemanse, M. (2022). Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages. Proceedings of Interspeech 2022, 1126–1130. doi: 10.21437/Interspeech.2022-11288 PDF

Cultural evolution of continuers

Continuers (frequent standalone utterances like mm-hm that people often use in succession) differ in interesting ways from other elements that are common, like top tokens (the most common words in a corpus) and discontinuers (frequent standalone utterances that people do not produce in successive streaks). A. Length of tokens for continuers, discontinuers and top tokens in 32 languages. B. Frequencies of major sound classes across types. Vowel nuclei occur across types, but continuers stand out for their preferences for nasals. C. Random forest analysis of 118 continuer forms in 32 spoken languages showing the top 10 most predictive phonemes (out of 29 attested).

Dingemanse, M., Liesenfeld, A., & Woensdregt, M. (2022). Convergent cultural evolution of continuers (mmhm). The Evolution of Language: Proceedings of the Joint Conference on Language Evolution (JCoLE), 61–67. PDF

Continuers and repair initiators

Two-panel figure showing (A) Typical sequential structures for continuers versus
repair initiators. Continuers are recurring items found in alternation with unique turns (a, c). Repair initiators are recurring items found between a unique turn a and its near-copy a’. (B) Prevalence of sequentially identified candidate continuers and repair initiators, demonstrating the potential of using sequential patterns to identify them in language-agnostic ways. Most frequent formats exemplified in 10 languages (9 phyla), from left to right: Akhoe Hai||om, Hausa, Tehuelche, Gutob, Kerinci, Siwu, Mandarin, German, Korean, Dutch.

Another useful feature of this diagram is that it makes it possible to infer a minimum corpus size for spotting interactional resources of interest. For instance, the smallest corpora among the 10 languages for which tokens are exemplified in the figure are Akhoe Hai||om and Hausa, both corpora that make up less than one hour in total. This appears to be a lower bound for identifying phenomena like repair, though continuers are about an order of magnitude more frequent and so can be reliably found even in smaller corpora.

Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), 1178–1192. https://aclanthology.org/2022.lrec-1.126 PDF

Quality control for conversational corpora

Conversational data can be transcribed in many ways. This panel provides a quick way to gauge the quality of transcriptions, here illustrated with data from Ambel (Arnold, 2017). A. Distribution of the timing of dyadic turn-transitions with positive values representing gaps between turns and negative values representing overlaps.
This kind of normal distribution centered around 0 ms is typical; when corpora starkly diverge from this it usually indicates noninteractive data, or segmentation methods that do not represent the actual timing of utterances. B. Distribution of transition time by duration, allowing the spotting of outliers and artefacts of automation (e.g. many turns of similar durations). C. A frequency/rank plot allows a quick sanity check of expected power law distributions and a look at the most frequent tokens in the corpus. D. Three randomly selected 10 second stretches of dyadic conversation give an impression of the timing and content of annotations in the corpus.

Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), 1178–1192. https://aclanthology.org/2022.lrec-1.126 PDF

How ASR training data differs from real conversation

L: Distributions of durations of utterances and sentences (in ms) in corpora of informal conversation (blue) and CommonVoice ASR training sets (red) in Hungarian, Dutch, and Catalan. Modal duration and annotation content differ dramatically by data type: 496ms (6 words, 27 characters) for conversational turns and 4642ms (10 words, 58 characters) for ASR training items. R: Visualization of tokens that feature more prominently in conversational data (blue) and ASR training data (red) in Dutch. Source data: 80k randomsampled items from the Corpus of Spoken Dutch (Taalunie, 2014) and the Common Voice corpus for automatic speech recognition in Dutch (Ardila et al., 2020), based on Scaled F score metric, plotted using scattertext (Kessler, 2017)

Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), 1178–1192. https://aclanthology.org/2022.lrec-1.126 PDF

/r/ for rough in Indo-European

A Across the Indo-European language family, the proportion of rough words with /r/ is much higher than the proportion of smooth words with /r/; B Each dot represents a language (size of the circle = number of words); whiskers show 95% Bayesian credible intervals corresponding to the mixed-effects Bayesian logistic regression analysis indicating that rough words have a much higher proportion of /r/ (posterior mean = 63%) than smooth words (posterior mean= 35%).

Winter, B., Sóskuthy, M., Perlman, M., & Dingemanse, M. (2022). Trilled /r/ is associated with roughness, linking sound and touch across spoken languages. Scientific Reports, 12(1), 1035. doi: 10.1038/s41598-021-04311-7 PDF

Iconicity and funniness ratings

The intersection of iconicity and funniness ratings for 1419 words. A: Scatterplot of iconicity and funniness ratings in which each dot corresponds to a word. A loess function generates the smoothed conditional mean with 0.95 confidence interval. Panels B and C show the distribution of iconicity and funniness ratings in this dataset.

Dingemanse, M., & Thompson, B. (2020). Playful iconicity: structural markedness underlies the relation between funniness and iconicity. Language and Cognition, 12(1), 203–224. doi: 10.1017/langcog.2019.49 PDF

Vowel-colour associations

Vowel-colour associations for 1164 participants (central panel), showing, clockwise from bottom left, (a) a participant with very low structure yet high consistency across trials, probably a false positive synaesthete, (b) a typical nonsynaesthete with mappings that are both inconsistent and unstructured; (c) a middling participant with significant structure but inconistent choices across trials; (d) a highly structured but inconsistent participant; and (e) a typical vowel-colour synaesthete, with highly structured, consistent and categorical mappings.

Cuskley, C., Dingemanse, M., Kirby, S., & van Leeuwen, T. M. (2019). Cross-modal associations and synesthesia: Categorical perception and structure in vowel–color mappings in a large online sample. Behavior Research Methods, 51(4), 1651–1675. doi: 10.3758/s13428-019-01203-7 PDF

Arbitrariness, iconicity and systematicity

(A, B) Words show arbitrariness when there are conventional associations between forms and meanings. Words show iconicity when there are perceptuomotor analogies between forms and meanings, here indicated by shape, size and proximity (inset). (B, C) Words show systematicity when statistical regularities in phonological form, here indicated by color, serve as cues to abstract categories such as word classes. (D) The cues involved in systematicity differ across languages and may be arbitrary. (E) The perceptual analogies involved in iconicity transcend languages and may be universal.

Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., & Monaghan, P. (2015). Arbitrariness, iconicity and systematicity in language. Trends in Cognitive Sciences, 19(10), 603–615. doi: 10.1016/j.tics.2015.07.013 PDF

Repair interjections in vowel space

Panel showing average positions of repair interjections in vowel space. Left: The vowel inventories of the world’s languages tend to make maximal use of vowel space. In contrast to this, the vowels of the repair interjections all cluster in the same low-front region. Abbreviations: Cha’palaa (Cha), Dutch (Dut), Icelandic (Ice), Italian (Ita), Lao (Lao), Mandarin (Man), Murrinh-Patha (Mur), Russian (Rus), Siwu (Siw), Spanish (Spa). Right: An instrumental analysis of interjection tokens from Spanish and Cha’palaa shows that the interjections have distinct, language-specific vowel targets, with Spanish closer to /e?/ and Cha’palaa closer to /a?/.

This is an annotated version of Figure 2 and Figure 3 from our original paper, showing more clearly that the right panel is zooming in on the small part of the vowel space populated by all of the languages.

The overall point is two-fold: First, there is strong similarity in the overall region of vowel space languages end up in for their repair interjections. Second, there is nonetheless also room for a small degree of language-specificity in precisely how the interjection is realised in a language. This demonstrates the two parts of our argument in the paper: that repair interjections have universal properties, and that they are (language-specific) words.

Dingemanse, M., Torreira, F., & Enfield, N. J. (2013). Is “Huh?” a universal word? Conversational infrastructure and the convergent evolution of linguistic items. PLOS ONE, 8(11), e78273. doi: 10.1371/journal.pone.0078273