Annotating Speech Turn

Goals: We seek to identify periods for each speaker when they have the floor, or are on turn, but that is not a well-defined condition, so some heuristics are in order. I provide guide lines and specific examples below.

Usually, in a dyadic conversation, one person is speaking and the other listening. However, real conversation is much messier than this. There will be periods, sometimes protracted, in which two subjects vie for sole command of the floor, overlapping in their speech. There may also be periods in which nobody in particular has the floor. Thus we annotate Turn on a by-speaker basis, rather than by-pair, and we acknowledge that zero, one, or two people may have "Turn" at any given moment.

The best way to annotate turn is to listen attentively to the conversation. This is best done in Praat, where it is possible to follow the time course of the speech with some precision. Examples given here will be somewhat harder to follow, due to the separation of sound and annotation.

First, consider the following extract:

We might annotate turn thus:

Notation convention: An interval labeled "s" is one in which the speaker has "Turn". A "b" label is used for back channels, which are described in more detail below.

Note that at the start, the two speakers overlap. The top speaker (A) eventually gives up and relinquishes floor to Speaker B. Speaker B pauses for a while, but there is a very clear intention to continue. This continuation is overlapped again by Speaker A, who eventually wrestles "Turn" from B. In annotating "Turn", we are not distinguishing between speech and non-speech. Rather, we are distinguishing those times when one participant is, or acts as if they were, in control. Thus, when a speaker stops speaking, but clearly intends to pick up and continue, we record "Turn" as continuous throughout.

Some cues to indicate that a speaker is retaining their turn are unfinished sentences, the use of filled pauses ("ummmmm", "ehhhhh"), a non-final intonation pattern, or the like. We might contrast these with indications that a turn has ended, such as asking a direct question, relinquishing turn after a protracted period of negotiation (as here, where Speaker A relinquishes turn after the first few seconds), or switching from clear "Turn" to clear back channeling (see below).

Consider the example below.

And its annotation::

Here, Speaker A stops for a considerable time, but there is clearly continuity from one utterance to the next (Rough translation: "Gaugain is buried on that island ... and Jacques Brel"), and so we do not regard this as a relinquishing of "Turn". Speaker B provides back channels which are treated differently:

Back channels

A back channel is a short utterance produced by one speaker which provides feedback to the speaker who has the floor, but which does not seek to take the floor itself. Typically, these are utterances like "yeah", "uh-huh", "ok", "exactly!", and such. They do not have complex linguistic structure, and they do not bring the other speaker to a stop.

Unlike "Turn", we annotate back channels only for the period during which the speaker actually talks. In the above example, two back channels are produced, one after the other. We need a rule to help us decide whether this should be noted as two separate back channels, or as one composite one.

RULE: If two back channels are separated by a silence of more than 200 ms, each is marked separately. If separated by less, they are regarded as one unit.

Potential Problems

The annotation of turn (and back channels) will necessarily involve some subjective judgement on the part of the annotator. Although for most stretches of a conversation, it will be clear who has the floor, there will be some occasions, usually of short duration, where a judgement call needs to be made. Here are a few examples, showing the judgement call I would make in that instance, along with discussion.

Example 1:

And its annotation::

Comments: Here, the top speaker (K) retains turn throughout. Although there is a long gap, and she is clearly thinking, there is a continuity to her speech, (witness the filled pause in the gap), and there is no evidence of any negotiation. In general, once someone clearly has turn, you should tend to assume that they retain it until there is some clear evidence of surrendering it, such as switching to back-channeling, or asking a direct questions. Judgement calls will need to be made during less animated stretches, where neither participant is assertive in taking the floor, but these are relatively rare.

Example 2:

And its annotation::

Comments: Two things are noteworthy here. Firstly, there is evidence of laughter. I have chosen not to annotate laughter as part of any turn, and so turn comes to an end when laughter breaks out. Second, note that I have taken the second speaker's first contribution ("ne") to be part of his "Turn", and not a back channel. This is because he repeats the word as he picks up his clear contribution - that is, the single interjection looks like part of turn negotiation, and hence counts as "Turn", and not as a back-channel.