CHAT Transcription Format
Adapted from:
Tools for Analyzing Talk Part 1: The CHAT Transcription Format
Brian MacWhinney
(Carnegie Mellon University)
and from slides by:
Sonja Eisenbeiss
(University of Essex)
Transcription in CHAT / CLAN
- Written in a text editor
- Saved as UTF8 encoded
- Use the
.cha
file extension
- CLAN will take care of these steps for you.
- Every transcript must:
- begin with the line:@Begin
- end with the line: @End
Between @Begin and @End:
- Headers with information about the transcript
- Main tiers [basic transcription]
- Dependent tiers [further annotations]
CHAT-Format: Main Tiers
- what was actually said
- one utterance per tier
- introduced by
*
, the three-letter code for the participant e.g. MOT
or CHI
and a tab:
*MOT: let me put them together .
CHAT-Format: Main Tiers
- Every line must end with a carriage return.
- The first line in the file must be an @Begin header line.
- The second line in the file must be an @Languages header line. The languages entered here use a three-letter ISO code, such as “eng” for English.
- The third line must be an @Participants header line listing three-letter codes for each participant, the participant’s name, and the participant’s role.
CHAT-Format: Main Tiers
- After the @Participants header come a set of @ID headers providing further details for each speaker. These will be inserted automatically for you when you run CHECK using escape-L (in CLAN).
- The last line in the file must be an @End header line.
CHAT-Format: Main Tiers
- Lines beginning with * indicate what was actually said. These are called “main lines.” Each main line should code one and only one utterance. When a speaker produces several utterances in a row, code each with a new main line.
- After the asterisk on the main line comes a three-letter code in upper case letters for the participant who was the speaker of the utterance being coded. After the three-letter code comes a colon and then a tab.
- What was actually said is entered starting in the ninth column.
Chat-Format: Main Tiers: Words and Utterances
- Utterances must end with an utterance terminator. The basic utterance terminators are the period, the exclamation mark, and the question mark. These can be preceded by a space, but the space is not required.
- Commas can be used as needed to mark phrasal junctions, but they are not used by the programs and have no sharp prosodic definition.
- Use upper case letters only for proper nouns and the word “I.” Do not use uppercase letters for the first words of sentences. This will facilitate the identification of proper nouns.
Chat-Format: Main Tiers: Words and Utterances
- To facilitate recognition of proper nouns and avoid misspellings, words should not contain capital letters except at their beginning. Words should not contain numbers, unless these mark tones.
- Unintelligible words with an unclear phonetic shape should be transcribed as
xxx
.
- If you wish to note the phonological form of an incomplete or unintelligible phonological string, write it out with an ampersand, as in
&guga
.
- Incomplete words can be written with the omitted material in parentheses, as in (be)cause and (a)bout.
Chat-Format: Main Tiers: Disfluency Markers
phrase repetition |
<> [/] |
[/] that is a dog. |
< > is used to mark repeated material |
word revision |
[//] |
a dog [//] beast |
Revision counts once |
phrase revision |
<> [//] |
[//] |
how can you see it ? |
filled pause |
&- |
&-um &-you_know |
Fillers with underscore count as one word |
CHAT manual pg. 92
CHAT: Dependent Tiers
Further annotations:
- %mor [morphosyntactic coding]
- %pho [phonological coding]
- %syn [syntactic coding]
- %err [errors]
- %com [comments]
- %spa [speech acts]