Complexity: Activity 4

Activity 4: Complexity in oral vs. written language

Heritage learners are generally known to have a higher oral proficiency than their foreign-language-learning peers. But what is the relationship between their oral skills and their written skills? Is this the same for heritage and foreign language learners?  In this activity we examine complexity in the oral and written language of our learners.

Would you expect written language or oral language to have more lexical richness? Why or why not?

When you have finished typing your answer, click to compare your response with the Learner Language staff response.

In general, we expect a written version of a story to have more lexical and syntactic complexity than the oral version of the same story. This is because a writer has more time to think in creating the storyline. They have more time to elaborate on motivation, reasons for story developments, or more detailed descriptions.

Part One: Type Token Ratio (TTR)

One useful measure of complexity, a type-token ration (TTR), documents lexical richness, or variety in vocabulary. Does the learner use the same words over and over, or does s/he use a variety of different words to communicate?  A type-token ratio (TTR) is the total number of UNIQUE words (types) divided by the total number of words (tokens) in a given segment of language. For example, that last sentence contains 26 different words (tokens), but several of those words (like ‘a’, ’the’, ‘words’) occur more than once, so there are only 19 UNIQUE words, or types. The TTR of that sentence is 19/26, or .73. The closer the TTR ratio is to 1, the greater the lexical richness of the segment.

Analysis 1:

Try performing a TTR analysis on some of the language produced by Henry in his oral and written versions of the narration task. Use the guidelines below and start with the written sample of his 'grocery store' narrative task:

Written Sample

Henry Written Sample Image

  1. Tokens: Count how many total words are used. Enter this number in Table 1 below.
  2. Types: How much variety in word choice is there in the written sample? Count the number of UNIQUE words that are NOT repeated. For example, “el” might be used 5 times, but you should only count it 1 time to get this number. Enter this into Table 1 below. 
  3. The Type Token Ratio (TTR). The TTR is the # of Types divided by the # of Tokens. The closer the TTR is to 1 the more lexical variety there is. Enter Henry's TTR for his written sample in Table 1 below.

Before performing a TTR analysis on Henry's spoken version of the task, NOTE: for purposes of comparison, this text should contain the same number of total words as were used in the written version, therefore, your analysis should stop with the word una and do not count "OK," "uh," and partial words or false starts.

Spoken Sample

Henry Spoken Sample Image

Click the image to enlarge, or see the full transcript (PDF): lines 10-13.

Figure out the Type Token Ratio for the spoken sample above, using the same procedure. Then enter Henry's TTR for his spoken sample in Table 1 below.

Table 1: Henry TTR Analysis

Henry Tokens Types TTR (= types ÷ tokens)

Analysis 2:

Now perform the same TTR analysis on the written (first) and spoken samples for Raúl provided below. Then enter each TTR value in Table 2.


Written Sample

Spoken Sample

Raul Spoken TTR image

Click the image to enlarge, or see the full transcript (PDF): lines 6-12.

Table 2: Raúl TTR Analysis

Raúl Types Tokens TTR


Table 3: TTR comparison between Henry and Raúl's oral and written samples

  Henry Raúl
TTR Written %
Spoken %


Comparing the lexical complexity in writing vs. spoken language for each learner, what patterns do you see from Henry and Raúl? Are they same or different? Discuss possible reasons for the patterns you observe in these two learners’ vocabulary usage.

When you have finished typing your answer, click to compare your response with the Learner Language staff response.

  Henry Raúl
TTR Written 34 / 53   = .64 29 / 55    = .52
Spoken 34 / 53   = .64 35 / 55    = .64

Neither Henry nor Raúl seem to follow our prediction of seeing a richer lexical density in writing. Henry has an equal amount of lexical variety in his oral and written versions. However, Raúl actually has LESS lexical complexity in his written version than in his oral version. His lexical variety decreased from .64 in his oral version  to .52 in his written version.

What could explain the results of Raúl’s complexity analysis?It is in fact fairly consistent with what is reported for heritage language learners: their oral proficiency in the heritage language typically outpaces their written proficiency. Heritage learners typically use the heritage language primarily if not solely for social purposes, which tend to be oral. And typically they have not used their heritage language for academic purposes in school, where they would have developed proficiency in written skills. Raúl's written version is very simple and repeats even the same word order: SVO.Raúl explained in his interview that he felt as though his vocabulary was limited and something he could potentially improve upon in a formal Spanish class.

Henry, on the other hand, likely learned all of his L2 vocabulary in both oral and written modalities. As a traditional foreign language learner, he likely has been assessed on his ability to produce new vocabulary words in both written and oral tasks throughout his language learning experience.


Part Two: Syntactic complexity

Now that we've gotten an idea of our learners' lexical complexity, we can take a look at their syntactic complexity. One basic method for doing this is to calculate the percentage of complex sentences in a given sample of language. A simple sentence has a subject and one predicate verb. Two simple sentences may be joined by and. These are still counted as two simple sentences. For our purposes, a complex sentence is two simple sentences combined with a subordinating conjunction (since, because, after, although, if, until, etc.) OR a relative pronoun (who, which, whose, whom, that, how, what, etc.).

For example:


He remembered the girl who gave him the book in the station.
Él recordó la chica que le dio el libro en la estación.

He remembered the girl.
Él recordó la chica.



Using the written and spoken samples from Part One above:
For each learner, use the tables below to find the % of complex sentences:

  1. Count the total number of sentences (T).
  2. Count the number of complex sentences used (C).
  3. Divide C by T. What percentage of the total sentences resulted as complex?
Henry C=Complex T=Sentences %

Raúl C=Complex T=Sentences %

This box shows all of your calculations in determining the complexity of our learners' language:




TTR Oral
% Complex Sentences Oral

Now that you've seen syntactic complexity alongside the lexical complexity of our learners, what further commentary can you provide about the complexity in these learners' language?

When you have finished typing your answer, click to compare your response with the Learner Language staff response.

    Henry Raúl
TTR Oral 34 / 53    .64 35 / 55     .64
Written 34 / 53    .64 29 / 55     .52
% Complex Sentences Oral 2 / 5     40% 1 / 5     20%
Written 2 / 4     50% 0 / 8     0%

While their TTR analyses were somewhat comparable, the percentages of complex sentences produced by Henry and Raúl are strikingly different. First, Henry did follow what we originally expected by producing (slightly) more complex language in his written sample (50%) than in his spoken sample (40%). Meanwhile, Raúl showed much lower syntactic complexity in both modalities, with 0% complex sentences in his written sample.

Henry is likely to have been explicitly taught how to write more complex sentences in his formal Spanish classes, and encouraged to do so. Raúl however likely never had this level of academic support in Spanish as he began school in English at such an early age. Therefore, the difference in complexity between reading and writing and between our FL and heritage learner largely has to do with purpose and assessment of their Spanish. The purpose at some point in Henry's language learning probably focused on his ability to construct more complex sentences. This was probably something assessed in his Spanish classes. His purpose for learning Spanish throughout his life were therefore vastly different than that of Raúl, whose purpose and assessment has always been purely intelligibility and communication.


CARLA Mailing List Signup Contact CARLA CARLA Events Donate to CARLA CARLA on Facebook CARLA on YouTube Twitter
Center for Advanced Research on Language Acquisition (CARLA) • 140 University International Center • 331 - 17th Ave SE • Minneapolis, MN 55414