Skip to main content

Rethinking Tests of Vocabulary Size

Vocabulary size tests assume that people learn the most frequent vocabulary words first—but is frequency the only factor that matters?

Photo from Pixabay

How do you measure how many words a person knows? This can be a challenge at language learning programs (like the Missionary Training Center for The Church of Jesus Christ of Latter-day Saints), in research studies, and even in K-12 education systems. This might seem like a simple task at first glance. Yet it’s impossible to test every single word—by one measure, the average adult native English speaker knows about 20,000 word families, let alone individual words.1 What words should be used to measure the extent of someone’s vocabulary and accurately portray their knowledge?

Designers of vocabulary size tests generally assume that learners are more likely to know words that appear most frequently in the language. This makes logical sense—but is this correlation strong enough for tests to be able to generalize the size of a learner’s vocabulary from a small number of examples based on frequency alone? This is the question that BYU linguist Dr. Brett Hashimoto asks in his article, “Is Frequency Enough?: The Frequency Model in Vocabulary Size Testing.”

The “Frequency Model”

Leading vocabulary tests, such as the Vocabulary Size Test (VST), work by dividing words into 1,000-word “bands”—one for the first 1,000 words, another for the next 1,000, and so on—based on which words appear most frequently in a corpus of language data. Test developers then hand-pick a fixed number of words from each band to include on the test. The number of words a learner knows on the test is multiplied by a constant to arrive at the total number of words the learner knows (i.e., the size of the learner’s vocabulary). This is called the Frequency Model.

It is intuitive that how often a word is used relates to whether a learner is likely to know it—people encounter words like “boring” and “house” more often than “quotidian” and “abode”—but is it the only important factor? If it is, then there should be a very strong correlation between the frequency of a word and how many people get it right on a vocabulary test. If the correlation is not strong, then there must be other factors besides frequency involved.

The Study

To test this, Dr. Hashimoto gave a test of vocabulary to 403 learners of English with 35 different native languages. Rather than hand-picking the test items from frequency bands, he systematically included every tenth word from the first 5,000 most common words in English. He mixed these words with fake words, and participants had to identify whether a word was real or fake.

Dr. Hashimoto found in his analysis of the test results that a word’s frequency only moderately predicts its difficulty. The difficulty of individual test items had significant variation that was not explained by frequency alone.

He also identified other problems with the Frequency Model. First, sampling of words from each of the bands is arbitrary, and test developers bring their own subjective biases in deciding which words are somehow most representative of words at that frequency level. Second, separating vocabulary into 1,000-word bands in the first place is just as arbitrary—by freely selecting a set number of words from each band, developers assume that all words within that band are at the same difficulty level.2 The test data revealed significant variation in word difficulty within bands.

The Implications

Dr. Hashimoto’s aim with this study is not to construct an entirely new test that abandons frequency altogether, but to point to the need of test designers to consider variables other than frequency when making vocabulary size tests. The list of possible variables in testing vocabulary knowledge is lengthy and may misrepresent a language learner's knowledge.

For example, words that are long or difficult to pronounce may be learned later, even if they are more frequent; in contrast, words that are similar to those in a learner’s first language and words that refer to concrete things that the learner interacts with may be learned earlier, even if they are less frequent overall. Learners with different demographic backgrounds may learn different sets of vocabulary; some learners may know a large number of infrequent words in a specific field. (Many missionaries that served for The Church of Jesus Christ of Latter-day Saints may have developed massive vocabularies of abstract religious terms but still struggle giving directions to the bakery. Frequency-based tests of vocabulary size make little sense when missionaries learning a language for a specific purpose are taught the words repentance and atonement before grocery store and purple.)

As vocabulary testing becomes more accurate, it will more adequately show language learners’ true vocabulary proficiency. “In short,” Dr. Hashimoto concludes, “modern vocabulary size tests have some potentially major issues with their current design, and efforts should be made in the future to improve them.” And the first step of these efforts, he argues, would be to involve variables other than frequency. Only then will researchers be able to accurately estimate vocabulary size—unless, of course, you have a dictionary, a highlighter, and a lot of time on your hands. 20,000 word families’ worth.∎

Dr. Brett Hashimoto, Professor of Linguistics at Brigham Young University

Read the original article

Hashimoto, B. J. (2021). Is frequency enough?: The frequency model in vocabulary size testing. Language Assessment Quarterly, 18(2), 171–187. https://doi.org/10.1080/15434303.2020.1860058

1 Goulden, R., Nation, P. and Read, J. (1990) How Large Can a Receptive Vocabulary Be? Applied Linguistics, 11, 341-363. https://doi.org/10.1093/applin/11.4.341

2 “Assuming that word 1 in a frequency list is as difficult as word 999 seems logically problematic. In the same vein, there is nothing intrinsic about word 999 in a word list that makes it automatically, necessarily, and significantly easier than word 1001.” (Hashimoto, 2021, p. 182).