Recently, I created a set of flashcards of single Chinese characters, to practice writing. The front of my Anki cards contained the pinyin, definition, and clozes for the most common words containing the character, while the back of the card was simply the character. I tagged the cards in groups of 200 by frequency rank, using tags of “1-200”, “201-400”, etc. I already know a number of characters, so I decided to start practicing with the more infrequent characters, the “1601-1800” tag.
There were some characters I was well familiar with. Other characters took more time to remember how to write, but weren’t too difficult, as I knew the characters on sight from extensive reading. But every once in a while I would be shown a card, and it would be for a character I had never seen before in 6 years! Some like 贼 (zéi, thief) or 鹏 (péng a mythical bird) were surprising to see in the 1600-1800 range for frequency ranks, ranked more frequent in the Lancaster Corpus than 垂 (chuí to hang down) and 夹 (jiā to squeeze). But however unusual they were, I still recall encountering them at some point (金色飞贼 is the golden snitch in Quidditch from Harry Potter, and 鹏 was from reading on Chinese mythical animals). However, 琉 (liú glazed tile) and 鲍 (bào abalone) don’t look familiar at all, and I am fairly certain I have never seen the characters 萼 (è calyx of a plant) and 懋 (mào diligent) in over 6 years of study. Is it just a strange chance that I haven’t encountered them, is it failing memory, or are they more rare than their frequency would suggest?
› Continue reading…
Tags:
corpus,
LCMC,
Linguistics,
word frequency,
word lists
The HSK is a well-known skill level test used by the PRC to assess language proficiency in Chinese. Even for those who have no interest in taking the HSK test, the lists of Chinese words associated with the test are a convenient source of material for learners to study vocabulary. I have used these word lists myself with great success; it was a quick and effective way to gain a huge amount of usable vocabulary.
In 2010, the HSK exam underwent a major reworking, changing the structure of its skill ranks, increasing emphasis on speaking and writing, and revising its vocabulary. Where the “old” pre-2010 word lists consisted of 8,000+ words across 4 levels, the “new” HSK has 5,000 words distributed into 6 levels. Below is a summary of the word counts in the old and new vocabulary lists, based on actual word lists obtained from various sources (see footnotes for details). Note that these include a small amount of double counting (less than 2%) due to words repeated at more than one level, because of either different pronunciation or meaning. Also note that these counts differ slightly from the official word counts reported by Hanban.
Word counts in the old and new HSK word lists
Level |
old HSK |
new HSK |
1 |
1007 |
153 |
2 |
2001 |
150 |
3 |
2189 |
300 |
4 |
3587 |
600 |
5 |
– |
1300 |
6 |
– |
2513 |
Total |
8784 |
5016 |
Since I had invested so much time in studying the old lists (up to level 3), it was natural to wonder whether I should continue studying my existing flashcards or switch to the new HSK lists. How many words have I learned that are deprecated by the HSK, and does it mean they are unimportant? If I did switch, what level should I pick to start studying ?
› Continue reading…
Tags:
flashcards,
studying,
vocabulary,
word frequency,
word lists,
words
The Brick Wall
With reading as my primary skill of focus in learning Chinese, a large part of my study is acquiring new words. Some vocabulary is from general word lists such as the HSK, while much of it is tied to a specific text I am reading, in order to increase my level of comprehension. While many approach the task of reading in a foreign language by looking up unknown words as they are encountered, I prefer to learn them ahead of time, to avoid the break in concentration while reading. With my bad habit of perfectionism, my main strategy in the past for learning these word has been the “Brick Wall Method”:
The Brick Wall Method – Learn every unknown word you encounter, no matter how difficult or rare it is
My theory has been — like being a brick wall against a tennis player — to not let any unknown word get past me, so that eventually I will run out of unknown words and thus will have learned the language. If a word is used in a text, it’s clearly important to some nominal degree, and if it’s used once, then it’s more likely to be seen again at some point, versus all the words that aren’t in the text.
› Continue reading…
Tags:
corpus,
Harry Potter,
Linguistics,
vocabulary,
word frequency,
word lists
I have started to analyze some of the results from the word test experiment. While I continue to crank through the data, here is an early result that I found interesting.
What this shows is the estimated word count, as extrapolated from the tested samples of each test, of over 200 trials, grouped by self-reported skill level and fit to a Gaussian distribution. The bar height is the mean value for the estimated known words, and the error bars are +/-1 standard deviation for the Gaussian distribution. Yes, that’s quite a large deviation, and it’s not surprising given the many sources of variability in the data: sampling error in individual trials by using 165 samples out of 36,000; self-reporting skill level in one of 6 categories — beginning to advanced and native — with the meaning of the categories undefined; and individual choice of what it means to “know” a word. This is all in addition to the natural distribution of skill within one of the 6 defined categories. Also, keep in mind that +/-1 standard deviation is only about 68% of the data. Thus, while the graph makes a nice linear progression by advancing skill level, the results are probably fuzzier than they look.
Despite the overlap in known words for each skill category, there are clear definite realms for each level. If you score 7,000 words on the test, you would be similar in knowledge to others in either the lower intermediate or the intermediate level, but you definitely know more than most beginners, and less than most high intermediate or advanced learners. So, this chart may be useful if you’ve gotten a word score from the test, and want to find out if you’re at the skill level you thought you were. Of course, there is much more to language skill than the the number of isolated words you know. Ultimately, the only person you have to satisfy is yourself.
Tags:
studying,
vocabulary,
words