Hapax Legomena vs. the Brick Wall

The Brick Wall

With reading as my primary skill of focus in learning Chinese, a large part of my study is acquiring new words. Some vocabulary is from general word lists such as the HSK, while much of it is tied to a specific text I am reading, in order to increase my level of comprehension. While many approach the task of reading in a foreign language by looking up unknown words as they are encountered, I prefer to learn them ahead of time, to avoid the break in concentration while reading. With my bad habit of perfectionism, my main strategy in the past for learning these word has been the “Brick Wall Method”:

The Brick Wall Method – Learn every unknown word you encounter, no matter how difficult or rare it is

My theory has been — like being a brick wall against a tennis player — to not let any unknown word get past me, so that eventually I will run out of unknown words and thus will have learned the language. If a word is used in a text, it’s clearly important to some nominal degree, and if it’s used once, then it’s more likely to be seen again at some point, versus all the words that aren’t in the text.

(more…)

Continue ReadingHapax Legomena vs. the Brick Wall

New Software – Chinese Word Extractor

  • Post author:
  • Post category:Software

I have had my online vocabulary extraction tool available on the web for a while now. I have gotten a lot of use out of it myself, as my primary interest has been to develop more vocabulary to increase reading ability. The application generally works ok, but it suffers from some technical issues. Because it loads the entire CC-CEDICT every time it runs, it taxes the shared hosting provider a lot, to the point where the script crashes unpredictably, especially for larger texts. It also requires manual intervention to keep the dictionary up to date, and adding more dictionaries takes a lot of additional effort.

Meanwhile, for the past year I’ve been working on a similar program that can be used offline. It has been working well, is a little faster, and is easier to drop in newer versions of the CC-CEDICT dictionary. I have spent a few months adding a little more polish to it, and now am releasing it as open source software. At this point, it is available for Windows systems. The source code is also available, which would allow it to be used on nearly any system. More details are at the project page and the documentation page. Here are some screenshots to demonstrate its functionality:

(more…)

Continue ReadingNew Software – Chinese Word Extractor

Some Thoughts on Listening Skill

Since starting Chinese, my studies have been mainly on developing reading ability. Not being in China or having regular language partners, my main window into China has been through text, so it made sense to take that approach. But more and more, I feel I’m missing out on enjoying all the audio and video content out there, which is easily accessible these days over the internet. I do occasionally listen to spoken Chinese for study purposes, but I have made very little progress in that area. At the beginning of this year, I changed my study ritual to focus much more on developing listening skills. In this post, I’ll share my experience thus far and various other thoughts. Please remember that this is just sharing my own experience and, as with any advice on language learning, your mileage will definitely vary!

(more…)

Continue ReadingSome Thoughts on Listening Skill

Creating Audio Flashcards with Transcriber, Audacity, and Anki

Transcriber, Audacity, and Anki are three programs, all free and open source, that are useful for language study. At some point in the future, I hope to write more on each of these. In the meantime, I wanted to announce two export plugins I created for Transcriber. One export creates a label file for Audacity, for splitting an audio file into individual clips, and the other creates an import file for Anki, associating the transcribed text with the audio segments. Below are step-by-step instructions for the 6 steps involved, starting from a raw audio file and finishing with a set of Anki flashcards.

(more…)

Continue ReadingCreating Audio Flashcards with Transcriber, Audacity, and Anki

A Mathematical Model for Chinese Word Knowledge

The Known Chinese Words Test has been running for a month now. During that time I’ve collected data from 170 trials, from learners with a wide range of levels. The results are very encouraging, so that I can give more details about what I have found. What I have been working on is a mathematical model for word knowledge, which can describe the probability for a particular person to know any word, with just a few variables involved. The results from the collected trials validates that hypothetical model, and I’m elated.

(more…)

Continue ReadingA Mathematical Model for Chinese Word Knowledge

Counting Known Chinese Words – Part II

  • Post author:
  • Post category:Linguistics

By 2008, I had been studying Chinese off and on for around 3 years. As a self-learner, my study was rather eclectic: Pimsleur, Chinesepod, and random flash card lists were my main methods. I was far from fluent, still struggling to understand all but the simplest news articles, fiction, or blog posts. But I felt like I did know a lot of words, I just didn’t know how many. How much longer before this would start to get easy? So I undertook a self-examination to estimate how many Chinese words I actually knew.

(more…)

Continue ReadingCounting Known Chinese Words – Part II

Counting Known Chinese Words – Part I

  • Post author:
  • Post category:Linguistics

In forums for foreign language learners,1 2 certain questions recur. Some are easy to answer (Do I need to learn tones? Yes). Others (Should I study words or characters?) may have more than one answer. Two common questions I am especially interested in are:

  • How many words do I know?
  • How many words do I need to know (to read a newspaper, book, etc.)?

The second question is not immediately answerable except under certain conditions: if you can’t yet read your target material, then the answer is–More! Of course, people ask the second question because they want to know how close they are to their target fluency level, and the answer to the first question can give a rough estimate of that.

(more…)

Continue ReadingCounting Known Chinese Words – Part I

How to Create an e-Book from an Online Reading Site

I recently bought an Amazon Kindle, for the primary purpose of reading more Chinese. It has turned out to be a great investment, since I am no longer tied to my computer screen for reading things I find online. I had been collecting bookmarks to online books sites for a long time without making much use of them. Now that I am a bigger consumer of reading material, I’m starting to make use of them. In particular, I need sites that allow for downloading the raw text, so that I can convert it into a formatted book. (more…)

Continue ReadingHow to Create an e-Book from an Online Reading Site