I have started to analyze some of the results from the word test experiment. While I continue to crank through the data, here is an early result that I found interesting.
What this shows is the estimated word count, as extrapolated from the tested samples of each test, of over 200 trials, grouped by self-reported skill level and fit to a Gaussian distribution. The bar height is the mean value for the estimated known words, and the error bars are +/-1 standard deviation for the Gaussian distribution. Yes, that’s quite a large deviation, and it’s not surprising given the many sources of variability in the data: sampling error in individual trials by using 165 samples out of 36,000; self-reporting skill level in one of 6 categories — beginning to advanced and native — with the meaning of the categories undefined; and individual choice of what it means to “know” a word. This is all in addition to the natural distribution of skill within one of the 6 defined categories. Also, keep in mind that +/-1 standard deviation is only about 68% of the data. Thus, while the graph makes a nice linear progression by advancing skill level, the results are probably fuzzier than they look.
Despite the overlap in known words for each skill category, there are clear definite realms for each level. If you score 7,000 words on the test, you would be similar in knowledge to others in either the lower intermediate or the intermediate level, but you definitely know more than most beginners, and less than most high intermediate or advanced learners. So, this chart may be useful if you’ve gotten a word score from the test, and want to find out if you’re at the skill level you thought you were. Of course, there is much more to language skill than the the number of isolated words you know. Ultimately, the only person you have to satisfy is yourself.