Tuesday, 29th September 2009
I've started yet another potentially huge programming project, this one inspired by my return to learning Chinese (mainly on iKnow (aka Smart.fm) and Lingq). This program returns to one I started a while ago, which allowed me to search an open source Chinese-English dictionary (cedict) and select words to create a list of vocab. What I want to create now is a program that allows me to store all the Chinese words I "know" and quantify to what degree I know them (do I know the pinyin, the tone, a meaning, all the meanings in all contexts, etc.?). There are many online applications that do something similar (including iKnow and Lingq), but they never seem to function quite as I would like. Maybe that’s because I'm not sure exactly what I want.
One thing I’d like is to have a collection of sentences that I can attempt to read, and if I get stuck, I want to be able to find out what a word is quickly. But I would also like the opposite in that: if I look up a word, I would like to see all the sentences in my collection that contain it, thus showing the word in context. One idea I've been considering is to build a network of hanzi linked by relationships such as tone, pinyin (perhaps broken down into initial and ending), radical, grammatical function (i.e. noun, verb, adjective etc.), vocab type (e.g. animal, relative, food etc.), where I have come across the word (e.g. iKnow, Harry Potter, a Go book etc.). This should help solidify the network that is being built my brain. The network might also be useful in identifying words that I’m likely to confuse.
Above is an screen shot of a Tkinter app that I've created. So far it just displays some text (in this case, the first three lines of a translation of Harry Potter) as an array of labels on a tkinter grid. I’m not sure whether this is a particularly good idea and may switch to using Pygame to display the text. It should be relatively quick to add a display of word meaning (which are currently output to the command line) using cedict. I think one major challenge will be to get the program to identify names in the text, which again, will require some understanding of Chinese grammar.