Friday, 2nd October 2009
Adding a dictionary
I have made some progress on my Chinese text reader program over the last couple of days. Now, when the mouse is moved over a character, that character's pinyin is displayed in the first box on the left and its meaning in the box below. The app also searches for compounds consisting of the selected character and the character following it. The screen shot shows how the program identifying the word 先生 in the text. As a side note, I had to add an arrow to the screen shot of the application because it doesn’t appear when you use Print Screen. I got the image of the arrow from here, which was very helpful.
I realised that the previous idea of displaying characters as an array of labels was quite a stupid idea so had to think of another method. I wanted the display the hanzi in such a way that I could display information about the character when it was moused over or clicked. Despite being more familiar with Pygame, I decided to stick with Tkinter, because I want to add the ability to input text and perhaps have lists, both of which are much simpler in Tkinter. I first changed the program (version 1.01, I guess) to use an array of buttons, which output information about the hanzi they displayed when pressed, but again, this seemed an overly complicated way of doing things.
Finally, I discovered that I could bind events to individual items on a Tkinter canvas, so now the hanzi are displayed one-by-one onto a canvas. In the hope that this blog might be useful to someone put a section of code below. The code shows two function of my App object (which inherits the Frame object). The code displays each of the characters that belong to the App's document object (originally called text, until I realised overwrote a Tkinter object) on a canvas and when the mouse is moved over (which is the "<Enter>" event) any hanzi (which are tagged as 'hanzi'), that hanzi is looked up in the App’s dictionary.
def createCanvas(self): self.canvas = Canvas(self, width=self.width, height=self.height, bg='white') self.canvas.grid() x = y = 20 for hanzi in self.document.characters: item = self.canvas.create_text((x, y), text=hanzi, font=("Arial", 12), tags='hanzi') x += self.char_width if x > self.width: x = 20 y += self.char_height if y > self.height: break self.canvas.tag_bind('hanzi', "", self.mouseoverHanzi) def mouseoverHanzi(self, event): n = event.widget.find_closest(event.x, event.y)-1 hanzi = self.dictionary.search(self.document.character[n])
Things to improve
- I'd like to be able to look up compounds of more than two characters, which I think will require improvements in the way the dictionary stores information, so it can retrieve compounds efficiently. Maybe I’ll learn how to use SQL or something, which I’ve been planning for a while now.
- The program should also look up compounds in which the character under the mouse is not the first. For example, if a user mouses over (there must be a better verb) 生 in the image text, the program should offer 先生.
- I'd like to add a visual aspect to the display, for example, highlighting the character or compound under selection, or by creating the option to display the pinyin under each hanzi.
- A major functionality that I intend to add is the ability to update the dictionary, so for example, I could add 女贞 (which means Ligustrum lucidum, the Chinese Privet (Harry Potter lives on Privet Drive)), which isn’t in cedict. This again may require changing the way I store information in the dictionary.
- In fact, I would like to have a separate list, which would contain words I know or would like to learn. The program could then highlight which words are not currently in my list.
- I also need to fix an issue with words like 号, which have two readings and corresponding meanings. Currently, when 号 is selected, the program displays it’s pinyin as "hao2", and its meaning as "roar; cry", because this is the first entry in the dictionary, coming before the other pinyin and meaning (which is the correct one in this context), "hao4" and "day of a month; (ordinal) number". This could, yet again, be overcome by altering the way the dictionary stores information.