Sunday, April 27, 2008

Project 3: Dictofunk


My newest project had the objective of loading a word list from a text file and converting the words into a tree.

In other words:

act

arc
art

would become

a->c->t
.->r->c
....->t

This is one of those sort of important, school-y types of things that people should learn, and I wanted to do it for the Boggle Solver program (given a Boggle board and a list of valid words, find all possible words within the rules of Boggle) I had talked with somebody about a couple of months ago.

The basic process involved is this:
1. Read a word from a file.
2. Starting out with an empty tree, search the top level of the tree for the first letter in the word.
3a. If you find it, make that spot the top of the new tree.
3b. If you don't, add the letter to the tree, make that letter the top of the new tree.
4. Chop off the first letter of the word and start over at 2 with your new subtree.

Finally, when the word is done, it sticks a "." at the end so it knows it's a complete word (so that you know, for instance, while "act" and "actual" are words, "actu" and "actua" aren't--they won't have a "." in their subtrees).

Here's the same algorithm in C# form:

public void insertWord(TreeNode thisNode, Char[] chars)
{
bool found = false;
String charString = new String(chars);

if (charString.Length > 0)
{
charString = charString.Substring(1);

foreach (TreeNode branch in thisNode.Nodes)
{
if (branch.Text == chars[0].ToString())
{
found = true;
insertWord(branch, charString.ToCharArray());
}
}
if (!found)
{
insertWord(thisNode.Nodes.Add(chars[0].ToString()), charString.ToCharArray());
}
}
}


And that's pretty much that. The actual algorithm is fast (as seen in the screen shot, with the test dictionary of over 70,000 words it was going through about 13,000 words per second). Loading the tree into the Windows Form, however, takes a good 60-90 seconds. That's not a major concern, however, since the graphical display (yet another great thing that C# does that makes life so easy) is essentially pointless except for visually verifying that the algorithm works. I was kind of shocked that this was even included as a basic display type. Oh, the future.

Oh, also, C# in general was a huge help in doing this in that it I didn't have to worry about constructing a tree class of my own. I was able to spend my time actually figuring out the algorithm I needed, which, while I've certainly read about tree insertions before, I did last night and this morning on my own. Once again, this is totally freshman programming class stuff, but it feels good to know I did it more or less on my own.

Source
Dictofunk.cs

No comments: