Here is an ongoing series of explanations of how the Graphmaster works, for those who want to build their own or simply want a deeper understanding of the approach.
The Graphmaster consists of a collection of nodes called Nodemappers. These Nodemappers map the branches from each node. The branches are either single words or wildcards.
The root of the Graphmaster is a Nodemapper with about 2000 branches, one for each of the first words of all the patterns (40,000 in the case of the A.L.I.C.E. brain). The number of leaf nodes in the graph is equal to the number of categories, and each leaf node contains the <template> tag.
There are really only three steps to matching an input to a pattern. If you are given (a) an input starting with word "X", and (b) a Nodemapper of the graph:
For completeness there should also be a terminal case: If the input is null (no more words) and the Nodemapper contains the <template> key, then a match was found. Halt the search and return the matching node.
If the root Nodemapper contains a key "*" and it points to a leaf node, then the algorithm is guaranteed to find a match.
A convenient metaphor for AIML patterns, and perhaps also an alternative to database storage of patterns and templates, is the file system. Hopefully by now almost everyone understands that their files and folders are organized hierarchically, in a tree. Whether you use Windows, Unix or Mac, the same principle holds true. The file system has a root, such as "c:\". The root has some branches that are files, and some that are folders. The folders, in turn, have branches that are both folders and files. The leaf nodes of the whole tree structure are files. (Some file systems have symbolic links or shortcuts that all you to place "virtual backward links" in the tree and turn it into a directed graph, but forget about that complexity for now). Every file has a "path name" that spells out its exact position within the tree.
"c:\my documents\my pictures\me.jpg"
denotes a file located down a specific set of branches from the root.
The Graphmaster is organized in exactly the same way. You can write a pattern like "I LIKE TO *" as "g:/I/LIKE/TO/star". All of the other patterns that begin with "I" also go into the "g:/I/" folder. All of the patterns that begin with "I LIKE" go in the "g:/I/LIKE/" subfolder. (Forgetting about <that> and <topic> for a minute) we can imagine that the folder "g:/I/LIKE/TO/star" has a single file called "template.txt" that contains the template.
If all the patterns and templates are placed into the file system in that way, we can easily rewrite the explanation of the matching algorithm: If you are given an input starting with word "X" and a folder of the filesystem:
[Editor's note: "underscore" and "star" as directory names above are meant to stand in for "_" and "*", which are not allowed as file or directory names in some operating systems. Since the literals "underscore" and "star" might be actual words in a pattern, perhaps a real implementation along these lines would use some other symbols to serve the same function.]
You can see that the matching algorithm specifies an effective procedure for searching the filesystem for a particular file called "template.txt". The path name distinguishes all the different "template.txt" files from each other.
What's more, you can visualize the "compression" of the Graphmaster in the file system hierarchy. All the patterns with common prefixes become "compressed" into single pathways from the root. Clearly this storage method scales better than a simple linear, array, or database storage of patterns, whether they are stored in RAM or on disk.
Here is a really simple explanation of Graphmaster pattern matching: It works just like a dictionary or encyclopedia. If you want to look up a word or phrase, you don't start at the beginning or the end and search through every entry until you find a match. No, you turn first to the section that matches the first letter or word. Then, you skip to another section that contains a set beginning with the next letter or word. You continue in this process until you find your word or phrase.
In the Graphmaster, the wildcards "*" and "_" act like two special letters, that come before "0" and after "Z" respectively. You should always check the "_" listings first, in case your input has an ending listed there. Check the alphabetical listings next. If you don't find any matches there, try the patterns beginning with "*".
The only final complication is that the "*" and "_" can appear in patterns that begin with "[0-9A-Z]". If your input matches some pattern prefix, then try the patterns with that prefix followed by "_" first, then the ones with that prefix followed by some word, finally try the ones with the prefix followed by "*".