AIML Pattern Matching Simplified

Dr. Richard S. Wallace
16 July 2001 (updated 20 October 2007)

Here is an ongoing series of explanations of how the Graphmaster works, for those who want to build their own or simply want a deeper understanding of the approach.

Rolling Your Own Graphmaster | The Filesystem Metaphor | Graphmaster For Dummies

Rolling Your Own Graphmaster

The Graphmaster consists of a collection of nodes called Nodemappers. These Nodemappers map the branches from each node. The branches are either single words or wildcards.

The root of the Graphmaster is a Nodemapper with about 2000 branches, one for each of the first words of all the patterns (40,000 in the case of the A.L.I.C.E. brain). The number of leaf nodes in the graph is equal to the number of categories, and each leaf node contains the <template> tag.

There are really only three steps to matching an input to a pattern. If you are given (a) an input starting with word "X", and (b) a Nodemapper of the graph:

  1. Does the Nodemapper contain the key "_"? If so, search the subgraph rooted at the child node linked by "_". Try all remaining suffixes of the input following "X" to see if one matches. If no match was found, try:
  2. Does the Nodemapper contain the key "X"? If so, search the subgraph rooted at the child node linked by "X", using the tail of the input (the suffix of the input with "X" removed). If no match was found, try:
  3. Does the Nodemapper contain the key "*"? If so, search the subgraph rooted at the child node linked by "*". Try all remaining suffixes of the input following "X" to see if one matches. If no match was found, go back up the graph to the parent of this node, and put "X" back on the head of the input.

For completeness there should also be a terminal case: If the input is null (no more words) and the Nodemapper contains the <template> key, then a match was found. Halt the search and return the matching node.

If the root Nodemapper contains a key "*" and it points to a leaf node, then the algorithm is guaranteed to find a match.

Note that:

  1. At every node, the "_" has first priority, an atomic word match second priority, and a "*" match lowest priority.
  2. The patterns need not be ordered alphabetically, only partially ordered so that "_" comes before any word and "*" after any word.
  3. The matching is word-by-word, not category-by-category.
  4. The algorithm combines the input pattern, the <that> pattern, and the <topic> pattern into a single "path" or sentence such as: "PATTERN <that> THAT <topic> TOPIC" and treats the tokens <that> and <topic> like ordinary words. The PATTERN, THAT and TOPIC patterns may contain multiple wildcards.
  5. The matching algorithm is a highly restricted version of depth-first search, also known as backtracking.
  6. You can simplify the algorithm by removing the "_" wildcard, and considering just the second two steps. Also try understanding the simple case of PATTERNs without <that> and <topic>.

The Filesystem Metaphor

A convenient metaphor for AIML patterns, and perhaps also an alternative to database storage of patterns and templates, is the file system. Hopefully by now almost everyone understands that their files and folders are organized hierarchically, in a tree. Whether you use Windows, Unix or Mac, the same principle holds true. The file system has a root, such as "c:\". The root has some branches that are files, and some that are folders. The folders, in turn, have branches that are both folders and files. The leaf nodes of the whole tree structure are files. (Some file systems have symbolic links or shortcuts that all you to place "virtual backward links" in the tree and turn it into a directed graph, but forget about that complexity for now). Every file has a "path name" that spells out its exact position within the tree.

"c:\my documents\my pictures\me.jpg"

denotes a file located down a specific set of branches from the root.

The Graphmaster is organized in exactly the same way. You can write a pattern like "I LIKE TO *" as "g:/I/LIKE/TO/star". All of the other patterns that begin with "I" also go into the "g:/I/" folder. All of the patterns that begin with "I LIKE" go in the "g:/I/LIKE/" subfolder. (Forgetting about <that> and <topic> for a minute) we can imagine that the folder "g:/I/LIKE/TO/star" has a single file called "template.txt" that contains the template.

If all the patterns and templates are placed into the file system in that way, we can easily rewrite the explanation of the matching algorithm: If you are given an input starting with word "X" and a folder of the filesystem:

  1. If the input is null, and the folder contains the file "template.txt", halt.
  2. Does the folder contain the subfolder "underscore/"? If so, change directory to the "underscore/" subfolder. Try all remaining suffixes of the input following "X" to see if one matches. If no match was found, try:
  3. Does the folder contain the subfolder "X/"? If so, change directory to the subfolder "X/", using the tail of the input (the suffix of the input with "X" removed). If no match was found, try:
  4. Does the folder contain the subfolder "star/"? If so, change directory to the "star/" subfolder. Try all remaining suffixes of the input following "X" to see if one matches. If no match was found, change directory back to the parent of this folder, and put "X" back on the head of the input.

[Editor's note: "underscore" and "star" as directory names above are meant to stand in for "_" and "*", which are not allowed as file or directory names in some operating systems. Since the literals "underscore" and "star" might be actual words in a pattern, perhaps a real implementation along these lines would use some other symbols to serve the same function.]

You can see that the matching algorithm specifies an effective procedure for searching the filesystem for a particular file called "template.txt". The path name distinguishes all the different "template.txt" files from each other.

What's more, you can visualize the "compression" of the Graphmaster in the file system hierarchy. All the patterns with common prefixes become "compressed" into single pathways from the root. Clearly this storage method scales better than a simple linear, array, or database storage of patterns, whether they are stored in RAM or on disk.

Graphmaster For Dummies

Here is a really simple explanation of Graphmaster pattern matching: It works just like a dictionary or encyclopedia. If you want to look up a word or phrase, you don't start at the beginning or the end and search through every entry until you find a match. No, you turn first to the section that matches the first letter or word. Then, you skip to another section that contains a set beginning with the next letter or word. You continue in this process until you find your word or phrase.

In the Graphmaster, the wildcards "*" and "_" act like two special letters, that come before "0" and after "Z" respectively. You should always check the "_" listings first, in case your input has an ending listed there. Check the alphabetical listings next. If you don't find any matches there, try the patterns beginning with "*".

The only final complication is that the "*" and "_" can appear in patterns that begin with "[0-9A-Z]". If your input matches some pattern prefix, then try the patterns with that prefix followed by "_" first, then the ones with that prefix followed by some word, finally try the ones with the prefix followed by "*".