This document is a
"work in progress"
Last update: October 30, 2011
Original Document by: Thomas Ringate
Copyright © 2001
Contributing Authors: Dr. Richard S. Wallace; Anthony Taylor; Jon Baer; Dennis Daniels
Given only the <pattern> and <template> tags, there are three general types of categories.
Strictly speaking, the three types overlap, because "atomic" and "default" refer to the <pattern> and "recursive" refers to a property of the <template>.
"Atomic" categories are
those with atomic patterns, i.e. the pattern contains no wild card
"*" or "_" symbol. Atomic categories are the
easiest, simplest categories to add in AIML.
<pattern>WHAT IS A CIRCLE</pattern>
<template><set_it>A circle</set_it> is the set of points equidistant from a common point called the center.
The above category does the following:
- Matches the client input of "What is a circle"
- Sets the "IT" variable to the value of "A circle"
- Sends the client the response: "A circle is the set of points equidestant from a common point called the center"
The name "default category"
derives from the fact that its pattern has a wildcard "*"
or "_". The ultimate default category is the one with
<pattern>*</pattern>, which matches any input. In the
ALICE distribution the ultimate default category resides in a file
called "std-pickup.aiml". These default responses are
often called "pickup lines" because they generally consist
of leading questions designed to focus the client on known topics.
The more common default categories have patterns combining a few words and a wild card. For example the category:
<pattern>I NEED HELP *</pattern>
<template>Can you ask for help in the form of a question?</template>
responds to a variety of inputs from "I need help debugging my program" to "I need help with my marriage." Putting aside the philosophical question of whether the robot really "understands" these inputs, this category elucidates a coherent response from the client, who at least has the impression of the robot understanding the client's intention.
Default categories show that writing AIML is both an art and a science. Writing good AIML responses is more like writing good literature, perhaps drama, than like writing computer programs.
are those that "map" inputs to other inputs, either to
simplify the language or to identify synonymous patterns.
Many synonymous inputs have the same response. This is accomplished with the recursive <srai> tag. Take for example the input "GOODBYE". This input has dozens of synonyms: "BYE", "BYE BYE, "CYA", "GOOD BYE", and so on. To map these inputs to the same output for GOODBYE we use categories like:
Simplification or reduction of complex input patterns is another common application for recursive categories. In English the question "What is X" could be asked many different ways: "Do you know what X is?", "Tell me about X", "Describe X", "What can you tell me about X?", and "X is what?" are just a few examples. Usually we try to store knowledge in the most concise, or common form. The <srai> function maps all these forms to the base form:
<pattern>DO YOU KNOW WHAT * IS</pattern>
<template><srai>WHAT IS <star/></srai></template>
The <star/> tag substitutes the value matched by "*", before the recursive call to <srai>. This category transforms "Do you know what a circle is?" to "WHAT IS A CIRCLE", and then finds the best match for the transformed input.
Another fairly common application of recursive categories is what might be called "parsing", except that AIML doesn't really parse natural language. A better term might be "partitioning" because these AIML categories break down an input into two (or more) parts, and then combine their responses back together.
If a sentence begins with "Hello..." it doesn't matter what comes after the first word, in the sense that the robot can respond to "Hello" and whatever is after "..." independently. "Hello my name is Carl" and "Hello how are you" are quite different, but they show how the input can be broken into two parts.
accomplishes the input partitioning by responding to "HELLO" with <srai>HELLO</srai> and to whatever matches "*" with <sr/>. The response is the result of the two partial responses appended together.
The above example assume's that there is an ATOMIC category of <pattern>HELLO</pattern>
ProgramD has a class called Substituter that performs a number of grammatical and syntactical substitutions on strings. One task involves preprocessing sentences to remove ambiguous punctuation to prepare the input for segmentation into individual sentence phrases. Another task expands all contractions and coverts all letters to upper case; this process is called "normalization".
The Substituter class also performs some spelling correction.
(See also the question "What is <person/>?")
One justification for removing all punctuation from inputs is the need to make ALICE compatible with speech input systems, which of course do not detect punctuation (unless the speaker utters the actual word for the punctuation mark -- "period").
When a client enters an input, the program scans the categories to
find the best match. By comparing the input with the patterns in the
following order, the algorithm ensures that the most specific pattern
matches first. "Specific" in this case has a formal
definition, but basically it means that the program finds the
"longest" pattern matching an input.
ATOMIC with a THAT
DEFAULT with a THAT
What type of heaters do you have?
will match the ATOMIC: "WHAT TYPE OF HEATERS DO YOU HAVE"
and not the REDUCTION of: WHAT TYPE OF *
The ATOMIC category will always take precidence over any other type of category, other than another ATOMIC with a THAT.
If you have two identical patters, but one has a THAT, then the THAT category, will take precidence over the ATOMIC category, if the THAT matches the bot's previous response.
If neither of the above is true, then a REDUCTION that matches part of the pattern will give it's response, and finally if none of the above matches, then the catch-all or pickup will take over.
Any categories that are contained within a TOPIC section will be searched first if the current setting of TOPIC matches a TOPIC section. This results in an extension of the search order to the following:
ATOMIC with a TOPIC and a THAT
ATOMIC with a TOPIC
DEFAULT with a TOPIC and a THAT
DEFAULT with a TOPIC
ATOMIC with a THAT
DEFAULT with a THAT
The TOPIC sections are always searched first if they match the current setting of TOPIC. This permits the botmaster to have identical category patterns within a TOPIC section and in the GENERAL section.
The wild-card character "*" comes before "A" in alphabetical order. For example, the "WHAT *" pattern is more general than "WHAT IS *". The default pattern "*" is first in alphabetical order and the most general pattern. For convenience AIML also provides a variation on "*" denoted "_", which comes after "Z" in alphabetical order.
No, the order is maintained internally when the categories load, but you can write them in any order.
If your session with program B included a "Classify" routine, then the AIML script is stored in order of category activation rank. In other words, program B stores the most frequently accessed category (usually '*') first, the second most frequently next, and so on. If a number of categories have the same activation count, program B saves them in alphabetical order by pattern. Hence, if the session did not include a "classify" routine, the program stores all the categories in alphabetical order by pattern (because they all have an activation count of zero).
One reason to store the categories in order by activation is to make the Applet interface more natural. Because the Applet interface starts simultaneously with a thread to load the robot source file, the Applet client can talk with the robot before all the categories are fully loaded. Given that the interlocutor is more likely to say something that activates a more frequently activated category, it makes sense to transmit these categories first. Storing the *.aiml files in order of category activation achieves the desired effect. The Applet loads the most frequent categories first, and continues loading in the background while the conversation begins.
In general there are a lot of categories whose job is "symbolic reduction". The category:
<pattern>ARE YOU VERY *</pattern>
<template><srai>ARE YOU <star/></srai></template>
This category [in std-brain.aiml] will reduce "Are you very very smart" to "Are you smart".
AIML is extensible. You can create an infinite number of new tags for foreign language pronouns, predicates, or application-specific properties. "Predicate tags" mean tags that have a client-specific "set" and "get" method. Pronouns like "it" have predicate tags like <set name="it"></set>. AIML has a number of these built-in tags for common English pronouns.
Using the <set name="xxxx">
and <get name="xxxx">
tags an endless variety of languages and possiblilties can be
Understanding recursion is important to understanding AIML. "Recursion" means applying the same solution over and over again, to smaller and smaller problems, until you reduce the problem to its simplest form. AIML uses the tags <sr/> and <srai> to implement recursion. The botmaster uses these tags to tell the robot how to respond to a complex sentence by breaking it down into the responses to simpler ones.
Recursion can apply many times to a single input. Given the normalized input:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX IS RIGHT NOW
an AIML category with the pattern "_ RIGHT NOW" matches
reducing the input to:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX IS
Another pattern ("<bot name="name"/> *") reduces it to:
CAN YOU PLEASE TELL ME WHAT LINUX IS
PLEASE TELL ME WHAT LINUX IS
TELL ME WHAT LINUX IS
and finally to:
WHAT IS LINUX
If your reply contains the markup
then the robot will insert the (virtual) client IP into the command line argument for "yourcommand". Then it is up to "yourcommand" to enforce access privileges.
If you are fortunate enough to be running lynx under Linux, the following markup is a simple way to "inline" the results of an HTTP request into the chat robot reply. Try asking ALICE: "What chatterbots do you know?" and she will reply with a page of links generated by the Google search engine.
Here is the information I found:
lynx -dump -source -image_links http://www.google.com/search?q=<personf/>
<pattern>WHAT IS YOUR WEBSITE</pattern>
It is at "http://www.mywebsite.org"
var winURL = "http://www.mywebsite.org";
A couple of things to note about this technique:
For the above reason, it is important to have some sort of explanatory statement before the scripting in case the scripting isn't supported. Besides, you want some response in your ALICE window, even if another window DOES come up.
If this is viewed in a browser that doesn't understand the
"// Go to <a href="http://www.geocities.com/krisdrent">The ALICE
semi-colon end-of-line delimiters!
_, *, and <bot name="name"/> (at present)
AIML broadly breaks down into two parts: "Pattern Side AIML expressions" that can appear in the <pattern>, <that>, and <topic> and "Template-Side AIML
expressions" that appear inside the <template>. Pattern-side AIML expressions (PSAE):
TSAE expressions are comprised of ordinary text, optionally marked up with all the other tags. Generally speaking, it doesn't make sense to use PSAE's in the
template or TSAE's in the pattern, topic or <that>...</that>. The sole exception at this point is <bot name="name"/>.