This document is a
"work in progress"
Last
update: 9JAN2002
Original Document by: Thomas Ringate
Copyright © 2001
Contributing Authors: Dr. Richard S. Wallace; Anthony Taylor; Jon
Baer; Dennis Daniels
CONTENTS
Do the categories need to be in alphabetical order by pattern?
How can I restrict remote clients from running programs on my computer?
Given only the <pattern> and <template> tags, there are three general types of categories.
atomic
default
recursive
Strictly speaking, the three types overlap, because "atomic" and "default" refer to the <pattern> and "recursive" refers to a property of the <template>.
"Atomic" categories are
those with atomic patterns, i.e. the pattern contains no wild card
"*" or "_" symbol. Atomic categories are the
easiest, simplest categories to add in AIML.
<category>
<pattern>WHAT IS A CIRCLE</pattern>
<template><set_it>A cicle</set_it> is the set
of points equidistant from a common point called the center.
</template>
</category>
The
above category does the following:
- Matches the client input of
"What is a circle"
- Sets the "IT" variable
to the value of "A circle"
- Sends the client the
response: "A circle is the set of points equidestant from a
common point called the center"
The name "default category"
derives from the fact that its pattern has a wildcard "*"
or "_". The ultimate default category is the one with
<pattern>*</pattern>, which matches any input. In the
ALICE distribution the ultimate default category resides in a file
called "std-pickup.aiml". These default responses are
often called "pickup lines" because they generally consist
of leading questions designed to focus the client on known topics.
The more common default categories have patterns combining a
few words and a wild card. For example the category:
<category>
<pattern>I NEED HELP *</pattern>
<template>Can
you ask for help in the form of a question?</template>
</category>
responds to a variety of inputs
from "I need help debugging my program" to "I need
help with my marriage." Putting aside the philosophical
question of whether the robot really "understands" these
inputs, this category elucidates a coherent response from the
client, who at least has the impression of the robot understanding
the client's intention.
Default categories show that writing
AIML is both an art and a science. Writing good AIML responses is
more like writing good literature, perhaps drama, than like writing
computer programs.
"Recursive" categories
are those that "map" inputs to other inputs, either to
simplify the language or to identify synonymous patterns.
Many
synonymous inputs have the same response. This is accomplished with
the recursive <srai> tag. Take for example the input
"GOODBYE". This input has dozens of synonyms: "BYE",
"BYE BYE, "CYA", "GOOD BYE", and so on. To
map these inputs to the same output for GOODBYE we use categories
like:
<category>
<pattern>BYE
BYE</pattern>
<template><srai>GOODBYE</srai></template>
</category>
Simplification or reduction of
complex input patterns is another common application for recursive
categories. In English the question "What is X" could be
asked many different ways: "Do you know what X is?", "Tell
me about X", "Describe X", "What can you tell me
about X?", and "X is what?" are just a few examples.
Usually we try to store knowledge in the most concise, or common
form. The <srai> function maps all these forms to the base
form:
<category>
<pattern>DO
YOU KNOW WHAT * IS</pattern>
<template><srai>WHAT
IS <star/></srai></template>
</categroy>
The <star/> tag substitutes the value matched
by "*", before the recursive call to <srai>. This
category transforms "Do you know what a circle is?" to
"WHAT IS A CIRCLE", and then finds the best match for the
transformed input.
Another fairly common application of
recursive categories is what might be called "parsing",
except that AIML doesn't really parse natural language. A better
term might be "partitioning" because these AIML categories
break down an input into two (or more) parts, and then combine their
responses back together.
If a sentence begins with
"Hello..." it doesn't matter what comes after the first
word, in the sense that the robot can respond to "Hello"
and whatever is after "..." independently. "Hello my
name is Carl" and "Hello how are you" are quite
different, but they show how the input can be broken into two parts.
The category:
<category>
<pattern>HELLO *</pattern>
<template><srai>HELLO</srai> <sr/>
</template>
</category>
accomplishes
the input partitioning by responding to "HELLO" with
<srai>HELLO</srai> and to whatever matches "*"
with <sr/>. The response is the result of the two partial
responses appended together.
The
above example assume's that there is an ATOMIC category of
<pattern>HELLO</pattern>
ProgramD has a class called Substituter that performs a number of grammatical and syntactical substitutions on strings. One task involves preprocessing sentences to remove ambiguous punctuation to prepare the input for segmentation into individual sentence phrases. Another task expands all contractions and coverts all letters to upper case; this process is called "normalization".
The Substituter class also performs some spelling correction.
(See also the question "What is <person/>?")
One justification for removing all punctuation from inputs is the need to make ALICE compatible with speech input systems, which of course do not detect punctuation (unless the speaker utters the actual word for the punctuation mark -- "period").
When a client enters an input, the program scans the categories to
find the best match. By comparing the input with the patterns in the
following order, the algorithm ensures that the most specific pattern
matches first. "Specific" in this case has a formal
definition, but basically it means that the program finds the
"longest" pattern matching an input.
Search
order:
ATOMIC with a THAT
ATOMIC
DEFAULT
with a THAT
DEFAULT
Example:.
What
type of heaters do you have?
will match the ATOMIC:
"WHAT TYPE OF HEATERS DO YOU HAVE"
and not the REDUCTION
of: WHAT TYPE OF *
The ATOMIC category will
always take precidence over any other type of category, other than
another ATOMIC with a THAT.
If
you have two identical patters, but one has a THAT, then the THAT
category, will take precidence over the ATOMIC category, if the THAT
matches the bot's previous response.
If
neither of the above is true, then a REDUCTION that
matches part of the pattern will give it's response, and finally if
none of the above matches, then the catch-all or pickup will take
over.
Any categories that are contained within a TOPIC
section will be searched first if the current setting of TOPIC
matches a TOPIC section. This results in an
extension of the search order to the following:
ATOMIC with a
TOPIC and a THAT
ATOMIC with a TOPIC
DEFAULT with a TOPIC and a
THAT
DEFAULT with a TOPIC
ATOMIC with a THAT
ATOMIC
DEFAULT
with a THAT
DEFAULT
The
TOPIC sections are always searched first if they match the current
setting of TOPIC. This permits the botmaster to have identical
category patterns within a TOPIC section and in the GENERAL section.
The wild-card character "*" comes before "A" in alphabetical order. For example, the "WHAT *" pattern is more general than "WHAT IS *". The default pattern "*" is first in alphabetical order and the most general pattern. For convenience AIML also provides a variation on "*" denoted "_", which comes after "Z" in alphabetical order.
No, the order is maintained internally when the categories load, but you can write them in any order.
If your session with program B included a "Classify" routine, then the AIML script is stored in order of category activation rank. In other words, program B stores the most frequently accessed category (usually '*') first, the second most frequently next, and so on. If a number of categories have the same activation count, program B saves them in alphabetical order by pattern. Hence, if the session did not include a "classify" routine, the program stores all the categories in alphabetical order by pattern (because they all have an activation count of zero).
One reason to store the categories in order by activation is to make the Applet interface more natural. Because the Applet interface starts simultaneously with a thread to load the robot source file, the Applet client can talk with the robot before all the categories are fully loaded. Given that the interlocutor is more likely to say something that activates a more frequently activated category, it makes sense to transmit these categories first. Storing the *.aiml files in order of category activation achieves the desired effect. The Applet loads the most frequent categories first, and continues loading in the background while the conversation begins.
In general there are a lot of categories whose job is "symbolic reduction". The category:
<category>
<pattern>ARE YOU
VERY *</pattern>
<template><srai>ARE YOU
<star/></srai></template>
</category>
This category [in std-brain.aiml] will reduce "Are you very very smart" to "Are you smart".
AIML is extensible. You can create an infinite number of new tags for foreign language pronouns, predicates, or application-specific properties. "Predicate tags" mean tags that have a client-specific "set" and "get" method. Pronouns like "it" have predicate tags like <set name="it"></set>. AIML has a number of these built-in tags for common English pronouns.
Using the <set name="xxxx">
and <get name="xxxx">
tags an endless variety of languages and possiblilties can be
supported.
Understanding recursion is important to understanding AIML. "Recursion" means applying the same solution over and over again, to smaller and smaller problems, until you reduce the problem to its simplest form. AIML uses the tags <sr/> and <srai> to implement recursion. The botmaster uses these tags to tell the robot how to respond to a complex sentence by breaking it down into the responses to simpler ones.
Recursion can apply many times to a single input. Given the normalized input:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX IS RIGHT NOW
an AIML category with the pattern "_ RIGHT NOW" matches
first,
reducing the input to:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX IS
Another pattern ("<bot name="name"/> *") reduces it to:
CAN YOU PLEASE TELL ME WHAT LINUX IS
And then:
PLEASE TELL ME WHAT LINUX IS
reduces to:
TELL ME WHAT LINUX IS
and finally to:
WHAT IS LINUX
If your reply contains the markup
<system>yourcammand <id/></system>
then the robot will insert the (virtual) client IP into the command line argument for "yourcommand". Then it is up to "yourcommand" to enforce access privileges.
If you are fortunate enough to be running lynx under Linux, the following markup is a simple way to "inline" the results of an HTTP request into the chat robot reply. Try asking ALICE: "What chatterbots do you know?" and she will reply with a page of links generated by the Google search engine.
<category>
<pattern>WHAT
*</pattern>
<template>
Here is the information I
found:
<system>
lynx -dump -source -image_links
http://www.google.com/search?q=<personf/>
</system>
</template>
</category>
Yes. You can include any HTML including <javascript> tags. Suppose you want to "chat AND browse," in other words, have the robot open up a new browser window when she provides a URL link. Here's a category that kicks out a piece of HTML/scripting that opens a new window with and loads a given URL. This is handy for search engines or showing off one's web page. This code contributed by Stefan Zakarias additions by Dennis Daniels.
<category>
<pattern>WHAT IS YOUR
WEBSITE</pattern>
<template>
It is at
"http://www.mywebsite.org"
<script
language="JavaScript">
function Popup(){
var
winURL = "http://www.mywebsite.org";
var
winWidth=800;
var winHeight=600;
var winScrollbars="yes";
var
winToolbar="yes";
var winSizeable="yes";
var
winLocation="yes";
var winDirectories="yes";
var
winStatus="yes";
var winMenubar="yes";
var
winCopyHistory="yes";
newWin=window.open(winURL,"",
"copyhistory="+winCopyHistory+
",menubar="+winMenubar+
",status="+winStatus+
",directories="+winDirectories+
",location="+winLocation+
",resizable="+winSizeable+
",toolbar="+winToolbar+
",scrollbars="+winScrollbars+
",height="+winHeight+
",width="+winWidth);
}
</script>
<a
href="javascript:Popup()">Go to my
website!</a>
</template>
</category>
A couple of things to note about this technique:
This will only work when ALICE is being talked to from a browser that runs JavaScript, i.e. it won't work in the applet. We have tested it in Netscape and MS Internet Explorer, and it works well in both.
For the above reason, it is important to have some sort of explanatory statement before the scripting in case the scripting isn't supported. Besides, you want some response in your ALICE window, even if another window DOES come up.
If this is viewed in a browser that doesn't understand the
<javascript> tag, notice that this line will show up:
"//
Go to <a href="http://www.geocities.com/krisdrent">The
ALICE
Connection</a>"
Which is good, because it
gives a back-up for the "non-scripted" (the Lynx users, I
guess.) And remember that you have to keep the "//" in
front of any non-java-script lines within the <javascript>
tag.
When using JavaScript in AIML, never
forget the
semi-colon end-of-line delimiters!
(You'll never
get your JavaScript working reliably otherwise ;-)
NORMALIZED TEXT
_, *, and <bot name="name"/> (at present)
PSAE
AIML
broadly breaks down into two parts: "Pattern Side AIML
expressions" that can appear in the <pattern>, <that>,
and <topic> and "Template-Side AIML
expressions"
that appear inside the <template>. Pattern-side AIML
expressions (PSAE):
TSAE
TSAE
expressions are comprised of ordinary text, optionally marked up with
all the other tags. Generally speaking, it doesn't make sense
to use PSAE's in the
template or TSAE's in the pattern, topic or
<that>...</that>. The sole exception at this point
is <bot name="name"/>.