You have reached the old ALICE documentation page.
Click here for the latest ALICE and
AIML Documentation. Changed 6/05/99.
Click here to chat with ALICE.
Famous Original AIML and ALICE Documentation
Revised January 5, 1998
Dr. Richard
S. Wallace
rsw@ime.net
Copyright ©
1995,1996,1997,1998 Dr. Richard S. Wallace. All rights reserved.
Download and Installation:
Heartfelt apologies that we offer no
Microsoft-compatible version at this time. Follow these steps to activate the
AIML interpreter and AITP and HTTP servers on Lynux and Unix machines:
- Download
and install SETL
- Download the ALICE.tar file and
expand it with tar xvf.
- Run setl new.setl to build lhs/ and rhs/
directories.
- Run setl index.setl to build index/ directory.
- (Optional: run setl alice.setl or judge.setl to test
AIML program)
- Edit go.setl to choose host machine(s) and port numbers.
- Run go on the AIML server host machine.
- Run go on the http server host machine.
Introduction:
What controls ALICE is not a single program but
a collection of autonomous clients and servers communicating via TCP/IP. This
document reports on the Artificial Intelligence Markup Language (AIML) and its
interpreter, along with the Artificial Intelligence Transfer Protocol (AITP) and
a client and server implementing it. The client in our case is itself a server,
specifically a limited HTTP server. In general there is no requirement that
these programs run on the same machine, and in particular we usually run the
HTTP server on one machine and the AITP server and AIML interpreter together on
another (In the original example the HTTP host machine is alice.eecs.lehigh.edu
and the http port number is 1991. The AITP server machine is
emma.eecs.lehigh.edu and its well-known port number is 1978.). Both machines
however share a common network file system, so the contents of our directory
appear on both hosts.
While the idea of distributed control is not new in AI in robotics, the
actual implementation of such a scheme is no longer very difficult, thanks to
the evolution of the Internet and some elegant work by David Bacon adapting an
old language, SETL, to today's web-dominated environment.
Both the AITP server and the AIML interpreter utilize co-processes Bacon calls pumps[4],
esseintially command line data processing programs running under the
control of the parent process. Three examples of pumps here are
normalize, gawk and alice (an instance of the AIML
interpreter), each of which may run as a command line program on its own or as a
pump.
The figure shows a subset of the processes involved in a typical
transaction with ALICE. The client (presumably, a human) utilizes a browser to
connect to the server installed on the host Alice (port 1991) and transmits a
query. The reply contains HTML markup for the video/telerobotic server and for
the remote text-to-speech synthesis server. The browser interprets this HTML and
initiates connections with the audio and video hosts.
The process on Alice:1991 is a forking server, a protective shield against
misbehaved clients. Once the forked child receives the client's input query it
initiates an atomic transaction with the server on Emma (port 1978). This
transaction consists of two exchanges: first, the faux HTTP server reports the
IP address of its client and second, it retransmits the client query in
its original HTTP format (specifically, a GET method).
The program normalize runs as a command line pump. It processes one
line of text input and removes all non-meaningful (to AIML) punctuation,
converts to upper case, and replaces a number of contractions and acronyms with
their expanded forms. The result is one line of normalized text, suitable for
matching with AIML lhs patterns.
Another line pump is gawk. This is a gigantic pattern-matching case
statement implemented as an awk script.
Files and directories:
The second figure illustrates the directory
structure of ALICE. Initially the author writes AIML files into the knowledge
directory. The program new.setl converts the AIML information into
lhs pattern files and rhs output templates. The key idea here
is that each AIML category has a unique name (either assigned by the author or
assigned automatically by new.setl). That name becomes the identifier
of the lhs pattern file and the rhs program filename. The
awk script compares an input pattern with all the pattern strings, and
print out the filename of the rhs template.
Directories:
- knowledge/ contains AIML knowledge files, relations, and other
state information.
- lhs/ and rhs/ contains "compiled" AIML information.
- index/ contains the associative memory database.
- dialogues/ contains log files of previous conversations.
- tmp/ contains temporary scratch files.
- bak/ is for editor backups.
Knowledge files:
We use the term "knowledge files" loosely to denote two
classes of data files. First, the knowledge includes the AIML source programs,
listed in the file all_knowledge. AIML files contain paragraphs of text
optionally marked up with AIML. The remainder of the knowledge files are data
files containing a variety of information including client names, IP addresses,
geographic locations, topic, pronouns, acronyms, and proper nouns.
- all_knowledge : The list of AIML files in the knowledge/
directory.
- PROPER-NOUNS : A list of nouns to capitalize.
- ACRONYMS :A list of acronyms to print in upper case.
- knowledge/KNOW.AIML : About 1200 AIML categories.
- knowledge/GOSSIP.AIML : An additional 1200 or so categories
derived from gossip.
- knowledge/PICKUP-LINES : Opening lines to start a dialog.
- knowledge/SMILEYS : A file of humorous smiley-faces.
The
remainder of the knowledge files are created automatically by alice.setl.
Initialization programs:
The programs new.setl and
index.setl may together be viewed as a "compiler" or preprocessor that
transforms the AIML knowledge files into data structures, stored in the file
system, later consulted by the interpreter.
- new.setl : Constructs the directories lhs/ and
rhs/ from the knowledge files. The lhs directory contains a
file for each left-hand side pattern. It also contains the awk program
awkfile which is the basis of AIML classification. The rhs
directory contains a file for each AIML response template.
- index.setl : index.setl builds the directory index/ from the
words in the lhs and rhs files. The index contains one file
for each word in the vocabulary. This word file contains a set category names
(i.e. names of rhs templates) that contain the word.
AIML Interpreter programs:
The interpreter alice.setl utilizes
a number of subprograms. Some of them, such as search.setl, are
activated via ordinary filters and others work as pumps.
- alice.setl : This is the natural language interpreter. After
running the initialization functions new.setl and index.setl
the interpreter may be invoked by setl alice.setl. This program reads
one line of data and writes a single line response, then waits for the next
line or a null character that terminates the program.
- vars.setl : These are global varibales used in alice.setl.
- rhsfuns.setl : This file contains the actual code for the AIML
functional expressions. Each AIML function (set_it, gossip, getname
etc.) has an associated SETL procedure with the same name.
- procmap.setl : Procmap is a data file used in conjuction with
oldalice.setl
- oldalice.setl : This poorly named file contains much of the AIML
interpreter.
- nickname.setl : This program tries to figure out something
interesting to say based on the seeker's domain name.
- contractions.setl : A data file used by normalize to expand
contractions.
- http_syntax.setl : A data file used by normalize to expand http
syntax.
- person.setl : "Person" refers to transformations among first and
second person. This program is the utility that accomplishes these
translations.
- person3.setl : "Person" refers to transformations among first and
third person. This program is the utility that accomplishes these
translations.
- search.setl : The associative memory search program takes a
sentence as its input and returns a rhs template based on an
information-theoretic matching criteria.
AITP Server Programs:
The AITP server uses the pump
normalize.setl and also calls alice.setl as a pump. The name
"AITP" is perhaps to fancy for what it really denotes, because the "protocol" is
so simple. Like HTTP GET methods, the basic transaction in AITP is to accept a
line of text and to return a response, also in our case a line of text. What
distinguishes this protocol from others is that there are relatively few
restrictions on the content of the input strings (because they are assumed to
contain natural language) and the requirement that the client, i.e. our HTTP
server, report the IP address of its client.
The AITP server then makes two transactions with the alice pump. First, it
issues the string
MY IP IS SO.AND.SO
and then reads back an acknowledging response.
Second, it passes along the normalized natural language query and reads back a
response. The IP address utimately becomes a key into all the knowledge files
that alice.setl uses to keep track of information about each client.
- aitp_server.setl : The AITP server is a natural langauge server
wrapped around alice.setl. This program also inserts HTML associated with Bell
Labs text-to-speech synthesis [2].
- normalize.setl : Normalize (should be called "canonize")
transforms input sentences into a canonical style that AIML understands.
HTTP Server Program:
This program is not a true HTTP server but a
program that accepts some types of HTTP requests (specifically, GET methods
modified to accomidate natural language strings) and discards others.
The file TRAILER.HTML contains optional HTML markup appended to the bottom of
the server's output. In our case the TRAILER.HTML file contains an ad for
Eastport Internet Associates.
- http_server.setl This server is designed to fork a new subprocess
for each client request. Each subprocess, if it makes a transaction with the
aitp server at all, is guaranteed to make an atomic transaction. This prevents
misbehaved clients from blocking the AIML server.
Utility programs:
The parent directory also contains a small number of
utility programs for intialization, termination and monitoring purposes.
- go :The system call to start the servers.
- stop :Stops the servers.
- killall :Kills all processes owned by this user.
- check.setl :A real-time monitoring program to keep track of
what's going on in the AIML server/interpreter.
- check : a utility to view the process tree and network port
status.
- go.setl :This program is called by "go".
- time.setl :This program is called by "check".
- review.setl :Review is an intersting utility that allows you to
view the transcripts of recent dialogues.
- ro.setl :The "reductionist onslaught" examines previous queries
saved in the "human.log" log file and classifies them with the most recent
version of alice.setl. This helps determine which categories are being used
and how often.
It ought to be remarked here that the last of these
utilities, what we call the Reductionist Onslaught, is perhaps the most
valuable tool in this entire package; however those readers interested in a more
philosophical discussion should read The Lying Game[1]
or Notes on
the Loebner Contest Grand Prize Rules.
Loebner Contest Interface
The program judge.setl provides a
user command-line interaction with ALICE. The input-output and command format is
compliant with the standard for the 1998 Loebner Contest [3].
This program works much like the AITP server, in the sense that it opens both
alice.setl and normalize.setl as pumps. But instead of
accepting text input from a client, judge.setl reads lines from the
standard input and prints to the standard output. And unlike the single-line
AITP transaction protocol, this text interface accepts multi-line queries.
(Terminate queries with double newlines '\n\n').
Note that the rhs function get_ip() is somewhat ill-defined
when running judge.setl. Care should be taken to see that (1) the
servers are not running, because simultaneous client queries can change
the value in knowledge/IP and thereby reset the client state, but (2)
some string be written in the file knowledge/IP.
Log files:
These files will be created as a side effect of running
ALICE. None of them is crucial and they may be deleted if taking up too much
space.
- aitp_server.log : rewritten with each execution of server.
- eq.log : appended when ifeq() is executed.
- new.log : includes the current size of ALICE
- search.log : appended by search.setl
- http.log : records real-time action for check.setl
- cat.log : records real-time action for check.setl
- tail.log : records real-time action for check.setl
- norm.log : appended by normalize.setl
- dialog.log : complete record of all exchanges
- human.log : complete record of normalized queries
- recall.log : appended by that(), recall() and
recallu()
- sr.log : records inputs and outputs of sr() recursively.
- unix.log : appended by unix()
AIML: Artificial Intelligence Markup Language
The basic unit of
knowledge in AIML is a simple paragraph of text, separated from other paragraphs
by a sequence of two or more newlines. At the root of the AIML interperter is
the function search(), which utilizes the associative memory feature to
identify paragraphs containing selected keywords, according to a random,
information-theorteic matching criteria. All of the words found in an AIML
paragraph become potential keywords for search() to match.
The first step in AIML markup, beyond typing in paragraphs of ordinary
text(*), is adding the Left-Hand Side (lhs) pattern strings.
We use the arrow symbol on its own line to separate the lhs from the
original paragraph, which we now denote the Right-Hand Side
(rhs) template. We shall return to the rhs momentarily.
(*)Note: "Ordinary Text" may include other types of markup, in particular
large doses of HTML.
The pattern strings in the lhs are regular expressions. In all cases
they consists of normalized (see normalize.setl) text containing only
upper case letters and numbers plus a limited number of punctuation symbols. An
input query matching one of these pattern strings results in an activation of
the corresponding rhs template.
A-STORY.
A STORY
---->
Once there was a man who faced a great crisis. No matter what he
did, he could not seem to overcome his obstacles. After trying
everything and asking everyone for help, the man finally gives up.
But at the last moment, one good characteristic of the man is
amplified by circumstances and he manages to save himself.
In this first example we see a category called "A-STORY." The optional
name string appears as the first line and is terminated by a period (.). If no
name is supplied, the interpreter assigns one automatically. The lhs
contains the pattern strings A STORY. Thus this category would be activated by
inputs such as TELL ME A STORY but not READ ME ANOTHER STORY.
The program index.setl inserts all the words from both the lhs and
rhs into the associative memory database.
rhs markup includes a large set of string function expressions (see
The AIML String Function Language below). These functional expressions
return strings which are combined with ordinary (unmarked) text to produce the
final text output.
The primary element of an AIML program is a statement delimited by a plus (+)
sign. The effect of statement1+statement2 is to construct a string
appending the result of evaluating statement1 and statement2.
AIML also recognizes the structure of a "sentence", defined as a string
terminated by a period (.).
Functional string expressions begin with a tilde (~) charachter. When AIML
encounters a tilde-expression, the interpreter returns a string (sometimes null)
resulting from the evaluation of the function expression. Some functional
expressions including ifeq, narration, randomsent and endai
also act as control block delimeters. More will be said about these shortly.
The functional language permits only four sorts of constants:
input_string, pattern_string and integer and string constants. The
latter are simple strings delimited by single-quotes ('). The constant
input_string is bound to the value of the input query, and its
companion pattern_string is bound to the pattern string in the
lhs which matched this query.
CALL-ME.
^CALL ME
---->
+~nullify(setname(subsent(input_string, 3, 3))) +
+~randomsent() +
Hi there .
Your name is.
Pleased to meet you.
Hi I'm ALICE, .
You are called .
OK I will call you .
+~endai() +
+~getname() +.
In this second AIML paragraph CALL-ME. we see examples of statements,
tilde-expressions, constants and a control block. The lhs contains only
one pattern string, so pattern_string (not used in any case) would be
bound to ^CALL ME.
The first statement extracts the third word from the input string using
subsent(input_string, 3, 3) and stores it with the predicate assertion
setname(). The function nullify() simply returns a null string
but not before evaluating its argument for its side-effects.
The control block randomsent()...endai() sets all of the
intermediate sentences to null except one picked randomly. The functions
randomsent() and endai() themselves return null.
Finally the function getname() returns the value previously stored
by setname().
YUH.
^YUH[ ]+$
---->
+~ifeq(subsent(that(), 1, 5),'DO YOU RUN WINDOWS 95 ') +
Do you like Microsoft products? I think they have a lot of problems.
+~elsneq() +
+~sr('YUP ') +
+~endai() +
This third example YUH introduces the control block ifeq() and
also the recursive AIML evaluation function sr(). The function
that() returns the program's previous utterance. In this case
ifeq() compares the first five words in that utterance with the string
constant 'DO YOU RUN WINDOWS 95 '. If they are the same, then the interpreter
returns the response beginning with the phrase 'Do you like Microsoft'.
Otherwise the response is obtained by the result of evaluating sr('YUP
').
The name sr(x) denotes "stimulus-response" and, as the name implies,
derives the response by applying the interpreter to x recursively. In
this case another catergory called YUP is activated by the new input string 'YUP
'.
The file KNOW.AIML contains about 1200 more examples of AIML paragraphs. Another 1200 or so
automatically generated paragraphs appear in GOSSIP.AIML.
Running the Interpreter
The term "interpreter" is confusing here
because we need to distinguish the AIML "interpreter" and the natural language
interpreter alice.setl. The former is really the combination of two
programs. AIML utilizes the unix awk or GNU gawk utility to speed classification
of input sentences. The program new.setl creates this awk script from the
lhs expressions in the AIML source. The output of the gawk script is a
single name, the name of the category picked as the best match to the input
string. That category name is also the name of an AIML program in the
rhs directory.
The AIML source files are listed in the file all_knowledge and the
program new.setl evaluates these and constructs the rhs and
lhs directories, as well as the awk script awkfile.
The earlier figure illustrating the file system provides an example. The
filenames in the lhs and rhs directories are the same, with
the exception of the awk script awkfile which appears only in the lhs
directory. The lhs files contain the pattern strings and the
rhs files contain the corresponding AIML program fragments. If the
seeker input MY NAME IS JOHN then the AIML interpreter evaluates the
rhs program MY-NAME-IS-.
If the rhs program contains a search() then the interpreter calls
search.setl which consults the index/ directory. The index is
literally a multi-mapping from key words to the rhs paragraphs
containing them.
The AIML interpreter evaluates the rhs program and returns a result
back to the natural language program alice.setl. Thus in effect the
AIML interpreter is "hidden" from the point of view of the natural language
program.
Perhaps it would be better to call new.setl the "compiler" and
alice.setl the "interpreter". In any case the NL program may be run
from the command line by setl alice.setl. The link to the web is
provided through the AITP server aitp_server.setl.
The AIML String Function Language
All AIML sring functions return a
(possibly null) string. Some of them manipulate data files as a side effect.
Constants
- input_string : the input query string.
- pattern_string : the pattern string matching the query
- string contstants : strings bounded by single-quotes (').
- small positive integers.
String Manipulation Functions
- encapsulate(s) : "encapsulation" in this case means insertion of
the string s into a new string that (a) begins with a non-blank
character and (b) ends with a blank character and (c) separates each word with
a single space.
- tail(s, i) : returns the tail of the sentences beginning
at word number i (> 0).
- pfx1(s) : returns the head of the sentences excluding
the last word.
- star2() : is the same as star(input_string,
pattern_string)
- star(s, p) : returns the tail of the strings beginning
where p matches s.
- nullify(s) : retuns the null string but evaluates s,
usually intended to induce the side effects of function expression s
without including s as part of a reply.
- append(s1, s2) : produces the encapsulated string s1+'
'+s2
- whattime() : is a date and time string
- unix(s) : WARNING -- this is a potential security hole. Use with
exterme caution. Evaluates the OS expressions and returns a string
(including HTML markup) containing the result.
Robot Control Functions
Robot control functons facilitate interaction
with the Spherical Pointing Motor robot eye motor.
- random_demo() : Move the robot by a random amount. Returns null
string.
- robot_right() : Move the robot right. Returns null string.
- robot_left() : Move robot left. Returns null string.
- robot_down() : Move robot left. Returns null string.
- robot_up() : Move robot up.
Learning and Information Management:
A large number of AIML functions
are devoted to information storage and retreival. Most of these are simply
special cases of the general functions setpred() and
getpred(), which save in a knowledge file a list of predicates and the
associated IP addresses of the clients having those attributes. For example, the
function setname(n) is really equivalent to
setpred('YNI',get_ip(),n).
- learn_column(l, r) : stores a new crotical activation column
L--->R in the knowledge file NEWCOLS.
- gossip(relation, belief) : stores a setence of the form (PERSON
relation belief) in a gossip knowledge file.
- setpred(wordfile, s, predicate) : wordfile is the name of a
predicate relation (e.g. YNI or "your name is") file,s is the name of
the seeker holding the predicate, and predicate is the relation.
- getpred(wordfile, s, deflt) : returns predicate or default value.
(Note: getpred and setpred are usually hidden in specific
predicate functions like setname() and gettopic().)
- getloc() : return client's geographic location .
- setloc(s) : set and return client location.
- set_it(s) : Set client's "it" pronoun. Returns "it".
- get_it() : returns the value of client's "it" pronoun.
- set_he(s) : Set client's "he" pronoun. Returns s.
- get_he() : returns the value of client's "he" pronoun.
- settopic() : sets client's topic string.
- gettopic() : gets client's topic.
- set_ip(s) : set ip address.
- get_ip() : get ip address .
- recallu() : Last utterance client made.
- recall() : Last utterance ALICE made.
- that() : Last utterance ALICE made.
- setname(s) : stores and returns client name.
- getname() : returns the client name.
- setlang(s) : stores and returns client's langauge.
- getlang() : returns client language.
- setjob(s) : sets the client's occupation.
- getjob() : gets the client's occupation.
Program Control Functions:
There are two types of program control
available in AIML: block structures and the recurivse stimulus-response function
sr().
- sr(s) : apply the AIML interpreter recursively to find an AIML
response to the string s.
- randomsent()...endai() : picks a random sentence. (evaluates all
choices as a side effect).
- randomeval()...endai() : picks a random sentence and evaluates
it.
- ifeq(u, v)...elsneq()...endai() : is a conditional evaluation
block that compares the strings u and v for equality and
returns the block before or after elsneq accordingly.
- narration()...endai() : saves a block of text for narration and
returns the first few lines of that block.
Artificial Intelligence Functions
A final set of functions operate on
natural language strings and produce natural language outputs.
- narrate() : continues to consume text from narration stack.
- randpar(f) : returns a random paragraph from the file f.
- person(x) : transform x to 2nd person or 1st person
- person3(x) : transform x to 3rd person
- search(s) : associative memory lookup for s.
- yesno(s) : Random answer to yes/no question s.
References
- 1. The Lying Game
http://www.wired.com/wired/5.08/idees_fortes.html
Richard S.
Wallace
- 2. Bell Labs
Text to Speech Synthesis
http://www.bell-labs.com/projects/tts/voices.html
John Holmgren
and Michael Tanenblatt
- 3. Home Page of The
Loebner Prize--"The First Turing Test"
http://acm.org/~loebner/loebner-prize.htmlx
Hugh Loebner
- 4. SETL for Data
Processing on the Internet
http://cs.nyu.edu/~bacon/survey-pap
David Bacon
Acknowledgements
Some of the people who made key contributions to the
development of ALICE deserve special recognition: David Bacon
, Ian Barclay
, Ken
Goldberg , Sage Greco , Tyra Baker
, and Mark
Cowperthwaite
The pedigree of the Spherical Pointing Motor includes such giants as Fred
Hansen, Eric Schwartz, Ben Bederson, Sergey Sokolov and Jon Selig.
Eric Moore and Ed Mackavitch of Lehigh University wrote the fundamental
image-capturing server.
Terry Boult and Lehigh University are gratefully acknolwdged for their
patience with and tolerance of this research.
No one could have found ALICE without the generous consideration of those
people who created links to the ALICE site. Some of the most significant
referers may be found here.
Thanks to David Powers of Flinders University, South Australia, and to
Kevin Sumption for work installing ALICE at the Powehouse Museum, Sydney.
I would also like to give special thanks to the the first 35,000 people who
chatted with ALICE and especially to those who contributed "clean" gossip.
Dr. Wallace was supported in part by a grant from the United States
National Science Foundation and a contract from the United States Departrment
of the Air Force.