You have reached the old ALICE documentation page.

Click here for the latest ALICE and AIML Documentation. Changed 6/05/99.

Click here to chat with ALICE.


Famous Original AIML and ALICE Documentation

Revised January 5, 1998

Dr. Richard S. Wallace
rsw@ime.net
Copyright 1995,1996,1997,1998 Dr. Richard S. Wallace. All rights reserved.


Download and Installation:

Heartfelt apologies that we offer no Microsoft-compatible version at this time. Follow these steps to activate the AIML interpreter and AITP and HTTP servers on Lynux and Unix machines:
  1. Download and install SETL
  2. Download the ALICE.tar file and expand it with tar xvf.
  3. Run setl new.setl to build lhs/ and rhs/ directories.
  4. Run setl index.setl to build index/ directory.
  5. (Optional: run setl alice.setl or judge.setl to test AIML program)
  6. Edit go.setl to choose host machine(s) and port numbers.
  7. Run go on the AIML server host machine.
  8. Run go on the http server host machine.

Introduction:

What controls ALICE is not a single program but a collection of autonomous clients and servers communicating via TCP/IP. This document reports on the Artificial Intelligence Markup Language (AIML) and its interpreter, along with the Artificial Intelligence Transfer Protocol (AITP) and a client and server implementing it. The client in our case is itself a server, specifically a limited HTTP server. In general there is no requirement that these programs run on the same machine, and in particular we usually run the HTTP server on one machine and the AITP server and AIML interpreter together on another (In the original example the HTTP host machine is alice.eecs.lehigh.edu and the http port number is 1991. The AITP server machine is emma.eecs.lehigh.edu and its well-known port number is 1978.). Both machines however share a common network file system, so the contents of our directory appear on both hosts.

While the idea of distributed control is not new in AI in robotics, the actual implementation of such a scheme is no longer very difficult, thanks to the evolution of the Internet and some elegant work by David Bacon adapting an old language, SETL, to today's web-dominated environment.

Both the AITP server and the AIML interpreter utilize co-processes Bacon calls pumps[4], esseintially command line data processing programs running under the control of the parent process. Three examples of pumps here are normalize, gawk and alice (an instance of the AIML interpreter), each of which may run as a command line program on its own or as a pump.

The figure shows a subset of the processes involved in a typical transaction with ALICE. The client (presumably, a human) utilizes a browser to connect to the server installed on the host Alice (port 1991) and transmits a query. The reply contains HTML markup for the video/telerobotic server and for the remote text-to-speech synthesis server. The browser interprets this HTML and initiates connections with the audio and video hosts.

The process on Alice:1991 is a forking server, a protective shield against misbehaved clients. Once the forked child receives the client's input query it initiates an atomic transaction with the server on Emma (port 1978). This transaction consists of two exchanges: first, the faux HTTP server reports the IP address of its client and second, it retransmits the client query in its original HTTP format (specifically, a GET method).

The program normalize runs as a command line pump. It processes one line of text input and removes all non-meaningful (to AIML) punctuation, converts to upper case, and replaces a number of contractions and acronyms with their expanded forms. The result is one line of normalized text, suitable for matching with AIML lhs patterns.

Another line pump is gawk. This is a gigantic pattern-matching case statement implemented as an awk script.


Files and directories:

The second figure illustrates the directory structure of ALICE. Initially the author writes AIML files into the knowledge directory. The program new.setl converts the AIML information into lhs pattern files and rhs output templates. The key idea here is that each AIML category has a unique name (either assigned by the author or assigned automatically by new.setl). That name becomes the identifier of the lhs pattern file and the rhs program filename. The awk script compares an input pattern with all the pattern strings, and print out the filename of the rhs template.

Directories:

Knowledge files:

We use the term "knowledge files" loosely to denote two classes of data files. First, the knowledge includes the AIML source programs, listed in the file all_knowledge. AIML files contain paragraphs of text optionally marked up with AIML. The remainder of the knowledge files are data files containing a variety of information including client names, IP addresses, geographic locations, topic, pronouns, acronyms, and proper nouns. The remainder of the knowledge files are created automatically by alice.setl.

Initialization programs:

The programs new.setl and index.setl may together be viewed as a "compiler" or preprocessor that transforms the AIML knowledge files into data structures, stored in the file system, later consulted by the interpreter.

AIML Interpreter programs:

The interpreter alice.setl utilizes a number of subprograms. Some of them, such as search.setl, are activated via ordinary filters and others work as pumps.

AITP Server Programs:

The AITP server uses the pump normalize.setl and also calls alice.setl as a pump. The name "AITP" is perhaps to fancy for what it really denotes, because the "protocol" is so simple. Like HTTP GET methods, the basic transaction in AITP is to accept a line of text and to return a response, also in our case a line of text. What distinguishes this protocol from others is that there are relatively few restrictions on the content of the input strings (because they are assumed to contain natural language) and the requirement that the client, i.e. our HTTP server, report the IP address of its client.

The AITP server then makes two transactions with the alice pump. First, it issues the string

MY IP IS SO.AND.SO
and then reads back an acknowledging response. Second, it passes along the normalized natural language query and reads back a response. The IP address utimately becomes a key into all the knowledge files that alice.setl uses to keep track of information about each client.

HTTP Server Program:

This program is not a true HTTP server but a program that accepts some types of HTTP requests (specifically, GET methods modified to accomidate natural language strings) and discards others.

The file TRAILER.HTML contains optional HTML markup appended to the bottom of the server's output. In our case the TRAILER.HTML file contains an ad for Eastport Internet Associates.

Utility programs:

The parent directory also contains a small number of utility programs for intialization, termination and monitoring purposes. It ought to be remarked here that the last of these utilities, what we call the Reductionist Onslaught, is perhaps the most valuable tool in this entire package; however those readers interested in a more philosophical discussion should read The Lying Game[1] or Notes on the Loebner Contest Grand Prize Rules.

Loebner Contest Interface

The program judge.setl provides a user command-line interaction with ALICE. The input-output and command format is compliant with the standard for the 1998 Loebner Contest [3].

This program works much like the AITP server, in the sense that it opens both alice.setl and normalize.setl as pumps. But instead of accepting text input from a client, judge.setl reads lines from the standard input and prints to the standard output. And unlike the single-line AITP transaction protocol, this text interface accepts multi-line queries. (Terminate queries with double newlines '\n\n').

Note that the rhs function get_ip() is somewhat ill-defined when running judge.setl. Care should be taken to see that (1) the servers are not running, because simultaneous client queries can change the value in knowledge/IP and thereby reset the client state, but (2) some string be written in the file knowledge/IP.

Log files:

These files will be created as a side effect of running ALICE. None of them is crucial and they may be deleted if taking up too much space.

AIML: Artificial Intelligence Markup Language

The basic unit of knowledge in AIML is a simple paragraph of text, separated from other paragraphs by a sequence of two or more newlines. At the root of the AIML interperter is the function search(), which utilizes the associative memory feature to identify paragraphs containing selected keywords, according to a random, information-theorteic matching criteria. All of the words found in an AIML paragraph become potential keywords for search() to match.

The first step in AIML markup, beyond typing in paragraphs of ordinary text(*), is adding the Left-Hand Side (lhs) pattern strings. We use the arrow symbol on its own line to separate the lhs from the original paragraph, which we now denote the Right-Hand Side (rhs) template. We shall return to the rhs momentarily.


(*)Note: "Ordinary Text" may include other types of markup, in particular large doses of HTML.


The pattern strings in the lhs are regular expressions. In all cases they consists of normalized (see normalize.setl) text containing only upper case letters and numbers plus a limited number of punctuation symbols. An input query matching one of these pattern strings results in an activation of the corresponding rhs template.

A-STORY.
A STORY 
---->
Once there was a man who faced a great crisis.  No matter what he
did, he could not seem to overcome his obstacles.  After trying
everything and asking everyone for help, the man finally gives up.
But at the last moment, one good characteristic of the man is
amplified by circumstances and he manages to save himself.

In this first example we see a category called "A-STORY." The optional name string appears as the first line and is terminated by a period (.). If no name is supplied, the interpreter assigns one automatically. The lhs contains the pattern strings A STORY. Thus this category would be activated by inputs such as TELL ME A STORY but not READ ME ANOTHER STORY.

The program index.setl inserts all the words from both the lhs and rhs into the associative memory database.

rhs markup includes a large set of string function expressions (see The AIML String Function Language below). These functional expressions return strings which are combined with ordinary (unmarked) text to produce the final text output.

The primary element of an AIML program is a statement delimited by a plus (+) sign. The effect of statement1+statement2 is to construct a string appending the result of evaluating statement1 and statement2. AIML also recognizes the structure of a "sentence", defined as a string terminated by a period (.).

Functional string expressions begin with a tilde (~) charachter. When AIML encounters a tilde-expression, the interpreter returns a string (sometimes null) resulting from the evaluation of the function expression. Some functional expressions including ifeq, narration, randomsent and endai also act as control block delimeters. More will be said about these shortly.

The functional language permits only four sorts of constants: input_string, pattern_string and integer and string constants. The latter are simple strings delimited by single-quotes ('). The constant input_string is bound to the value of the input query, and its companion pattern_string is bound to the pattern string in the lhs which matched this query.

CALL-ME.
^CALL ME 
---->
+~nullify(setname(subsent(input_string, 3, 3))) +
+~randomsent() +
Hi there .
Your name is.
Pleased to meet you.
Hi I'm ALICE, .
You are called .
OK I will call you .
+~endai() + 
+~getname() +.
In this second AIML paragraph CALL-ME. we see examples of statements, tilde-expressions, constants and a control block. The lhs contains only one pattern string, so pattern_string (not used in any case) would be bound to ^CALL ME.

The first statement extracts the third word from the input string using subsent(input_string, 3, 3) and stores it with the predicate assertion setname(). The function nullify() simply returns a null string but not before evaluating its argument for its side-effects.

The control block randomsent()...endai() sets all of the intermediate sentences to null except one picked randomly. The functions randomsent() and endai() themselves return null.

Finally the function getname() returns the value previously stored by setname().

YUH.
^YUH[ ]+$
---->
+~ifeq(subsent(that(), 1, 5),'DO YOU RUN WINDOWS 95 ') +
Do you like Microsoft products?  I think they have a lot of problems.
+~elsneq() +
+~sr('YUP ') +
+~endai() +
This third example YUH introduces the control block ifeq() and also the recursive AIML evaluation function sr(). The function that() returns the program's previous utterance. In this case ifeq() compares the first five words in that utterance with the string constant 'DO YOU RUN WINDOWS 95 '. If they are the same, then the interpreter returns the response beginning with the phrase 'Do you like Microsoft'. Otherwise the response is obtained by the result of evaluating sr('YUP ').

The name sr(x) denotes "stimulus-response" and, as the name implies, derives the response by applying the interpreter to x recursively. In this case another catergory called YUP is activated by the new input string 'YUP '.

The file KNOW.AIML contains about 1200 more examples of AIML paragraphs. Another 1200 or so automatically generated paragraphs appear in GOSSIP.AIML.


Running the Interpreter

The term "interpreter" is confusing here because we need to distinguish the AIML "interpreter" and the natural language interpreter alice.setl. The former is really the combination of two programs. AIML utilizes the unix awk or GNU gawk utility to speed classification of input sentences. The program new.setl creates this awk script from the lhs expressions in the AIML source. The output of the gawk script is a single name, the name of the category picked as the best match to the input string. That category name is also the name of an AIML program in the rhs directory.

The AIML source files are listed in the file all_knowledge and the program new.setl evaluates these and constructs the rhs and lhs directories, as well as the awk script awkfile.

The earlier figure illustrating the file system provides an example. The filenames in the lhs and rhs directories are the same, with the exception of the awk script awkfile which appears only in the lhs directory. The lhs files contain the pattern strings and the rhs files contain the corresponding AIML program fragments. If the seeker input MY NAME IS JOHN then the AIML interpreter evaluates the rhs program MY-NAME-IS-.

If the rhs program contains a search() then the interpreter calls search.setl which consults the index/ directory. The index is literally a multi-mapping from key words to the rhs paragraphs containing them.

The AIML interpreter evaluates the rhs program and returns a result back to the natural language program alice.setl. Thus in effect the AIML interpreter is "hidden" from the point of view of the natural language program.

Perhaps it would be better to call new.setl the "compiler" and alice.setl the "interpreter". In any case the NL program may be run from the command line by setl alice.setl. The link to the web is provided through the AITP server aitp_server.setl.


The AIML String Function Language

All AIML sring functions return a (possibly null) string. Some of them manipulate data files as a side effect.

Constants

String Manipulation Functions

Robot Control Functions

Robot control functons facilitate interaction with the Spherical Pointing Motor robot eye motor.

Learning and Information Management:

A large number of AIML functions are devoted to information storage and retreival. Most of these are simply special cases of the general functions setpred() and getpred(), which save in a knowledge file a list of predicates and the associated IP addresses of the clients having those attributes. For example, the function setname(n) is really equivalent to setpred('YNI',get_ip(),n).

Program Control Functions:

There are two types of program control available in AIML: block structures and the recurivse stimulus-response function sr().

Artificial Intelligence Functions

A final set of functions operate on natural language strings and produce natural language outputs.

References