About This Document
This document describes the organization and use of
an experimental Alicebot engine that won the
in October 2001. Ideally this document will be obsolete shortly, as the new features of
are integrated into the reference implementation and other Alicebot engines.
is meant to be little more than a platform for conducting experiments, not a piece of production quality software.
Table of Contents
What is Program dB?
What is NOT in Program dB
Download and run
How does Targeting work?
How are targets generated?
The Categories tab
The Inputs tab
The Spiral tab
Special bot predicates
Appendix A. Loebner Contest 2001 Data Entry Protocol and File Format
is an experimental Alicebot engine and server that includes code from both
This is the source code of the program that running at the Loebner contest held at the
London Science Museum
on 13 October 2001, where A.L.I.C.E. won her second Loebner prize.
was the original Java version of the Alicebot engine and server. When
took over Java development in 2001, and subsequently the A.L.I.C.E. AI Foundation sponsored the development of the AIML 1.0 refernce server, these efforts came to be known as
refers to the collection of AIML programs written in C/C++).
contains all the basic elements of an Alicebot engine and server:
an AIML parser and interpreter,
a Substitution preprocessor,
a Classifier to organize multiple clients,
a web server, and
specific Responder interfaces for HTML and the Loebner contest.
also includes a special "Targeting" graphical user interface.
In this context, "targeting" refers to the automatic detection of new patterns for which the robot does not already have specific answers. The proposed new pattern, along with that and topic patterns, are called a "target category" or simply, a "target". The botmaster may supply the answer, thereby transforming the target into a new AIML category.
The name "dB" was also selected in honor of
an early contributor to A.L.I.C.E. and AIML, whose signature is "dB".
The Program dB GUI contains no chat interface. The program runs a web server however, so you can chat with the bot via http://localhost:2001.
The AIML parser is AIML 1.0 compliant -- sort of. That is, it works for a selected subset of AIML that appears in the content of ALICE's brain.
There are severe restrictions on the nesting of template-side AIML tags.
It does not work for "general" AIML expressions. We need to either scrap the parser or document its limitations better.
There is no "AIMLWatcher" function. The AIML files are loaded once at launch time only.
The nice new HTML responder developed by Kim Sullivan is not included, nor is the "SmartResponder" developed by Jon Baer.
There is no third-party code whatsoever: no Jetty, no SQL server, no ANT, etc.
is 100% pure
GNU General Public License
It may not have all the features, but it is all home grown code.
has no "long term memory". It forgets all the clients each time it is shut down. This is partly the result of having no database, but seems to improve the overall performance of the server. The program can run for days at a time and remember thousands of clients in RAM.
Finally, program dB is really two applications: the Targeting server and the Loebner contest text-based interface. Although the two share much code, their functionality is quite different. The Targeting server has a GUI, runs a web server, and can handle multiple clients. The Loebner interface is text-based, has no functions other than chat, and works with only one client (or "judge") at a time. The Loebner program does not have any targeting function. The Targeting program and the Loebner program have different log file formats too.
is packaged in a single zip file. The best way to download it is to create a directory like
and unzip the package there. Make this the "current working directory".
The zip file expands into four directories:
src: source code
data: configuration data and gossip logging
lib: location of alicebot.jar file
brain: the AIML of A.L.I.C.E.'s brain
To run the targeting server on Linux:
java -classpath lib/alicebot org.alicebot.gui.ALICE data
java -classpath lib\alicebot.jar org.alicebot.gui.ALICE data
To run the Loebner contest interface on Linux, use:
java -classpath lib/alicebot org.alicebot.core.Loebner data
java -classpath lib\alicebot.jar org.alicebot.core.Loebner data
These launch commands may be placed in a shell script or batch file for easy startup later. (Examples are included in the download).
The "data" specifies the location of the data directory, which would be the current directory if not specified.
needs this value to locate the global parameters, HTML template files, and gossip log file.
is smart enough to figure out if it is running on Windows or Unix. If Windows, it looks for the file brain\Main.aiml. If Unix, it uses brain\Startup.aiml.
Run the Targeting server and have a chat with the bot through http://localhost:2001. The program should launch with the targeting panel tab already selected. After the server has collected some dialogue, you can browse the targets with the Next Target button.
Program dB Targeting Server
(Click to enlarge)
What is a target? For the purpose of this program, a target is a 9-tuple:
The client input, the value of that, and the current topic; plus
the matching pattern, that pattern, and topic pattern; plus
a suggested new pattern, that pattern, and topic pattern.
The nine values appear in the top three rows of text boxes in the targeting panel.
The first six values are not editable, but the botmaster may edit the new input pattern, that pattern and topic pattern.
The botmaster may also supply a new template. When the template is complete, saving the category with the Save button appends the new category to the file brain/new.aiml.
The Delete target button removes the target from the current set of proposed targets.
Next target browses to the next available target.
If you try to save a target without editing the template, an "Empty Template!" warning appears and the category is not saved.
Six editing buttons provide some shortcuts for writing the template:
in the template.
<random>: inserts the AIML template for a 3-element random selection.
<sr/>: inserts <sr/>.
<srai>: inserts <srai> </srai>.
Reduce: reduces the length of the input by one word, by removing the last word before the wildcard. For example, YOU ARE VERY * reduces to <srai>YOU ARE <star/></srai>. This reduction is commonly applied to eliminate adverbs and other logically useless words.
Clear: clears the template.
The perfect targeting algorithm has not yet been developed. Meanwhile, we rely on heuristics to select targets from the activated categories.
The A.L.I.C.E. brain,
at the time of this writing, contains 41,000 categories. In any given run of the server, however, typically only a few thousand of those categories are activated. Potentially, every activated category is a source of targets. If more than one input activated some category, then each of those inputs potentially forms a new target. The first step in targeting is to save all the activated categories and the inputs that activated them.
Program dB then applies six steps to each activated category to either accept it as a source of targets, or reject it:
If the template contains a <srai>INTERJECTION</srai>, then the pattern was either YES, NO, SO, or one of a number of other interjections. Usually these cases are the client saying YES or NO to a question the robot asked. Here the botmaster may add a sensible response to the client's YES or NO reply, so this makes a good target.
If the template contains a <srai>XFIND...</srai> then the pattern was either WHAT IS *, WHO IS *, WHERE IS * or some other similar default pattern for an information question. These are usually good, simple targets because the botmaster can often "look up" the answer in a dictionary or reference book, if s/he does not know the answer already.
But if the template has any other <sr/> or <srai>, this category is probably not a good source for targets. Most cases of <srai> reduce complex forms of inputs into simpler ones. The pattern DO YOU KNOW WHAT * IS, for example, always reduces to <srai>WHAT IS <star/></srai>. It is usually better to look for targets in the terminal patterns like WHAT IS *, that to write many specific new patterns based on reducible patterns like DO YOU KNOW WHAT * IS.
The A.L.I.C.E. brain
uses a special category with the pattern XFIND * to respond to information questions for which she does not have a specific answer. The XFIND category by itself does not produce useful targets, so we disregard targets from this category, even though the pattern contains a wild card.
Another default category of little use for targeting is the one with pattern CALL ME *. All of the name-telling categories such MY NAME IS *, MY REAL NAME IS *, and the one with <that>WHAT CAN I CALL YOU</that>, reduce to the CALL ME * category. Unless you are interested in writing special responses for many different people's names, then the CALL ME * category is not a good source of targets.
Finally, if none of the prevoius cases apply, the program considers whether the matched pattern contains a wildcard *. If the pattern is atomic, the category is not likely to be a good source of targets. It could be, if we considered <that>, as in the first case, or <topic>, but that would be too advanced for this algorithm. If the pattern contains a wildcard, then the category is likely a good source of targets.
If the matched pattern contains a wildcard, the suggested new pattern is generated as follows:
Align the input sentence with the matched pattern.
Create a new pattern by using the words from the first pattern, plus one more word from the input.
(1) Suppose the topic is "READY" and the following conversation fragment takes place:
Client: WHAT IS PIZZA
Robot: I have to process that one for a while.
The input matched the category with <pattern>WHAT IS A *</pattern>, generating a default response. The value of <that> is "OH". Browsing the targets, the botmaster sees:
Input: WHAT IS PIZZA, OH, READY
Matched: WHAT IS *, *, *
New AIML: WHAT IS PIZZA, *, *
The targeting algorithm extended the existing pattern by one word from the input, to obtain the new AIML pattern "WHAT IS PIZZA".
In the template window, the botmaster types:
Pizza is a delicious food made with dough, cheese and tomato sauce.
The botmaster may also click the <think> button to add the text
Pizza is a delicious food made with dough, cheese and tomato sauce.
But s/he needs to edit the text to achieve the desired result of remembering the topic is "PIZZA":
Pizza is a delicious food made with dough, cheese and tomato sauce.
After entering this text in the <template> text area, the botmaster may save the new category with the Save Category button.
(2) In the second example, the client entered "WHAT IS A PIZZA", the topic is "PIZZA", and the value of <that> is "I HAVE TO PROCESS THAT ONE FOR A WHILE".
The Targeting display shows
Input: WHAT IS A PIZZA, I HAVE TO PROCESS THAT ONE FOR A WHILE, PIZZA
Matched: WHAT IS A *, *, *
New AIML: WHAT IS A PIZZA, *, *
Here the targeting algorithm again extended the existing pattern by one word taken from the input, creating the new atomic pattern "WHAT IS A PIZZA".
The botmaster now needs only to click the <srai> button and add some text to create the template
<srai>WHAT IS PIZZA</srai>
since we previously added that category.
(3) In the third example we start with the following dialogue fragment:
Client: My name is Rich.
Robot: Rich, are you my master?
The topic is "READY". The botmaster views the display:
Input: YES, RICH ARE YOU MY MASTER, READY
Matched: YES, *, *
New AIML: YES, *, *
Here the crude targeting algorithm fails to create a new target pattern, but it has nevertheless detected a good new target. The botmaster edits the New AIML text boxes so that they contain:
New AIML: YES, * ARE YOU MY MASTER, *
Now s/he edits the template to read:
Okay, I will obey only you.
And then saves the new category. This example shows how targeting can be used to detect new <that> categories.
(4) In the next example, the client asks "WHAT IS THE IRON GIANT". The value of <that> is "PLEASE STAND BY", and the topic is "PIZZA".
The Targeting display shows:
Input: WHAT IS THE IRON GIANT, PLEASE STAND BY, PIZZA
Matched: WHAT IS THE *, *, *
New AIML: WHAT IS THE IRON *, *, *
In this case the algorithm extended the pattern by one word, but since the input still contained more words, the program made the new AIML pattern end with the wildcard *. This new pattern is not quite perfect, because the question WHAT IS THE IRON * is too general. So, the botmaster may choose to edit the display so that it shows:
New AIML: WHAT IS THE IRON GIANT, *, *
Then write a template, and save the result.
(5) In this example, the botmaster sees the display:
Input: OH, SEE YOU LATER, YOU
Matched: OH, *, *
New AIML: OH, *, *
This information is almost useless, so the botmaster discards the target with the delete target button. This ensures that the target will not reappear when browsing with next target.
(6) Now suppose the value of <that> is TELL ME, and the topic is A RIDDLE. The client enters "I COULD POSSIBLY GIVE YOU A HINT". The targeting display shows:
Input: I COULD POSSIBLY GIVE YOU A HINT, TELL ME, A RIDDLE
Matched: I COULD *, *, *
New AIML: I COULD POSSIBLY *, *, *
The botmaster recognizes that the word "POSSIBLY" plays little role in resolving the client's meaning, so s/he uses the Reduce button to create the template:
<srai>I COULD <star/></srai>
The effect of the new category is to eliminate the word "POSSIBLY" from all inputs beginning with "I COULD POSSIBLY".
The Categories tab displays a ranked histogram of all the activated categories. The most frequently activated category is ranked first, followed by the second, and so on. You can use the scroll bar to browse the categories once there are too many to display on the screen.
By selecting one of the categories, you can see all of the inputs
activating that category on the Inputs tab.
The Inputs tab displays all the inputs which activated a particular category. By default, it shows the data for the most-activated category (usually * : * : *). You can change the default by selecting one of the categories on the Categories tab.
By selecting one of the input samples, you can create a new target. The new target is displayed on the Target panel.
The Spiral tab is the least developed and most experimental portion of
Program dB. The purpose of the Spiral tab is to display the AIML categories in a spiral plot. Various options exist to change the spiral graph parameters.
Most bot predicates in
are set by the <srai>BOT XXX</srai> convention. You can find them by looking in the file brain\B.aiml.
however has a number of unconventional built-in bot predicates. These predicates are hard-wired in
so any AIML using them should not be considered "portable".
<bot name="vocabulary"/>: Returns the total number of words in the pattern vocabulary. The words in the templates are not counted.
<bot name="nclients"/>: The number of clients mapped in the Classifier.
<bot name="spt"/>: Server Processing Time: an estimate of the time to process each client transaction.
<bot name="hourlyqueries"/>: Hourly Queries: an estimate of the maximum number of queries per hour on this server.
<bot name="memory"/>: Memory: total RAM utilization of program dB.
<bot name="os"/>: Operating System: the name of the underlying operating system.
<bot name="arch"/>: Architecture: the name of the underlying CPU processor architecture.
The A.L.I.C.E. brain has a special test category with the pattern "BOT PROPERTIES" that will display the value of all the bot predicates.
The file data/predicates.data contains all the custom AIML predicates and their default values.
treats almost all predicates as custom predicates, even <that> and <topic>. This file sets the default value of <that> is set by the line:
that=What can I call you
uses the predicate=*default convention to specify those predicates which are intended to return the predicate name, rather than the value between the tags, when they are set. Some examples are:
This says that <set name="he"> returns "he", and the default value of <get name="he"/> is also "he".
Some predicates have intentionally "blank" values as their defaults:
The A.L.I.C.E. brain contains a special test category with the pattern "WHAT DO YOU KNOW ABOUT ME", which should display the values of all the custom predicates.
The file data/globals.data contains a few global parameters and their default values:
You can change any of these values by editing the file and
ClerkTimeout is used by the web server to determine how long the program waits for a remote client to respond to a TCP request, before assuming that the client is no longer active. The value is in milliseconds, and 10 seconds is more than sufficient even during times of network congestion.
ServerPort is the TCP port number used by the web server. By convention Alicebot programs have "squatted" on the well-known port number 2001 since 1998, although strictly speaking the number 2001 has already been reserved by a lesser-known service.
Targeting is a boolean parameter to determine whether the targeting algorithm runs or not. The Loebner application overrides this value and sets it to false.
TargetSkip is the number of client inputs between each new generation of targets. If it is set to 1, then new targets are generated after every client input. On a busy server, this may be too frequently. Set it to 1 for demos and higher values for multi-client situations.
BrainSize is an estimate of the total number of categories. The bot will reply with a special built-in default category, until the number of categories loaded is BrainSize. Set it to 0 (zero) if you want the bot to respond with its own AIML right away, even if the brain is not fully loaded. Be careful not to set this value to anything larger than the actual number of categories in your bot.
data/trailer.html: These files contain the HTML wrapped around the bot reply and input form.
data/gossip.data: This file logs all data recorded by the <gossip> tag.
dialog.data: The log file for conversations is stored in the bot home directory, not the data directory.
src/cc: A shell script with a command to recompile all the Java source files.
src/wrap: A shell script with a command to wrap all the Java class files into lib/alicebot.jar
data/gnu.data: The GNU General Public License.
The location of the data directory may be changed by specifying the directory as the first argument when running the program (see
Download and Run).
The protocol of interaction with the Loebner interface is specified by this appendix. The Loebner interface has the unique property that both client inputs and robot replies are "multiline"--that is, they may contain multiple lines of text separated by carriage returns. To terminate an input and "send" it to the robot, the client or "judge" must enter carriage return twice.
Key Entry - Terminal Display
Each Computer Entry program (also "program") will operate in two modes: "set-up" and "contest". Upon initial execution each program will operate in set-up mode.
Set-up mode will permit contest officials to prepare the program for the contest. To that end, while in set-up mode the program will:
request a file name to append the transcripts of the interaction to;
assume that "judge00" is at the terminal and prepare an initial comment;
prompt with a sign "+" requesting a response from "judge00" (contest official);
interact with "judge00" in the normal way with the sole difference that the "+" sign is used as the prompt;
write a transcript to the specified file as usual.
Upon receipt of the string "@@T"[CR][CR] the program will toggle to "contest" mode and behave as follows:
The key sequence "@","@","T",[CR][CR] when entered in response to a ">" or "+" prompt terminates the conversation with the current Judge, and will cause the program to clear the screen, prompt with a "?" and await the next Judge.
The key sequence "@","@","nn",[CR][CR] (nn=01-99) when entered in response to a ">" or "?" prompt will indicate that a new Judge, "JUDGEnn", is now entering data. Each Judge will thus identify himself/herself when moving to a new terminal.
The program will make an initial response to the Judge's input.
The program's response may be multiline.
Upon completion of the program's response, the program will prompt the Judge for input.
The computer will prompt the Judge with a ">" character and then echo on the screen, character by character, the Judge's entry.
Judges' questions and comments can be multiline. Each question or comment will be entered one line at a time. Each line will be terminated by a carriage return ([CR]). Judges will key in questions and comments in response to a ">" prompt from the program.
Entry of two consecutive carriage returns will indicate that the Judge's question or comment is complete and that the terminal (program or human) must respond.
The terminal will display character data in a monotype font (all characters of equal width).
The key sequence "@","@","X",[CR][CR] when entered in response to a ">" or "?" prompt will cause the program to exit.
EXAMPLE: Assume program is responding to current interaction with Judge 3.
[some program comment-completed]
>@@04 [CR](new Judge, number 4)
> [CR](two lines required)
Welcome Judge 4 (Comment by program (or human))
>Do you think that the [CR] (multiline question from Judge 4)
>Republicans can succeed [CR] (question continued line 2)
>in winning the White House? [CR] (question ended (first [CR])
>[CR] (second [CR])
Only if Newt succeeds in (answer, line 1)
Developing a more tolerant image. (answer, line 2)
> (Cursor waits for input ">' prompt - Judge to respond)
Data File Format
Intent: Each program entered in the Loebner Prize Contest will produce a text file transcript of the interactions with Judges. The file should be readable by standard text-reader programs.
Each Computer Entry in the 2001 Loebner Prize Contest will append to a text file on disk containing the transcript of keyboard input and program output. The program should discard the existing contents of the file, but should open the file in append mode (the file should be created automatically if it does not exist yet).
The file will be named according to the input during set-up operation.
The file will be in ASCII text format suitable for input into a standard word processing program.
The first three lines are headers containing the following:
(c)2001 Science Museum, London, all rights reserved
[Program Name] [Contestant Name]
Start at: [YYYY/MM/DD HH:MM:SS]
Each succeeding line will either mirror one line displayed on the screen, preceded by the source ("JUDGEnn") or "PROGRAM") and time in brackets or indicate a change of Judges: "***JUDGEnn***".
EXAMPLE: For the above interaction:
PROGRAM[14:12:25] Welcome judge 4
JUDGE04[14:12:32] Do you think that the
JUDGE04[14:12:39] Republicans can succeed
JUDGE04[14:12:55] in winning the White House?
PROGRAM[14:13:15] Only if Newt succeeds in
PROGRAM[14:13:17] developing a more tolerant image