[an error occurred while processing this directive]

Don't Read Me:
Program dB

Dr. Richard S. Wallace

Last revised 28 October 2001

[an error occurred while processing this directive] Dr. Richard S. Wallace [an error occurred while processing this directive]

About This Document

This document describes the organization and use of Program dB, an experimental Alicebot engine that won the Loebner Prize in October 2001. Ideally this document will be obsolete shortly, as the new features of Program dB are integrated into the reference implementation and other Alicebot engines. Program dB is meant to be little more than a platform for conducting experiments, not a piece of production quality software.

Table of Contents

  1. What is Program dB?

  2. What is NOT in Program dB

  3. Download and run

  4. How does Targeting work?

  5. How are targets generated?

  6. Some examples

  7. The Categories tab

  8. The Inputs tab

  9. The Spiral tab

  10. Special bot predicates

  11. Custom predicates

  12. Globals

  13. Other files

Appendix A. Loebner Contest 2001 Data Entry Protocol and File Format

1. What is Program dB?

Program dB is an experimental Alicebot engine and server that includes code from both Program D and Program B. This is the source code of the program that running at the Loebner contest held at the London Science Museum on 13 October 2001, where A.L.I.C.E. won her second Loebner prize. Program B was the original Java version of the Alicebot engine and server. When Jon Baer took over Java development in 2001, and subsequently the A.L.I.C.E. AI Foundation sponsored the development of the AIML 1.0 refernce server, these efforts came to be known as "Program D". ("Program C" refers to the collection of AIML programs written in C/C++).

Program dB contains all the basic elements of an Alicebot engine and server:

Program dB also includes a special "Targeting" graphical user interface.

In this context, "targeting" refers to the automatic detection of new patterns for which the robot does not already have specific answers. The proposed new pattern, along with that and topic patterns, are called a "target category" or simply, a "target". The botmaster may supply the answer, thereby transforming the target into a new AIML category.

The name "dB" was also selected in honor of David Bacon, an early contributor to A.L.I.C.E. and AIML, whose signature is "dB".

2. What is NOT in Program dB

The Program dB GUI contains no chat interface. The program runs a web server however, so you can chat with the bot via http://localhost:2001.

The AIML parser is AIML 1.0 compliant -- sort of. That is, it works for a selected subset of AIML that appears in the content of ALICE's brain.

There are severe restrictions on the nesting of template-side AIML tags.

It does not work for "general" AIML expressions. We need to either scrap the parser or document its limitations better.

There is no "AIMLWatcher" function. The AIML files are loaded once at launch time only.

The nice new HTML responder developed by Kim Sullivan is not included, nor is the "SmartResponder" developed by Jon Baer.

There is no third-party code whatsoever: no Jetty, no SQL server, no ANT, etc. Program dB is 100% pure GNU General Public License free software. It may not have all the features, but it is all home grown code.

Program dB has no "long term memory". It forgets all the clients each time it is shut down. This is partly the result of having no database, but seems to improve the overall performance of the server. The program can run for days at a time and remember thousands of clients in RAM.

Finally, program dB is really two applications: the Targeting server and the Loebner contest text-based interface. Although the two share much code, their functionality is quite different. The Targeting server has a GUI, runs a web server, and can handle multiple clients. The Loebner interface is text-based, has no functions other than chat, and works with only one client (or "judge") at a time. The Loebner program does not have any targeting function. The Targeting program and the Loebner program have different log file formats too.

3. Download and run

Program dB is packaged in a single zip file. The best way to download it is to create a directory like

C:\Alicebot (Windows)

or

/home/alicebot (Linux)

and unzip the package there. Make this the "current working directory".

The zip file expands into four directories:

To run the targeting server on Linux:

java -classpath lib/alicebot org.alicebot.gui.ALICE data

On Windows:

java -classpath lib\alicebot.jar org.alicebot.gui.ALICE data

To run the Loebner contest interface on Linux, use:

java -classpath lib/alicebot org.alicebot.core.Loebner data

On Windows:

java -classpath lib\alicebot.jar org.alicebot.core.Loebner data

These launch commands may be placed in a shell script or batch file for easy startup later. (Examples are included in the download).

The "data" specifies the location of the data directory, which would be the current directory if not specified. Program dB needs this value to locate the global parameters, HTML template files, and gossip log file.

Program dB is smart enough to figure out if it is running on Windows or Unix. If Windows, it looks for the file brain\Main.aiml. If Unix, it uses brain\Startup.aiml.

4. How does Targeting work?

Welcome to the A.L.I.C.E. AI Foundation

Promoting the adoption and development of Alicebot and AIML free software.

Run the Targeting server and have a chat with the bot through http://localhost:2001. The program should launch with the targeting panel tab already selected. After the server has collected some dialogue, you can browse the targets with the Next Target button.

Program dB Targeting Server (click to enlarge)

Program dB Targeting Server
(Click to enlarge)

What is a target? For the purpose of this program, a target is a 9-tuple:

The nine values appear in the top three rows of text boxes in the targeting panel.

The first six values are not editable, but the botmaster may edit the new input pattern, that pattern and topic pattern.

The botmaster may also supply a new template. When the template is complete, saving the category with the Save button appends the new category to the file brain/new.aiml.

The Delete target button removes the target from the current set of proposed targets.

Next target browses to the next available target.

If you try to save a target without editing the template, an "Empty Template!" warning appears and the category is not saved.

Six editing buttons provide some shortcuts for writing the template:

  1. <think>: inserts

    <think>
      <set name="it">
        <set name="topic">
          <person/>
        </set>
      </set>
    </think>

    in the template.

  2. <random>: inserts the AIML template for a 3-element random selection.

  3. <sr/>: inserts <sr/>.

  4. <srai>: inserts <srai> </srai>.

  5. Reduce: reduces the length of the input by one word, by removing the last word before the wildcard. For example, YOU ARE VERY * reduces to <srai>YOU ARE <star/></srai>. This reduction is commonly applied to eliminate adverbs and other logically useless words.

  6. Clear: clears the template.

5. How are targets generated?

The perfect targeting algorithm has not yet been developed. Meanwhile, we rely on heuristics to select targets from the activated categories.

The A.L.I.C.E. brain, at the time of this writing, contains 41,000 categories. In any given run of the server, however, typically only a few thousand of those categories are activated. Potentially, every activated category is a source of targets. If more than one input activated some category, then each of those inputs potentially forms a new target. The first step in targeting is to save all the activated categories and the inputs that activated them.

Program dB then applies six steps to each activated category to either accept it as a source of targets, or reject it:

  1. If the template contains a <srai>INTERJECTION</srai>, then the pattern was either YES, NO, SO, or one of a number of other interjections. Usually these cases are the client saying YES or NO to a question the robot asked. Here the botmaster may add a sensible response to the client's YES or NO reply, so this makes a good target.

  2. If the template contains a <srai>XFIND...</srai> then the pattern was either WHAT IS *, WHO IS *, WHERE IS * or some other similar default pattern for an information question. These are usually good, simple targets because the botmaster can often "look up" the answer in a dictionary or reference book, if s/he does not know the answer already.

  3. But if the template has any other <sr/> or <srai>, this category is probably not a good source for targets. Most cases of <srai> reduce complex forms of inputs into simpler ones. The pattern DO YOU KNOW WHAT * IS, for example, always reduces to <srai>WHAT IS <star/></srai>. It is usually better to look for targets in the terminal patterns like WHAT IS *, that to write many specific new patterns based on reducible patterns like DO YOU KNOW WHAT * IS.

  4. The A.L.I.C.E. brain uses a special category with the pattern XFIND * to respond to information questions for which she does not have a specific answer. The XFIND category by itself does not produce useful targets, so we disregard targets from this category, even though the pattern contains a wild card.

  5. Another default category of little use for targeting is the one with pattern CALL ME *. All of the name-telling categories such MY NAME IS *, MY REAL NAME IS *, and the one with <that>WHAT CAN I CALL YOU</that>, reduce to the CALL ME * category. Unless you are interested in writing special responses for many different people's names, then the CALL ME * category is not a good source of targets.

  6. Finally, if none of the prevoius cases apply, the program considers whether the matched pattern contains a wildcard *. If the pattern is atomic, the category is not likely to be a good source of targets. It could be, if we considered <that>, as in the first case, or <topic>, but that would be too advanced for this algorithm. If the pattern contains a wildcard, then the category is likely a good source of targets.

    If the matched pattern contains a wildcard, the suggested new pattern is generated as follows:

    • Align the input sentence with the matched pattern.

    • Create a new pattern by using the words from the first pattern, plus one more word from the input.

6. Some examples

(1) Suppose the topic is "READY" and the following conversation fragment takes place:

Robot: Oh
Client: WHAT IS PIZZA
Robot: I have to process that one for a while.

The input matched the category with <pattern>WHAT IS A *</pattern>, generating a default response. The value of <that> is "OH". Browsing the targets, the botmaster sees:

Input: WHAT IS PIZZA, OH, READY
Matched: WHAT IS *, *, *
New AIML: WHAT IS PIZZA, *, *

The targeting algorithm extended the existing pattern by one word from the input, to obtain the new AIML pattern "WHAT IS PIZZA".

In the template window, the botmaster types:

Pizza is a delicious food made with dough, cheese and tomato sauce.

The botmaster may also click the <think> button to add the text

Pizza is a delicious food made with dough, cheese and tomato sauce.
<think>
  <set name="it">
    <set name="topic">
      <person/>
    </set>
  </set>
</think>

But s/he needs to edit the text to achieve the desired result of remembering the topic is "PIZZA":

Pizza is a delicious food made with dough, cheese and tomato sauce.
<think>
  <set name="it">
    <set name="topic">
      PIZZA
    </set>
  </set>
</think>

After entering this text in the <template> text area, the botmaster may save the new category with the Save Category button.

(2) In the second example, the client entered "WHAT IS A PIZZA", the topic is "PIZZA", and the value of <that> is "I HAVE TO PROCESS THAT ONE FOR A WHILE".

The Targeting display shows

Input: WHAT IS A PIZZA, I HAVE TO PROCESS THAT ONE FOR A WHILE, PIZZA
Matched: WHAT IS A *, *, *
New AIML: WHAT IS A PIZZA, *, *

Here the targeting algorithm again extended the existing pattern by one word taken from the input, creating the new atomic pattern "WHAT IS A PIZZA".

The botmaster now needs only to click the <srai> button and add some text to create the template

<srai>WHAT IS PIZZA</srai>

since we previously added that category.

(3) In the third example we start with the following dialogue fragment:

Client: My name is Rich.
Robot: Rich, are you my master?
Client: Yes.
Robot: Oh.

The topic is "READY". The botmaster views the display:

Input: YES, RICH ARE YOU MY MASTER, READY
Matched: YES, *, *
New AIML: YES, *, *

Here the crude targeting algorithm fails to create a new target pattern, but it has nevertheless detected a good new target. The botmaster edits the New AIML text boxes so that they contain:

New AIML: YES, * ARE YOU MY MASTER, *

Now s/he edits the template to read:

Okay, I will obey only you.

And then saves the new category. This example shows how targeting can be used to detect new <that> categories.

(4) In the next example, the client asks "WHAT IS THE IRON GIANT". The value of <that> is "PLEASE STAND BY", and the topic is "PIZZA".

The Targeting display shows:

Input: WHAT IS THE IRON GIANT, PLEASE STAND BY, PIZZA
Matched: WHAT IS THE *, *, *
New AIML: WHAT IS THE IRON *, *, *

In this case the algorithm extended the pattern by one word, but since the input still contained more words, the program made the new AIML pattern end with the wildcard *. This new pattern is not quite perfect, because the question WHAT IS THE IRON * is too general. So, the botmaster may choose to edit the display so that it shows:

New AIML: WHAT IS THE IRON GIANT, *, *

Then write a template, and save the result.

(5) In this example, the botmaster sees the display:

Input: OH, SEE YOU LATER, YOU
Matched: OH, *, *
New AIML: OH, *, *

This information is almost useless, so the botmaster discards the target with the delete target button. This ensures that the target will not reappear when browsing with next target.

(6) Now suppose the value of <that> is TELL ME, and the topic is A RIDDLE. The client enters "I COULD POSSIBLY GIVE YOU A HINT". The targeting display shows:

Input: I COULD POSSIBLY GIVE YOU A HINT, TELL ME, A RIDDLE
Matched: I COULD *, *, *
New AIML: I COULD POSSIBLY *, *, *

The botmaster recognizes that the word "POSSIBLY" plays little role in resolving the client's meaning, so s/he uses the Reduce button to create the template:

<srai>I COULD <star/></srai>

The effect of the new category is to eliminate the word "POSSIBLY" from all inputs beginning with "I COULD POSSIBLY".

7. The Categories tab

The Categories tab displays a ranked histogram of all the activated categories. The most frequently activated category is ranked first, followed by the second, and so on. You can use the scroll bar to browse the categories once there are too many to display on the screen.

By selecting one of the categories, you can see all of the inputs activating that category on the Inputs tab.

8. The Inputs tab

The Inputs tab displays all the inputs which activated a particular category. By default, it shows the data for the most-activated category (usually * : * : *). You can change the default by selecting one of the categories on the Categories tab.

By selecting one of the input samples, you can create a new target. The new target is displayed on the Target panel.

9. The Spiral tab

The Spiral tab is the least developed and most experimental portion of Program dB. The purpose of the Spiral tab is to display the AIML categories in a spiral plot. Various options exist to change the spiral graph parameters.

10. Special bot predicates

Most bot predicates in Program dB are set by the <srai>BOT XXX</srai> convention. You can find them by looking in the file brain\B.aiml. Program dB however has a number of unconventional built-in bot predicates. These predicates are hard-wired in Program dB, so any AIML using them should not be considered "portable".

The A.L.I.C.E. brain has a special test category with the pattern "BOT PROPERTIES" that will display the value of all the bot predicates.

11. Custom predicates

The file data/predicates.data contains all the custom AIML predicates and their default values. Program dB treats almost all predicates as custom predicates, even <that> and <topic>. This file sets the default value of <that> is set by the line:

that=What can I call you

Program dB uses the predicate=*default convention to specify those predicates which are intended to return the predicate name, rather than the value between the tags, when they are set. Some examples are:

he=*he
her=*her
him=*him
ihr=*ihr
it=*it

This says that <set name="he"> returns "he", and the default value of <get name="he"/> is also "he".

Some predicates have intentionally "blank" values as their defaults:

password=
cat=
wife=
friend=

The A.L.I.C.E. brain contains a special test category with the pattern "WHAT DO YOU KNOW ABOUT ME", which should display the values of all the custom predicates.

12. Globals

The file data/globals.data contains a few global parameters and their default values:

ClerkTimeout=10000
ServerPort=2001
Targeting=true
TargetSkip=10
BrainSize=40000

You can change any of these values by editing the file and restarting Program dB.

ClerkTimeout is used by the web server to determine how long the program waits for a remote client to respond to a TCP request, before assuming that the client is no longer active. The value is in milliseconds, and 10 seconds is more than sufficient even during times of network congestion.

ServerPort is the TCP port number used by the web server. By convention Alicebot programs have "squatted" on the well-known port number 2001 since 1998, although strictly speaking the number 2001 has already been reserved by a lesser-known service.

Targeting is a boolean parameter to determine whether the targeting algorithm runs or not. The Loebner application overrides this value and sets it to false.

TargetSkip is the number of client inputs between each new generation of targets. If it is set to 1, then new targets are generated after every client input. On a busy server, this may be too frequently. Set it to 1 for demos and higher values for multi-client situations.

BrainSize is an estimate of the total number of categories. The bot will reply with a special built-in default category, until the number of categories loaded is BrainSize. Set it to 0 (zero) if you want the bot to respond with its own AIML right away, even if the brain is not fully loaded. Be careful not to set this value to anything larger than the actual number of categories in your bot.

13. Other files

The location of the data directory may be changed by specifying the directory as the first argument when running the program (see Download and Run).

Appendix A.
Loebner Contest 2001 Data Entry Protocol and File Format

The protocol of interaction with the Loebner interface is specified by this appendix. The Loebner interface has the unique property that both client inputs and robot replies are "multiline"--that is, they may contain multiple lines of text separated by carriage returns. To terminate an input and "send" it to the robot, the client or "judge" must enter carriage return twice.

Key Entry - Terminal Display

Each Computer Entry program (also "program") will operate in two modes: "set-up" and "contest". Upon initial execution each program will operate in set-up mode.

Set-up mode

Set-up mode will permit contest officials to prepare the program for the contest. To that end, while in set-up mode the program will:

  1. request a file name to append the transcripts of the interaction to;

  2. assume that "judge00" is at the terminal and prepare an initial comment;

  3. prompt with a sign "+" requesting a response from "judge00" (contest official);

  4. interact with "judge00" in the normal way with the sole difference that the "+" sign is used as the prompt;

  5. write a transcript to the specified file as usual.

Contest mode

Upon receipt of the string "@@T"[CR][CR] the program will toggle to "contest" mode and behave as follows:

  1. The key sequence "@","@","T",[CR][CR] when entered in response to a ">" or "+" prompt terminates the conversation with the current Judge, and will cause the program to clear the screen, prompt with a "?" and await the next Judge.

  2. The key sequence "@","@","nn",[CR][CR] (nn=01-99) when entered in response to a ">" or "?" prompt will indicate that a new Judge, "JUDGEnn", is now entering data. Each Judge will thus identify himself/herself when moving to a new terminal.

  3. The program will make an initial response to the Judge's input.

  4. The program's response may be multiline.

  5. Upon completion of the program's response, the program will prompt the Judge for input.

  6. The computer will prompt the Judge with a ">" character and then echo on the screen, character by character, the Judge's entry.

  7. Judges' questions and comments can be multiline. Each question or comment will be entered one line at a time. Each line will be terminated by a carriage return ([CR]). Judges will key in questions and comments in response to a ">" prompt from the program.

  8. Entry of two consecutive carriage returns will indicate that the Judge's question or comment is complete and that the terminal (program or human) must respond.

  9. The terminal will display character data in a monotype font (all characters of equal width).

  10. The key sequence "@","@","X",[CR][CR] when entered in response to a ">" or "?" prompt will cause the program to exit.

EXAMPLE: Assume program is responding to current interaction with Judge 3.

[some program comment-completed]
>@@04 [CR](new Judge, number 4)
> [CR](two lines required)
Welcome Judge 4 (Comment by program (or human))
>Do you think that the [CR] (multiline question from Judge 4)
>Republicans can succeed [CR] (question continued line 2)
>in winning the White House? [CR] (question ended (first [CR])
>[CR] (second [CR])
Only if Newt succeeds in (answer, line 1)
Developing a more tolerant image. (answer, line 2)
> (Cursor waits for input ">' prompt - Judge to respond)

Data File Format

Intent: Each program entered in the Loebner Prize Contest will produce a text file transcript of the interactions with Judges. The file should be readable by standard text-reader programs.

  1. Each Computer Entry in the 2001 Loebner Prize Contest will append to a text file on disk containing the transcript of keyboard input and program output. The program should discard the existing contents of the file, but should open the file in append mode (the file should be created automatically if it does not exist yet).

  2. The file will be named according to the input during set-up operation.

  3. The file will be in ASCII text format suitable for input into a standard word processing program.

  4. The first three lines are headers containing the following:
    (c)2001 Science Museum, London, all rights reserved
    [Program Name] [Contestant Name]
    Start at: [YYYY/MM/DD HH:MM:SS]

  5. Each succeeding line will either mirror one line displayed on the screen, preceded by the source ("JUDGEnn") or "PROGRAM") and time in brackets or indicate a change of Judges: "***JUDGEnn***".

    EXAMPLE: For the above interaction:

    ***JUDGE04***
    PROGRAM[14:12:25] Welcome judge 4
    JUDGE04[14:12:32] Do you think that the
    JUDGE04[14:12:39] Republicans can succeed
    JUDGE04[14:12:55] in winning the White House?
    PROGRAM[14:13:15] Only if Newt succeeds in
    PROGRAM[14:13:17] developing a more tolerant image

    [an error occurred while processing this directive]