A FORMAL SPECIFICATION OF AIML IN SETL (Working Draft 0.01 February 14, 2003) Authors: Dr. Richard S. Wallace Dr. Doubly Aimless First, Twenty-six SETL basics: A. SETL (Set Language) invented in 1969 by Jack Schwartz, a mathematician. B. SETL is based on well-known principles of set theory and math. logic. C. David Bacon is the caretaker of the language today. D. {} denotes a set. Sets contain only instance of each member item. E. [] is a tuple, the SETL equivalent of an array, queue, list or stack. F. # means "size of" or "length of". -- is for comments. G. om means "the undefined". From the Greek letter Omega. H. Strings can be either ' or " delimeted. I. Sets and tuples (and numbers, chars, strings and om) are the only data types or "objects" in SETL. J. val := T(3); means "assign the third element of tuple T to val." K. Tuple variabes may be assigned all at once, [x, y, z] := [A, B, C]; L. [x, y] := [y, x] means "swap the values of x and y". M. [1..n] means the n-tuple of digits from 1 to n. N. T(n..) means the sequence of elements in tuple T, beginning w/ nth. O. S := S with e; is the same as S := S + {e}; (+ is set union here). P. {x : x in S | P(x)} means "the set of all x, where x is a member of the set S such that predicate P(x) is true". The vbar "|" is "such that". The colon ":" also means "such that", but it actually constructs the set. Q. +/S means apply the operator + to the set of items in tuple or set S. R. A special case of a Set is the Map, or set of pairs (2-tuples). S. If M is a map then M(x) is the notation to "set" and "get" the value of M at the value x, as in z := M(x); or M(x) := z; T. domain M is the domain of M, and range M is the range (both are sets). The Map is from the domain set to the range set. U. if M is a multi-map (some x in domain M maps to multiple y's in range ) then M{x} refers to the set of all map values for x. V. getfile(f) returns the entire contents of f as one big string. W. unstr is a very intelligent mechanism for parsing strings in data. The unstr operator dynamically determines the data type from the string format. X. SETL uses regular expression matching unless magic = false. Y. Common typo errors: Using = instead of := and forgetting ;. Z. Documentation last seen at http://birch.eecs.lehigh.edu/~bacon WHY SETL? We would prefer to write an algorithm specification with pseudo-code, i.e. a combination of math, logic, English, and Algol syntax. But we can't easily transmit mathematical symbols through email, so a language like SETL closely approximates the result we desire. You could take the SETL program, replace the keyword "exists" with an upside-down "E", replace "+/" with the Greek letter capital Sigma, "in" with the set member symbol whatchamacallit, etc., and we would have pretty much the same thing. But SETL allows us to easily communicate concepts like "set membership" using only ASCII text charachters in email. I might as well write an automatic SETL-to-pseudocode translator for publication purposes, and never tell anyone I wrote the pseudo code in SETL! My exercise in writing SETL code is one person's attempt to write a formal specification for AIML. It is definitely not a standard approved or in any endorsed by the Foundation. I may propose something to the ArchComm later on, but for now it remains a research effort. There is no agreement among the experts about what constitutes a formal program specification. In academic journals, you often see algorithms expressed in pseudo-code, usually a mix of logic, set theory, English and Algol. SETL is the closest programming language (that I know of) to this idealized pseudo-code. The inventors and developers of SETL were highly influenced by the reductionist notion that all computations reduce to axiomatic set theory. The majority of the world of computer science went off and reinvented a number of mathematical wheels, and a few mathematically minded programmers said wait a minute, there is a simpler way. Take record fields for example. The record field is a special case of a Map. A "data type" is really just a predicate that is true if the data fits a particular format. Even humble concepts like list, stack, and queue reduce to formal definitions based on sets and maps. "Objects" and "Classes" are an almost embarassing redefinition of classic terms. Okay, I will get off my soapbox now. The point is, my SETL program is just one way to write a mathematical specification for AIML. At the very least, it will give us concrete code fragments to argue about. If someone wants to add feature F to AIML, we can nail down precisely, in a mathematical language, what F means, or argue that F means this, or that specific thing. David Bacon calls SETL the world's most luxurious programming language, and I have to agree. It's like riding to work in a limo, having a Sparcstation, and writing SETL code, compared to taking the bus, having a PC, and writing Visual C++ code. The amount of code required to implement the entire AIML standard is less than 1000 lines, much smaller than Java, and smaller even than Lisp. SETL hides all of the boring details of the implementation so that you are free to concentrate on the "high level" problem solving. Ok now I am really off the soapbox. The short answer to "why" is minimalism, reductionism, simplicity, elegance, beauty and luxury. It is also incredibly useful as a model for discussing the semantics of , / with defaults, whether "that" changes in an , how the star bindings change in an etc. If we can define what we mean using mathematical symbols, whether in SETL or pseudocode, then there will be no disagreement about what we mean. We can't deprecate the meaning of "Set" or "Map." A final idea about SETL: I see SETL, Lisp and Prolog (and maybe AIML) as three (or 4) points in a space of possible logic or specification languages. These are all cool languages in their own way. But no one, as far as I know, has yet found the perfect point in that space. Of SETL, Lisp and Prolog, SETL is the only one with a classic block structured Algol-style syntax familiar to programmers of Pascal and C. So, you should be grateful I am not pushing AIML specifications in Lisp! I have been working on a formal mathematical specification for AIML. I used the SETL language to write a program, M, that implements Graphmaster, matching, Template parsing, Substitutions, Autochat, Loebner contest interface, and Targeting, all in under 1000 lines of code. The source code is, as customary for us, free under the terms of the GPL. You can also try to find a SETL interpeter and actually run it. Beware, the only known implementation is Unix-only! See http://birch.eecs.lehigh.edu/~bacon. I actually considered releasing portions of program M and telling everyone it was "pseudo-code" rather than an actual program in a real language. The reason is, SETL is very close to a precise logical specification language. The point of this exercise is not to release a new, reliable, scalable, efficient AIML interpreter, but rather to try to pin down the precise meanings of each AIML tag. FORMAL SPECIFICATION OF AIML MultiLineResponse(client, input, Func): This is the top-level function for replying to a client input. The input is a string consisting of one or more sentences. The function Func is a routine for displaying, formatting or sending the replies. MultilineResponse splits the input into sentences, and obtains an answer for each one. Each answer may contain one or more sentences. The function Answer() processes input sentences one at a time. The replies depend on the state of the robot. The state is determined by the sequence of previous replies, but it does not depend on whether the replies resulted from one query or several, as long as the sequence of input sentences is the same. For example, the robot does not see any difference between these two sequences: (1) C: Hello. R: Hi there! How are you? C: My name is Rich. What are you? R: Rich, are you my master? I am the latest result in A.I. (2) C: Hello. My name is Rich. What are you? R: Hi there! How are you? Rich, are you my master? I am the latest result in A.I. In either case, the history of "input" and "that" is exactly the same sequences of values: that = [['I am the latest result in A.I'],['Rich, are you my master'], ['Hi there', 'How are you']] input = ['what are you', 'My name is Rich','Hello'] At present, AIML has no defined way of indexing the top-level multiline inputs. We may add this feature in the future. For now, the input sequence is stored in a one-dimensional array or table. The AIML code refers to the Nth element of the table. The output that values are stored in a two-dimensional array or table. The AIML code refers to the Nth element of the Mth term in the "that" table. Cross reference this code with the following AIML Parser code fragment dealing with and : ... inputs := Predicates(client, 'inputs'); thats := Predicates(client, 'thats'); ... elseif exists k in [1..DialogDepth] | match(remainder, '') /= '' then result := inputs(k); --- get answer from inputs table. elseif exists k in [1..DialogDepth] | exists m in [1..DialogDepth] | match(remainder, '') /= '' then result := thats(k)(m); --- get answer from thats table. ... The constant DialogDepth is the limit on the history of the remembered conversation. Note: match(X, Y) /= '' means Y starts with X. The term remainder stands for the remainder of the selected template. Here is the specification for MultiLineResponse. The arguments are the client id, his input, and a function to display or transmit the result. This code is equivalent to the Responders in programs B and D. proc MultiLineResponse(client, input, Func); multiline := Sentenceai(input); -- Split the input into sentences. for sentence in multiline loop; -- Reply to each sentence [reply, path] := Answer(sentence, 'localhost'); call (Func, sentence, reply); -- Do something with the response. end loop; end proc; Here are two sample display functions for MultiLineResponse: proc Lprint(x, y); print(prompt+' '+y); end proc; -- w/prompt proc Mprint(x, y); print(y); end proc; -- no promt. The Answer routine provides the robot reply to a client sentence or query. The input is a single sentence. The output reply may consist of one or more sentences. Answer() also takes care of updating the stored histories of "that" and "topic". Essentially the program stores a queue of the last DialogDepth inputs and replies. proc Answer(input, client); stars := []; depth := 0; --- Initialize environment environ := [client, stars, matchpath, depth]; -- Matching environment. [reply, path] := Srai(input, environ); -- Match and evaluate template. inputs := Predicates(client, 'inputs'); -- Get input history. inputs := [input]+inputs; -- Prepend new input to history. inputs := inputs(1..DialogDepth); -- Discard oldest input from history. thats := Predicates(client, 'thats'); -- Get that history. thats := [Sentenceai(reply)]+thats; -- Prepend the reply to that history. thats := thats(1..DialogDepth); -- Discard oldest that from history. Predicates(client, 'inputs') := inputs; -- Save input history. Predicates(client, 'thats') := thats; -- Save that history. return([reply, path]); end proc; Note: the function Srai(input, environ) is the function of the AIML tag. The environment variable environ stores information needed by the template parser to evaluate the AIML template. One way to test the functionality of the "that" and "input" history is to use them to display a short dialog same. The Dialogai function displays the last few lines of dialog. It uses the stored values of "input" and "that" proc Dialogai(client); thats := Predicates(client, 'thats'); inputs := Predicates(client, 'inputs'); for i in [#thats-i+1 : i in [1..#thats]] loop; print('C: '+inputs(i)+"."); --- The ith input goes with sentences := thats(i); --- the ith "that" response. for x in sentences loop; print('R: '+x+'.'); end loop; end loop; flush(1); end proc; Here is my proposed semantic definition of in pseudo-code (a.k.a. SETL) --- Srai:- "symbolic reduction" in AIML, also known as "syntactic rewrite", --- "simple recursion", "stimulus-response" or "synonym resolution." proc Srai(x, environ); [client, stars, matchpath, depth] := environ; topic := Normalize(Predicates(client, 'topic')); thats := Predicates(client, 'thats'); -- Thats may be stored as an array in the Predicates if thats(1) /= om and #thats(1) > 0 then that := Normalize(thats(1)(#thats(1))); else that := 'Undefined'; end if; x := Response(environ, x); -- First, parse and process any AIML tags inside x. x := Normalize(x); -- Then, normalize the result. if x = om or x = '' then return ['You said nothing','']; end if; input := x; path := split(x+" that "+that+" topic "+topic, ' '); -- construct new input path stars := []; matchpath := ''; result := [template, stars, matchpath]; Matchai(ROOT, path, 1, result); --- Find the new matching category [template, stars, matchpath] := result; environ := [client, stars, matchpath, depth+1]; reply := Response(environ, template); --- Evaluate the template of the matching category. if reply = '' then reply := 'I said nothing.'; end if; reply ?:= 'I said OM.'; reply := Formalize(reply); ---- Make a pretty sentence out of the reply. return [reply, matchpath]; ---- Return the reply and the matching graphmaster path. end proc; As you can see, if in x the text x contains no AIML markup, then the Srai pseudo-code will capitalize x. Therefore, there is no need for the botmaster to capitalize it. The A.L.I.C.E. brain has examples of both forms, but I prefer the capitalized style because I find it easier to read. GRAPHMASTER MATCHING I suppose I should call these "proposed specs". --- Matchai: Find the matching category for a client input --- by searching the Graphmaster graph. -- The argument binding is a vector of three elements: -- The template, an array of star bindings, and the maching path. -- For example: -- result := [template, stars, matchpath]; -- Matchai(root, path, 1, result); -- places the detected template, stars and binding in the -- 3-tuple called result. --- --- The simplest stripped-down version of AIML pattern matching --- returns only true or false, indicating whether a match exists. --- The map G stores the graph. See AddCategory --- --- proc Matchai(node, path, index) --- if index > #path then --- if G(node, 'template') /= om then return true; --- else return false; --- end if; --- end if; --- word := path(index); --- if G(node, 'under') /= om and --- exists i in [index+1..#path+1] | Matchai(G(node, 'under'), path, i) --- then return true; --- elseif G(node, 'star') /= om and --- exists i in [index+1..#path+1] | Matchai(G(node, star), path, i) --- then return true; --- else return false; --- end if; --- end proc; --- The next step is to add a binding environment for the --- template, wildcard values, and matching node. --- --- proc Matchai(node, path, index, rw binding); -- binding resides at a --- [template, stars, matchnode] := binding; -- specific memory address. --- if index > #path then --- if G(node, 'template') /= om then --- template := G(node, 'template'); --- matchnode := node; --- binding := [template, [], matchnode]; --- return(true); --- else return(false); --- end if; --- end if; --- word := path(index); --- if G(node, 'under') /= om and --- exists i in [index+1..#path+1] | Matchai(G(node, 'under'), path, i, binding) --- then --- phrase := flatten(path, index, i-1); --- stars := [phrase]+stars; --- binding := [binding(1), stars, binding(3)]; --- return true; --- elseif G(node, 'word') /= om and --- Matchai(G(node, 'word'), path, index+1, binding) --- then --- return true; --- elseif G(node, 'star') /= om and --- exists i in [index+1..#path+1] | --- Matchai(G(node, star), path, i, binding) --- then --- phrase := flatten(path, index, i-1); --- stars := [phrase]+binding(2); --- binding := [binding(1), stars, binding(3)]; --- return true; --- else return false; --- end if; --- --- --- The final implementation uses the filesystem to store --- the graph. proc Matchai(node, path, index, rw binding); [template, stars, matchpath] := binding; if index > #path then if fexists(node+'/template') then template := getfile(node+'/template'); matchpath := node; template := Trimai(template); binding := [template, [], matchpath]; return(true); else return(false); end if; end if; word := path(index); if fexists(node+'/under') and exists i in [index+1..#path+1] | Matchai(node+'/under', path, i, binding) then phrase := flatten(path, index, i-1); stars := [phrase]+stars; binding := [binding(1), stars, binding(3)]; return true; elseif fexists(node+'/'+word) and Matchai(node+'/'+word, path, index+1, binding) then return true; elseif fexists(node+'/star') and exists i in [index+1..#path+1] | Matchai(node+'/star', path, i, binding) then phrase := flatten(path, index, i-1); stars := [phrase]+binding(2); binding := [binding(1), stars, binding(3)]; return true; else return false; end if; end proc;