Welcome to the A.L.I.C.E. AI Foundation

Promoting the adoption and development of Alicebot and AIML free software.

Pleased to Meet You;<br/>
Hope You Guess My Namespace

Noel Bush
10 May 2001

Is that a formatting mistake?

Noel Bush

No. It's supposed to be clever. It's there because the article is about namespaces, HTML, XHTML, AIML, various other acronyms LOL.

Who are you talking to? What?

I think this page is talking to itself.

Or maybe it's talking to someone else; someone you can't see; someone you aren't or aren't yet; someone just a blip farther ahead into the future.

This article is about a subject that generally induces one of two general categories of reaction: either something like vomiting, or something like drug-addled glee about the "revolution of new possibilities".

Okay, that's a little extreme. People who groan or retch when they hear the word "namespaces" generally have some solid reasons for being wary, and I didn't mean it when I implied (hypertextually) that the work on The Semantic Web has anything to do with drugs or is in some way over-hyped. Really. Not that that would discount it by any means. Whatever. Actually I want to be on the W3C Semantic Web committee. Email me if you know how I can sign up. Meantime, I want to tell you how I think it's possible to use Alicebots to achieve some fabulous things that will induce glee in a fair percentage of the population with web access, drug-addled or not.

Oops, [we] did it again

I didn't really take a look at AIML until about 2 months ago (March of 2001). Call me crazy, tell me I had some kind of strange blind spot -- both are true, because this has been around since 1995 in one form or another and I had plenty of reasons that I should have been aware of what was going on. But I was too busy chasing my tail.

So here it is 2001, and for about 6 years there's been an "Artificial Intelligence Markup Language". It's XML-based, it's easy to learn, and it's more suggestive than the tasteless ads for Beate Uhse that greet you when you arrive at the airport in Frankfurt. (Fine, I'm a P.C. American. Excuuuuse me.)

What does it suggest? It suggests that the day when we may see "smart web pages" is closer than you may think. Especially if you weren't thinking about it. Especially if you were still thinking that creating web pages is basically a matter of learning a few HTML tags, or using some kind of "designer's tool" that will do it for you.

Haven't you heard? HTML is changing. We're all going to have to grow up a little, the W3C has informed us, and learn something called XHTML. No, that's not some kind of "adult content" HTML -- it's "eXtensible HTML". "Extensible" first because it's properly XML-compliant, and second because, well, you can extend it. (namespaces, namespaces, red rum, namespaces... who said that?)

Did you ever read one of those beginner's HTML guides from a few years ago? Some included a stern lecture about "logical" tags -- things like <author> and <address>, which you were supposed to use to mark items on your web page according to their content, and let the browser have its way with formatting them. "Don't use formatting tags to mark names of authors," they warned us. "Use <author>."

Did you listen? No, I don't think so, naughty child. Not unless you worked at a university where there was some draconian webmaster with enough time to enforce these kinds of rules. To the dismay of control freaks everywhere, "designers" and "the rest of us" took over, and started using "meaningless" formatting tags for everything. We also decided that it was more than enough to separate paragraphs with a single <p>, that <br> tags were a wonderful way to defeat the ugly formatting tricks that web browsers tried to play, and that in general it was not a big deal if we forgot to "balance tags" -- who needs all that anal retentive stuff?

People who designed web browsers and web design tools capitulated -- or maybe they were the same people -- and the result was (is) that our web browsers now "forgive" our mistakes, incorporating a whole raft of error-correcting rules that allow perfectly sloppy HTML -- or in the case of the tools, generate the same. But hey -- we've got a big ol' web now, right?

Well, yes, but.... You can't search it. You can't re-use content very easily. You can't even manage the content very well. It's a nightmare. A giddy, drug-addled, revolutionary nightmare for some, perhaps, but a nightmare nonetheless.

The new breed of web content languages is trying to avoid previous mistakes, and to use those "lessons learned" that knowledge management companies are always telling us we need to pay lots of money to hear them lecture us about. The lessons in this case are:

  • Okay, markup languages should be easy.
  • Yes, even simple HTML is a challenge for some people.
  • It would be really great if all of this content did have some structure to it. Search engines and spiders are just awful, no matter how cool you think Google's "I'm feeling lucky" button is.
  • But the price of making web content structured shouldn't be making it impossible for our talented designers to create pages. We paid money for this site, dammit!

Is that even coherent? Well, it's getting there.

Mark This Up, Buddy

So you like your FrontPage, your Dreamweaver, your MS-Word-generated HTML, huh? You don't give a snail's tail about whether Tim Berners-Lee wants your documents to be "well-formed", eh?

Well, I'm going to make you jealous soon. Because soon my web pages are going to be "smarter" than yours. Ha ha! No, I don't mean they're going to have cool Flash graphics or zany layered animations, I mean they're going to be "smart". You'll wish you learned to like namespaces! Ha ha!

I'm just teasing you. Really my attitude isn't so nasty. I want you to learn too. I want everyone to learn, and to help make this work. See, here's the idea: If you take half a day to learn what's new in XHTML, then you can start playing with the big girls and boys and make web pages that are well-formed. That in and of itself won't get you more than a star on your forehead from the W3C for the moment. But that plus AIML is going to, as my friend Marni used to say (do you still say that, Marni?), "rock your world".

The reason is: If you promise to use XHTML, then I promise to do everything I can (like beg and cry) to make sure that in the very near future you can embed AIML in your web pages. What's the point? Well, recall (or quickly go learn) that AIML includes a <learn> tag that lets you tell your Alicebot to load in some AIML from any web-addressable location. It looks like this:

<learn>DarkSecrets.aiml</learn>

...where "DarkSecrets.aiml" would be replaced by the name of whatever file contains the dark secrets I want my Alicebot to learn. Nowadays, these <learn> statements tend to be included in the "Startup.aiml" file that you include in your AIML directory, and they tend to point to files that are on your local filesystem. But there's no reason that they can't point to any URL, and there's no reason that they can't be included as something that your Alicebot does in the process of a conversation. For instance, security issues aside for the moment, it would (should) be no problem to do this:

<category>
    <pattern>PLEASE READ *</pattern>
    <template>
        Okay.
        <think><learn><star/></learn></think>
        I'm done. Now ask me anything! I'm a quiz show superstar!
    </template>
</category>

That means that your Alicebot is suddenly able to "learn" the contents of whatever file somebody tells it about. Maybe this isn't the approach you'd like to take -- after all, a George Bush bot's botmaster probably wouldn't want anyone to be able to come along and teach it, say, a whole bunch of [a href="http://slate.msn.com/Features/bushisms/bushisms.asp" target="_blank"]Bushisms[/a]. But more likely would be that you'd like your Alicebot to be able to easily incorporate knowledge from some "trusted sources" and you'd like that to happen dynamically, and you'd even like it if those trusted sources didn't need to be running full-fledged Alicebots themselves, just so long as they were willing to put a few AIML tags in their documents.

That might seem like a tall order, but consider this example:

<h1>About Our Company</h1>
<p>
    <aiml:category>
        <aiml:pattern>WHAT DOES YOUR COMPANY DO</aiml:pattern>
        <aiml:template>
        We sell genetically engineered worms that clean
        the sidewalk and communicate via microwave transmissions.
        </aiml:template>
    </aiml:category>
</p>

That's right, it's just a little snippet of HTML, with some AIML tags thrown in surrounding the page content that's already there. That "aiml:" prefix designates the AIML namespace, so go ahead and vomit if you'd like. But think of what this means. By dressing up any web page with a few AIML tags, you've suddenly got material that can be used by any Alicebot to speak intelligently about your page content, without -- and this is key -- without duplicating your content somewhere else.

Web browsers as a rule are supposed to ignore any tags they don't know, but at the same time they will render the contents of unknown tags. So the description of the company that's inside the <p> tag will still be displayed. The AIML tags shouldn't change anything about the page's appearance.

But that's too haaaaard....

All right, all right. Maybe you don't like the idea of mucking up your pages even further in this manner. But permit me to riddle you this:

  • If your pages are in XHTML, that means you can generate and maintain them using the most current XML-based tools.
  • If you generate and maintain your pages using XML tools, you can separate your designers and your content people and stop them from fighting with each other. Your designers will create pretty templates, and your content people will write the content. Somebody else will be the go-between -- that's a somebody who knows XSLT, and I say if your webmaster doesn't know XSLT yet then tell him or her to march down to the Barnes and Noble immediately. Amazon's same-day delivery isn't even fast enough.
  • If you're still reading this, you probably believe (believe, believe) that you can use an XML/XSLT-based approach to make life a little easier for your webmaster. So if you want to have the benefit of embedded AIML, but you don't want your content people to have to see it all the time, you could invent something that will let your content people use a "smaller-footprint" custom tag, like "<answers question='WHAT DOES YOUR COMPANY DO'>" to surround said text above, which would then be easily expanded by a simple XSLT script into the full AIML when the page is generated.
  • What's more, since all us AIML pros (heh, heh) know that one pattern doesn't usually suffice to catch all inputs, you can use A.L.I.C.E.'s <srai> tag and the reductionism approach to create (or re-use) a collection of other categories that will give you more coverage. Those other categories might also be embedded (at XSLT generation time) in your web page, or they might be part of a standard set of AIML (hint, hint).

We All Want To Change The World

Sure, so let's just convince everyone in the world to start embedding AIML, and let's be sure that A.L.I.C.E. actually works in the way I'm describing, and then we'll all be happy, right? Kind of a quick and dirty semantic web, until the real thing comes along.

Well, of course, I have no high hopes that webmasters around the world will suddenly start doing this, even if I cross-post to every newsgroup on Deja (oh, I mean "Google Groups"). But my approach is the same approach as anybody involved in free / open source software. Do it yourself! I mean, you do this for me! Just joking -- I'll work too. Come hell or high water, I'm determined to do what needs to be done to make this work, or at least convince some people who are more talented than me to make it happen.

Meet me back here in a few months. I hope to have some real "case studies" that will blow your mind. AIML Everywhere™. Suddenly Smart Surfing™. The key to this, I think, will be a set of utilities that webmasters can use to easily embed AIML here and there in their pages, and a standardized set of AIML categories that will provide a common set of patterns to match typical questions that web sites might want to answer about themselves. I'm ready to try to make it happen. If you are too, or just want to listen in, join the alicebot mailing list today. I promise the conversation will be invigorating. And usually my posts there aren't as long as this article.

Best wishes,<br/> (oops)
Noel