Print Page - A bit of a different bot

Member's Experiments & Projects => AI Programming => Topic started by: Art on January 15, 2012, 12:46:54 pm

Title: A bit of a different bot
Post by: Art on January 15, 2012, 12:46:54 pm

Anyone ever experiment (program one as opposed to simply using) a bot (chatbot) that is able to read
and or extract useful textural information from an XML (structured) file. The bot would not only read
but be able to answer questions based on the text contained within the file.

I know a at least one or perhaps two such bots but have been toying with the idea and approach from
my limited programming point of view.

Appreciate any thoughts, ideas, code, inputs, or just chime in!!

Title: Re: A bit of a different bot
Post by: Bragi on January 15, 2012, 04:37:58 pm

I guess it depends on which level that the xml needs to be interpreted: is it the bot's internal file processing mechanism that needs to process the xml and extract the text-only part, which is than sent to the pattern matcher for further processing or does the pattern matcher itself need to process the xml?
-In the first case, you need a well defined xml file format so that it can be properly queried (hardcoded stuff, perhaps you can use xslt's to query the xml and have those xslt definitions dynamic?).
-For the second case: in the end, an xml file is also just a string, so if you have a pattern matcher that's powerful enough to handle the xml specification, you should be able to define a set of patterns that are able to read any type of xml and extract the text values from them, and use these for further processing.

A long time ago, I implemented a general purpose xml parser (that was part of the library for a programming language that I developed), so I think I have a fairly good idea what it takes to make one. The first thing that came to mind was 'recursion' (pretty obvious): an xml element can have other xml elements as children, so the pattern matcher needs a way to declare a pattern that refernces itself. All the file formats can also be a problem, but that should be handled by the bot's internal file loading.

I'm not certain this can be done with AIML since I don't know if it can handle recursion (can an AIML pattern reference another pattern or itself?). Furthermore, defining xml tags in xml files is tedious, at best.
From what I remember about the sourceforge, it can't have patterns that reference other patterns, so that's out.
I'm not certain about chatscript, though I would be suprised if it couldn't do recursion, so I think it can parse xml. (anyone knows this exactly?, I'd really like to know)

The pattern matching language that I am using is based on compiler generator techniques, and should be able to handle the full xml specification, though it would definitely be slower compared to a C# xml parser for instance (I think). Code size is probably about the same as a compiler generator like coco/r.
From the top of my head, it would look something like:

Code

TOPIC name : XMLElement
   Rule name: Element
     you say: <$FrontName {~XMLElement.attribs} >{~XMLElement.Element | $content} </$backName>
                <$FrontName/> 
    When: $BackName && ($frontName != BackName)
        bot says: there was an error in the xml formatting
    else
         bot says: $content:Evaluate
  Rule name: attrib
    Inputs:  $AttribName = \' $attribValue \'

This is just a rough sketch, untested, with big wholes and errors, just a basic start for a simple xml element. Key here is the recursion: ~XMLElement.Element which allows for nested xml elements.
also:
':Evalulate' is actually called differently, but I forgot the exact name, is for the next release anyway. It sends the text part back to the pattern matcher for further evaluation.
This scheme probably also only works if the bot has the ability to turn on/off certain patterns. For instance, if you first need to extract the text out of the xml and process this seperatly, you need to make certain that the patterns which handle the content, don't overrule the xml patterns. In my system, this can be done by turning on and off an entire topic.

Finally, if you use the neural network, anything is possible, you just need to code it out.

Title: Re: A bit of a different bot
Post by: Art on January 15, 2012, 08:05:09 pm

Thanks Jan for that very nice response.

Imagine yourself "crafting" or finding a decently crafted XML file.

Allow your bot to "examine" it.

Ask your bot anything about what it had just examined.

The bot would answer you to the best of it's ability based on what
it knows about the subject of that particular XML file.

Could this "knowledge" then be retained within a file or even so
other such files could be appended to creat an ever larger
knowledgebase?

One could define his / her own "expert system" in a manner of
speaking. I love to have one such bot just for conducting
experiments of this nature.

Appreciate your (and others) input! Thanks!!

Title: Re: A bit of a different bot
Post by: Bragi on January 16, 2012, 09:29:02 am

I did some thinking about this last night. It's actually an interesting concept. You could potentially 'feed' a bot with large amounts of data like this. One thing has been bothering me though. As it is now, with my pattern matcher at least, it tries to use every word in the input as the start of a pattern in parallel, which is great for regular input, but just overkill for an xml formatted file. It shouldn't cause different results, just to much processing. This can be solved though, either by using a different input channel (there is text, int, image,... a new one would be xml), or there can somehow be a switch in the regular pattern matching code to select between 2 modes: parallel or single shot.

You can capture the structure of the xml file as an asset or a thesaurus, and if you use a predefined xml file structure, you could even mix both. These assets/thesaurus structures can then be queried to retrieve information. I think that a lot depends on the structure of the xml file. You could make a topic that simply captures the 'raw' structure of the xml file, something like this:

Code

//note: this is untested, so pseudo code
when #bot.xml
   #bot.($FrontName).value = $content
   #bot.($FrontName) = #bot.xml
   #bot.xml = #bot.($FrontName)
   #bot -= $FrontName
else
   #bot.xml.($FrontName).value = $content

but probably, you'll want to interpret the data a little more, like with the google api, 'forecastInformation' can be used to look up the city in the database, and when not existing, create an asset for it. The second part can be used to store weather information in the asset.

Do you perhaps have a specific xml file format in mind?

Quote

Could this "knowledge" then be retained within a file or even so
other such files could be appended to creat an ever larger
knowledgebase?

Yes, once it's stored in the neural network, it remains in there until a delete instruction is performed. So you can 'join' the data of multiple xml files.

Title: Re: A bit of a different bot
Post by: Art on January 16, 2012, 10:58:01 am

Very interesting.

Once the data is stored then additional data is likewise stored, would the two files be stored as individual files or could they be merged together into one large (and potentially growing) file?

There would need to be certain parameters or even keywords for the bot to use so that, at a later date, it could go to a specific portion of said file to obtain the information and answer question in that regard.

Within each file there would have to be a "marker", if you will, for the bot to key on along with some filtering to weed out the extraneous garbage text / symbols, etc.

It would certainly make for an interesting and challenging experiment.

Title: Re: A bit of a different bot
Post by: Bragi on January 16, 2012, 01:24:38 pm

Quote

Once the data is stored then additional data is likewise stored, would the two files be stored as individual files or could they be merged together into one large (and potentially growing) file?

Sort of. The contents of the files will be merged into 1 dataset, but internally, the xml files are no longer used, but instead, they are stored in a binary form, split across multiple database files (currently 8 ). So, once the data has been imported, it is internally stored, and the xml files no longer have any purpose (other than retaining the data in text form).
From a usage point of view, it all depends on the structure used to store the content of the files: you can either mimic the file structure, or transform it into something different.

Quote

There would need to be certain parameters or even keywords for the bot to use so that, at a later date, it could go to a specific portion of said file to obtain the information and answer question in that regard.

probably. But if you write a parser for a specific xml file format so that it merges the data properly, it will simply become part of the general dataset. For instance, the google weather api (http://blog.programmableweb.com/2010/02/08/googles-secret-weather-api/), if you were to save the weather info into the asset that represents the city (and not just store the xml file 'as is'), something like so:

Code

$city = $CityName:ResolvePerson           //can also use $CityName:FindAssetFromValue(name)  which looks for an asset where $cityName is the value, 'name' is the attrib
#city.weather = $condition                  //store the 'cloudy' or other value in the 'weather' part'
#city.temp = $temp                            //store the actual temperature
//note: maybe also need to store some date info?

then you can return this info if a user asks for the weather in that city, like so (again untested, so pseudo code):

Code

input: what's the weather in $cityName
calculate:
  $city = $cityName:ResolvePerson
output when $city && #city.weather
     It is #city.weather in $cityName. It's currently #city.temp
 output when: $city
    I know that place, but I don't know what the weather is like over there.
 else: Never heard of $cityName

Note: in the code, I am switching between $city and #city This has to do with how you want to approach/use the variable content. When you write $value, it's just a regular variable, so when you assign to $city, the assigned value is temporarily retained in that variable name. #city is asset specific, so it will do some transformations to the variable content (like making certain that there is always just 1 item stored in the asset variable, regular vars like $city, can actually contain a list of values...
In short, when you want to store a temporary value for further calculation, use $xxx (like $city = xxxx) but if you need to do an asset operation, like store something as an asset value or retrieve an asset value, #xxx needs to be used. (takes some getting used to).

Here's something I could relatively easily do:
in the chatbot designer, add a new menu item to import any 'generic' file. When you do 'file/import/generic/ you can first select 1 or more files (xml or other type of text file) that need to be imported. Next, you select the topics that should be used for 'reading' and importing the content. Only patterns from those topics would be used during the pattern matching process. This could also be triggered from within the patterns: when sending some data back to the patternmatcher as input, you could specify the projects that are allowed to be used (this is relatively easy to add).

Also, as a side note: suppose you have 2 xml file formats which contain similar data, but formatted differently (with different labels and so). If you use thesaurus variables for the xml -element and attribute names, you can most likely (probably not always though) create 1 pattern definition able to handle both definitions. This could be something like:

Code

 <^Front:noun.WeatherTag {~XMLElement.attribs} >{~XMLElement.Element | $content} </$backName>
                <^Back:noun.WeatherTag/>

Title: Re: A bit of a different bot
Post by: Bragi on January 17, 2012, 10:55:46 am

something I forgot to mention: the designer application already supports 'asset xml' files. These allow you to import/export asset data. So if you have an xml file, you can transform it (using xslt or something else) into the asset xml format and import that.
I've also fixed the asset editor, so starting from the next release, you can edit this data like the attached image.

the xml file for the data in the image looks like:

Code

<?xml version="1.0" encoding="utf-8"?>
<Asset ID="93b3585d-66b0-4214-ab81-817d735b0f5e">
  <Name>user</Name>
  <Items>
    <Item>
      <Attribute>
        <Text Value="name" />
      </Attribute>
      <Data>
        <DataItem>
          <Meaning>Value</Meaning>
          <Value>
            <Text Value="jan" />
          </Value>
        </DataItem>
      </Data>
    </Item>
    <Item>
      <Attribute>
        <Text Value="birthday" />
      </Attribute>
      <Data>
        <DataItem>
          <Meaning>Value</Meaning>
          <Time>21/07/2011 0:00:00</Time>
        </DataItem>
      </Data>
    </Item>
    <Item>
      <Attribute>
        <Text Value="hand" />
      </Attribute>
      <Data>
        <DataItem>
          <Meaning>Value</Meaning>
          <Children ID="d0761ff6-8cb3-425b-a111-e3591d40dc46" IsRoot="False">
            <Item>
              <Attribute>
                <Text Value="location" />
              </Attribute>
              <Data>
                <DataItem>
                  <Meaning>Value</Meaning>
                  <Value>
                    <Text Value="left" />
                  </Value>
                </DataItem>
              </Data>
            </Item>
          </Children>
        </DataItem>
      </Data>
    </Item>
  </Items>
</Asset>

It's a bit verbose and this is just a subset of every possible element, but it describes the data in full details.

Title: Re: A bit of a different bot
Post by: Art on January 17, 2012, 09:47:31 pm

Very nice Jan, but will it be able to do what I've described in my opening posts? That is, answer a question based in the information contained within the XML (converted or not) document? As if having a conversation with the bot...you ask...it answers.

Thanks!

Title: Re: A bit of a different bot
Post by: Bragi on January 18, 2012, 09:11:55 am

yes, if you have the patterns and queries defined. For instance, suppose you have a bunch of records on the birthdays of famous people. you could transform each record into an asset like the one above (with the attributes 'name', 'birthday' and a 'value' data field for both) and import all of them into the chatbot.
Next, you need some patterns and queries like so:

Code

input: when was ^name:noun.name born?
calculate: $person = $name:resolvePerson
output when $person && #person.birthday
   $name was born on #person.birthday:month / #person.birthday:day / #person.birthday:year
 output when $person
   I have heard of $name, but I don't know his birthday.
 else
  I don't know ^name
//this is not tested, so pseudo again.

Title: Re: A bit of a different bot
Post by: claude2 on January 19, 2012, 09:01:45 pm

This is a very experienced Bragi. I learned, with the forum ZABAWARE, how magnified the script. It was easier in time Hal5, to open the program, and changed, the DataBank. But it is true that the extensions of a Chatbot, are very numerous. It's a passion, very special, and I learn a lot in that. Thanks to the help of many friends. I know many recipes in this exciting field. :)

Title: Re: A bit of a different bot
Post by: Merlin on February 10, 2012, 06:22:49 pm

I have found it easier to convert external knowledge files (like XML) into a simpler format and then use that to populate other knowledge files.
For example, AIML to TSV (Tab Separated Values). 1 column then becomes what to search for, the other the response.

Title: Re: A bit of a different bot
Post by: Bragi on February 17, 2012, 10:26:18 am

I finally got round testing this (it had been implemented a while ago). I've been (well, 'am') using it to test the imported wordnet data, by loading various files containing opposites, superlatives, .... that I collected over the years. This has resulted in lots of reloads after fixing things here and there. All in all, this has already been a great way for doing stress testing. Thanks Art.

Title: Re: A bit of a different bot
Post by: Carl2 on February 28, 2012, 09:21:16 pm

Art,
There is another one you may be interested in Paula SG http://www.paoloentertainment.com/software/main.htm (http://www.paoloentertainment.com/software/main.htm.)
There is a free trial version that stops after about 5 Min. and you have to restart the program. It has a looping video. Interesting program, the stock version has been on the internet and should not be used by minors but I've found you can remove all or almost all of her files and create chat logs as I am calling them and change her completly.
Carl2

Title: Re: A bit of a different bot
Post by: Art on February 29, 2012, 09:18:47 pm

Nice site / bot, Carl. I think I experimented with that one many years ago and it still might have some possibilities.

I modified your original posting and removed the period you had at the end of main.html. (causing the link to default to a 404 screen).
No worries mate! 'Tis all good!!

Nice find Carl!

Carl,
The last copyright notice for that site is / was dated 2002!! No wonder on doesn't get a response from any email inquiries!!
At least I tried to find out whether he had done any major updates or whether it was compatible with Windows 7, etc. for any interested parties.

Sorry about that but sometimes old things make way for new ones!!
???

Title: Re: A bit of a different bot
Post by: Carl2 on March 12, 2012, 12:35:06 am

Art,
It is compatable with Win 7, I downloaded it and registered it since I played with it for ages. I'm using a 32 In tv for a monitor, the display for Paula looks small on this screen. I believe I spent more time with Hal and I remember playing with some AIML bots. I think this time around Paula has more brainfiles. I'd like to see Paolo do something on the video, rather short looping video now.
I found you can change Paula by creating your own script for her and she takes it from there. Then the data builds up by chatting with her. I've been putting time in on Hal who looks like he is an the move, the animation is still not doing much because of money and a problem that was run into. Actually you may know more about this because of your tie with AI Dreams.
I wouldn't worry about the 2002, I downloaded and registered between 1 and 2 years ago. Never had a need to contact them. Download it , and give it a try. She has a lot of crap in her from being on the internet, I took out all the languages I did not know and that end that. Then I took out things I didn't like then put in things. Interesting.
Carl2

Ai Dreams Forum

Member's Experiments & Projects => AI Programming => Topic started by: Art on January 15, 2012, 12:46:54 pm