About the new AIML set

DaveMorton · « **on:** August 17, 2011, 01:13:53 am »

I'm not certain how many AIML botmasters there are here (aside from myself, that is), and I have no idea just how "new" this AIML set actually is (I only just discovered it, so it's "new" to me), but I've just given it a good look-see, and I'm quite pleased with it, and here's why:

1.) It seems that they've taken the time and effort to separate ALICE's code from everything else, and placed it in it's own set of discrete files. I know that the earlier AAA set was supposed to do this (to an extent), but this more recent set has done a better job of "pulling ALICE out", saving the botmaster considerable time and effort.

2.) Much of the redundant code has been removed, leaving less bloat and waste. True, this AIML set is larger than the last one, but for a good reason, which is number:

3.) One of the 'best' improvements to this new AIML set is the addition of an AIML translation of MindPixel. For those of you who don't know what MindPixel is, I strongly recommend that you read up on that link I just gave out. Basically, it's a huge list of questions with 'yes' or 'no' answers (and varying degrees of certainty in-between). For even the most dedicated botmaster, this 'database' represents probably a year or more of 'heavy coding' (at least 6 hours per day, 5 days per week) to create. And since I truly suck at coming up with spontaneous content, it would likely take me a lot longer. Granted, some of the questions are somewhat silly ("SHOULD SOFTWARE BE FREE"), and there are a few typos ("SHOULE YOU MEASURE TWICE AND CUT"), but all in all, I see this as a great benefit.

4.) And finally, this new set is a little more up-to-date than previous sets. One of the more tedious chores that a botmaster has is keeping the AIML current, with regards to political leaders and current events. While this set isn't "up to the minute" by any stretch of the imagination, it's far better than the AAA.

One of the plans I have for this new MindPixel data is to add a certain amount of randomness to the responses. Right now, if you were to ask a bot with this set installed "Should software be free", all you'll get, every time, is "I am certain". I think it would seem a lot more "human" to have a more varied answer, personally. Right now I have a category for "yes" that chooses randomly between around 50 affirmative answers. My "no" category has around 30 or so, but is expected to grow (only a couple of the "no" answers are simple re-wordings of a "yes" answer). It's the categories for the varying degrees of certainty ("maybe", it's likely", "I doubt it", etc.) that are going to give me the most grief, but I'm confident that it will certainly stand improved.

Anyway, I thought I would share this.

Bragi · « **Reply #1 on:** August 17, 2011, 11:03:58 am »

I've been taking a look at this mindpixel project. It appears that the raw data (the yes/no questions) can be extracted in plain text file. Has anyone done this? I'd like to use this data to train/test my bot.

Bragi · « **Reply #2 on:** August 17, 2011, 11:09:50 am »

Ok, I found the 'mpexport-0.4-2.i386.rpm' file which appears to contain a large txt file with lots of yes/no questions, but no answers, are there any answers to go with the questions?

DaveMorton · « **Reply #3 on:** August 17, 2011, 11:11:24 am »

I haven't done so, no. From the looks of the files in the MindPixel downloads section, the files are pretty much geared toward *nix systems (.rpm and .deb files), so I haven't bothered with it much. I could, if need be, write a script that extracts the questions/answers from the AIML files and saves the data as a CSV text file.

Bragi · « **Reply #4 on:** August 17, 2011, 11:33:19 am »

Well, I just found a text file of 18megs with nothing but questions, so I suppose those are the questions, but what about the answers?
PS: the files look like linux stuff indeed, but when opened, they appear to be windows (or mixed): you've got 'exe', 'mdb' and 'png', all windows stuff to me.

DaveMorton · « **Reply #5 on:** August 17, 2011, 11:38:13 am »

Well, here's a zip file that contains all of the questions and answers as text files, in CSV format.

http://www.geekcavecreations.com/Downloads/MindPixelQ_A.zip

Bragi · « **Reply #6 on:** August 17, 2011, 01:05:01 pm »

Thanks a bunch GCC. those appear to be the same, but with the answer.

DaveMorton · « **Reply #7 on:** August 17, 2011, 06:40:15 pm »

The answers are at the end of each line, separated by a comma and a space. Most answers are one or two word answers.

About the new AIML set

DaveMorton

About the new AIML set

Bragi

Re: About the new AIML set

Bragi

Re: About the new AIML set

DaveMorton

Re: About the new AIML set

Bragi

Re: About the new AIML set

DaveMorton

Re: About the new AIML set

Bragi

Re: About the new AIML set

DaveMorton

Re: About the new AIML set

Recent Topics

Recent News

Users Online

Articles