Pattern based NLP

  • 28 Replies
  • 66292 Views
*

MikeB

  • Electric Dreamer
  • ****
  • 120
Pattern based NLP
« on: May 24, 2020, 12:16:50 pm »
This is a project I've been working on for a few years. In 2019 I was testing out the theory on Pandora Bots, and this year I'm converting it to C/C++.

The main goal is to be small, fast, solid-state. Not algorithms, deducing/knowledge, learning...

It works by matching singular words, first through spell check, then into another full list of words which assigns each word a token (one physical character/byte symbol). There are approx 20-30 groups of symbols which all words match into. The 20-30 symbols are then used in pattern sentences. There are a few hundred pattern sentences. From each pattern sentence a broad intention can be gathered and this used in chatbot reponses.

The Chatbot responses are fixed to around 50-100 and are represented as a one character/byte symbol. These are used to create duplicate / cross language responses. They can also be voice recorded and assigned to each symbol.

It's aimed at time/space restricted Chatbots, for example in games, where processing needs to happen in milliseconds, but also take into account broad user input.

Size and Response times:
In 2019 in pandora bots (1000 words, 200 sentences) the size is ~500kb, and ~1 second response time.
In 2020 in C/C++ on an Arm Cortex M4 @ 120mhz it's ~159kb and 15-100 milliseconds.
In 2021 in C/C++ on a modern PC @ 2.6ghz with Binary Searching (set up time of 70ms), processing is ~1ms / sentence.

A few features:
 < 500kb including word databases.
 One sentence generally takes less than 1 ms to process.
 Limited chatbot responses make it easy to voice record and/or change personality.
 Private information stripped during word compression (no names/places).
 Fine differentiation of intentions, eg between: Wondering, Questions, and Directions - "can you speak english" "do you speak english" "speak english".
 Can count occurance of emotional words, logical words, burning-analyser words, light-sense words to reply in kind better.

For the problem of chatbots in games or time/space restricted platforms, it solves:
 Too much data or processing power required.
 Cannot change the personality/no personality.
 Cannot change the language/only one language.
 Chatbot escaping the topic due to bad intention reading.
 Chatbot returning bad views / knowledge calculation (only fixed responses allowed).
 Chatbot terrible voice synthesis (can pre-record all fixed responses including randomised duplicate recordings).
 Not being white box/solid state/predictable.
« Last Edit: April 19, 2021, 01:56:25 pm by MikeB »

*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP for games
« Reply #1 on: May 24, 2020, 12:48:43 pm »
I will be working on it throughout this year and aim to make it into an API at least by the end of the year...

So far I have tested the Plain C version on the 2017/2018 Loebner prize questions...

All the words in each question can be changed for slightly different words and still return the same response (randomised differently).. If the same test is run again the responses could change to their random alternate versions...

Quote
Loebner 2018.
   Good afternoon.   - "greetings"
   Do you consent to having this conversation recorded? - "for me, not a problem"
   Have we met before? - "cannot say, good to meet you"
   Do you know how to make toast? - "no, not sure about this" [toast]
   Which languages can you use? - "many many, english, espanol, italiano... what about you?" [only english]
   How many letters are in the word "abracadabra"? - "who am I? Pietro Bembo?"
   Where should one look for love? - "interesting! i don't know" [doesn't know "love"]
   How are you feeling right now? - "I'm very good, and how are you?"
   Who said "I have a dream"? - "no no quotes"
   How old are you? - "thousands... cannot say"
   When might I need to know how many times a wheel has rotated? - "who am I? Fibonacci?"
   What will you do later today? - "just came back from surfing, may go out again!"
   What is the third angle in a triangle with internal angles of 90 degrees and 30 degrees? - "no no maths"
   Who is your favourite artist? - "interesting! many many..."
   What do you hold when you shake hands? - "words are not my profession" [hands]
   Do you have any legs? - "no, words elude me [legs]
   Can you recommend me a film? - "yes, thousands..." [doesn't know "film"]
   Do you understand Winograd Schemas? - "nay, confusing words" [doesn't know winograd schemas]
   If a chicken roosts with a fox they may be eaten. What may be eaten? - "OK. from memory... somebody..."
   I had to go to the toilet during the film because it was too long. What was too long? - "alright. alright. from memory... that thing..."
« Last Edit: April 19, 2021, 01:53:27 pm by MikeB »

*

ivan.moony

  • Trusty Member
  • ************
  • Bishop
  • *
  • 1590
    • contrast-zone
Re: Pattern based NLP for games
« Reply #2 on: May 24, 2020, 09:01:48 pm »
Sounds like a great improvement over current chatbot technology like AIML. What do you plan to do with it?
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

8pla.net

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1263
  • TV News. Pub. UAL (PhD). Robitron Mod. LPC Judge.
    • 8pla.net
Re: Pattern based NLP for games
« Reply #3 on: May 25, 2020, 12:13:13 am »
C Language is a good choice, I think.
My Very Enormous Monster Just Stopped Using Nine

*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP for games
« Reply #4 on: August 07, 2020, 06:49:29 am »
Sounds like a great improvement over current chatbot technology like AIML. What do you plan to do with it?

I'll be trying to integrate it as an Unreal Asset and/or approach a few different people who already do chat interfaces... In some ways it's better than AIML (you don't have to choose between a menu reply system or 10,000 custom responses)... but in other ways it's not very flexible. You have the ~100 fixed phrases, but they must be an alternative of one of the preprogrammed ones... and there's a section for custom reponses, but the input is choosing one of the fixed intentions/topics/perspectives and the output is one of the fixed ~100 phrases.

So you couldn't talk specifically about a product or idea. You'd use a secondary bot that has a list of all the keywords you're looking for, then you could join the intention with those.

*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP for games
« Reply #5 on: August 07, 2020, 06:56:27 am »
C Language is a good choice, I think.

It compiled tiny in C, but I had to move to C++ now to make a windows DLL and get 16-bit wide chars. 400kb  :(

*

squarebear

  • Trusty Member
  • *********
  • Terminator
  • *
  • 849
  • It's Hip to be Square
Re: Pattern based NLP for games
« Reply #6 on: August 07, 2020, 08:35:57 am »
The size and speed in pandora bots (1000 individual words with sentences) is ~500kb, and 1 to 2 seconds response time.
I've not found such a delay. I have a bot with over 350,000 categories and it responds almost instantly. www.kuki.bot
Perhaps you are using AIML in a non standard way?
Feeling Chatty?
www.mitsuku.com

*

8pla.net

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1263
  • TV News. Pub. UAL (PhD). Robitron Mod. LPC Judge.
    • 8pla.net
Re: Pattern based NLP for games
« Reply #7 on: August 07, 2020, 01:14:40 pm »
C Language is a good choice, I think.

It compiled tiny in C, but I had to move to C++ now to make a windows DLL and get 16-bit wide chars. 400kb  :(

Do both then,  C Language and C++...  You may as well.  They are compatible.

And, I would suggest making a Linux version, too, like ChatScript has.


My Very Enormous Monster Just Stopped Using Nine

*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP for games
« Reply #8 on: September 15, 2020, 09:22:31 am »
The size and speed in pandora bots (1000 individual words with sentences) is ~500kb, and 1 to 2 seconds response time.
I've not found such a delay. I have a bot with over 350,000 categories and it responds almost instantly. www.kuki.bot
Perhaps you are using AIML in a non standard way?

I used about 2000 categories, but it re-searches several times. So 10 words can be 2000 x 5 x 10. If it's only 5 words or less it's instant....

*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP for games
« Reply #9 on: September 15, 2020, 10:13:19 am »
Recompiled to C++ DLL, C++ windows console (8bit standard english characters). 250kb

Approx 500 spellcheck words, 1200 words, 100 symbolic sentences, 50 chatbot recognised intentions, 50 chatbot fixed english phrases

1ms response time.

In the image below, the chatbot response is wrong (picking up general "how is your *" instead of "how are you"), but this is what it's like as a demo.

"explain is I/you motion-moving logic-direct" are the uncompressed symbols. One per word...

It's still basically an I-Don't-Know Bot, but the instant intention pickup is useful. You can still talk ON the topic/intention... and the ~50 fixed output phrases means it can all be voice recorded...

« Last Edit: October 12, 2020, 09:23:23 am by MikeB »

*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP
« Reply #10 on: September 25, 2020, 09:29:25 am »
Updated the word searching in Misspelled Words and Tokenise Word lists for a faster way of doing it.

The old way was scrolling through every character in the input sentence for each of the words in the 500 - 1300 word lists.

The new way is basically how people do it:
First: Look at the start character.
Second: Look at the length of the word.
Third: Look at the last character.
Forth: Is it only one character long?
Fifth: Check every character from 2nd to the last.

You break out (or continue;) the loop if any one of those fails. On average it's something like a 1 in 26 shot for the first, 1 in 5 for the second, 1 in 5 for the third, 1 in 5 for the forth...

Seemed to double or triple the speed. A 20 word sentence (2-3ms) now takes 0-1 ms.

Can't have spaces in the words though so will have to make a short "Catchphrase" word list.

*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP
« Reply #11 on: October 12, 2020, 09:16:23 am »
Decided to make it into a full NLP including Thesaurus, Sentiment (like/dislike), Email Spam, Aggressive language detection as well as the Chatbot.

Here's the Thesaurus. Everything takes 0-1ms.

It's fast because the words are already categorised in groups with each other, so it's just a reverse look-up. However it does still need some topic searching because some groups have over 50 words.


*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP
« Reply #12 on: October 26, 2020, 05:50:21 am »
Here's an example of the Spam detection and differentiation.

The differentiation is between the phrases:
"Do not miss out"
"Do not miss out on great fun"
"Do not miss out on great offers"

The Thesaurus also shows all the alternative words ( max 8 ) that could have been used to output the same thing.

The Chatbots response is:
"For what purpose?"
"Ok. Not a problem."
"Gah, no selling. I'm not buying."

Still shifting the words around into different categories. There's now 1500 words (+300).

Also the word searching has been changed again to just "quick search" the first letter (using as few instructions as possible in a tight loop, so it can move onto the next fast), before searching the rest of the word. Also using a rebellious "goto" command to get to the next iteration faster.

Next: Chatbot (Alternate Language output), Language Translate, Tone/Harrassment identification.


*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP
« Reply #13 on: November 26, 2020, 08:28:32 am »
Working on a new utility to handle entry into the Chatbot Decisions file (Handles input from the NLP as tokens I T T P, and outputs S S S speech tokens).


*

MikeB

  • Electric Dreamer
  • ****
  • 120
Re: Pattern based NLP
« Reply #14 on: December 14, 2020, 05:29:14 am »
Added a Start Page/Test Page to the utility.

The NLP processing/debug itself isn't changeable in the utility (word symbolising), that's still left to the console app. The utility is for setting up Chatbots, and some separate Spam and Tone options not related to the chatbot.

Spam detection is symbol based not literal, so synonyms of the word "offers" are all detected together, not just single words. This is multi-language as well.

The Thesaurus is a simple reverse lookup on word and a secondary word-topic so there's no setup apart from how many words to return.

Tone detection is an output of approx 10 levels from light patronising/grooming/objectifying to "i hate everything, all x's are x". Tested this in an early alpha version but is not implemented in the nlp and utility yet.