This is a project I've been working on for a few years. In 2019 I was testing out the theory on Pandora Bots, and this year I'm converting it to plain C.
The main goal is to be as small as possible, solid-state (no algorithms, learning, knowledge calculation), fast as possible, multi-language.
It converts pattern-matched singular words into a symbolic/tokenised word to be matched again in symbolic sentences. They are then assigned an intention/topic/perspective. The chatbot code uses the Intention/topic/perspective to associate fixed response/s (with randomisation).
Everything from the spell check to the word-to-symbol tokeniser, and the sentence pickups, chatbot responses are all staged pattern matching.
The size and speed in pandora bots (1000 individual words with sentences) is ~500kb, and ~1 second response time. In Plain C (on an Arm Cortex M4 @ 120mhz) it's ~159kb and 15-100 milliseconds. After spell check and tokenisation, all actions are generally less than 1 ms.
Key features
-Less than 500kb including all word databases.
-Millisecond fast.
-100 max fixed chatbot responses make it easy to voice record and/or change personality.
-Private information automatically stripped during word compression (names of places and things).
-Native differentiation of Wondering, Questions, and Directions - "can you speak english" "do you speak english" "speak english".
The intention pickup allows you to write a general chatbot response (non specific) of a few varieties with confidence, and not have to look at the backend.
There is some short term memory for handling puzzles "If I did this, then what is this?" but as there's no knowledge reflection, it can still 1) tell it's a question, 2) scan for the topic, and 3) count logical words as opposed to emotional words. And be relevant that way.
So this solves the problem of chatbots in games due to:
-Too much data or processing power required.
-Cannot change the personality/no personality.
-Cannot change the language/only one language.
-Cannot acknowledge the user/escapes the topic.
-Wrong views or bad knowledge calculation.
-Cannot record audio/terrible voice synthesis.
-Not being white box/solid state/predictable.