This is a project I've been working on for a few years. In 2019, I tested it on Pandora Bots. This year I'm converting it to C/C++.
The main goal is to be small, fast, and white-box, instead of algorithms, knowledge, self-learning.
Broadly it works as a word and sentence compressor.
Words are matched to a pre-defined list of words in a lookup table. Matches return one 8-bit character symbol for sentence matching, and two other 8-bit symbols for context and uniqueness. [March 2023] There are 49 total word groups for the first symbol, and up to 256 for the second and third individually.
Sentences made of 4-10 one character symbols, where each symbol is 1 of 49 options, each containing hundreds to thousands of words. This means each sentence can detect hundreds of millions of sentences which contain similar meaning. These sentences are grouped and stored in a pre-defined list which compress the sentence to an intention symbol.
For a chatbot, the developer may use the one-character intentions, with multiple one-character word context in pre-defined lists to cover practically all possible spoken interactions, with a good level of attentiveness to the original sentence and a deliberate white-box response. Output can be further modified using another lookup table to output randomness in text/audio response. A good number of literal responses to cover a broad range of sentence intentions is 50-100. This makes changing chatbot personality very easy.
Size and Response times:
In 2019 in pandora bots (1000 words, 200 sentences) size is ~500kb, speed ~1 second response.
In 2020 in C on an Arm Cortex M4 @ 120mhz , total size is ~159kb, speed 15-100 ms / sentence.
In 2021 in C on a PC @ 2.6ghz with Binary Searching (set up time of 70ms), speed ~1ms / sentence.
Other chatbot features:
Total size is < 500kb including word databases.
One sentence generally takes less than 1 ms to process.
Limited chatbot responses make it easy to record an actors voice and change personality.
Private information is stripped during word compression (words that aren't in the pre-defined list are lost and non-recoverable).
Fine differentiation of intentions, eg between: Wondering, Questions, and Directions - "can you speak english" "do you speak english" "speak english".
Can count occurance of emotional words, logical words, burning-analyser words, light-sense words to reply in kind better.
For the problem of chatbots in experiences/games and/or cpu restricted platforms, it solves
Too much data or processing power required.
Cannot change the personality/no personality.
Cannot change the language/only one language.
Chatbot escaping the topic due to bad intention reading.
Chatbot returning bad views / knowledge calculation (only pre-determined responses).
Chatbot terrible voice synthesis (a voice actor can record all lines including random alternates).