Pattern based NLP & ASR

  • 66 Replies
  • 278509 Views
*

8pla.net

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1302
  • TV News. Pub. UAL (PhD). Robitron Mod. LPC Judge.
    • 8pla.net
Re: Pattern based NLP
« Reply #45 on: August 15, 2022, 12:57:58 pm »
Quote
Current phoneme animations in games are either pre-calculated to pre-recorded audio, or random mouth movements.

What is a phoneme animation?   Do you mean a mouth posture animation?   
My guess is, yes, since you mentioned these are game animations.

Phonemes and visemes are closely related.  Visemes are graphics, phonemes are audio
of the same speech synthesis.   However, one difference is that a single mouth posture
(a viseme) may look the same ( be reused ) for a few different phoneme sounds.



My Very Enormous Monster Just Stopped Using Nine

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #46 on: August 16, 2022, 08:55:10 am »
Yes I mean mouth pose animation.

I have a few written, but it's a little bit down the to-do list.

The tongue can technically be animated as well (to individualise animations to each of the 20-40 phonemes) but I'm not sure how visible that is for the effort spent....

*

8pla.net

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1302
  • TV News. Pub. UAL (PhD). Robitron Mod. LPC Judge.
    • 8pla.net
Re: Pattern based NLP
« Reply #47 on: August 16, 2022, 03:22:19 pm »
Some are so similar, you may design a set with a fewer number of visemes that still works good.
 
https://www.youtube.com/watch?v=6c1WsMuhpFo

Of course, there is nothing wrong with doing all of the visemes, if that is your goal.
My Very Enormous Monster Just Stopped Using Nine

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #48 on: August 17, 2022, 07:31:59 am »
I plan to do it minimalistically at first, but I'm also planning for max quality...

I actually have two K's and two T's recorded right now - one for highs, one for lows. They both make the same basic mouth shape, but the higher toned version also activates the cheek muscles.

EG.
"K" as in "kay" = 3000hz.
"K" as in "kee" = 4000hz = cheek muscles.

I think vowels can be a little muddled, but plosive consants (k, t, p, d, f...) can improve quality a lot if done well.

Great video
« Last Edit: August 17, 2022, 10:27:15 am by MikeB »

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #49 on: September 09, 2022, 10:02:23 am »
This month was Word Lists, Word Searching, converting to a library format, and running the main speech processor in a separate thread.

Word Lists are small. As it doesn't use prediction there's no way to tell the difference between homophones using singular words. "Close" and "claws" may be the same word in two different accents, also, so it's better if the user knows which word they want (deliberately picks one). If the list contains "close" but not "claws" then vowel swapping ("aw" to "oh") can correct it. Another issue is silent/quiet seconday syllables (eg. the "s" on the end of "close") - In this case, as words are written as syllables, "clo" can be saved as "close" for a boost in detection,... In future, user filtering should trim the list beforehand. I think that's better as when you're facing a door, "open" or "close" are the two main words you're looking for, not "clone" and similar.

Word Searching uses binary searching. Each word may take one to ten spaces (high/low versions of the same consonant, accented vowels), so a list of 10 words may be 100 records.

Library format, and running the main speech processor in a separate thread.
Now using pThread to run the main processing loop in a new thread. Much more user friendly as main() now only contains new Phoneme checking (for mouth animations), new Words checking, new Phrases checking.

Next month I want to get a solid bunch of action words working such as: Left, right, up, down, start, stop, open, close, yes, no, hi, hello, bye. one, two, three, four...

So that will allow me to test/refine frequencies & word lists further.

*

MagnusWootton

  • Replicant
  • ********
  • 634
Re: Pattern based NLP
« Reply #50 on: September 09, 2022, 04:29:27 pm »
K-nearest neighbour is a statistical analysis/learning method, and this is the reason why speech recognition always takes at least 0.5 seconds to process, and why speech recognition has never progressed since the 1950's (except for different types of guessing algorithms)...

My approach is halfway between IBM shoebox and modern speech recognition.

Frequencies above & below 1000hz are split, ultimately run through an FFT, then fast frequency analysis performed. Nothing else. The NLP (for homophones, missing words...) uses Compression Pattern Matching.

Speech recognition that can handle all words in all languages are one thing, but an "as fast as possible" approach is still needed in society...

Sorry for being a bit late,  but k-nearest is only slow if the database of ids is too high,  if u keep under 100 it goes extremely quickly.

just full linear test every one in the database and its really easy to code it as well, no skill job thats for sure.

the fft actually goes a little slow,  back in the olden days too many fft effects in fruity used to slow you down,  these days its alot different tho,  goes alot faster with the 4 cores on the cpu,  I think thats why.
Doing it fast would be staying in the time domain, or at least making it some sparse fft call, not every frame.

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #51 on: September 10, 2022, 08:47:51 am »
It may help. After I finish everything and need more accuracy...

All speech recognition I've seen look for ultra precise frequencies (large FFT data set), and use large lookup tables. This takes the longest as I've never seen speech rec faster than ~0.5sec.

I've already done it the hard way using low res FFT and watching transitions. I have a table of 23 vowels plus accented versions, so the only place I would add it is in helping "alias" the input frequency & transition data to the vowel/consonant tables better.

It's weak to nasally/vague and wavering voice... but i'd rather not guess at accents, etc I want to use method for that.

*

MagnusWootton

  • Replicant
  • ********
  • 634
Re: Pattern based NLP
« Reply #52 on: September 10, 2022, 06:00:00 pm »
Maybe u can sum up the adjacent samples, kinda like a low pass filter, but dont divide it back down, and it will speed it up a huge amount,  that hint was given to me on Goertzels singularity channel by a person there.

Your project is cool,  I like all recognition projects, really good and practical. really works, kinda magic watching the computer do it.

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #53 on: November 06, 2022, 07:00:55 am »
Taking a few months to learn Unity.

A few small updates:
  • Phoneme-Word lists are improved. Combined 'high' and 'low' version of consonants, and differently accented vowels through list indexing. Phoneme-Word lists now contain one entry per word, unless split by syllables.
  • Added a 'sudden ear sensitivity' boost to voice in the first two consonant frames (16, 32ms. 1000-5500hz), 5x normal values.
  • Vol Peak Normalisation fixed.
  • Both the 5x boost to consonant frames and the Vol Peak Norm changes improved detection of "cake" to 1 in 2 (1 in 3 previously). Currently, problems detecting "r". Vowels much improved.

Future:
  • Fixing basic words in order to have phrase detection. A variety of single words.
  • Testing with recorded voice (streamers playing Lifeline, Bot Colony). Wakeword/Keyword spotting benchmark samples.

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #54 on: January 14, 2023, 08:22:30 am »
I returned to the NLP as it's a major part of the Speech Recognition, and the Chatbot component as a whole.

Previously in the NLP, there were ~3500 words and ~1000 pattern sentences. This only solved 550-1000 sentences in the WiC benchmark as the words weren't set up correctly. Word grouping was very basically split.

Now the Grammar interpretation/word-compression groups are set up as originally intended. 39 total groups now for all words to compress into (3-4x).

So currently, with ~2200 words and only 16 pattern sentences: 165(+) sentences are solved in the WiC benchmark. So already, there's a lot less false positives and the patterns are more targetting the sentences they're intended for.

Max pattern sentences will be around 1000, and the goal is to solve around 70-75% of the 5500 sentence benchmark. So hopefully that's possible.

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #55 on: February 06, 2023, 08:48:17 am »
Up to 2900 words and 700 WiC sentences and ran into a problem.

The WiC test itself isn't specific about what "context" is. Ie. The same word selected in two sentences - do they have the same "context".

Before I matched for different Intentions. The words before and after the highlighted word. Eg. (1) "for one person to do". (2) "each person is unique". Different intentions, but same word meaning.

Now it appears to mean Homophones.. So in any case where the word is the same it's a match.

So I'm switching to two lists. One for Homophones. One for Intentions. The Intentions list is for chatbots / phrase meaning comparison.

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #56 on: March 07, 2023, 10:22:39 am »
NLP:
  • Fully converted all words into correct groups (~3000 words in 49 total word groups).
  • There are currently 11 Homophone/sentence groups for the WiC test (~250 sentences). Eventually will be around 50 groups, 1000 sentences.

Only testing for a good WiC score right now. It solves a lot for the short amount of groups/sentences, and it's all "as intended" now, so I just need to go through the sentences (with homophones in mind) for a good score.

Speech rec:
  • Updated Hilbert & FIR filters.
  • Updated Fletcher-Munson Curve/Equal loudness calibration.
  • Added loading from a .wav file.
  • Refreshed grouped frequencies under 1000hz. (407hz, 500hz, 594hz, 813hz, 1000hz...)
  • Refreshed all vowel frequencies (28), and added a few non-plosive consonants.
  • Removed Sudden Sensitivity Boost for the first plosive frames.

Loading from a .wav file substancially increased reliable testing, so I'm now refreshing everything.

Formant groups will be changed from three to six: 0-375hz, 375-1000hz, 800-2500hz, 2500-3500hz, 3500-4500hz, 4500-6000hz, 6000-8000hz.

Most vowels only use up to 2500hz. This is perfect, except for "a", "e", "i" short vowels which use up to 3500hz. The difference (when spoken) is whether your cheeks are activated or not (showing teeth). So accounting for both (under and over 2500hz separately), "a", "e", "i" can now have low and high versions represented in viseme animations. One with cheeks activated, one without. There are also a few long vowels which can have the same low & high versions. "ee", "ay", "ew", "ow", "oi", "ier", "uah".

Definetely will also be using some kind of MFCC to un-fuzz low frequencies post FFT.

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #57 on: April 13, 2023, 02:55:51 pm »
Speech Rec:
  • Added Frequency Spectrogram to visually output frequency groups/blocks.
  • Removed Hilbert @ 2000hz filter. It was causing some frequencies to be inconsistent.
  • Removed FIR LP @ 1000hz and split samples, split FFT. It did improve frequencies slightly, but not enough to warrant the extra processing time and code complexity.
  • Updated Fletcher-Munson Curve/Equal loudness.
  • Updated frequency group ranges.
  • Updated/fixed a bug with FFT output.
  • Updated Vowel recognition.
  • Now integrating into a phoneme extraction tool.
  • Still no MFC.
  • Faster (1-5ms per syllable).

Combined vowels "i" and "ee":
It's unable to tell the difference between the short vowel "i" and long vowel "ee", as the FFT decimates frequencies too much to reliably detect a 20hz frequency drop. An MFC setup only for low frequencies may help,... but the real solution is an FFT with higher resolution in that area (and less resolution in upper frequency areas).

Other errors:
There is an error with the hamming window, sine wave sync/frame, or FFT. (Vertical grey bars in the spectrogram). Otherwise it's looking good.

Testing sounds "kay", "key","kai".


*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #58 on: May 12, 2023, 04:16:07 pm »
Speech Rec / Phoneme Extraction Tool:
  • Conversion of main processing code from C to C# (50%).
  • Wav file loading (8/16/24/32bit PCM, IEEE 32bit float)
  • GUI: Better Audio wave sample display (dB x time)
  • GUI: Loudness Curve - movable points
  • GUI: other
  • Loading FFT with a Hann window - grey vertical bars gone in spectrogram.

*

MikeB

  • Autobot
  • ******
  • 219
Re: Pattern based NLP
« Reply #59 on: June 20, 2023, 11:40:38 am »
Speech Recognition / Phoneme Extraction Tool:

Conversion of main processing code (including FFT) from C to C# - 100%.
GUI/code: Loudness Curve
GUI/code: Phoneme output
GUI
Updated frequency group definitions (312, 468, 562, 656, 750, 875, 1000, 1333, 1666, 2000, 2500, 3000, 4000, 5313, 8000)
Updated equal-loudness
New equal-loudness calibration for the first 16ms frame (+50%)
Removed last FIR filter @ 8000hz. Did improve frequencies (loudness) slightly but not enough to justify the processing time.
Combined Find Zero-Crossing function with Noise Find. Speed & quality improvement.
Replaced Hann window with a custom wide lobe cosine window/Inverse Blackman window.
Improved syllable finding
Updated Wav file loading

Satisfied now with the "Kay, Kee, Kai" test.

Next target is finding "yes" & "no". This is part of the Google Speech Commands (Keyword Spotting) benchmark.


 


OpenAI Speech-to-Speech Reasoning Demo
by ivan.moony (AI News )
March 28, 2024, 01:31:53 pm
Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am
Nvidia Hype
by 8pla.net (AI News )
December 06, 2023, 10:04:52 pm
How will the OpenAI CEO being Fired affect ChatGPT?
by 8pla.net (AI News )
December 06, 2023, 09:54:25 pm
Independent AI sovereignties
by WriterOfMinds (AI News )
November 08, 2023, 04:51:21 am
LLaMA2 Meta's chatbot released
by 8pla.net (AI News )
October 18, 2023, 11:41:21 pm

Users Online

329 Guests, 0 Users

Most Online Today: 396. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles