Ai Dreams Forum

Member's Experiments & Projects => General Project Discussion => Topic started by: frankinstien on March 25, 2021, 04:30:05 pm

Title: Efficiency!
Post by: frankinstien on March 25, 2021, 04:30:05 pm
I'm working on sound processing and using FFT to break the signals into frequency and energy. The computational horsepower to do this is best applied to the GPU even though there are tons of CPU-based solutions out there for FFT. But when you realize how biology does it, where hairs in the cochlea vibrate and the cochlea is tuned for various frequencies the ability to identify the frequencies and energy from each is instant! The moment the sound is captured by the ear it is already identified and accessed to its energy level. With that said perhaps there is a better approach for A.I. hearing where some kind of tuned pipe or pipes identify the frequency and its energy accessed as well. Miniaturizing this is also a challenge.

Does anyone have any ideas?
Title: Re: Efficiency!
Post by: MagnusWootton on March 27, 2021, 04:02:11 pm
Cpus can run lots of fast fourier transforms,  maybe back before we had 3 gigahert quad core cpus you could only have a few running at once, but now you can pretty much spam them into your VST host when your making music.

The mamallian ear is a mystery of nature,   how can we percieve in all directions all sounds from everywhere, just from two diaphram/microphone sources?.    To make a robot my best bet would be to do it more like an eye, and have a separate receptor for every direction.

Another thing, is god could give us "hear vision" if he wanted, and maybe all you need is an ear out front and it can actually see as well,   only god truly knows tho.
Title: Re: Efficiency!
Post by: LOCKSUIT on March 27, 2021, 05:04:51 pm
.......What's wrong with usig a microphone? You should be able to get both volume and pitch per "pixel" timestep.
Title: Re: Efficiency!
Post by: MagnusWootton on March 27, 2021, 08:46:04 pm
Yeh but how do you get all the different sounds separated per direction with just 2 microphones.
Title: Re: Efficiency!
Post by: frankinstien on March 28, 2021, 05:48:27 pm
Yeh but how do you get all the different sounds separated per direction with just 2 microphones.

There is an ability to remove voices from audio tracks with DFT/FFT:
2DFT (https://pseeth.github.io/public/papers/seetharaman_2dft_waspaa2017.pdf)
Extract Vocals (http://cs229.stanford.edu/proj2012/MendezPondicherryYoung-ExtractingVocalSourcesFromMasterAudioRecordings.pdf)

But here's a paper that uses a different approach to sound seperation (https://www.researchgate.net/publication/317725445_A_Multi-resolution_approach_to_Common_Fate-based_audio_separation)

And here are some A.I. approaches (https://venturebeat.com/2020/11/11/googles-soundfilter-ai-separates-any-sound-or-voice-from-mixed-audio-recordings/).

Here are some localization approaches using A.I.
Simulation of Human Ear Recognition Sound
Direction (https://www.degruyter.com/document/doi/10.1515/jisys-2019-0250/htm)

Robotics (https://www.sciencedirect.com/science/article/pii/S0921889016304742)
Title: Re: Efficiency!
Post by: LOCKSUIT on March 28, 2021, 06:55:58 pm
Yeh but how do you get all the different sounds separated per direction with just 2 microphones.

Same way you separate things in an image....they are spaced by timesteps....look a bit to the left and you get another object....an you may even see 2 objects overlayered! :)
Title: Re: Efficiency!
Post by: MagnusWootton on March 28, 2021, 08:48:35 pm
Yeh but how do you get all the different sounds separated per direction with just 2 microphones.

There is an ability to remove voices from audio tracks with DFT/FFT:
2DFT (https://pseeth.github.io/public/papers/seetharaman_2dft_waspaa2017.pdf)
Extract Vocals (http://cs229.stanford.edu/proj2012/MendezPondicherryYoung-ExtractingVocalSourcesFromMasterAudioRecordings.pdf)

But here's a paper that uses a different approach to sound seperation (https://www.researchgate.net/publication/317725445_A_Multi-resolution_approach_to_Common_Fate-based_audio_separation)

And here are some A.I. approaches (https://venturebeat.com/2020/11/11/googles-soundfilter-ai-separates-any-sound-or-voice-from-mixed-audio-recordings/).

Here are some localization approaches using A.I.
Simulation of Human Ear Recognition Sound
Direction (https://www.degruyter.com/document/doi/10.1515/jisys-2019-0250/htm)

Robotics (https://www.sciencedirect.com/science/article/pii/S0921889016304742)

Yes, thats quite a fancy bit of audio engineering there.   the magic filter that separates the instruments from each other is something audio engineers dream about, and its happening now.   quite amazing.
Title: Re: Efficiency!
Post by: frankinstien on March 30, 2021, 08:10:24 pm
Yes, thats quite a fancy bit of audio engineering there.   the magic filter that separates the instruments from each other is something audio engineers dream about, and its happening now.   quite amazing.

I just finished a meeting on MS Teams and had the TV on and noticed how I was able to differentiate the voices from the computer speakers and the TV and it was based on localization. Then I thought that even when you're in a crowded room you can focus on voices or sounds by their direction, so it doesn't matter if one is whispering while there are other sounds or noises around because we can detect the location of a sound we can focus on just that information. FFT does have the ability to do phase-detection so if I drop everything other than the phase component of the sound that is of interest it should allow for better speech recognition as well as sound recognition.

This is why I'm migrating a DFT/FFT library to OpenCL, I need to do all kinds of fancy stuff to sound. I have seen some sites that have OpenCL code but no real examples of it working, but if someone here has seen such material please do not hesitate to post it here.
Title: Re: Efficiency!
Post by: MagnusWootton on March 31, 2021, 07:18:44 am
By the look of the deep learning going into it these days,  maybe its more in the mind in the ear,  but just shows, once u have a full intelligence, u only need a simple ear...
Title: Re: Efficiency!
Post by: infurl on March 31, 2021, 07:24:50 am
This is why I'm migrating a DFT/FFT library to OpenCL, I need to do all kinds of fancy stuff to sound. I have seen some sites that have OpenCL code but no real examples of it working, but if someone here has seen such material please do not hesitate to post it here.

Maybe you will find someone knowledgeable on https://www.reddit.com/r/OpenCL/ (https://www.reddit.com/r/OpenCL/).
Title: Re: Efficiency!
Post by: MikeB on April 01, 2021, 07:37:15 am
Many microphones are already tuned for voice (300hz-3000hz), so the louder the pickup the more likely it's voice...

A software equaliser can adjust the dba for that range, but uses 1-5% cpu on a 3ghz cpu.