Ai Dreams Forum
Artificial Intelligence => AI News => Topic started by: MikeB on February 09, 2022, 01:32:22 pm
-
“By combining visual cues, such as the movement of the lips and teeth during speaking, along with auditory information for representation learning, AV-HuBERT can capture nuanced associations between the two input streams efficiently even with much smaller amounts of untranscribed video data for pretraining,” Meta AI’s researchers explained.
https://siliconangle.com/2022/01/07/meta-ai-built-speech-recognition-platform-relies-visual-cues-filter-background-noise/ (https://siliconangle.com/2022/01/07/meta-ai-built-speech-recognition-platform-relies-visual-cues-filter-background-noise/)