Evaluating open-source TTS engines

Ultron · « **on:** February 22, 2015, 06:21:18 pm »

I have been recently searching for open-source TTS engines which I can use with my A.I. I have found quite a few and tested them with suggested 'tricky' sentences. I guess the best candidates were:

http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html
-Try voice 'Peter' from HTS(2011) - Combilex group

http://www.digitalfuturesoft.com/dfttssdk.php
-This one also seems interesting. Take a listen:
http://www.digitalfuturesoft.com/voicedemos/neosppaul.wav

Note that these are old project that have not quite been maintained but judging from that I'd say they are still pretty solid engines.

My favorite remains IVONA. You can use it for free (at least it's voices, but not sure about commercial usage) with your C/C++ project via Microsoft Speech API (MS SAPI). The reason I am not using this is because I do not like the API provided by Microsoft, maybe because of it's seemingly complex structure and the requirement of adding a lot of code to your files.

P.S. My personal favorite is IVONA's Brian (English - Male) voice. It is the closest to Jarvis I have ever heard.

ranch vermin · « **Reply #1 on:** February 23, 2015, 12:24:35 am »

Talk about bum libraries by microsoft, try windows media foundation, before you learn that i need a full understanding of direct show, and then make sure your com and direct x expert, then you can go throw them both in the junk and just use avistream (made in ~1996) and be done with it.

I recommend go putting your sound engineering hat on and start from scratch with harmonics, i bet theres some really bad ass robotic sounding voice patterns that come straight out of raw implementation (what always happens) for some killer evil muther.

8pla.net · « **Reply #2 on:** February 23, 2015, 02:58:48 am »

Here, I am running an open source JavaScript port of eSpeak: http://www.chatbots.tk

Art · « **Reply #3 on:** February 23, 2015, 10:29:34 am »

Check out Charles who I've heard as a somewhat sarcastic butler on some websites.

Other good ones are here as well:
http://www.speaktext.com/downloadtts.htm

Ultron · « **Reply #4 on:** February 23, 2015, 08:36:45 pm »

Anyone else getting the idea to create an A.I. that develops and synthesizes it's own human-like voice? Possibly based on and developed through listening to many different conversations between humans.

Well you must admit it's an interesting idea so I'm storing it in my locker. Maybe one day...

Art · « **Reply #5 on:** February 24, 2015, 01:16:24 am »

There is a company that will allow you to speak using your own voice then construct a TTS voice based on it (your voice).

How would a robot know what sounded suitable or not regarding a voice selection? Based on frequencies or just raw sampling?
What if it liked some parts of a female's speech and some from a male. Could prove to be an interesting result unless some guidelines were in place, then again, that would be exerting a degree of control over the bot and that control is something that a lot of practitioners would like to avoid.

Then again, it was your dream.

Ultron · « **Reply #6 on:** February 24, 2015, 11:50:16 am »

Art - I don't know, and that's what makes it a good idea! The point is to observe and learn - this is why chemists carry out experiments.
We would learn a lot from such an experiment - sadly, this is somewhat complicated to do if you understand the idea to the depth that I do. Actually I might attempt this, but sadly I do not live in a country that speaks English and I would also have to make it a portable robot thingy if I want it to learn by itself. Or just make it listen to radio... yea that's easier...

Art · « **Reply #7 on:** February 25, 2015, 12:35:53 am »

Ahh...so if you allowed it to listen to say, a BBC radio station for a decent period of time, it might adapt or adopt (as the case may be), an English speaking (Queen's English, not American English) voice? Same for listening to an Australian station? A male voice talk show program or perhaps a female only talk show (if they exist in your country)?

That would certainly prove to be interesting.

I had a chatbot running some time back and while it was "listening for speech" from me, I decided to try something else. I picked up my guitar and played a short intro of notes. The chatbot suddenly spoke saying, "I don't think I've ever heard that sound before."

I was practically floored! It didn't say, "What was that?" or some such inquiry but the fact that it recognized the notes as music or musical sounds.

I no longer have that particular bot as it crashed during an online upgrade and I never got it to run after that but I was still impressed.

UltraHal, listens for speech and basically ignores music, be it chords or notes. I would imagine it would depend on how the thresholds are established and recognized by the receiver. More stuff to ponder...

ranch vermin · « **Reply #8 on:** February 25, 2015, 05:22:52 am »

To hear music it would help to hear things that happen at the same time, and parallel class. like a drum kit and a guitar. then you could go fill out your computer memory with a bit more discovered.

Ultron · « **Reply #9 on:** February 25, 2015, 10:30:34 pm »

Now Art got me thinking about the fundamentals of how we differentiate between music and talking. You might imagine it is simple, but think about singing, or better yet, rap. Seeing many robots from the future getting confused there...

Art · « **Reply #10 on:** February 26, 2015, 12:39:25 am »

Yeah but if someone develops a rapping robot you can rest assured that I'll be changing the channel!!

Ultron · « **Reply #11 on:** February 26, 2015, 11:50:07 am »

You twisted my words but made a good point xD

Don Patrick · « **Reply #12 on:** February 26, 2015, 01:28:51 pm »

Patterns, I think. There are regular timing/frequency patterns to music, even in rap (which I wouldn't call music). The syllables and pronounciation in speech have no regular recurring patterns over a certain range. For a more simple distinction, music and singing tends to have long, stretched tones at regular intervals, while tones of speech change at the rate of machine gun fire. Sooo... if I were to program it I'd map out frequencies of frequencies and have the program comment on my terrible taste in music just like KITT from Knight Rider.

Interesting tangent. Might use it to have the computer pick out certain mood music, but I don't have any coding libraries to pick sound files apart for frequency analysis.

Data · « **Reply #13 on:** February 26, 2015, 02:01:56 pm »

I will just jump in here and mention that there are now a few singing voices available or ways to make a computer sing.

http://www.virsyn.de/en/E_Home/e_home.html

http://www.virsyn.de/Demo/CANTOR2/BicycleForTwo.mp3

or

There are others too.

Freddy · « **Reply #14 on:** February 26, 2015, 02:12:33 pm »

Nice links Data, Daisy sounds really good. Cantor looks interesting too.

I'm tempted to have another play with FL studio now.

Evaluating open-source TTS engines

Ultron

Evaluating open-source TTS engines

ranch vermin

Re: Evaluating open-source TTS engines

8pla.net

Re: Evaluating open-source TTS engines

Art

Re: Evaluating open-source TTS engines

Ultron

Re: Evaluating open-source TTS engines

Art

Re: Evaluating open-source TTS engines

Ultron

Re: Evaluating open-source TTS engines

Art

Re: Evaluating open-source TTS engines

ranch vermin

Re: Evaluating open-source TTS engines

Ultron

Re: Evaluating open-source TTS engines

Art

Re: Evaluating open-source TTS engines

Ultron

Re: Evaluating open-source TTS engines

Don Patrick

Re: Evaluating open-source TTS engines

Data

Re: Evaluating open-source TTS engines

Freddy

Re: Evaluating open-source TTS engines

Recent Topics

Recent News

Users Online

Articles