Ai Dreams Forum

Software & Hardware => Sound Software => Topic started by: Ultron on February 22, 2015, 06:21:18 pm

Title: Evaluating open-source TTS engines
Post by: Ultron on February 22, 2015, 06:21:18 pm: I have been recently searching for open-source TTS engines which I can use with my A.I. I have found quite a few and tested them with suggested 'tricky' sentences. I guess the best candidates were:

http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html (http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html)
-Try voice 'Peter' from HTS(2011) - Combilex group

http://www.digitalfuturesoft.com/dfttssdk.php (http://www.digitalfuturesoft.com/dfttssdk.php)
-This one also seems interesting. Take a listen:
http://www.digitalfuturesoft.com/voicedemos/neosppaul.wav (http://www.digitalfuturesoft.com/voicedemos/neosppaul.wav)

Note that these are old project that have not quite been maintained but judging from that I'd say they are still pretty solid engines.

My favorite remains IVONA. You can use it for free (at least it's voices, but not sure about commercial usage) with your C/C++ project via Microsoft Speech API (MS SAPI). The reason I am not using this is because I do not like the API provided by Microsoft, maybe because of it's seemingly complex structure and the requirement of adding a lot of code to your files.

P.S. My personal favorite is IVONA's Brian (English - Male) voice. It is the closest to Jarvis I have ever heard.
Title: Re: Evaluating open-source TTS engines
Post by: ranch vermin on February 23, 2015, 12:24:35 am: Talk about bum libraries by microsoft, try windows media foundation, before you learn that i need a full understanding of direct show, and then make sure your com and direct x expert, then you can go throw them both in the junk and just use avistream (made in ~1996) and be done with it.

I recommend go putting your sound engineering hat on and start from scratch with harmonics, i bet theres some really bad ass robotic sounding voice patterns that come straight out of raw implementation (what always happens) for some killer evil muther.
Title: Re: Evaluating open-source TTS engines
Post by: 8pla.net on February 23, 2015, 02:58:48 am: Here, I am running an open source JavaScript port of eSpeak: http://www.chatbots.tk (http://www.chatbots.tk)
Title: Re: Evaluating open-source TTS engines
Post by: Art on February 23, 2015, 10:29:34 am: Check out Charles who I've heard as a somewhat sarcastic butler on some websites.

Other good ones are here as well:
http://www.speaktext.com/downloadtts.htm (http://www.speaktext.com/downloadtts.htm)
Title: Re: Evaluating open-source TTS engines
Post by: Ultron on February 23, 2015, 08:36:45 pm: Anyone else getting the idea to create an A.I. that develops and synthesizes it's own human-like voice? Possibly based on and developed through listening to many different conversations between humans.

Well you must admit it's an interesting idea so I'm storing it in my locker. Maybe one day...
Title: Re: Evaluating open-source TTS engines
Post by: Art on February 24, 2015, 01:16:24 am: There is a company that will allow you to speak using your own voice then construct a TTS voice based on it (your voice).

How would a robot know what sounded suitable or not regarding a voice selection? Based on frequencies or just raw sampling?
What if it liked some parts of a female's speech and some from a male. Could prove to be an interesting result unless some guidelines were in place, then again, that would be exerting a degree of control over the bot and that control is something that a lot of practitioners would like to avoid.

Then again, it was your dream. O0
Title: Re: Evaluating open-source TTS engines
Post by: Ultron on February 24, 2015, 11:50:16 am: Art - I don't know, and that's what makes it a good idea! The point is to observe and learn - this is why chemists carry out experiments.
We would learn a lot from such an experiment - sadly, this is somewhat complicated to do if you understand the idea to the depth that I do. Actually I might attempt this, but sadly I do not live in a country that speaks English and I would also have to make it a portable robot thingy if I want it to learn by itself. Or just make it listen to radio... yea that's easier...
Title: Re: Evaluating open-source TTS engines
Post by: Art on February 25, 2015, 12:35:53 am: Ahh...so if you allowed it to listen to say, a BBC radio station for a decent period of time, it might adapt or adopt (as the case may be), an English speaking (Queen's English, not American English) voice? Same for listening to an Australian station? A male voice talk show program or perhaps a female only talk show (if they exist in your country)?

That would certainly prove to be interesting.

I had a chatbot running some time back and while it was "listening for speech" from me, I decided to try something else. I picked up my guitar and played a short intro of notes. The chatbot suddenly spoke saying, "I don't think I've ever heard that sound before."

I was practically floored! It didn't say, "What was that?" or some such inquiry but the fact that it recognized the notes as music or musical sounds.

I no longer have that particular bot as it crashed during an online upgrade and I never got it to run after that but I was still impressed.

UltraHal, listens for speech and basically ignores music, be it chords or notes. I would imagine it would depend on how the thresholds are established and recognized by the receiver. More stuff to ponder...
Title: Re: Evaluating open-source TTS engines
Post by: ranch vermin on February 25, 2015, 05:22:52 am: To hear music it would help to hear things that happen at the same time, and parallel class. like a drum kit and a guitar. then you could go fill out your computer memory with a bit more discovered.
Title: Re: Evaluating open-source TTS engines
Post by: Ultron on February 25, 2015, 10:30:34 pm: Now Art got me thinking about the fundamentals of how we differentiate between music and talking. You might imagine it is simple, but think about singing, or better yet, rap. Seeing many robots from the future getting confused there...
Title: Re: Evaluating open-source TTS engines
Post by: Art on February 26, 2015, 12:39:25 am: Yeah but if someone develops a rapping robot you can rest assured that I'll be changing the channel!! :knuppel2: :2funny:
Title: Re: Evaluating open-source TTS engines
Post by: Ultron on February 26, 2015, 11:50:07 am: You twisted my words but made a good point xD :2funny:
Title: Re: Evaluating open-source TTS engines
Post by: Don Patrick on February 26, 2015, 01:28:51 pm: Patterns, I think. There are regular timing/frequency patterns to music, even in rap (which I wouldn't call music). The syllables and pronounciation in speech have no regular recurring patterns over a certain range. For a more simple distinction, music and singing tends to have long, stretched tones at regular intervals, while tones of speech change at the rate of machine gun fire. Sooo... if I were to program it I'd map out frequencies of frequencies and have the program comment on my terrible taste in music just like KITT from Knight Rider.

Interesting tangent. Might use it to have the computer pick out certain mood music, but I don't have any coding libraries to pick sound files apart for frequency analysis.
Title: Re: Evaluating open-source TTS engines
Post by: Data on February 26, 2015, 02:01:56 pm: I will just jump in here and mention that there are now a few singing voices available or ways to make a computer sing.

http://www.virsyn.de/en/E_Home/e_home.html (http://www.virsyn.de/en/E_Home/e_home.html)

http://www.virsyn.de/Demo/CANTOR2/BicycleForTwo.mp3 (http://www.virsyn.de/Demo/CANTOR2/BicycleForTwo.mp3)

or

How To Create Computerized Vocals Without Vocalist (http://www.youtube.com/watch?v=TBpiHtsqhHk#ws)

There are others too.
Title: Re: Evaluating open-source TTS engines
Post by: Freddy on February 26, 2015, 02:12:33 pm: Nice links Data, Daisy sounds really good. Cantor looks interesting too.

I'm tempted to have another play with FL studio now.
Title: Re: Evaluating open-source TTS engines
Post by: Art on February 26, 2015, 09:32:46 pm: When one considers that a song is nothing more than a verse with added music & possibly percussion.

Most songs have certain patterns like with different types of poetry have their patterns (iambic pentameter, etc.).

Early on the ballads were the most popular form of musical poetry and the development went forth and multiplied (not all for the better I might add).

Today, practically every song, musically speaking, has remnants of some other song, perhaps from some other time, but those notes are inescapable to the trained ear picking them up and quickly sorting through the brain in an attempt to locate the original source. The eureka moment arrives as he says, "Yes! 'Country Roads' by John Denver uses a lot of the same chords as 'Let it Be', by the Beatles!!"
True, not quite the same tempo but yes, the same chords for the similar melody. Therein lies this "pattern" we speak of.

So now the TTS voices use pattern recognition for understanding and speaking and emoting and now singing.

Eventually, within the framework of our world, we will see someone singing and have to ask, I wonder if that is a real person and if it is, is that person really singing? or is it a digital rendition?

Step right up ladies and gents, you can't tell the real ones from the copies, I tell you, it's the greatest show on Earth! :o ;)
Title: Re: Evaluating open-source TTS engines
Post by: Ultron on February 26, 2015, 11:37:59 pm: Interesting find Data. I too like FL studio and CANTOR seems cool.

Anyways, I suggested an idea in which a robot / A.I. finds a way to identify songs and differentiate them from regular speech, as well as synthesize it's own speech and voice. But since you guys sub-consciously volunteered to be my test subjects and derive the algorithms for this on your own, fine. :P

I am not sure how an A.I. would distinguish between music (vocal) and speech, but if we can do it, so can robots. If we knew how to and already had algorithms and programs that do this (as accurately as we do) then this experiment wouldn't be as interesting. But an A.I. should be able to find a nearly identical algorithm to our natural 'algorithm', and it should be more accurate than anything we can think of.

And yes patterns are everything. Patterns patterns patterns...
Title: Re: Evaluating open-source TTS engines
Post by: paphus on February 27, 2015, 01:52:56 am: Some TTS links, and comments re: AI learning to speak,

If you are using Java, I have found the Mary TTS project to be the best.

http://mary.dfki.de/ (http://mary.dfki.de/)

It has several quality voices in several different languages.
It is what we currently use on BOT libre. We also offer free web based TTS from BOT libre web API,

see,
http://www.botlibre.com/forum-post?id=566608 (http://www.botlibre.com/forum-post?id=566608)

Android and iOS support voice APIs and HTML 5 has a voice API, that Chrome now support.

see,
http://updates.html5rocks.com/2014/01/Web-apps-that-talk---Introduction-to-the-Speech-Synthesis-API (http://updates.html5rocks.com/2014/01/Web-apps-that-talk---Introduction-to-the-Speech-Synthesis-API)

Someone asked about having an AI learn how to speak from listening to conversation. I have also considered that, and think it is a great idea for both TTS and voice recognition. Currently TTS and voice recognition are treated as different problems, but I would consider that they are the same problem, and TTS and voice rec built from first principles as learning software would be the best solution. It would produce something that could learn any accent, any language, perhaps understand sound in general... not sure exactly how to get started though...
Title: Re: Evaluating open-source TTS engines
Post by: 8pla.net on February 27, 2015, 01:56:25 am: Thanks for jumping in, Data.
Title: Re: Evaluating open-source TTS engines
Post by: Art on February 27, 2015, 10:14:47 am: Just for sake of mentioning, there IS a difference between Voice Recognition (VR) and Speech Recognition (SR).

VR is a Speaker Dependent system using various methods to "identify" the person whereas SR works for practically everyone. The average Smart Phone uses SR, allowing anyone can speak into its microphone and be "understood". A VR system would only recognize one user.

There were attempts years ago for one to use their voice as their password, but there systems proved to be a bit too finicky to be useful.
Title: Re: Evaluating open-source TTS engines
Post by: Ultron on February 28, 2015, 04:02:51 pm: Art I believe using speech (or voice, which one do I use now?) recognition for security purposes is absurd. Not even human beings can always accurately identify a voice.

And I would like to thank everyone for the links. This topic should be kept nice and tidy as it has grown into a quality discussions and one can find many useful links. Cheers!
Title: Re: Evaluating open-source TTS engines
Post by: Freddy on April 08, 2015, 10:24:55 pm: Quote from: paphus on February 27, 2015, 01:52:56 am

If you are using Java, I have found the Mary TTS project to be the best.

http://mary.dfki.de/ (http://mary.dfki.de/)

It has several quality voices in several different languages.
It is what we currently use on BOT libre. We also offer free web based TTS from BOT libre web API.

I've been using MaryTTS myself for a while now. I agree it is very good and a great price. I'd really like to get into making my own voices, have you attempted this yet ?

Another cool thing about MTTS is that there are settings whereby you can make one voice sound very different. It's like having 10 for the price of one.

Their phoneme timings are very useful for doing lip sync. Not as easy to implement as SAPI, but given some thought and experimenting you can get it to work nicely. I did post a video a while back of Jess chatting in Unity. I've improved it a little since then so it might be time for another video.