Evaluating open-source TTS engines

Art · « **Reply #15 on:** February 26, 2015, 09:32:46 pm »

When one considers that a song is nothing more than a verse with added music & possibly percussion.

Most songs have certain patterns like with different types of poetry have their patterns (iambic pentameter, etc.).

Early on the ballads were the most popular form of musical poetry and the development went forth and multiplied (not all for the better I might add).

Today, practically every song, musically speaking, has remnants of some other song, perhaps from some other time, but those notes are inescapable to the trained ear picking them up and quickly sorting through the brain in an attempt to locate the original source. The eureka moment arrives as he says, "Yes! 'Country Roads' by John Denver uses a lot of the same chords as 'Let it Be', by the Beatles!!"
True, not quite the same tempo but yes, the same chords for the similar melody. Therein lies this "pattern" we speak of.

So now the TTS voices use pattern recognition for understanding and speaking and emoting and now singing.

Eventually, within the framework of our world, we will see someone singing and have to ask, I wonder if that is a real person and if it is, is that person really singing? or is it a digital rendition?

Step right up ladies and gents, you can't tell the real ones from the copies, I tell you, it's the greatest show on Earth!

Ultron · « **Reply #16 on:** February 26, 2015, 11:37:59 pm »

Interesting find Data. I too like FL studio and CANTOR seems cool.

Anyways, I suggested an idea in which a robot / A.I. finds a way to identify songs and differentiate them from regular speech, as well as synthesize it's own speech and voice. But since you guys sub-consciously volunteered to be my test subjects and derive the algorithms for this on your own, fine.

I am not sure how an A.I. would distinguish between music (vocal) and speech, but if we can do it, so can robots. If we knew how to and already had algorithms and programs that do this (as accurately as we do) then this experiment wouldn't be as interesting. But an A.I. should be able to find a nearly identical algorithm to our natural 'algorithm', and it should be more accurate than anything we can think of.

And yes patterns are everything. Patterns patterns patterns...

paphus · « **Reply #17 on:** February 27, 2015, 01:52:56 am »

Some TTS links, and comments re: AI learning to speak,

If you are using Java, I have found the Mary TTS project to be the best.

http://mary.dfki.de/

It has several quality voices in several different languages.
It is what we currently use on BOT libre. We also offer free web based TTS from BOT libre web API,

see,
http://www.botlibre.com/forum-post?id=566608

Android and iOS support voice APIs and HTML 5 has a voice API, that Chrome now support.

see,
http://updates.html5rocks.com/2014/01/Web-apps-that-talk---Introduction-to-the-Speech-Synthesis-API

Someone asked about having an AI learn how to speak from listening to conversation. I have also considered that, and think it is a great idea for both TTS and voice recognition. Currently TTS and voice recognition are treated as different problems, but I would consider that they are the same problem, and TTS and voice rec built from first principles as learning software would be the best solution. It would produce something that could learn any accent, any language, perhaps understand sound in general... not sure exactly how to get started though...

8pla.net · « **Reply #18 on:** February 27, 2015, 01:56:25 am »

Thanks for jumping in, Data.

Art · « **Reply #19 on:** February 27, 2015, 10:14:47 am »

Just for sake of mentioning, there IS a difference between Voice Recognition (VR) and Speech Recognition (SR).

VR is a Speaker Dependent system using various methods to "identify" the person whereas SR works for practically everyone. The average Smart Phone uses SR, allowing anyone can speak into its microphone and be "understood". A VR system would only recognize one user.

There were attempts years ago for one to use their voice as their password, but there systems proved to be a bit too finicky to be useful.

Ultron · « **Reply #20 on:** February 28, 2015, 04:02:51 pm »

Art I believe using speech (or voice, which one do I use now?) recognition for security purposes is absurd. Not even human beings can always accurately identify a voice.

And I would like to thank everyone for the links. This topic should be kept nice and tidy as it has grown into a quality discussions and one can find many useful links. Cheers!

Freddy · « **Reply #21 on:** April 08, 2015, 10:24:55 pm »

Quote from: paphus on February 27, 2015, 01:52:56 am

If you are using Java, I have found the Mary TTS project to be the best.

http://mary.dfki.de/

It has several quality voices in several different languages.
It is what we currently use on BOT libre. We also offer free web based TTS from BOT libre web API.

I've been using MaryTTS myself for a while now. I agree it is very good and a great price. I'd really like to get into making my own voices, have you attempted this yet ?

Another cool thing about MTTS is that there are settings whereby you can make one voice sound very different. It's like having 10 for the price of one.

Their phoneme timings are very useful for doing lip sync. Not as easy to implement as SAPI, but given some thought and experimenting you can get it to work nicely. I did post a video a while back of Jess chatting in Unity. I've improved it a little since then so it might be time for another video.

Evaluating open-source TTS engines

Art

Re: Evaluating open-source TTS engines

Ultron

Re: Evaluating open-source TTS engines

paphus

Re: Evaluating open-source TTS engines

8pla.net

Re: Evaluating open-source TTS engines

Art

Re: Evaluating open-source TTS engines

Ultron

Re: Evaluating open-source TTS engines

Freddy

Re: Evaluating open-source TTS engines

Recent Topics

Recent News

Users Online

Articles