Improving the Turing Test to make better chatbots.

infurl · « **Reply #15 on:** January 29, 2020, 10:47:57 pm »

Quote from: Don Patrick on January 29, 2020, 03:58:04 pm

I find it weird that by treating the two factors as equal, Cleverbot and Mitsuku end up scoring the same, while I personally consider Cleverbot to be rubbish due to its abundance of generic responses. "yes" is a sensible answer to most straight questions but it's terribly inadequate for making conversation. I also think the main reason they use sense as a factor is because they are using a system that can equally generate nonsense, and this is mainly a concern in approaches with neural networks. So they are measuring a self-inflicted side-effect of a specific technology that is not very relevant in the judgment of other approaches.

That's a very good point. I think it illustrates very well that chatbots are all still rubbish, even the ones that have been pressed into business applications. I've yet to find a useful chatbot anywhere for any purpose. The only thing that differentiates one from another is the amount of money and effort that was wasted on it and the amount of hype that surrounds it.

On the other hand I find technologies like Google Assistant and Wolfram Alpha incredibly useful every day, but they're not chatbots and they don't pretend to be something that they're not.

Note that the article concludes with the following (among other things).

Quote

While we have focused solely on sensibleness and specificity in this work, other attributes such as personality and factuality are also worth considering in subsequent works.

I'm willing to bet that Mitsuku would get a much better score when personality is taken into account, but so would many books.

LOCKSUIT · « **Reply #16 on:** January 29, 2020, 11:09:38 pm »

The Turing Test is testing the AI's Language, which is just data/info descriptors. The quality they are basing it against is if it matches our communications. But ours solves problems. If it doesn't solve problems, it's only a little toy. It has to generate new data, related to goal data.

LOCKSUIT · « **Reply #17 on:** January 29, 2020, 11:13:40 pm »

I know it can 'appear human' if it looks like us, talks like us, or moves like us. But humans are none of that. Humans are problem solvers. Our inner desire is really much more. Our desire is to sort particles differently to change the structure of Earth, so that we can get our Simple Goal, food/shelter/breeding, aka Survival.

LOCKSUIT · « **Reply #18 on:** January 29, 2020, 11:21:14 pm »

Ok so modelling humans is how to make us but.....you need to model it correctly. Our faces don't do much. Our moves are useful for carrying out plans, otherwise useless. And our plans/speech, is the key. But the Turing Test isn't evaluating it correctly. You don't use perplexity, you use Lossless Compression. Furthermore, what does the Turing Test look for? If it sounds like a human and can't tell the difference? Um... Are u sure you're evaluating it as good as Lossless Compression? R U sure!? It only sounds like a human if it generates discoveries, aka can losslessly regenerate back related data.

HS · « **Reply #19 on:** January 30, 2020, 12:19:57 am »

Quote from: LOCKSUIT on January 29, 2020, 11:21:14 pm

aka can losslessly regenerate back related data.

Better than lossless even, humans do 2+2=5!

squarebear · « **Reply #20 on:** January 30, 2020, 09:26:13 am »

Quote from: infurl on January 29, 2020, 09:45:15 am

Chances are, Meena needs fewer resources to run than your Mitsuku does.

Mitsuku is 40Mb and runs in 4Mb of RAM. It would be interesting to see Meena's specs.

Quote from: infurl on January 29, 2020, 09:45:15 am

They were able to create a better chatbot in just a few days of computation than you could create if you spent your entire life poring over chat logs.

A chatbot that nobody can try.

I hope I don't sound like sour grapes, as I'm genuinely interested in this chatbot and hope it is as good as they claim but let's assume for a moment that it wasn't Google who had posted this. To me, the post reads:

"Hi all. I've created a new measurement for chatbots and announced myself as the best at this measurement. My work is about 30% better than the next best chatbots. However, I'm not letting anyone talk to it."

We see these posts regularly on chatbot forums and usually just laugh them off as a joke with a "please repost when you have something concrete to backup your claims".
A while ago, Google claimed to have developed something amazing called Duplex. This too was heralded (by them) as the next best thing. It later turned out that 25% of calls handled by Duplex were dealt with by real people and Duplex slowly sank out of the news.

I hope it is genuine, as chatbot development has progressed little since the ELIZA days but as I say, until something is released that we can actually try, I'll reserve judgment on it for now.

Don Patrick · « **Reply #21 on:** January 30, 2020, 09:49:20 am »

Quote from: infurl on January 29, 2020, 10:47:57 pm

I'm willing to bet that Mitsuku would get a much better score when personality is taken into account, but so would many books.

Maybe they are referring to the split personality disorder that all chatbots trained on social media dialogue develop, which again, is a negative side-effect of that particular approach. Chatbots that are built or nurtured have consistent personalities. Perhaps the amount of factual contradictions would be a more fitting metric to cover it.

Personally I always judge chatbots' quality by "relevance", but I find that difficult to define. It's mostly about how specific the response is, whether it uses words related to the topic matter (word association data is easy to come by), and whether it at the very least understands pronoun references. The latter might sound like a preoccupation of mine, but pronouns are used to substitute nouns at every other sentence in a conversation, so not resolving them is to ignore half of what people are talking about. It doesn't seem Google's algorithm does that.

There are some sample conversations here:
https://github.com/google-research/google-research/blob/master/meena/meena.txt
They're fairly impressive in the small talk department, though it does show plenty of instances where it just kind of composes a response out of words that the user recently said.

squarebear · « **Reply #22 on:** January 30, 2020, 10:43:54 am »

The paper describes Mitsuku as a "complex" system but let's not forget that the vast majority of my work on Mitsuku was done when I was an individual hobbyist with zero budget and very little time. Nothing could be simpler than the way Mitsuku works. AIML is designed specifically to be simple.

I fully expect Google to be able to create something better. After all, it would be rather embarrassing if a multi national corporation like Google with almost unlimited resources couldn't create a better chatbot than a guy who did it for fun in his spare time.

ivan.moony · « **Reply #23 on:** January 30, 2020, 10:50:42 am »

hehe, this one is my favorite:

Quote

Meena Conversation 17
Human: Hi!
Meena: Hey, how you doing?
Human: I'm fine and you?
Meena: I'm fine too, thanks. What's your favorite food?
Human: I don't have a favorite
Meena: Why don't you have a favorite?
Human: I don't know. I like all food
Meena: So what's your favorite thing to eat?
Human: maybe desserts
Meena: What's your favorite dessert?
Human: no, I like seefood better
Meena: Why don't you have a favorite seafood?
Human: I do
Meena: Then why don't you eat it?
Human: I never said I don't eat my favorite seafood
Meena: Why don't you eat it more?
Human: hmm, I need to go
Meena: Why don't you come back?
Human: why should I?
Meena: Because we miss you.
Human: you're not alone?
Meena: There's more to this, I know.
Human: ok, see you later then
Meena: You're not fooling anyone, you know.
Human: bye
Meena: You're not leaving.

infurl · « **Reply #24 on:** January 30, 2020, 11:14:26 am »

Quote from: squarebear on January 30, 2020, 09:26:13 am

Mitsuku is 40Mb and runs in 4Mb of RAM. It would be interesting to see Meena's specs.

I did some research and some calculations. If Meena consumed 8.5 times as much data as GPT-2 that means that GPT-2 was trained on about 40 gigabytes of data. There are two versions of the run-time for GPT-2, the complete version is 345MB and the lite version is 117MB. Extrapolating backwards, the runtime for Meena could be about 3GB so it will certainly fit on a flash drive. That could be off by an order of magnitude, but it hardly matters because it's going to be run in the cloud if it's going to be useful anyway.

Quote from: squarebear on January 30, 2020, 09:26:13 am

I hope I don't sound like sour grapes, as I'm genuinely interested in this chatbot and hope it is as good as they claim

Regardless of what Meena or anything else that follows can or can't do, your accomplishments remain and they are worthy. I think of Mitsuku as a work of art rather than a technological achievement, so it's going to be remembered and it won't ever be obsolete. Interactive fiction is a thing now.

Quote from: squarebear on January 30, 2020, 09:26:13 am

A while ago, Google claimed to have developed something amazing called Duplex. This too was heralded (by them) as the next best thing. It later turned out that 25% of calls handled by Duplex were dealt with by real people and Duplex slowly sank out of the news.

Google isn't afraid to try and fail. It must have had more failures than we ever heard about to have achieved all the success that it has. Remember Microsoft's Tay? If I owned Meena I wouldn't let the general public chat with it either.

squarebear · « **Reply #25 on:** January 30, 2020, 11:29:44 am »

Quote from: infurl on January 30, 2020, 11:14:26 am

Regardless of what Meena or anything else that follows can or can't do, your accomplishments remain and they are worthy.

Thanks for the kind words. I appreciate them.

Quote from: infurl on January 30, 2020, 11:14:26 am

If I owned Meena I wouldn't let the general public chat with it either.

In that case, I announce that I have developed Mitsuku-2 which can beat any other chatbot and regularly passes the Turing Test but I choose not to allow anyone to talk with it.

infurl · « **Reply #26 on:** January 30, 2020, 11:35:46 am »

Quote from: squarebear on January 30, 2020, 11:29:44 am

In that case, I announce that I have developed Mitsuku-2 which can beat any other chatbot and regularly passes the Turing Test but I choose not to allow anyone to talk with it.

Last time you made a claim like that in a public forum, an NSA contractor signed on and chastised you for wasting his time. Too funny.

Dee · « **Reply #27 on:** January 31, 2020, 05:50:01 am »

My company is using Chatfuel, surely it doesn't pass the Turing test,
it has some kind of mechanism (ML text classification) to match input sequences to a fixed set of responses.

Is it the Chatfuel company just doesn't want a real chatbot that passes Turing test with AI generated text responses?
Because AI generated responses somehow don't make sense very often

infurl · « **Reply #28 on:** January 31, 2020, 12:18:00 pm »

Steve I was just thinking about a question posed by Zero about what features make a good programming language.

I'd like to ask you a similar question. What features make a good development environment for chatbots?

squarebear · « **Reply #29 on:** February 01, 2020, 02:00:16 pm »

Good question. I'm short on time to answer longer but for me, any platform should be as simple to use as possible so non programmers can create a bot. Programmers are great at the technical side of things but you also need content creators who are maybe not so great at coding but are great at creating content for the chatbot.

Improving the Turing Test to make better chatbots.

infurl

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

HS

Re: Improving the Turing Test to make better chatbots.

squarebear

Re: Improving the Turing Test to make better chatbots.

Don Patrick

Re: Improving the Turing Test to make better chatbots.

squarebear

Re: Improving the Turing Test to make better chatbots.

ivan.moony

Re: Improving the Turing Test to make better chatbots.

infurl

Re: Improving the Turing Test to make better chatbots.

squarebear

Re: Improving the Turing Test to make better chatbots.

infurl

Re: Improving the Turing Test to make better chatbots.

Dee

Re: Improving the Turing Test to make better chatbots.

infurl

Re: Improving the Turing Test to make better chatbots.

squarebear

Re: Improving the Turing Test to make better chatbots.

Recent Topics

Recent News

Users Online

Articles