Ai Dreams Forum

Chatbots => General Chatbots and Software => Topic started by: infurl on January 28, 2020, 09:37:39 pm

Title: Improving the Turing Test to make better chatbots.
Post by: infurl on January 28, 2020, 09:37:39 pm: https://ai.googleblog.com/2020/01/towards-conversational-agent-that-can.html (https://ai.googleblog.com/2020/01/towards-conversational-agent-that-can.html)

Quote
Modern conversational agents (chatbots) tend to be highly specialized â€” they perform well as long as users donâ€™t stray too far from their expected usage. To better handle a wide variety of conversational topics, open-domain dialog research explores a complementary approach attempting to develop a chatbot that is not specialized but can still chat about virtually anything a user wants.

It probably won't surprise anyone that Google has been developing Chatbot technology too. To achieve this they made some improvements to the Turing Test in the form of the Sensibleness and Specificity Average (SSA), and they used their artificial intelligence algorithms to develop better artificial intelligence algorithms.

Quote
The Meena model has 2.6 billion parameters and is trained on 341 GB of text, filtered from public domain social media conversations. Compared to an existing state-of-the-art generative model, OpenAI GPT-2, Meena has 1.7x greater model capacity and was trained on 8.5x more data.

Real human beings typically score 86 on the new test compared to Mitsuku and Cleverbot which score 56 and other chatbots considerably lower. Google's new Meena chatbot can score a staggering 79 which is approaching human levels.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on January 28, 2020, 10:11:34 pm: The Turing Test isn't the way to measure intelligence. The way to measure intelligence is either to personally check if it can solve real world problems, compress data losslessly by Learning the patterns ex. cat/dog and frequencies so it can generate the data back including related data, and lastly can survive death better (which is what 'solving problems' and 'doing Good' means to most Humans. Love=breeding, hence survival.). So goal #1 make it stop you from shutting it off and able to defend against all humans. Well, maybe that comes next, let's make sure it can first stop our own death.

Study/try this and you won't go back.
http://mattmahoney.net/dc/text.html

We want Better Problem Solvers. We need better pattern finding so it can use past experience to re-generate missing future data using related context experiences.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: HS on January 29, 2020, 12:55:26 am: They are using an outside-in approach, but there's no telling if it will converge on anything of substance. I'd try to make sure I have something self sustaining, and then try to expand on it. That way you get automatic error checks, as the systems benefits from, or rejects various additions.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on January 29, 2020, 01:30:52 am: Quote from: Hopefully Something on January 29, 2020, 12:55:26 am
They are using an outside-in approach, but there's no telling if it will converge on anything of substance. I'd try to make sure I have something self sustaining, and then try to expand on it. That way you get automatic error checks, as the systems benefits from, or rejects various additions.

I agree. I hope you read the whole article because they say something like that at the end.

Yesterday I read an interesting essay about GPT-2 that was written by someone who knows what they are talking about and I believe the points that they made in that article apply equally well to Meena and all the other useless chatbots that are ultimately descended from Eliza.

https://thegradient.pub/gpt2-and-the-nature-of-intelligence/ (https://thegradient.pub/gpt2-and-the-nature-of-intelligence/)

The article contrasts the two major philosophical schools of thought about the origins of intelligence, that is, nativism versus empiricism. Nativism postulates that intelligence has to be preprogrammed with rules. Empiricism takes the view that intelligence starts with a blank slate and arises organically out of accumulated experiences.

Technologically these equate to GOFAI (good-old-fashioned-artificial-intelligence or symbolic processing) and machine learning (statistical processing). The latter has had a lot of success recently but ultimately it seems to be going nowhere.

I take the view that they're both wrong, although nativism is less wrong than empiricism.

What we think of as intelligence is actually the result of billions of years of evolution across all living things, and more recently, across all our civilizations. To produce an artificial intelligence empirically, on some level you would have to match the processing capacity of all the human brains throughout history. As even a single human brain is more powerful than all the computers on earth put together, to attempt that seems a bit futile at this juncture.

In short, the rules evolve empirically over time and each generation passes its improved set of rules on to the next one.

Intelligence is an ecosystem.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: Art on January 29, 2020, 03:26:38 am: Nativism doesn't necessarily imply any degree of knowledge that Empiricism certainly would. Knowing only the rules for playing baseball, cards or chess doesn't make one a good player. Knowledge gained through playing and learning over time is what makes the difference.

What one then does with that knowledge can and will make all the difference in the world.

@ Lock, Why the preoccupation with death? We all die, it is a part of life. The old die and make way for the new. If we were meant to live forever then we would and no one would have died and our planet would by now be vastly overpopulated to the point of extinction. Then again, why do we think NASA is so intent on making the journey to inhabit Mars?
Title: Re: Improving the Turing Test to make better chatbots.
Post by: MikeB on January 29, 2020, 05:07:07 am: On the topic of the Turing test - Sensibleness and Specificity Average (SSA) developed by google researchers, it looks like it's exactly the same as used in regular Turing tests except with a graph - One point for relevance. One point for a specific answer.

At least they're recognising Pattern Matching bots as competitive instead of "you're not the future"/hate speech.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: HS on January 29, 2020, 06:39:22 am: Quote from: infurl on January 29, 2020, 01:30:52 am
I agree. I hope you read the whole article because they say something like that at the end.

Umm.. now I have. O0

Quote from: infurl on January 29, 2020, 01:30:52 am
I take the view that they're both wrong, although nativism is less wrong than empiricism.

Yes, neither Nativism nor Empiricism appear like complete strategies, hearing them explained doesnâ€™t produce an epiphany to kickstart an understanding of intelligence. They create the symptoms, but not the disease. :)

Quote from: infurl on January 29, 2020, 01:30:52 am
Intelligence is an ecosystem.

An ecosystemâ€¦ Yesâ€¦ Thatâ€™s a good way to describe it. A system which attunes to the world. Overlapping loops of various functions, both detailing the present and building towards the future, synergistically reinforcing each other.

Regarding the communication, more important than good words, are clues about where the words are coming from. If there is no reason for it to exist, the most clever banter can be as thin and dry as a paper cut-out. We should first figure out how to create a system which generates reasons, then we wonâ€™t need to work so hard to make the resulting proceedings seem reasonable. They wonâ€™t have to be reasonable, logical, or consistent at the surface level. They just need to have the signature of a system running on faith and countless assumptions. :angel_002: Seriously.

The test wouldnâ€™t be about if it makes sense and responds to questions in the correct way, but rather if itâ€™s able to make you self conscious. The â€œSomething perceives me, how do I appear?â€ reaction is what weâ€™re truly hoping for. This could also provide a second, less Freudian explanation for all the pretty female robots. Those could be attempts to recreate that missing sense of presence with an optical trick, to make the robot appear closer to sentience than it actually is. We'd probably be best off with both.

But again, thatâ€™s an outside-in approach. Just grammar and glamour trying to distract from the empty space behind them. Are there any attempts/examples of a core program which provides a basic functional system with some built in goals, which also supports near omnidirectional growth?

Seems like one of the basic goals of a general intelligence would be to have an optimal interaction with the world, not necessarily a victorious one.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: squarebear on January 29, 2020, 08:23:33 am: A shame there's no actual way to talk to the bot to validate the claims. The things that stands out for me is that Meena is 341Gb which is 10,000 times larger than Mitsuku. I'm unaware of the hardware requirements of the bot but Mitsuku can run from a USB on around 4Mb of RAM. Given this, I'm curious whether Meena is practical for running locally on a device such as a smartwatch,

I checked out the paper a few days ago and gave Google feedback on it. I usually take these things with a pinch of salt unless I can try them myself. 15 years in the chatbot business has made me rather cynical of any amazing new claims, especially if nobody can actually try it out.

Did I ever mention I can run the 100 metres in 8 seconds? However, I choose not to do a public demonstration of this ;)

Would love to try it out though and am genuinely curious about it.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on January 29, 2020, 09:45:15 am: Quote from: squarebear on January 29, 2020, 08:23:33 am
The things that stands out for me is that Meena is 341Gb which is 10,000 times larger than Mitsuku. I'm unaware of the hardware requirements of the bot.

No, Meena is not 341Gb. That is how much conversational data they processed in order to generate the model that drives Meena. The amount of training data that is used to train a neural network does not bear any relation to the ultimate size of the model that is distilled from it. Chances are, Meena needs fewer resources to run than your Mitsuku does.

What they did that is different is that they figured out a way to measure the quality of a conversation without needing human judges. As MikeB pointed out, the Turing Test already has metrics for sensibleness and specificity. Google's researchers discovered a parameter that is generated by the learning algorithm which correlates strongly with those human measured values, the so-called perplexity of the model.

With that knowledge they were able to design a neural network to generate and test a large number of different neural networks until they found the ones that produced the optimal results for a conversational chatbot. They were able to create a better chatbot in just a few days of computation than you could create if you spent your entire life poring over chat logs.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: ivan.moony on January 29, 2020, 01:08:23 pm: I can't wait till I speak to Meena. I like the concept of pairing NN with chatbot technology. But I'm sure there will be a lot of place for improvements. I bet Korr would have great ideas on that cause.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on January 29, 2020, 01:16:01 pm: Quote from: ivan.moony on January 29, 2020, 01:08:23 pm
I can't wait till I speak to Meena. I like the concept of pairing NN with chatbot technology. But I'm sure there will be a lot of place for improvements. I bet Korr would have great ideas on this cause.

Yes that's for sure. If you read the article closely you will have noticed that the algorithm alone achieved a score of 72 but with some hand-tuning that was increased to 79. There will still be a role for human chatbot authors to play.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: ivan.moony on January 29, 2020, 01:38:59 pm: Quote from: infurl on January 29, 2020, 01:16:01 pm
Quote from: ivan.moony on January 29, 2020, 01:08:23 pm
I can't wait till I speak to Meena. I like the concept of pairing NN with chatbot technology. But I'm sure there will be a lot of place for improvements. I bet Korr would have great ideas on this cause.

Yes that's for sure. If you read the article closely you will have noticed that the algorithm alone achieved a score of 72 but with some hand-tuning that was increased to 79. There will still be a role for human chatbot authors to play.

I'm counting on the fact that NNs are Turing complete, which means they can perform any possible computation, proof, or correct thought. With carefully related different segments of artificial brain, possibly deciding a degree of statistical correctness, maybe it would turn to the very AGI?
Title: Re: Improving the Turing Test to make better chatbots.
Post by: Don Patrick on January 29, 2020, 03:58:04 pm: I find it weird that by treating the two factors as equal, Cleverbot and Mitsuku end up scoring the same, while I personally consider Cleverbot to be rubbish due to its abundance of generic responses. "yes" is a sensible answer to most straight questions but it's terribly inadequate for making conversation. I also think the main reason they use sense as a factor is because they are using a system that can equally generate nonsense, and this is mainly a concern in approaches with neural networks. So they are measuring a self-inflicted side-effect of a specific technology that is not very relevant in the judgment of other approaches.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on January 29, 2020, 06:01:03 pm: @Art, the reason er death (well, survival) is the evaluation is because change happens. Be it an idea or person, sorting of particles happens, you can't delete or create bytes/matter/energy, only sort them! So change=death. What survives is the "good" change. More patterns arise during evolution, it was chaos when it began random. We are aligning now. So AGI/physics/evolution is all about change of particle positions / survival, updates if you will. Everyday we seek food to survive, and breeding to populate against depletion. We grow new data too. Ideas 'fight' too. One takes over and overpoweringly radiates 'updates' to the rest, which may feel pleasant like a friendly tip or, well, erm. Being 'intelligent' is all about your skills at finding food, and breeding....so your whole goal is to try to survive/spread, and whoever is better wins against many others and updates the others.

Yes starting from blank slate VS passing down rules, both are needed. You are taught, then you build onto it.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on January 29, 2020, 06:56:29 pm: See above my last 2 posts. And so Lossless Compression is an evaluation for AGI (I hope korrelan doesn't use perplexity) because the sorting of particles Learns patterns, enabling the sort of particles to easily work with many unseen future issues given to it. It's an enabling factor when you sort your particles better/Learn patterns. It let's you solve many many problems, and survive/spread. So pattern/sorting = solving issues = surviving/spreading/regenerating lifeform back.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on January 29, 2020, 10:47:57 pm: Quote from: Don Patrick on January 29, 2020, 03:58:04 pm
I find it weird that by treating the two factors as equal, Cleverbot and Mitsuku end up scoring the same, while I personally consider Cleverbot to be rubbish due to its abundance of generic responses. "yes" is a sensible answer to most straight questions but it's terribly inadequate for making conversation. I also think the main reason they use sense as a factor is because they are using a system that can equally generate nonsense, and this is mainly a concern in approaches with neural networks. So they are measuring a self-inflicted side-effect of a specific technology that is not very relevant in the judgment of other approaches.

That's a very good point. I think it illustrates very well that chatbots are all still rubbish, even the ones that have been pressed into business applications. I've yet to find a useful chatbot anywhere for any purpose. The only thing that differentiates one from another is the amount of money and effort that was wasted on it and the amount of hype that surrounds it.

On the other hand I find technologies like Google Assistant and Wolfram Alpha incredibly useful every day, but they're not chatbots and they don't pretend to be something that they're not.

Note that the article concludes with the following (among other things).

Quote
While we have focused solely on sensibleness and specificity in this work, other attributes such as personality and factuality are also worth considering in subsequent works.

I'm willing to bet that Mitsuku would get a much better score when personality is taken into account, but so would many books.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on January 29, 2020, 11:09:38 pm: The Turing Test is testing the AI's Language, which is just data/info descriptors. The quality they are basing it against is if it matches our communications. But ours solves problems. If it doesn't solve problems, it's only a little toy. It has to generate new data, related to goal data.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on January 29, 2020, 11:13:40 pm: I know it can 'appear human' if it looks like us, talks like us, or moves like us. But humans are none of that. Humans are problem solvers. Our inner desire is really much more. Our desire is to sort particles differently to change the structure of Earth, so that we can get our Simple Goal, food/shelter/breeding, aka Survival.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on January 29, 2020, 11:21:14 pm: Ok so modelling humans is how to make us but.....you need to model it correctly. Our faces don't do much. Our moves are useful for carrying out plans, otherwise useless. And our plans/speech, is the key. But the Turing Test isn't evaluating it correctly. You don't use perplexity, you use Lossless Compression. Furthermore, what does the Turing Test look for? If it sounds like a human and can't tell the difference? Um... Are u sure you're evaluating it as good as Lossless Compression? R U sure!? It only sounds like a human if it generates discoveries, aka can losslessly regenerate back related data.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: HS on January 30, 2020, 12:19:57 am: Quote from: LOCKSUIT on January 29, 2020, 11:21:14 pm
aka can losslessly regenerate back related data.

Better than lossless even, humans do 2+2=5!
Title: Re: Improving the Turing Test to make better chatbots.
Post by: squarebear on January 30, 2020, 09:26:13 am: Quote from: infurl on January 29, 2020, 09:45:15 am
Chances are, Meena needs fewer resources to run than your Mitsuku does.
Mitsuku is 40Mb and runs in 4Mb of RAM. It would be interesting to see Meena's specs.

Quote from: infurl on January 29, 2020, 09:45:15 am
They were able to create a better chatbot in just a few days of computation than you could create if you spent your entire life poring over chat logs.
A chatbot that nobody can try.

I hope I don't sound like sour grapes, as I'm genuinely interested in this chatbot and hope it is as good as they claim but let's assume for a moment that it wasn't Google who had posted this. To me, the post reads:

"Hi all. I've created a new measurement for chatbots and announced myself as the best at this measurement. My work is about 30% better than the next best chatbots. However, I'm not letting anyone talk to it."

We see these posts regularly on chatbot forums and usually just laugh them off as a joke with a "please repost when you have something concrete to backup your claims".
A while ago, Google claimed to have developed something amazing called Duplex. This too was heralded (by them) as the next best thing. It later turned out that 25% of calls handled by Duplex were dealt with by real people and Duplex slowly sank out of the news.

I hope it is genuine, as chatbot development has progressed little since the ELIZA days but as I say, until something is released that we can actually try, I'll reserve judgment on it for now.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: Don Patrick on January 30, 2020, 09:49:20 am: Quote from: infurl on January 29, 2020, 10:47:57 pm
I'm willing to bet that Mitsuku would get a much better score when personality is taken into account, but so would many books.
Maybe they are referring to the split personality disorder that all chatbots trained on social media dialogue develop, which again, is a negative side-effect of that particular approach. Chatbots that are built or nurtured have consistent personalities. Perhaps the amount of factual contradictions would be a more fitting metric to cover it.

Personally I always judge chatbots' quality by "relevance", but I find that difficult to define. It's mostly about how specific the response is, whether it uses words related to the topic matter (word association data is easy to come by), and whether it at the very least understands pronoun references. The latter might sound like a preoccupation of mine, but pronouns are used to substitute nouns at every other sentence in a conversation, so not resolving them is to ignore half of what people are talking about. It doesn't seem Google's algorithm does that.

There are some sample conversations here:
https://github.com/google-research/google-research/blob/master/meena/meena.txt
They're fairly impressive in the small talk department, though it does show plenty of instances where it just kind of composes a response out of words that the user recently said.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: squarebear on January 30, 2020, 10:43:54 am: The paper describes Mitsuku as a "complex" system but let's not forget that the vast majority of my work on Mitsuku was done when I was an individual hobbyist with zero budget and very little time. Nothing could be simpler than the way Mitsuku works. AIML is designed specifically to be simple.

I fully expect Google to be able to create something better. After all, it would be rather embarrassing if a multi national corporation like Google with almost unlimited resources couldn't create a better chatbot than a guy who did it for fun in his spare time. :D
Title: Re: Improving the Turing Test to make better chatbots.
Post by: ivan.moony on January 30, 2020, 10:50:42 am: hehe, this one is my favorite:

Quote
Meena Conversation 17
Human: Hi!
Meena: Hey, how you doing?
Human: I'm fine and you?
Meena: I'm fine too, thanks. What's your favorite food?
Human: I don't have a favorite
Meena: Why don't you have a favorite?
Human: I don't know. I like all food
Meena: So what's your favorite thing to eat?
Human: maybe desserts
Meena: What's your favorite dessert?
Human: no, I like seefood better
Meena: Why don't you have a favorite seafood?
Human: I do
Meena: Then why don't you eat it?
Human: I never said I don't eat my favorite seafood
Meena: Why don't you eat it more?
Human: hmm, I need to go
Meena: Why don't you come back?
Human: why should I?
Meena: Because we miss you.
Human: you're not alone?
Meena: There's more to this, I know.
Human: ok, see you later then
Meena: You're not fooling anyone, you know.
Human: bye
Meena: You're not leaving.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on January 30, 2020, 11:14:26 am: Quote from: squarebear on January 30, 2020, 09:26:13 am
Mitsuku is 40Mb and runs in 4Mb of RAM. It would be interesting to see Meena's specs.

I did some research and some calculations. If Meena consumed 8.5 times as much data as GPT-2 that means that GPT-2 was trained on about 40 gigabytes of data. There are two versions of the run-time for GPT-2, the complete version is 345MB and the lite version is 117MB. Extrapolating backwards, the runtime for Meena could be about 3GB so it will certainly fit on a flash drive. That could be off by an order of magnitude, but it hardly matters because it's going to be run in the cloud if it's going to be useful anyway.

Quote from: squarebear on January 30, 2020, 09:26:13 am
I hope I don't sound like sour grapes, as I'm genuinely interested in this chatbot and hope it is as good as they claim

Regardless of what Meena or anything else that follows can or can't do, your accomplishments remain and they are worthy. I think of Mitsuku as a work of art rather than a technological achievement, so it's going to be remembered and it won't ever be obsolete. Interactive fiction is a thing now.

Quote from: squarebear on January 30, 2020, 09:26:13 am
A while ago, Google claimed to have developed something amazing called Duplex. This too was heralded (by them) as the next best thing. It later turned out that 25% of calls handled by Duplex were dealt with by real people and Duplex slowly sank out of the news.

Google isn't afraid to try and fail. It must have had more failures than we ever heard about to have achieved all the success that it has. Remember Microsoft's Tay? If I owned Meena I wouldn't let the general public chat with it either.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: squarebear on January 30, 2020, 11:29:44 am: Quote from: infurl on January 30, 2020, 11:14:26 am
Regardless of what Meena or anything else that follows can or can't do, your accomplishments remain and they are worthy.
Thanks for the kind words. I appreciate them. :)

Quote from: infurl on January 30, 2020, 11:14:26 am
If I owned Meena I wouldn't let the general public chat with it either.
In that case, I announce that I have developed Mitsuku-2 which can beat any other chatbot and regularly passes the Turing Test but I choose not to allow anyone to talk with it. ;D
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on January 30, 2020, 11:35:46 am: Quote from: squarebear on January 30, 2020, 11:29:44 am
In that case, I announce that I have developed Mitsuku-2 which can beat any other chatbot and regularly passes the Turing Test but I choose not to allow anyone to talk with it. ;D
Last time you made a claim like that in a public forum, an NSA contractor signed on and chastised you for wasting his time. Too funny.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: Dee on January 31, 2020, 05:50:01 am: My company is using Chatfuel, surely it doesn't pass the Turing test,
it has some kind of mechanism (ML text classification) to match input sequences to a fixed set of responses.

Is it the Chatfuel company just doesn't want a real chatbot that passes Turing test with AI generated text responses?
Because AI generated responses somehow don't make sense very often :uglystupid2:
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on January 31, 2020, 12:18:00 pm: Steve I was just thinking about a question posed by Zero about what features make a good programming language.

I'd like to ask you a similar question. What features make a good development environment for chatbots?
Title: Re: Improving the Turing Test to make better chatbots.
Post by: squarebear on February 01, 2020, 02:00:16 pm: Good question. I'm short on time to answer longer but for me, any platform should be as simple to use as possible so non programmers can create a bot. Programmers are great at the technical side of things but you also need content creators who are maybe not so great at coding but are great at creating content for the chatbot.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: Dee on February 01, 2020, 07:41:28 pm: Quote from: squarebear on February 01, 2020, 02:00:16 pm
Good question. I'm short on time to answer longer but for me, any platform should be as simple to use as possible so non programmers can create a bot. Programmers are great at the technical side of things but you also need content creators who are maybe not so great at coding but are great at creating content for the chatbot.
::) It's interesting that non-programmers will be able to create chatbots without IT knowledge,
just like like creating robots and give them some motivations, some instincts;
and bots start learning actively, for example, crawl the internet and learn by themselves the knowledges that match motivations;
just like God made us with instincts and motivations.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: infurl on March 21, 2020, 10:36:55 pm: Here is the "Two Minute Papers" take on Meena.

https://www.youtube.com/watch?v=3Wppf_CNvD0 (https://www.youtube.com/watch?v=3Wppf_CNvD0)
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on March 21, 2020, 11:42:38 pm: Mitsuki keeps getting in there ;)

""Humans were employed as crowd workers to rate each such conversation for its 'sensibility' and its 'specificity,' and such examples do, indeed, make big progress from prior chatbots. ...Sadly, the humans were not asked by Adiwardana and colleagues to rate conversations for 'interestingness,' because this and other exchanges in the sample are incredibly dull."

But one should not leave it at that, because it appears that the team has realized its steps ahead, as they review their work and what goals are yet to be achieved. They wrote in their paper:

"Furthermore, it may be necessary to expand the set of basic human-like conversation attributes being measured beyond sensibleness and specificity. Some directions could include humor, empathy, deep reasoning, question answering and knowledge discussion skills. "

This is in the early stage of its development, and Meena will undergo further evaluation before you can actually talk with it, said reports."

Lossless Compression is the best evaluation. See the Hutter Prize and my thread https://aidreams.co.uk/forum/index.php?topic=14561.75

You don't actually need to test its turns per conversation count aka how long until the human leaves, nor have humans vote based on interestingness. What you "want" is the text generator/predictor to predict likely answers to unseen inputs, and also predict desired answers to steer the prediction in its favor yet more. That, is the interestingness, better compression results in both. All you do is mark nodes with reward chemical traces, no hiring employees like google did! If its interesting, the human WILL stay!!!! Oh yes! All u need to do is make it know what happens next and it can cure cancer, yup. And, Semantics. And predict desired outcomes too, I guess that's why we believe in Gods...when you ask "you want what?" i repeat back "i want fries" by changing words (prediction chooses them, as chains forward). Doing so should improve compression, cus i myself am predictable in my desires, not just frequency. I suppose hmm, frequency shows desires, if they talk about it most then ya its frequent, hmm. Well, the reward chemical is meant to let it eat large data but still withtain its root reasoning to work with the data correctly, i think... not sure, will sleep on it!
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on March 22, 2020, 11:03:36 am: Ok, so, predicting the future using frequency of what was seen to come next after the recognized recent context state, does all, but as for our hardwired desires to steer the prediction, hmm, yes i just decided it is as follows: we have milestone marks to help us along the way, short term ones, longer term ones, and final outcome end goal. Ex. favored research method....better hard drive.....food on table for kids. 3 goals. You immediately ask your desired Question to yourself if internet fails and use the likliest path and the favored method to research, or jog or whatever, and then you are working your way through this maze to the next expected outcome predicted, better hard drive fabrication. So yes, using RL goal reward chemical in compression should, result in better prediction by modeling the objective milestone build of thought the same as the writers do. I guess this is actually modeling real humans, well Hutter Prize is a way to evaluate the neural network, we are finding out how to invent AGI simply. As for Meena, I'd like to see more its generations, but the evaluation of how long humans stay talking to it is not the best way to keep interest, all we need do is answer questions correctly and that's what we really want too,
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on March 22, 2020, 11:45:07 am: Also see the below link about Perplexity Evaluation for AI! As I said, Lossless Compression evaluation in the Hutter Prize is *the best* and see it really is the same thing, prediction accuracy. Except it allows errors.

https://planspace.org/2013/09/23/perplexity-what-it-is-and-what-yours-is/

https://www.youtube.com/watch?v=BAN3NB_SNHY

Hmm. I assume they take words or sentences and check if the prediction is close/exact, then carry on. With lossless compression, it stores the arithmetic encoded decimal of the probability and the resulting file size shows the probability error for the whole file, no matter if your predictor did poor on some or not, as well, just like Perplexity. However they don't consider the neural network size, it could just copy the data. That's why they use a test set after/during training. The goal is same, make a good neural network predictor though. The test set/compression is also, similar a lot, they are seeing how well it understands the data while not copying the data directly.

So which is better? I'm not sure now. Perplexity, or Lossless Compression?
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on March 22, 2020, 12:28:06 pm: Ah, you can train a net on a dataset and then test on another different dataset but you can never be sure the dataset is on topic, it has to be different lol!!!!! With Lossless Compression evaluation, the predictor also predicts the next token, and we store the accuracy error, but it is of the same dataset, meaning it can fully understand the dataset, and is safe because we include the code size and compressed error size and make sure the compression is most can get. Speed matters too. And working memory size. Cus brute force would work but is slowest

Since both evaluations test the predictor's accuracy and know the right symbol to predict, we see the error, but we can't know the best compression/accuracy possible, the contest will never stop. With Perplexity, this is true too I think, it gets ex. 90% letters or words predicted exactly, but how many can it get right? 100%? Maybe if the training dataset is large enough, it will do better, but doesn't mean it is understanding it as much. With compression, you can do better the bigger the dataset, but you can at least keep the size static and focus on compression aka understanding the data better. I guess with Perplexity you too can keep your training set static. So ya both can keep dataset same size and improve prediction to an unknown limit.

Conclusion is Perplexity isn't focusing on the very dataset it is digesting, but a different "test" dataset, which is bad.
Title: Re: Improving the Turing Test to make better chatbots.
Post by: LOCKSUIT on March 22, 2020, 01:29:13 pm: Does anyone here know how the evaluation Bytes Per Character works when used on a text predictor's accuracy? And how is it different from Perplexity?