Improving the Turing Test to make better chatbots.

Dee · « **Reply #30 on:** February 01, 2020, 07:41:28 pm »

Quote from: squarebear on February 01, 2020, 02:00:16 pm

Good question. I'm short on time to answer longer but for me, any platform should be as simple to use as possible so non programmers can create a bot. Programmers are great at the technical side of things but you also need content creators who are maybe not so great at coding but are great at creating content for the chatbot.

It's interesting that non-programmers will be able to create chatbots without IT knowledge,
just like like creating robots and give them some motivations, some instincts;
and bots start learning actively, for example, crawl the internet and learn by themselves the knowledges that match motivations;
just like God made us with instincts and motivations.

infurl · « **Reply #31 on:** March 21, 2020, 10:36:55 pm »

Here is the "Two Minute Papers" take on Meena.

LOCKSUIT · « **Reply #32 on:** March 21, 2020, 11:42:38 pm »

Mitsuki keeps getting in there

""Humans were employed as crowd workers to rate each such conversation for its 'sensibility' and its 'specificity,' and such examples do, indeed, make big progress from prior chatbots. ...Sadly, the humans were not asked by Adiwardana and colleagues to rate conversations for 'interestingness,' because this and other exchanges in the sample are incredibly dull."

But one should not leave it at that, because it appears that the team has realized its steps ahead, as they review their work and what goals are yet to be achieved. They wrote in their paper:

"Furthermore, it may be necessary to expand the set of basic human-like conversation attributes being measured beyond sensibleness and specificity. Some directions could include humor, empathy, deep reasoning, question answering and knowledge discussion skills. "

This is in the early stage of its development, and Meena will undergo further evaluation before you can actually talk with it, said reports."

Lossless Compression is the best evaluation. See the Hutter Prize and my thread https://aidreams.co.uk/forum/index.php?topic=14561.75

You don't actually need to test its turns per conversation count aka how long until the human leaves, nor have humans vote based on interestingness. What you "want" is the text generator/predictor to predict likely answers to unseen inputs, and also predict desired answers to steer the prediction in its favor yet more. That, is the interestingness, better compression results in both. All you do is mark nodes with reward chemical traces, no hiring employees like google did! If its interesting, the human WILL stay!!!! Oh yes! All u need to do is make it know what happens next and it can cure cancer, yup. And, Semantics. And predict desired outcomes too, I guess that's why we believe in Gods...when you ask "you want what?" i repeat back "i want fries" by changing words (prediction chooses them, as chains forward). Doing so should improve compression, cus i myself am predictable in my desires, not just frequency. I suppose hmm, frequency shows desires, if they talk about it most then ya its frequent, hmm. Well, the reward chemical is meant to let it eat large data but still withtain its root reasoning to work with the data correctly, i think... not sure, will sleep on it!

LOCKSUIT · « **Reply #33 on:** March 22, 2020, 11:03:36 am »

Ok, so, predicting the future using frequency of what was seen to come next after the recognized recent context state, does all, but as for our hardwired desires to steer the prediction, hmm, yes i just decided it is as follows: we have milestone marks to help us along the way, short term ones, longer term ones, and final outcome end goal. Ex. favored research method....better hard drive.....food on table for kids. 3 goals. You immediately ask your desired Question to yourself if internet fails and use the likliest path and the favored method to research, or jog or whatever, and then you are working your way through this maze to the next expected outcome predicted, better hard drive fabrication. So yes, using RL goal reward chemical in compression should, result in better prediction by modeling the objective milestone build of thought the same as the writers do. I guess this is actually modeling real humans, well Hutter Prize is a way to evaluate the neural network, we are finding out how to invent AGI simply. As for Meena, I'd like to see more its generations, but the evaluation of how long humans stay talking to it is not the best way to keep interest, all we need do is answer questions correctly and that's what we really want too,

LOCKSUIT · « **Reply #34 on:** March 22, 2020, 11:45:07 am »

Also see the below link about Perplexity Evaluation for AI! As I said, Lossless Compression evaluation in the Hutter Prize is *the best* and see it really is the same thing, prediction accuracy. Except it allows errors.

https://planspace.org/2013/09/23/perplexity-what-it-is-and-what-yours-is/

Hmm. I assume they take words or sentences and check if the prediction is close/exact, then carry on. With lossless compression, it stores the arithmetic encoded decimal of the probability and the resulting file size shows the probability error for the whole file, no matter if your predictor did poor on some or not, as well, just like Perplexity. However they don't consider the neural network size, it could just copy the data. That's why they use a test set after/during training. The goal is same, make a good neural network predictor though. The test set/compression is also, similar a lot, they are seeing how well it understands the data while not copying the data directly.

So which is better? I'm not sure now. Perplexity, or Lossless Compression?

LOCKSUIT · « **Reply #35 on:** March 22, 2020, 12:28:06 pm »

Ah, you can train a net on a dataset and then test on another different dataset but you can never be sure the dataset is on topic, it has to be different lol!!!!! With Lossless Compression evaluation, the predictor also predicts the next token, and we store the accuracy error, but it is of the same dataset, meaning it can fully understand the dataset, and is safe because we include the code size and compressed error size and make sure the compression is most can get. Speed matters too. And working memory size. Cus brute force would work but is slowest

Since both evaluations test the predictor's accuracy and know the right symbol to predict, we see the error, but we can't know the best compression/accuracy possible, the contest will never stop. With Perplexity, this is true too I think, it gets ex. 90% letters or words predicted exactly, but how many can it get right? 100%? Maybe if the training dataset is large enough, it will do better, but doesn't mean it is understanding it as much. With compression, you can do better the bigger the dataset, but you can at least keep the size static and focus on compression aka understanding the data better. I guess with Perplexity you too can keep your training set static. So ya both can keep dataset same size and improve prediction to an unknown limit.

Conclusion is Perplexity isn't focusing on the very dataset it is digesting, but a different "test" dataset, which is bad.

LOCKSUIT · « **Reply #36 on:** March 22, 2020, 01:29:13 pm »

Does anyone here know how the evaluation Bytes Per Character works when used on a text predictor's accuracy? And how is it different from Perplexity?

Improving the Turing Test to make better chatbots.

Dee

Re: Improving the Turing Test to make better chatbots.

infurl

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

LOCKSUIT

Re: Improving the Turing Test to make better chatbots.

Recent Topics

Recent News

Users Online

Articles