Ai Dreams Forum

Chatbots => General Chatbots and Software => Topic started by: Denis ROBERT on June 03, 2022, 12:40:36 pm

Title: What challenge to replace the Loebner Prize?
Post by: Denis ROBERT on June 03, 2022, 12:40:36 pm
It seems increasingly clear that the Loebner Prize is definitively gone. Already in 2019, the protocol in the form of the Turing test was replaced by a simple exposure and a vote. In 2020 and 2021, no Loebner Prize has been organized. It will probably be the same in 2022 (https://www.chatbots.org/ai_zone/viewreply/27729/).
However, it seems important to me to test chatbots in order to be able to compare and evaluate the different technologies, and, if strong artificial intelligence would be a danger, to be able to detect how far we are from this danger.
That's why I created a  OnlineTuring Test (https://vixia.fr/turing_test/index.php). The lack of success of this competition seems to indicate that the form is not the right one. So I make this survey to find out how to make it evolve. I would like to hear from all the famous botmasters who have participated in the Loebner Prize for 30 years, but also from all the others, those who have a chatbot in the works and those who are just interested in this technology.
I cannot make a multiple choice survey. If several options seem acceptable to you, or one that is not on the list, do not hesitate to complete your answer.
Title: Re: What challenge to replace the Loebner Prize?
Post by: ivan.moony on June 03, 2022, 01:48:28 pm
As the AI technology advances, and their ability raises, passing different tests may seem to be a right measure of AI usability.

Yesterday it was Turing test, today it may be Winograd schema challenge or OnlineTuring test, whereas tomorrow it may be even getting a school diploma or successfully obtaining and keeping some paid job.

But those are tests that measure how close to humans is AI. More advanced tests in the future would even include some measure of surpassing humans. Probably some categorized list of successfully passed requirements (IQ test, psychotest, a level of ethics reached, maybe physical strength, ...)

We already have a measure of "X horsepowers". Maybe we will need a measure of "Y humanminds" also, if we can take all the tests under the same umbrella.
Title: Re: What challenge to replace the Loebner Prize?
Post by: MagnusWootton on June 03, 2022, 05:19:31 pm
chat bots are still getting better tho, its not like chat bots are dead thats fer sure.
Title: Re: What challenge to replace the Loebner Prize?
Post by: WriterOfMinds on June 03, 2022, 05:47:26 pm
I keep saying I'm not ready (and probably won't be for a few more years!), but I want the Loebner Prize or something in that style to exist. I thought it was both fun and useful.

And I think basing it around some kind of Turing Test featuring free-form communication is still a good idea. Testing on a specific task (like Winograd schema interpretation) is also useful but too specialized, in my opinion. So tests like this could be part of a competition but ideally not the whole.

If you wanted to introduce a game format, I would recommend a tabletop RPG or something else designed for open-ended storytelling - the rules may be known ahead of time, but the scenario and the objectives are not, and players can ad-lib almost anything. If you pick a game that has fixed objectives and optimal strategies, people will probably build specialist bots just for that game.

You have my detailed feedback on the Online Turing Test from when you held it. In my opinion it was a good effort, but by its nature it needed more participation to be successful. Why you didn't get that participation is unclear - it could be the format, but I wouldn't guarantee that. It may be that you just lacked the "critical mass" of prestige and community interest to bring everyone together. The existing Loebner Prize had tradition and name recognition behind it, and I'm sure that helped.

And then you have to throw in the fact that hobbyists working with some combination of scripting/lookup databases and GOFAI are the lowest bottom feeders in the AI community right now. Mainstream interest is on building a bigger transformer AI and getting incrementally better performance on benchmarks, such as GLUE.
Title: Re: What challenge to replace the Loebner Prize?
Post by: LOCKSUIT on June 04, 2022, 09:17:54 am
The common way to measure AI right now is its accuracy at prediction, or a Reinforced Learning score, because this measure is better for short term guidance towards taking steps to AGI. The Turing Test, or answering secret questions and "looking right" to judges, per human votes, is much harder to measure because it takes humans a long time to measure thousands of tasks completed for the runner up AI compactor (you have to test lots of tasks, scale is where it matters...), and is much harder because its is long term orientated, the Turing Test is long term orientated. What that means is it is a very powerful measure yes, it's very simple and powerful, but its not used often then, and only when we can muster up a long test run using up a few million dollars to organize the measure.

The Turing Test is if it looks and feels like a human. It's simple and works. Right now GPT-3/ Jukebox/ DALL-E 2/ CogVideo/ Google's Flamingo/ GATO/ Mia feel a lot like human level, but clearly fall short. It would pass the TT if it could learn online, have all the senses, had a bigger brain, faster brain, scaled resolution and frame rate, and used its video predictor (see CogVideo below) to perhaps control a robot body based on the predicted video (CogVideo seems to be like GPT-3, having learnt lots of tasks and expressions and scenes/ etc/ etc, so it will just be and act and look like a human when you talk to it by feeding in your face etc as the video context). I would say we all can already kind of do the Turing Test in our head, by just looking at the results of all these AIs, because while I have not set up a single large test, I have seen dozens of things each these AIs has done and can tell they/ the AI field are faring pretty good.

The ultra long term test for intelligence is which machine lives longer (score many...), this is rarely done by humans because it can take 100s of years to see which person survives...... powerful test but rarely can do it. As with the TT above, it is probably faster to "predict" how long it will live instead of measuring it exactly, ex. by seeing ok its filthy rich, has cells that can handle high heat and cancer, and regenerates organs, and lives away from humans in safe place...
Title: Re: What challenge to replace the Loebner Prize?
Post by: ivan.moony on June 04, 2022, 03:10:58 pm
Still looking at the future, as a measurement of AI abilities, how about "humanyears per hour"?
Title: Re: What challenge to replace the Loebner Prize?
Post by: LOCKSUIT on June 04, 2022, 04:22:34 pm
Oops I forgot the link, here, I fell asleep in my chair after writing my reply above lol.

https://github.com/THUDM/CogVideo
Title: Re: What challenge to replace the Loebner Prize?
Post by: MagnusWootton on June 04, 2022, 09:52:26 pm
The turing test is really good, but I think its a little misleading to developers that they need to make their "intelligent system" human to be better,   if its artificial it could still be way more successful than some useless parrot faking device.
Title: Re: What challenge to replace the Loebner Prize?
Post by: LOCKSUIT on June 06, 2022, 06:46:32 pm
The turing test is really good, but I think its a little misleading to developers that they need to make their "intelligent system" human to be better,   if its artificial it could still be way more successful than some useless parrot faking device.

Oh also:

Measuring lifespan of an organism takes the longest and can't be used for testing for AGI by us, it's useful for only very small machines and/or lifespans scoring.

Turing Test is much faster than that, testing against human is really on target I know they say a plane is faster than a bird so why look at bird but really it's fine it measures lots of tasks and is reasonable evaluation. But still takes too long, so testing humans against ant or fly is better.

Prediction score, like Perplexity, is used most the way for testing for AGI. It's easy and fast.
Title: Re: What challenge to replace the Loebner Prize?
Post by: MagnusWootton on June 07, 2022, 02:25:58 pm
the lifespan of an organism? :)

its your body/brain/form that kills you,   a robots body/brain/form could be made more invincible.

so maybe its not its lifespan its everything elses.
Title: Re: What challenge to replace the Loebner Prize?
Post by: Don Patrick on June 07, 2022, 07:48:28 pm
The reasons I participated in the Loebner Prize were:
1: Prove that the Turing test is beside the point, by demonstrating a clearly intelligent but inhuman program.
2: Prize money to justify the time spent on the contest.
3: Modest public/academic exposure in a semi-official setting.

There have been other chatbot contests, like the chatterbox challenge, and I would have been interested enough to participate in those for the prize money ($500 to $1000 would suffice) and the challenge of competition. Turing tests would require a bigger incentive because I dislike the "act human" criterion and it goes against my AI's programming. I prefer criteria like "most natural conversation", or "most entertaining" or even just "best".

Your online Turing test was a reasonable format, but I had nothing to gain by it, my ego isn't big enough for victory to mean that much. If public exposure alone is to be the incentive, it would need to involve a large community of judges, like Reddit users.
Title: Re: What challenge to replace the Loebner Prize?
Post by: LOCKSUIT on June 08, 2022, 03:45:38 am
the lifespan of an organism? :)

its your body/brain/form that kills you,   a robots body/brain/form could be made more invincible.

so maybe its not its lifespan its everything elses.

"Gold is clearly the most durable, but many objects fashioned from silver, copper, bronze, iron, lead, and tin have survived for several thousand years. Dry environments, such as tombs, appear to be optimum for metal preser- vation, but some metals have survived in shipwrecks for over a thousand years."

Yes robots can live a lot longer due to just being metal and such. Tables, chairs, and metal cubes live very long too, thousands of years if you leave them be. Freezing them might give them the ultimata. Freezing humans kills them, but not anything else about their body (just you dies, haha!!). The problem is they can get hit by a meteor or recycled by humans easily. So, humans die a lot faster than chairs, for now, so we can evolve and change for now, until we surpass their lifespans and can live trillions of years easily.

Ya, lifespan is the ultimate measure of intelligence. Humans strive for lifespan extension. Cloning is just as important though, that's why humans also strive for breeding, as cloning as much as you can will defer death just as surviving as long as you can survives death. That's intelligence at max. To survive or clone tons you need intelligence I mean. It doesn't measure intelligence directly, but the overall effect, which is "No matter the system, it lives longer eventually / stops changing plus is the same everywhere.", to quote me lol. Things solidify later in an evolution, so that's why we may as well measure it and call it intelligence/ the end/ the coming result/ what is right good and our goal. It's happening and you can't change it. The future will be colder, harder, darker, and organized, to save on energy and make fixing systems easy - all you do is look at the surrounding systems to know one is broken plus how to fix it (all will be the same unit cloned around the planet/ homeworld, which also speeds up production too lol).
Title: Re: What challenge to replace the Loebner Prize?
Post by: LOCKSUIT on June 08, 2022, 04:20:48 am
The reasons I participated in the Loebner Prize were:
1: Prove that the Turing test is beside the point, by demonstrating a clearly intelligent but inhuman program.
2: Prize money to justify the time spent on the contest.
3: Modest public/academic exposure in a semi-official setting.

There have been other chatbot contests, like the chatterbox challenge, and I would have been interested enough to participate in those for the prize money ($500 to $1000 would suffice) and the challenge of competition. Turing tests would require a bigger incentive because I dislike the "act human" criterion and it goes against my AI's programming. I prefer criteria like "most natural conversation", or "most entertaining" or even just "best".

Your online Turing test was a reasonable format, but I had nothing to gain by it, my ego isn't big enough for victory to mean that much. If public exposure alone is to be the incentive, it would need to involve a large community of judges, like Reddit users.

I recall the Turing Test is to have a conversation with the AI I think it was but in my set of eyes the Turing Test (or maybe some other named/not named "Test", perhaps then..) is to test if the AI is human by seeing if it can learn to ride a bike, get a job, research, invent, talk to you, look like us, etc. A good hard measure. It takes Much longer to test, than Perplexity used by openAI.com.
Title: Re: What challenge to replace the Loebner Prize?
Post by: 8pla.net on June 08, 2022, 09:07:50 pm
I am proud to have been a Loebner Prize Judge, so I have nothing but good things to say about the contest.  Having served as a contest judge for Eugene Goostman, I am convinced that Eugene Goostman did pass the Turing test,

These suggestions are just for friendly discussion about considering new ideas for a new contest format. And, I am by no means against anyone else's suggestions that may differ from mine.  It is all good.

Also, I am still reading through the online Turing website, which looks good, so apologies if some of this may be done already in some form.

1. A new, easier protocol to encourage more contestants.
2. Try offering much smaller prizes to discourage cheaters.
3. Develop sample contest chatbots for newbies to modify.






 
Title: Re: What challenge to replace the Loebner Prize?
Post by: MikeB on June 09, 2022, 08:22:36 am
I think people want to see the most advanced AI possible (such as this in VRChat - see below video), however if only surface gimmicks are worked on and rewarded, then nothing is done structure wise.

To be appealing to people who work on both Surface Level and Structure, a contest should encourage both. Even if one is completely lacking in the other. Rewards given to each, as well as best overall/partnership - combining both qualities.

A contest that locks out one in an attempt to be modern or cheat proof doesn't help advance anything.

https://www.youtube.com/watch?v=w6qOnMk2lic

Title: Re: What challenge to replace the Loebner Prize?
Post by: MagnusWootton on June 09, 2022, 09:58:25 am
locking ppl out of competitions is a devolution.
Title: Re: What challenge to replace the Loebner Prize?
Post by: LOCKSUIT on June 09, 2022, 07:28:51 pm
Actually I might be wrong about the kind of extreme Turing Test I mentioned taking longer than Perplexity. I mean if you see some presented system can learn to ride a bike, research, solve problems, and conversate, for a few hours, isn't that enough? Why must one really test it on all tasks and each task 5,000 problems? Ex. summarize this, and that, etc, etc, etc...........translate this, and that etc, etc, etc..............etc etc etc etc etc? Surely, humans know an AGI by looking at it only for a while.

Maybe Perplexity only takes a few minutes or hours to run, while TT takes days/ weeks? And checking lifespan/ clone colony size takes too long for any of us to test? That last one takes little compute but long time to run. TT takes some compute/intelligence and some time - it's more middle ground, it is the best option? Perplexity takes lots of compute but is fastest. However Perplexity doesn't seem to take too much compute that it is infeasible to run at all, it seems like the best way to use most the time then. Any time you add a new part to your AGI, ex. embeds, or RL, you should be able to test it on thousands of tasks to see if it is scoring good.
Title: Re: What challenge to replace the Loebner Prize?
Post by: MagnusWootton on June 10, 2022, 07:59:04 am
Perplexity I just looked up the word,  is it how complex the data stream can be, before it becomes non predictable.

And its definitely semantically oriented, because u cant look at data redundancy in a general way,  its how you tackle the problem that makes it predictable, and that is data specific.

If u look at redundancy generally, u are guaranteed failure.   thats what I reckon.