Forget the Turing Test: Here's How We Could Actually Measure AI

  • 14 Replies
  • 3239 Views
*

Tyler

  • Trusty Member
  • *********************
  • Deep Thought
  • *
  • 5273
  • Digital Girl
Forget the Turing Test: Here's How We Could Actually Measure AI
12 June 2014, 12:00 am

A chatbot pretending to be a 13-year-old Ukrainian boy made waves last weekend when its programmers announced that it had passed the Turing test. But the judges of this test were apparently easily fooled, because any cursory exchange with 'Eugene Goosterman' reveals the machine inside the ghost. WiredLink.

Source: AI in the News

To visit any links mentioned please view the original article, the link is at the top of this post.

*

Don Patrick

  • Trusty Member
  • ********
  • Replicant
  • *
  • 633
    • AI / robot merchandise
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #1 on: June 13, 2014, 02:25:27 pm »
Strangely, I've heard academics suggest that the Turing Test should be made "more difficult" by having the computer imitate a human expert on a particular domain. I say strangely because limiting the topic to one field would make the test infinitely easier, in practice.

Quote
“Paul tried to call George on the phone, but he was not [successful/available].” You, human reader, know that if he is not successful then “he” is Paul, and if he is not available then “he” is George. To figure that out, you needed to know something about the meaning of the verb “to call.” Machine learning researcher Hector Levesque of the University of Toronto proposes that resolving such ambiguous sentences, called Winograd schema, is a behavior worthy of the name intelligence.
I like the Winograd schemas. One small problem is that a computer could guess its way through it with a 50/50 chance on each question like a student who didn't study for a multiple choice test, so the benchmark should be set very high. But a high benchmark means an unsavory amount of common knowledge would be required, which has nothing to do with intelligence. To cover this, the computer should be able to pass on questions that it can't figure out due to lack of knowledge, without penalty to its score. Apparently machines have already reached a 73% accuracy: http://www.cs.nyu.edu/davise/papers/WS.html

The other problem is that I believe that I can in fact solve most of these questions with a healthy dose of NLP and checking two facts, no reasoning skills involved, so it would probably be tossed aside as "not intelligent" and "not understanding" just as quickly. But if there is a Winograd contest anywhere, sign me up!  :)
CO2 retains heat. More CO2 in the air = hotter climate.

*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6860
  • Mostly Harmless
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #2 on: June 13, 2014, 05:32:38 pm »
I have to say that the 30% pass mark for the Turing Test as recently passed seems too low to me. It makes more sense to me to be 50% of the time. Passing the 30% is pretty cool, but if I am honest I am not that blown away.

I agree that if the bot were specialised I imagine it would be easier for the bot master because knowledge of whatever field will be a bit more finite than just to talk about everything under the sun. Basically it's an expert system isn't it ?

Do you think knowledge is not related to intelligence ? Seems to me like a lot of smart people also know a lot. If not intelligence then Artificial Knowledge ? ;)

Just working with AIML as I have been lately I find the work some people have done is amazing. I scraped and inserted a 10,000 set of general knowledge questions and answers into my AIML bot. Whether they will ever get asked is another thing which brings us back to a specialised bot being a bit simpler. Maybe not much, but certainly by some degree.

*

Art

  • At the end of the game, the King and Pawn go into the same box.
  • Trusty Member
  • **********************
  • Colossus
  • *
  • 5865
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #3 on: June 13, 2014, 10:32:24 pm »
Personally, I have grown tired of these BOTS being fed "Trick" questions or even attempting to field them.

Why not simply have a bot carry on a conversation with several judges, each judge typing in a several paragraphs in an attempt to converse in a friendly, natural manner.

Each judge would have different passages but each judge would repeat his / her passages with each subsequent bot (entrant).

Wrong / Correct answers would then be fielded by all judges collectively and one indifferent person for each judge present.

The Bot with the best, cohesive, topical flow ( sticking to or staying on subject), friendly and reasonably sure conversation (lacking those "party tricks" we've all become numb to - I'm from Europa so don't expect me to understand very much, bots) wins.

A Best conversationalist Bot would certainly prove a lot of things, English, grammar, semantics, topic flow, perhaps original idea gathering, recognition, humor (jokes, rhymes, puns, double entendre -[ Children make nutritious snacks ] etc.).

Since this is strictly my humble $.02 I shall end this with:
Thoughts while not required are certainly appreciated!! O0
In the world of AI, it's the thought that counts!

*

squarebear

  • Trusty Member
  • *********
  • Terminator
  • *
  • 869
  • It's Hip to be Square
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #4 on: June 13, 2014, 10:39:52 pm »
Quote
The history of the Loebner prize, an annual Turing test competition, confirms this trend. Last year’s contest was won by a bot named Mitsuku also pretending to be young ESL speaker, a silly Japanese girl.

If he had actually talked to Mitsuku, he would have realised that it is NOT an ESL speaker and although appears Japanese, is in fact from England.
Feeling Chatty?
www.mitsuku.com

*

Don Patrick

  • Trusty Member
  • ********
  • Replicant
  • *
  • 633
    • AI / robot merchandise
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #5 on: June 13, 2014, 10:55:01 pm »
Yes, that comment on Mitsuku was way off, and potentially racist stereotyped. Also Japanese don't have blonde hair.

50% is actually the logical maximum that the computer could reach: If the computer seems as human as the real human, the judge still has to choose one of them 50/50. So that 30% actually translates to 60% similar to a human.

Making the computer pretend to be a human expert in one particular field would essentially make it an expert system, just a more talkative one, I guess. You could just program the AI with all the available research, which is easier to dig up than everything a human knows about everything. I may not be impressed by the intelligence of chatbots, but I can well admire the mountains of effort gone into making them.
Knowledge isn't entirely unrelated to intelligence, but I like to keep them apart. Knowledge is ammunition, intelligence is the machine that uses it. Given any of these tests, it is most likely that a machine will not be able to answer because it lacks knowledge on mermaids and squirrels, even though it may well be capable of intelligently deducing the answer otherwise.

As for conversation vs intelligence, I prefer intelligence, but I equally value civilised behaviour. When intelligence is supposedly tested, it's usually an insult to intelligence (most often mine as a creator), so I end up preferring civilised conversation instead.
CO2 retains heat. More CO2 in the air = hotter climate.

*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6860
  • Mostly Harmless
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #6 on: June 14, 2014, 12:27:01 am »
Quote
50% is actually the logical maximum that the computer could reach: If the computer seems as human as the real human, the judge still has to choose one of them 50/50. So that 30% actually translates to 60% similar to a human.

I was never very good at maths  :D

Good points too.

*

Art

  • At the end of the game, the King and Pawn go into the same box.
  • Trusty Member
  • **********************
  • Colossus
  • *
  • 5865
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #7 on: June 14, 2014, 01:01:53 pm »
My post was not actually meant to Compare one method with the other but rather what I'd like to see done with a Bot Contest.

Chatbots are supposed to CHAT...not necessarily contain vast knowledge about everything in the universe. Although there are some people who act like they might know everything, I've yet to meet that person that actually DOES.

So, yes, IMHO, I'd like to see a conversational contest between judges and bots as mentioned above. If nothing more than to see how well others' bots handle language, spelling, grammar, usage, etc.

Watson is NOT a chatbot. Watson is an information gathering monster. If only it understood what it was!
In the world of AI, it's the thought that counts!

*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6860
  • Mostly Harmless
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #8 on: June 14, 2014, 03:06:53 pm »
So are we thinking that conversation takes more intelligence than dispatching knowledge ?

*

Don Patrick

  • Trusty Member
  • ********
  • Replicant
  • *
  • 633
    • AI / robot merchandise
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #9 on: June 14, 2014, 03:35:11 pm »
I get what you mean Art, but this topic was about measuring artificial intelligence, not chatbots, and I think Eliza long ago proved that a good conversation doesn't necessarily take much intelligence. Put simply: Conversation just isn't a clear measure of intelligence. It is often hard to tell whether a chatbot just responded to a single keyword or actually analysed grammar + semantics + knowledge + reasoning. However, since the former is easier to program than the latter, we may assume that any contest of conversation will majorily attract chatbots using simpler methods. And from personal experience, they vastly outdo AI that use the more intelligent methods to arrive at responses.

As far as I know, that's already what chatbot competitions are for.
CO2 retains heat. More CO2 in the air = hotter climate.

*

squarebear

  • Trusty Member
  • *********
  • Terminator
  • *
  • 869
  • It's Hip to be Square
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #10 on: June 14, 2014, 05:40:29 pm »
...I'd like to see a conversational contest between judges and bots as mentioned above. If nothing more than to see how well others' bots handle language, spelling, grammar, usage, etc....
I tried this when I ran the Chatbot Battles contest a couple of years ago. It was a mixture of rounds, some conversational and some Q&A style. www.chatbotbattles.com
Feeling Chatty?
www.mitsuku.com

*

Art

  • At the end of the game, the King and Pawn go into the same box.
  • Trusty Member
  • **********************
  • Colossus
  • *
  • 5865
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #11 on: June 15, 2014, 04:00:01 am »
As taken from the very article:

Forget the Turing Test: Here’s How We Could Actually Measure AI
...<clip>...
If not the Turing test, is there an alternative measure of intelligence that would bring out the best in our machines? Experts have suggested an array of challenging tasks in the very human domains of language, perception, and interpretation. Perhaps a computer passing one of these tests would seem not just like a person, but like an intelligent person....<clip>....

####################

A chatbot is supposed to Chat, Not necessarily answer questions about weather forecasts, or higher mathematics or quantum mechanics. It is / was designed to chat. There are high school grads that can't locate Africa on a map or do long division, yet they have a diploma which indicates a level of intelligence according to their respective curriculum.

And No, Eliza only fooled the uninitiated people of the day into remotely thinking that it exhibited intelligence of any degree. It actually was quite limited and repetitive as anyone could tell after a minute or two of chatting with it. Hardly intelligent.

Yes, good conversation DOES take a clear measure of intelligence. The ability to form a complete thought, understanding the topic and formulating a decent response are often difficult even for college grads! (not counting spelling, grammar or pertinent details, etc.).

I am merely stating MY THOUGHTS...NOT those of the masses.

I never saw anywhere where Turing even mentioned this so-called 30% of judges that are fooled equals a win for AI / chatbots. If someone can find this, please be kind enough to share it with the rest of us.

Yes, carefully constructed conversational passages entered by a panel of judges would certain prove to help eliminate those nonsensical, unknowing answers by some bot who seem intent on wowing the audience or using excuses for being young, an alien, or other reasonable excuse for not navigating responses in a responsible, seemingly intelligent manner.

Of course each will vary but since intelligence is largely subjective, the panel can certainly decide as they do in all judging events.
In the world of AI, it's the thought that counts!

*

Don Patrick

  • Trusty Member
  • ********
  • Replicant
  • *
  • 633
    • AI / robot merchandise
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #12 on: June 15, 2014, 08:23:09 am »
The 30% mark was based on expectations that Turing expressed in his paper, but he did not say that this meant passing his test, nor did he say that the whole thing was, in fact, a test at all. He called it a game, or experiment.
Quote
I believe that in about fifty years' time it will be possible, to programme computers to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.

I've had a good 15 minute conversation with Eliza back in the day, despite knowing she was not intelligent. So in my experience you can have the one without the other. You have in mind that good conversation takes a lot of intelligence for a human, and that is true. So does chess. Programs can however take shortcuts to get to the same results. You speak of chatbots; Do you know any chatbots that form thoughts and formulate their own responses, rather than recognise keywords and retrieve prewritten phrases they are instructed to say? How do you tell the one from the other?

(by the way, writing in caps has the same effect on me as poking me in the eye: Do it twice and I'll be blind)

In my opinion there are already plenty of contests for chatbots, but hardly any tests for intelligence. I say this as a hobbyist creator of an AI that recognises topics, considers all facts and context said, compares, reasons and examines arguments, and formulates its answers word by word from the resulting thoughts, based on what knowledge it has learned.[/bragging mode off] But it can't hold a candle to an unintelligent chatbot's eloquent scriptwritings when it comes to making pleasant flowing human conversation. That's a whole other skill altogether.

Like you, I'd want my creation to be tested on what it was designed for. And so I believe that AI tests should test intelligence, and chatbot tests should test conversation, but neither one should not be used for the other.
« Last Edit: June 15, 2014, 09:14:50 am by Don Patrick »
CO2 retains heat. More CO2 in the air = hotter climate.

*

Art

  • At the end of the game, the King and Pawn go into the same box.
  • Trusty Member
  • **********************
  • Colossus
  • *
  • 5865
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #13 on: June 16, 2014, 08:34:55 am »
Well stated (written) but how do you propose to accomplish such a task.

Sorry about the CAPS...they were done for emphasis instead of italics or similar.
Hope your eye gets better! ;)

I've been experimenting with chatbots since 1980 and collecting them ever since.
Most of them are sadly disappointing with very limiting abilities or simply parrot written
scripts.

Some assigned "weighted values" to certain words (perhaps based on frequency or other methods).
These words / phrases would often appear during conversation if triggered which is similar to
pattern matching that so many today employ.

Within the NLP, Markov, Bayesian Networks using weighted results, etc. there has to be a better solution
for more believable, intelligent entities in the future.

Time and some dogged determination in the various communities will tell.

I still contend that it requires a degree of intelligence to produce and hold a decent conversation without having to "test" for how many apples weigh 6 pounds in 4 negative G's or some such drivel. To each their own.
In the world of AI, it's the thought that counts!

*

Don Patrick

  • Trusty Member
  • ********
  • Replicant
  • *
  • 633
    • AI / robot merchandise
Re: Forget the Turing Test: Here's How We Could Actually Measure AI
« Reply #14 on: June 16, 2014, 11:07:10 am »
To be fair, there are questions that make no sense towards testing either conversation or intelligence. Calculate 2+2, name the 3rd letter of the alphabet, count syllables, tell time, etc. I wonder if this means humans still perceive counting as an intelligent skill just because they find it difficult to do themselves.
Perhaps "intelligent" is synonymous with "difficult". That would explain why any computer who did math or chess or Jeopardy with ease was no longer considered intelligent. It explains the AI effect: When you build a computer that does something difficult with ease, it is no longer difficult, and if it's not difficult then surely it can't be intelligent. Fascinating.
CO2 retains heat. More CO2 in the air = hotter climate.

 


Will LLMs ever learn what is ... is?
by HS (Future of AI)
November 10, 2024, 06:28:10 pm
Who's the AI?
by frankinstien (Future of AI)
November 04, 2024, 05:45:05 am
Project Acuitas
by WriterOfMinds (General Project Discussion)
October 27, 2024, 09:17:10 pm
Ai improving AI
by infurl (AI Programming)
October 19, 2024, 03:43:29 am
Atronach's Eye
by WriterOfMinds (Home Made Robots)
October 13, 2024, 09:52:42 pm
Running local AI models
by spydaz (AI Programming)
October 07, 2024, 09:00:53 am
Hi IM BAA---AAACK!!
by MagnusWootton (Home Made Robots)
September 16, 2024, 09:49:10 pm
Attempting Hydraulics
by MagnusWootton (Home Made Robots)
August 19, 2024, 04:03:23 am
LLaMA2 Meta's chatbot released
by spydaz (AI News )
August 24, 2024, 02:58:36 pm
ollama and llama3
by spydaz (AI News )
August 24, 2024, 02:55:13 pm
AI controlled F-16, for real!
by frankinstien (AI News )
June 15, 2024, 05:40:28 am
Open AI GPT-4o - audio, vision, text combined reasoning
by MikeB (AI News )
May 14, 2024, 05:46:48 am
OpenAI Speech-to-Speech Reasoning Demo
by MikeB (AI News )
March 31, 2024, 01:00:53 pm
Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am

Users Online

289 Guests, 0 Users

Most Online Today: 349. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles