Forget the Turing Test: Here's How We Could Actually Measure AI

Tyler · « **on:** June 13, 2014, 12:00:08 pm »

Forget the Turing Test: Here's How We Could Actually Measure AI
12 June 2014, 12:00 am

A chatbot pretending to be a 13-year-old Ukrainian boy made waves last weekend when its programmers announced that it had passed the Turing test. But the judges of this test were apparently easily fooled, because any cursory exchange with 'Eugene Goosterman' reveals the machine inside the ghost. WiredLink.

Source: AI in the News

To visit any links mentioned please view the original article, the link is at the top of this post.

Don Patrick · « **Reply #1 on:** June 13, 2014, 02:25:27 pm »

Strangely, I've heard academics suggest that the Turing Test should be made "more difficult" by having the computer imitate a human expert on a particular domain. I say strangely because limiting the topic to one field would make the test infinitely easier, in practice.

Quote

â€œPaul tried to call George on the phone, but he was not [successful/available].â€ You, human reader, know that if he is not successful then â€œheâ€ is Paul, and if he is not available then â€œheâ€ is George. To figure that out, you needed to know something about the meaning of the verb â€œto call.â€ Machine learning researcher Hector Levesque of the University of Toronto proposes that resolving such ambiguous sentences, called Winograd schema, is a behavior worthy of the name intelligence.

I like the Winograd schemas. One small problem is that a computer could guess its way through it with a 50/50 chance on each question like a student who didn't study for a multiple choice test, so the benchmark should be set very high. But a high benchmark means an unsavory amount of common knowledge would be required, which has nothing to do with intelligence. To cover this, the computer should be able to pass on questions that it can't figure out due to lack of knowledge, without penalty to its score. Apparently machines have already reached a 73% accuracy: http://www.cs.nyu.edu/davise/papers/WS.html

The other problem is that I believe that I can in fact solve most of these questions with a healthy dose of NLP and checking two facts, no reasoning skills involved, so it would probably be tossed aside as "not intelligent" and "not understanding" just as quickly. But if there is a Winograd contest anywhere, sign me up!

Freddy · « **Reply #2 on:** June 13, 2014, 05:32:38 pm »

I have to say that the 30% pass mark for the Turing Test as recently passed seems too low to me. It makes more sense to me to be 50% of the time. Passing the 30% is pretty cool, but if I am honest I am not that blown away.

I agree that if the bot were specialised I imagine it would be easier for the bot master because knowledge of whatever field will be a bit more finite than just to talk about everything under the sun. Basically it's an expert system isn't it ?

Do you think knowledge is not related to intelligence ? Seems to me like a lot of smart people also know a lot. If not intelligence then Artificial Knowledge ?

Just working with AIML as I have been lately I find the work some people have done is amazing. I scraped and inserted a 10,000 set of general knowledge questions and answers into my AIML bot. Whether they will ever get asked is another thing which brings us back to a specialised bot being a bit simpler. Maybe not much, but certainly by some degree.

Art · « **Reply #3 on:** June 13, 2014, 10:32:24 pm »

Personally, I have grown tired of these BOTS being fed "Trick" questions or even attempting to field them.

Why not simply have a bot carry on a conversation with several judges, each judge typing in a several paragraphs in an attempt to converse in a friendly, natural manner.

Each judge would have different passages but each judge would repeat his / her passages with each subsequent bot (entrant).

Wrong / Correct answers would then be fielded by all judges collectively and one indifferent person for each judge present.

The Bot with the best, cohesive, topical flow ( sticking to or staying on subject), friendly and reasonably sure conversation (lacking those "party tricks" we've all become numb to - I'm from Europa so don't expect me to understand very much, bots) wins.

A Best conversationalist Bot would certainly prove a lot of things, English, grammar, semantics, topic flow, perhaps original idea gathering, recognition, humor (jokes, rhymes, puns, double entendre -[ Children make nutritious snacks ] etc.).

Since this is strictly my humble $.02 I shall end this with:
Thoughts while not required are certainly appreciated!!

squarebear · « **Reply #4 on:** June 13, 2014, 10:39:52 pm »

Quote

The history of the Loebner prize, an annual Turing test competition, confirms this trend. Last yearâ€™s contest was won by a bot named Mitsuku also pretending to be young ESL speaker, a silly Japanese girl.

If he had actually talked to Mitsuku, he would have realised that it is NOT an ESL speaker and although appears Japanese, is in fact from England.

Don Patrick · « **Reply #5 on:** June 13, 2014, 10:55:01 pm »

Yes, that comment on Mitsuku was way off, and potentially racist stereotyped. Also Japanese don't have blonde hair.

50% is actually the logical maximum that the computer could reach: If the computer seems as human as the real human, the judge still has to choose one of them 50/50. So that 30% actually translates to 60% similar to a human.

Making the computer pretend to be a human expert in one particular field would essentially make it an expert system, just a more talkative one, I guess. You could just program the AI with all the available research, which is easier to dig up than everything a human knows about everything. I may not be impressed by the intelligence of chatbots, but I can well admire the mountains of effort gone into making them.
Knowledge isn't entirely unrelated to intelligence, but I like to keep them apart. Knowledge is ammunition, intelligence is the machine that uses it. Given any of these tests, it is most likely that a machine will not be able to answer because it lacks knowledge on mermaids and squirrels, even though it may well be capable of intelligently deducing the answer otherwise.

As for conversation vs intelligence, I prefer intelligence, but I equally value civilised behaviour. When intelligence is supposedly tested, it's usually an insult to intelligence (most often mine as a creator), so I end up preferring civilised conversation instead.

Freddy · « **Reply #6 on:** June 14, 2014, 12:27:01 am »

Quote

50% is actually the logical maximum that the computer could reach: If the computer seems as human as the real human, the judge still has to choose one of them 50/50. So that 30% actually translates to 60% similar to a human.

I was never very good at maths

Good points too.

Art · « **Reply #7 on:** June 14, 2014, 01:01:53 pm »

My post was not actually meant to Compare one method with the other but rather what I'd like to see done with a Bot Contest.

Chatbots are supposed to CHAT...not necessarily contain vast knowledge about everything in the universe. Although there are some people who act like they might know everything, I've yet to meet that person that actually DOES.

So, yes, IMHO, I'd like to see a conversational contest between judges and bots as mentioned above. If nothing more than to see how well others' bots handle language, spelling, grammar, usage, etc.

Watson is NOT a chatbot. Watson is an information gathering monster. If only it understood what it was!

Freddy · « **Reply #8 on:** June 14, 2014, 03:06:53 pm »

So are we thinking that conversation takes more intelligence than dispatching knowledge ?

Don Patrick · « **Reply #9 on:** June 14, 2014, 03:35:11 pm »

I get what you mean Art, but this topic was about measuring artificial intelligence, not chatbots, and I think Eliza long ago proved that a good conversation doesn't necessarily take much intelligence. Put simply: Conversation just isn't a clear measure of intelligence. It is often hard to tell whether a chatbot just responded to a single keyword or actually analysed grammar + semantics + knowledge + reasoning. However, since the former is easier to program than the latter, we may assume that any contest of conversation will majorily attract chatbots using simpler methods. And from personal experience, they vastly outdo AI that use the more intelligent methods to arrive at responses.

As far as I know, that's already what chatbot competitions are for.

squarebear · « **Reply #10 on:** June 14, 2014, 05:40:29 pm »

Quote from: Art on June 14, 2014, 01:01:53 pm

...I'd like to see a conversational contest between judges and bots as mentioned above. If nothing more than to see how well others' bots handle language, spelling, grammar, usage, etc....

I tried this when I ran the Chatbot Battles contest a couple of years ago. It was a mixture of rounds, some conversational and some Q&A style. www.chatbotbattles.com

Art · « **Reply #11 on:** June 15, 2014, 04:00:01 am »

As taken from the very article:

Forget the Turing Test: Hereâ€™s How We Could Actually Measure AI
...<clip>...
If not the Turing test, is there an alternative measure of intelligence that would bring out the best in our machines? Experts have suggested an array of challenging tasks in the very human domains of language, perception, and interpretation. Perhaps a computer passing one of these tests would seem not just like a person, but like an intelligent person....<clip>....

####################

A chatbot is supposed to Chat, Not necessarily answer questions about weather forecasts, or higher mathematics or quantum mechanics. It is / was designed to chat. There are high school grads that can't locate Africa on a map or do long division, yet they have a diploma which indicates a level of intelligence according to their respective curriculum.

And No, Eliza only fooled the uninitiated people of the day into remotely thinking that it exhibited intelligence of any degree. It actually was quite limited and repetitive as anyone could tell after a minute or two of chatting with it. Hardly intelligent.

Yes, good conversation DOES take a clear measure of intelligence. The ability to form a complete thought, understanding the topic and formulating a decent response are often difficult even for college grads! (not counting spelling, grammar or pertinent details, etc.).

I am merely stating MY THOUGHTS...NOT those of the masses.

I never saw anywhere where Turing even mentioned this so-called 30% of judges that are fooled equals a win for AI / chatbots. If someone can find this, please be kind enough to share it with the rest of us.

Yes, carefully constructed conversational passages entered by a panel of judges would certain prove to help eliminate those nonsensical, unknowing answers by some bot who seem intent on wowing the audience or using excuses for being young, an alien, or other reasonable excuse for not navigating responses in a responsible, seemingly intelligent manner.

Of course each will vary but since intelligence is largely subjective, the panel can certainly decide as they do in all judging events.

Don Patrick · « **Reply #12 on:** June 15, 2014, 08:23:09 am »

The 30% mark was based on expectations that Turing expressed in his paper, but he did not say that this meant passing his test, nor did he say that the whole thing was, in fact, a test at all. He called it a game, or experiment.

Quote

I believe that in about fifty years' time it will be possible, to programme computers to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.

I've had a good 15 minute conversation with Eliza back in the day, despite knowing she was not intelligent. So in my experience you can have the one without the other. You have in mind that good conversation takes a lot of intelligence for a human, and that is true. So does chess. Programs can however take shortcuts to get to the same results. You speak of chatbots; Do you know any chatbots that form thoughts and formulate their own responses, rather than recognise keywords and retrieve prewritten phrases they are instructed to say? How do you tell the one from the other?

(by the way, writing in caps has the same effect on me as poking me in the eye: Do it twice and I'll be blind)

In my opinion there are already plenty of contests for chatbots, but hardly any tests for intelligence. I say this as a hobbyist creator of an AI that recognises topics, considers all facts and context said, compares, reasons and examines arguments, and formulates its answers word by word from the resulting thoughts, based on what knowledge it has learned.[/bragging mode off] But it can't hold a candle to an unintelligent chatbot's eloquent scriptwritings when it comes to making pleasant flowing human conversation. That's a whole other skill altogether.

Like you, I'd want my creation to be tested on what it was designed for. And so I believe that AI tests should test intelligence, and chatbot tests should test conversation, but neither one should not be used for the other.

Art · « **Reply #13 on:** June 16, 2014, 08:34:55 am »

Well stated (written) but how do you propose to accomplish such a task.

Sorry about the CAPS...they were done for emphasis instead of italics or similar.
Hope your eye gets better!

I've been experimenting with chatbots since 1980 and collecting them ever since.
Most of them are sadly disappointing with very limiting abilities or simply parrot written
scripts.

Some assigned "weighted values" to certain words (perhaps based on frequency or other methods).
These words / phrases would often appear during conversation if triggered which is similar to
pattern matching that so many today employ.

Within the NLP, Markov, Bayesian Networks using weighted results, etc. there has to be a better solution
for more believable, intelligent entities in the future.

Time and some dogged determination in the various communities will tell.

I still contend that it requires a degree of intelligence to produce and hold a decent conversation without having to "test" for how many apples weigh 6 pounds in 4 negative G's or some such drivel. To each their own.

Don Patrick · « **Reply #14 on:** June 16, 2014, 11:07:10 am »

To be fair, there are questions that make no sense towards testing either conversation or intelligence. Calculate 2+2, name the 3rd letter of the alphabet, count syllables, tell time, etc. I wonder if this means humans still perceive counting as an intelligent skill just because they find it difficult to do themselves.
Perhaps "intelligent" is synonymous with "difficult". That would explain why any computer who did math or chess or Jeopardy with ease was no longer considered intelligent. It explains the AI effect: When you build a computer that does something difficult with ease, it is no longer difficult, and if it's not difficult then surely it can't be intelligent. Fascinating.

Forget the Turing Test: Here's How We Could Actually Measure AI

Tyler

Forget the Turing Test: Here's How We Could Actually Measure AI

Don Patrick

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Freddy

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Art

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

squarebear

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Don Patrick

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Freddy

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Art

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Freddy

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Don Patrick

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

squarebear

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Art

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Don Patrick

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Art

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Don Patrick

Re: Forget the Turing Test: Here's How We Could Actually Measure AI

Recent Topics

Recent News

Users Online

Articles