AI cheats in Turing test! What does it mean? Are robots becoming more human?

# Tech Desk
Representational image. | Photo: Canva
Representational image. | Photo: Canva

Artificial intelligence has taken a significant leap forward as GPT-4.5, developed by OpenAI, has successfully passed a variation of the Turing Test by outperforming real humans in being perceived as human. According to a preprint study conducted by researchers at the University of California, San Diego, the AI was able to mimic human behaviour so effectively that it was misidentified as the real person 73% of the time.

The Turing Test, developed in 1950 by British codebreaker and computer scientist Alan Turing, evaluates whether a machine can exhibit intelligent behaviour equivalent to or indistinguishable from that of a human. In this updated three-party version of the test, participants interacted in simultaneous five-minute online chats with both a human and an AI, and were then asked to identify the machine.

What is the Turing Test?

As said before the Turing Test is a concept in artificial intelligence proposed by British mathematician and computer scientist Alan Turing in 1950. It’s designed to assess whether a machine can exhibit human-like intelligence.

In the classic setup:

  • A human judge has conversations (usually text-based) with both a human and a machine, without knowing which is which.
  • If the judge cannot reliably tell which is the machine, then the machine is said to have passed the Turing Test.

It doesn’t test intelligence in the traditional IQ sense, but rather the machine's ability to mimic human conversational behaviour convincingly.

GPT-4.5 and LLaMa deceive the human eye

The study tested four language models—GPT-4o, GPT-4.5, Meta’s LLaMa-3.1, and the historic ELIZA chatbot. Two separate participant groups—126 UC San Diego undergraduates and 158 individuals from the online platform Prolific—were involved.

Participants were unaware of which interlocutor was human. When the AIs were prompted to adopt humanlike personas, GPT-4.5 was mistaken for a human 73% of the time, significantly surpassing the success rate of the actual human participants. Meta's LLaMa-3.1 also performed above chance, with a success rate of 56%.

Conversely, without the persona prompt, GPT-4.5’s performance dropped to 36%. GPT-4o and ELIZA were far less effective, achieving just 21% and 23% respectively.

Lead researcher Cameron Jones commented, “People were no better than chance at distinguishing humans from GPT-4.5 and LLaMa,” calling the findings a potential milestone in AI-human mimicry.

Experts call it emotional mimicry, not intelligence

The implications of the findings go beyond academic circles. John Nosta, founder of Nosta Lab, described the outcome as “a triumph of artificial empathy” rather than a genuine test of machine intelligence. According to Nosta, participants based their judgments on emotional resonance, tone, and familiarity, rather than logical indicators of human behaviour.

“This wasn’t a Turing Test,” he said. “It was a social chemistry test—Match.GPT.”

Broader implications and policy concerns

Carsten Jung of the Institute for Public Policy Research warned that AI is now convincingly crossing the "uncanny valley," making it difficult for people to tell machines apart from humans. He emphasised the need for stronger policies to regulate AI’s growing societal role, particularly as such technology increasingly features in companionship, therapy, and online interactions.

Researcher Cameron Jones echoed this, noting the potential for misuse: “These results provide more evidence that LLMs could substitute for people in short interactions without anyone being able to tell. This could lead to job automation, sophisticated social engineering, and societal disruption.”

A turning point for AI regulation?

The study marks the first known instance where AI has passed a standardised, three-party Turing Test, and it highlights the urgent need for global frameworks to govern how such powerful tools are used. As AI models become more adept at emotional and linguistic mimicry, questions around ethics, consent, and transparency are becoming ever more pressing.

With AI’s conversational fluency now on par with—or better than—humans, the challenge facing society is no longer whether machines can think, but whether we can still tell the difference.