Understanding the Turing Test
Cover image generated by Bing Image Creator

Understanding the Turing Test

The Turing Test: An Overview

The Turing test has acted as the ultimate assessor of artificial intelligence (AI) systems’ abilities to display human intelligence as defined —or as our discussion will show, undefined— commonly. To investigate why it became a wide-spread standard for evaluating AI and understand how an AI system passes it, I studied the origins of the test: Computer Machinery and Intelligence, written by Alan Turing in 1950.

Turing starts by introducing the question “Can machines think?”. However, to him, this question cannot be answered without being able to accurately define the words machines and think. He proposes to replace this question with a more verifiable one. He defines the Imitation Game, where person A’s goal is to act like person B, whereby trying to deceive person C, the interrogator, and B’s goal is to make C correctly distinguish between the two. A and B are known to C as X and Y in an unknown order. C asks questions to X and Y, and tries to determine which one is which. The new question then becomes “What will happen when a machine takes the part of A in this game?”. The test, known to us as the Turing test, is further developed so that person A is replaced by a machine, and B and C remain human. The interrogator’s new goal is to find out which one of X and Y is the machine. A machine that is difficult to distinguish from the human player is deemed “intelligent”. Turing, seemingly out of gut feeling, makes the claim that there will be machines able to fool the interrogator at least 30% of the time after 5 minutes of interrogation.

In the original paper, Turing goes on to explain that this test is only feasibly applicable to digital computers rather than other types of machines e.g. a mechanical contraption. He discusses why this is so by broadly explaining the theory behind digital computers. Put shortly, digital computers are a class of discrete state machines, and in theory, one can develop a computer that can “mimic any discrete state machine” for any given task, especially if we consider its memory sufficiently large, awarding digital computers with the universality trait. Furthermore, he attempts to refute some of the probable oppositions from various perspectives after revealing his own opinion about the thought experiment. Turing believed that “intelligent” machines would be developed by the end of the century, and it was only a matter of a substantial increase in memory capacity. The rise of such technology would render the original “can machines think?” question meaningless.

This thought experiment was so influential that researchers started to actually carry it out with AI systems developed solely for one purpose: to think. Even other variations of the test were designed where roles are reversed, question forms differ, perception is also considered, etc. One can claim that the Turing test helped us discuss, define, and refine our understanding of intelligence, thinking, and being human.

The Test Is Not Impossible

The Turing test does not quantitatively measure intelligence. Rather, it is an attempt to infer the existence of intelligence through the examination of behaviour. The interrogator observes its behaviour, makes a comparison between two players, and tries to decide which one “feel” more human. This procedure is a workaround to the fact that the inner workings of intelligence is unknown to us. We do not know how cognitive processes occur well enough. Once agreeing to the assumption that humans possess intelligence, we can only study its outputs. So, to decide whether something else is intelligence or not, we compare its behaviour to that of an average human to draw conclusions. By doing so, we hope to understand intelligence better by reverse engineering ourselves and building similar systems.

This nature of the test is also addressed by Turing. At the end of the paper, he discusses what he calls learning machines. The concept was not a complete novelty back then, but it seemed more and more feasible to implement. He proposes a system that can simulate the behaviour of a child’s brain. Because a child’s brain becomes an adult’s brain as learning occurs, he expects the system to imitate an adult human as well. The system needn’t copy a human brain, the only requirement was that it could produce output as if it is human. He also goes on to add that the programmer does not need to know all of the inner workings of such a system. Turing’s proposal can be considered as the foundation of modern machine learning, which is now an integral part of technology now.

Over the years, there have been many instances of AI systems that have passed —or claimed to have passed— some version of the Turing test. ELIZA (1966), PARRY (1972), Eugene Goostman (2008) are some notable AI systems that engaged in human-like conversations. Yet, these tests are criticised for not evaluating intelligence accurately. Some machines refused to answer questions but did it like humans, while others only excelled at specific contexts. Even the rule that the judges have only 5 minutes for a conversation has been criticised. However, the main reason is our inability to determine what human intelligence is. There are many different aspects of intelligence, and we do not even know how exactly a single one operates.

In addition, there are more recent AI systems that appear to be even more human-like. Google’s LaMDA convinced some researchers that it had a consciousness of its own. OpenAI’s GPT models are extremely good at conversations. Outside the conversation scope, there is Midjourney that creates artwork so successfully that it is nearly impossible to determine whether the artist is a human or not. Another contributor to this situation is that we tend to anthropomorphise non-human entities easily if their behaviour is explained or perceived in human context. The state-of-the art AI technology shows that the Turing test might have become obsolete as the judgement resembles a wild guess more and more. Nonetheless, the question that “Can machines think?” is still valid and an active debate subject, proving Turing’s prediction about the original question false. The cause seems to be that the words “machine” and “think” are still ambiguous.

Discussion

The fact that the Turing test occurs in conversational manner leads us to think about the complexity of language. There are two main aspects of language: syntax and semantics. Unfortunately for the researchers, the two are not independent from each other, and implementing them in separate systems is not possible in order to achieve a machine that can use language. The task, then, relies on analysing and understanding a language to define it systematically, and to communicate the system to AI. This is endeavour has taken around a century’s research —possibly more— to accomplish, but there is still room for more to refine the details and generalise everything to languages other than English.

Context is also integral to how a human perceives the world and engages in conversations. It consists of many elements such as the location, time, social variables, etc. Our brains are hard-wired to understand context and act upon the understanding. It mostly happens without our awareness, and to study something that “just happens” has always been difficult. Consequently, it is also difficult to teach contextual information to AI. Most AI systems struggled to behave according to the context in which the conversations occurred, which was what gave their identities away to judges.

Attention mechanism has also been another bottleneck for AI systems’ success in conversations as they tended to forget the previous conversation content or disregard details, unlike an average human. This issue caused AI systems to perform badly in the Turing tests since their output seemed to stray away from the current conversation as the conversation length increased. The main issue with attention mechanism was that it was difficult to find a digital or mathematical representation. The recent advancements in Large Language Models might have solved this problem, but the duration it took to get to this level is a sign of how challenging understanding attention mechanism was.

While engaging in a conversation, so many cognitive processes occur simultaneously that add up to making intelligence seem like human. To be able to make machines behave like human, we first need to understand these processes so that their analogous digital implementations can be developed. We seem to have developed some understanding of the cognitive processes, but some may not be enough to decide whether machines can think.


Sources

Understanding the Turing Test
Older post

How to Create a Personalized Chatbot with OpenAI API?

It has been difficult for artificial intelligence systems to pass the Turing test. To understand why, we investigate what the Turing test actually is, make some judgements about the AI systems that have passed the test (or some version of it), and discuss reasons pertaining to the nature of the test and what we call intelligence.