When asked whether it was possible to build a thinking, intelligent machine, computer science pioneer Alan Turing gave a rather pragmatic answer: if the machine is able to fool a human into thinking it is a human, then it really doesn’t matter if she is really intelligent. Since then, the criterion – machines can pretend to be human in a dialogue between people – has been referred to as the “Turing test”.
Alan Turing himself describes the “imitation game” in his article as follows: A human questioner should only converse with a computer and a human using the keyboard and screen. If he cannot recognize the machine in a significant proportion of the conversations, the machine has passed the test. However, the proposal is quite vague – neither the length of the dialogue is defined nor any requirements for the testing person or even the questions that the tester is allowed to ask. In addition, the test is based on a single – human – skill: processing language correctly. Other functions such as image processing or motion control, which also require higher cognitive abilities, are not taken into account.
Bad for the AI?
In a much-discussed article in 1995, computer scientists Patrick Hayes and Kenneth Ford even described it as harmful for AI research to follow Turing’s proposal. “It’s one of the basics that students learn from the start that you can’t design an experiment so that the result is nothing,” Hayes and Ford write in their paper. “But that’s what the Turing test does. You can never say for sure whether there really is no difference between man and machine, or whether the tester wasn’t clever enough.” , who passed the Turing test, were offered prize money of $100,000, hardly any academic groups took part. The competition was last held in 2019 – it is unclear whether it will be continued in the future.
However, in a Turing test conducted by the British Royal Society in 2014, a chatbot actually managed to convince the testers of its humanity. However, the program worked with various tricks – among other things, it claimed to be a teenager who can only speak very imperfect English.
In a workshop, the Association for the Advancement of Artificial Intelligence promptly devoted itself to the further development of the Turing test, which was to be expanded to include a whole range of different tasks in the future. For example, Gary Marcus from New York University suggested that the AI, connected to a robot, should build Ikea furniture. This requires not only the ability to grasp and understand images and text, but also physical dexterity, tenacity – and tolerance for frustration.
Ambiguities remain difficult
Due to practical difficulties, however, this proposal has not yet been implemented. In 2014, however, the computer scientist Hector Levesque proposed a method that even powerful language models failed: the Winograd Challenge. The test consists of two sentences that describe simple facts but have ambiguous references. In the question that follows the sentences, the machine has to show that it can take into account the relationships between subjects and objects – for example, it understands that a hat fits in a suitcase, but a suitcase does not fit in a hat. However, this barrier has now also fallen: there are language models that also crack this test.
An international working group from Google therefore proposes creating a kind of benchmark from 204 tasks that an AI has to work through. Other researchers even want AI and humans to evaluate each other or that the AI has to solve a task that can be performed by animals. However, even the extended tests cannot solve a fundamental problem: Even if the AI performs the tasks perfectly, you cannot tell whether it is behaving intelligently or just imitating intelligent behavior perfectly. But Turing was probably right in this case: it really doesn’t make any difference.
#Lambda #Turing #test #relevant