Rohit Prasad, head scientist for Alexa, recently published an article questioning whether the Turing Test is still a relevant benchmark for evaluating our progress in artificial intelligence. As most of us know by now, the Turing Test challenges a human judge to distinguish between a human and an AI by conversing with both via text. In his design of the test, Turing foresaw that intelligent behavior can be programmed: in his day chess was often held up as an example of a pursuit clearly requiring intelligence. But as computational power increased, simple tree search proved sufficient to play at a grandmaster level.
What humans viewed as a pinnacle achievement proved quite tractable to simple computation. Rohit proposes among other tests, rating a conversational bot for ‘naturalness and coherence’ and whether we’d like to converse with it again, even knowing it’s clearly artificial. Yet, GPT-3 suggests coherent, natural language generation is already within our grasp. A pre-scripted chatbot pushing the user towards amusing anecdotes may create an enjoyable experience, without itself having any innate intelligence.
In many ways, what Turing saw was our simultaneous limitations and ingenuity. If pushed to define intelligence and design a test that clearly distinguishes it, we have not historically done well finding the crux of our conscious existence. On the other hand, though, we have proven ourselves immensely capable of producing software agents to solve the problems we have set out. A clearly defined test is a requirements document, and humans quickly set out building a solution to exactly that test. So Turing avoided directly defining the facets of intelligence and said, “Look, just prod at that thing. Ask it for sonnets, ask it for opinions, for common sense facts, to prove a simple theorem.” For any given task, we can build a façade of competence. But if we poke at something faking intelligence long enough we’ll find the seams. If we can’t find those seams, who are we to call it unintelligent?
Rohit notes that the computer can find the square root of massive numbers instantly, or control the lights in a smart home without making small talk about the weather. But is the Turing Test then broken? Or, perhaps, sending an electrical signal to a light socket isn’t actually true intelligence? It’s exciting when, in the course of my work, I can dig into some really cutting-edge piece of technology. But more often I find myself proposing that simple software or human approaches will solve a given problem more effectively. It’s ok if we solve problems with the right tools. We don’t need to call every tool we use AI.
To be fair, solving the Turing Test isn’t an active pursuit for most AI researchers, and much simpler, more directed benchmarks like SQuaD and SuperGLUE delineate the limitations of our current approaches and drive innovation. As a practical matter, progress comes in small steps: finding an area where current approaches fail, and then finding the building blocks to address that limitation.
But our conversational agents aren’t failing the Turing Test because they’re simply too smart, too clearly inhuman in their competence. They’re failing because we can produce some aspects of intelligence, but not all of them. My suspicion is that should the day come where the major barrier to the Turing Test is the AI being able to dumb itself down to human levels, it’ll accomplish that just fine. If prompted to act like my 5 year old son, I can avoid injecting eight syllable words into the conversation.
For a long period of time, ending about five years ago, researchers and businesses mostly avoided the term AI. We had been burned by the clear gap between human intelligence and what a computer could achieve. With Deep Learning unlocking new tasks, we took that mantle back up. I have no issue calling what our technology can achieve a form of intelligence. But as an industry, I think we do ourselves and our customers a disservice trying to pare back the definition of intelligence to what we can accomplish. We can celebrate our technological achievements while acknowledging the work remaining. We can solve important problems in the world with the right tools, and not pretend that there’s a silicon brain behind each solution.