Generative artificial intelligence (GenAI) fails quickly in situations requiring accurate world modelling rather than just predictive capabilities, MIT and Harvard researchers have found in a study with significant implications for AI models used in the real world situations, such as AI-powered wayfinding.
Predictions accurate
The researchers focused on the type of “transformer” generative AI model employed behind GPT-4, which is trained on vast quantities of language, or “large language models” (LLMs). The language data eventually enables the transformers to predict outcomes in a sequence.
In the tests, two classes of transformer were chosen for game playing, logic puzzles and navigation testing, such as solving a seating plan problem in a performance venue. One of them was trained on data generated from randomly produced sequences and the other on data generated by following strategies. The transformers can predict moves in games like Connect 4 and Othello, and supply step-by-step directions for navigating New York City, the scientists found.
World views can’t cope with minor change
But, in “sequence distinction” and “sequence compression” tests that revealed whether the transformers had actually formed an accurate real world model, their performance dropped notably. Only the transformer trained on strategy data was able to generate a coherent world model for Othello moves. And in worrying news for the mobility sector for example, neither class of transformer was able to produce an accurate city map for New York. What’s more, both the transformers’ navigation performances fell at minor hurdles.
For example, when the researchers introduced road blocks and detours into the New York navigation exercise, “its performance plummeted” MIT News reported, even when the variations were only minuscule. “I was surprised by how quickly the performance deteriorated as soon as we added a detour. If we close just 1 percent of the possible streets, accuracy immediately plummets from nearly 100 percent to just 67 percent,” lead author Keyon Vafa said.
AI New York map covered in nonexistent roads
Looking into the anomaly, the team, made up of academics from MIT, Harvard, Cornell and Tisch universities, discovered the AI had created an internal map of New York covered in a network of non-existent roads and flyovers.
Speaking to the implications of the study, “the question of whether LLMs are learning coherent world models is very important if we want to use these techniques to make new discoveries,” said senior author, Ashesh Rambachan, of the MIT Laboratory for Information and Decision Systems (LIDS).
The next step for the researchers is to introduce problems with only partially known rules for the AI transformers to work on.