Researchers from Arizona State University conclude that the reasoning abilities of large language models are a “brittle mirage.” According to an article by Kyle Orland in Ars Technica, these models struggle significantly with problems that deviate from their training data. In a controlled experiment, the researchers found that an AI’s performance collapsed when tasks were presented in unfamiliar formats or combinations. The study suggests that models using a “chain of thought” process are not truly reasoning. Instead, they are performing a sophisticated form of pattern matching. The authors warn this can create a “false aura of dependability,” cautioning against their use in high-stakes fields like medicine and law without thorough vetting.