How LLMs hallucinate a seahorse emoji due to training data patterns

Large language models consistently claim that a seahorse emoji exists, even though no such emoji has ever been part of the Unicode standard. Theia Vogel reports on her website about this peculiar behavior and its technical causes.

When asked about the existence of a seahorse emoji, models like GPT-5, Claude Sonnet 4.5, and Gemini 2.5 Pro all confidently affirm its existence. Vogel tested multiple models 100 times each. GPT-5 answered “Yes” in every single instance. Llama 3.3 did the same in all cases. When prompted to demonstrate the emoji, these models often produce incorrect substitutes like tropical fish or horse emojis, sometimes entering repetitive loops of emoji spam.

The belief in a seahorse emoji extends beyond artificial intelligence. Reddit threads and social media platforms host hundreds of users who report distinct memories of such an emoji existing. A seahorse emoji was formally proposed to Unicode but rejected in 2018. Vogel suggests that models may have absorbed this widespread misconception from their training data, or developed it independently through pattern recognition. Given that Unicode includes numerous other aquatic animals, both humans and models might reasonably assume a seahorse would be present.

To investigate the technical mechanism behind this behavior, Vogel employed the logit lens technique. This interpretability tool examines what token predictions emerge at each layer of a neural network, before the final output. In the case of real emojis like the fish emoji, models successfully construct an internal representation combining the concept of “fish” with “emoji.” This combined representation then gets matched to the correct emoji token in the model’s vocabulary.

For the nonexistent seahorse emoji, the same process occurs. Middle layers of the network show clear evidence of the model building a “seahorse plus emoji” representation. Words like “sea,” “horse,” and “seahorse” appear in the predicted tokens, alongside byte sequences that form emoji prefixes. The model genuinely attempts to output what it believes should exist.

The problem arises at the final step. The language model head compares the internal representation against approximately 300,000 token vectors in its vocabulary. For a real emoji, this comparison finds a close match and outputs the correct token. For the seahorse emoji, no matching token exists. The system instead outputs the closest available alternative, typically a horse or fish related emoji.

Some models update their beliefs after seeing their own incorrect output. Claude 4.5 Sonnet, for example, sometimes recognizes mid-response that the emoji it produced was wrong and corrects its statement. Other models like GPT-5 continue attempting to produce the nonexistent emoji repeatedly without adjusting their approach.

Vogel speculates that this limitation might partially explain why reinforcement learning benefits language models. Such training exposes models to their own outputs, providing information about mismatches between internal representations and actual token availability that base training alone cannot capture.

How LLMs hallucinate a seahorse emoji due to training data patterns

Related posts:

Stay up-to-date: