In a previous Smart Content Report, I featured a funny illustrated guide generated by ChatGPT’s Dall-E:
I find such “failures” interesting to see, because they can reveal fundamental problems.
For example, we are still a long way from an AI that actually understands the world around it (“General World Model”). At the moment, these tools imitate their target medium as best they can, without even beginning to know what they are actually producing. This is true for all types of media: text, photo, video, audio.
This is why, for example, the numbers in the graphic above are so nonsensical: the image generator doesn’t know what numbers are, how they work, and what role they play in a guide like this. The AI has just learned from examples that this element appears there.
And then, of course, there is the fundamental problem with the instructions in the picture: Apparently, there are many illustrations showing how to attach the back of the TV to the brackets. And this gets mixed up with the rest of the process.
Or ask ChatGPT how many times the letter R appears in the word strawberry. You will most likely get the wrong answer. And why is that? Because ChatGPT does not see the word strawberry or the individual letters. It works internally with “tokens” and these can be single or multiple characters. ChatGPT has learned how these tokens are connected and relate to each other and can do amazing things with this knowledge. But it doesn’t see what we are seeing. More about that in this article on TechCrunch.
This creates a contrast that I personally always find amazing. For example, the illustrated instructions above look good at first glance. The style is familiar, the human figure looks reasonably correct, it’s visually appealing… But the longer you inspect it, the more problems you discover.
And that’s what it’s like to work with the current generation of AI tools. They can do a lot of things well. They can be helpful. But you have to stay vigilant. And you have to learn where and when they fail.
The line between what AI can and cannot do is not a straight line though. It is more like a wild zigzag. Nevertheless, if you use the tools long enough, recurring weaknesses become apparent.
One thing in particular seems clear: for the vast majority of tasks, they are still far from being a direct replacement for a human.