Journalists Dalvin Brown, Kara Dapena, and Joanna Stern tested ChatGPT, Claude, Copilot, Gemini, and Perplexity in everyday situations. Each chatbot was asked questions formulated by Wall Street Journal editors and columnists. The responses were evaluated by an independent panel of judges based on accuracy, usefulness and overall quality.
The health category included questions about pregnancy, weight loss, depression and a diverse set of symptoms. ChatGPT scored highest in this category, while Gemini excelled in the Finance category, which included topics such as inheritance, pensions and interest rates. In Cooking, ChatGPT impressed with creative and realistic menus, while Gemini won the test with a question about a gluten- and dairy-free chocolate cake. In the Creative Writing category, Copilot impressed with its jokes and original responses, while Perplexity shone with its journalistic expertise.
Overall, the chatbots proved helpful and avoided controversy, with Perplexity emerging as the most frequent winner. The differences between the chatbots also illustrate that each is suitable for different purposes.