LiveBench is a new benchmark for large language models developed by a team of scientists. Unlike existing benchmarks, it uses constantly updated questions from current sources and automatically scores the answers based on objective criteria. The team has taken special care to avoid the risk of “contamination”, where the training data of a language model contains the test data of a benchmark. This means that the results of the benchmark should actually reflect the model’s abilities in new situations, and not just its ability to reproduce already known content.