Mistral launches new OCR API to convert complex documents for AI processing

Mistral AI has introduced Mistral OCR, a new optical character recognition API designed to transform complex PDF documents into AI-ready Markdown files. According to TechCrunch’s Romain Dillet, the French large language model developer launched this tool as a solution for organizations struggling to make their document repositories accessible to AI systems.

Unlike conventional OCR tools, Mistral OCR is multimodal, capable of detecting and preserving both text and visual elements in documents. The API creates bounding boxes around images and illustrations, incorporating them into the final output rather than simply extracting text. The resulting Markdown-formatted content is optimized for large language models (LLMs), which rely heavily on this formatting syntax for training and output generation.

“Over the years, organizations have accumulated numerous documents, often in PDF or slide formats, which are inaccessible to LLMs, particularly RAG systems. With Mistral OCR, our customers can now convert rich and complex documents into readable content in all languages,” stated Guillaume Lample, Mistral’s co-founder and chief science officer.

The company claims its OCR solution outperforms similar offerings from tech giants like Google, Microsoft, and OpenAI, particularly when handling documents with complex elements such as mathematical expressions, advanced layouts, tables, and non-English content. Mistral also touts superior speed, attributing this advantage to the tool’s specialized focus compared to more general-purpose multimodal models like GPT-4o.

Mistral OCR is available through multiple channels, including the company’s own API platform and cloud partners such as AWS, Azure, and Google Cloud Vertex. For organizations handling sensitive or classified information, Mistral offers on-premise deployment options.

The API integrates seamlessly with Retrieval-Augmented Generation (RAG) systems, enabling companies to use multimodal documents as input for large language models. This capability opens up numerous potential applications across various industries, with particular relevance for sectors dealing with large volumes of complex documentation, such as legal services.

Mistral is already utilizing the OCR technology in its AI assistant Le Chat, where it processes uploaded PDF files to extract and understand document content before further AI processing.

Related posts:

Stay up-to-date: