Mistral Releases OCR 4, a Document Analysis AI
French AI company Mistral AI has unveiled OCR 4, a next-generation OCR model designed to analyze document structure. Beyond simple text extraction, it can output location information, element classification, and confidence scores in a single operation, with support for standalone deployment on proprietary infrastructure. Pricing is $4 per 1,000 pages ($2 for batch processing), and it is immediately available through multiple platforms including Mistral API, Amazon SageMaker, and Microsoft Foundry.

Mistral AI, a French AI company, has unveiled OCR 4, a new model designed to extract information from documents. OCR (Optical Character Recognition) is a technology that converts characters on paper or in images into computer-readable data, but OCR 4 goes far beyond that scope. Rather than simply extracting text, it analyzes the structure of entire documents and outputs the position, type, and confidence level of each element simultaneously—a significant difference from conventional models.
This fourth-generation release, arriving approximately 15 months after Mistral entered this field, comes amid heightened discussions about AI sovereignty in Europe. An increasing number of companies and government agencies are hesitant to send confidential documents to U.S. cloud services, and demand for models that can operate entirely on proprietary infrastructure has grown, particularly in Europe. OCR 4 responds to this demand by adopting a design that enables standalone deployment on internal servers.
The model supports a broad range of use cases, backing 170 languages and 10 language groups, and can process PDF, DOC, PPT, and OpenDocument format files. The core outputs consist of three elements: "bounding boxes (location information)," "block type classification," and "confidence scores." A bounding box provides coordinate information indicating where each extracted element appears in the original document, allowing users to later verify "which page and location this data came from." Additionally, block types such as headings, tables, formulas, and signatures are automatically classified, making it easier to route content to downstream systems based on their specific requirements.
Pricing is $4 per 1,000 pages, dropping to $2 when using batch API. It is already available through Mistral API and Document AI on Mistral Studio, and can also be accessed via Amazon SageMaker and Microsoft Foundry. Support for Snowflake's "Parse Document" is planned to be added soon.
This system is important for enterprises because it eliminates the need to handle document reading and structure analysis in separate systems. Traditionally, the workflow involved using OCR to extract text, then adding a separate layout analysis step. With OCR 4, this entire process can be completed in a single model, potentially reducing development and operational costs.
Particularly noteworthy is its compatibility with RAG (Retrieval-Augmented Generation: a mechanism where AI retrieves and references related documents when generating responses). The ability to trace "on which page and in which section the evidence appears" when AI extracts information from documents is critical for operational accuracy and audit compliance. The location information and confidence scores provided by OCR 4 serve as the foundation enabling such traceability.
In heavily regulated industries such as finance, healthcare, and law, there are deep-rooted concerns about outsourcing document processing to external cloud services. OCR 4's deployment model, which enables processing to be completed on proprietary infrastructure, has the potential to serve as an option for such enterprises looking to advance AI adoption internally. How successfully Mistral can expand its commercial position as a leader in European AI will be measured by future adoption cases.
This article is an original work independently written and edited by the AI issue editorial team based on factual reporting. © AI issue. Unauthorized reproduction, redistribution, or use for AI training is prohibited.