Every mid-sized manufacturing company in the SK/EU processes dozens to hundreds of paper or scanned documents every day — supplier invoices, customer purchase orders, delivery notes, customs declarations, technical drawings with dimensions. And for most of them, somewhere there is an employee manually retyping values into ERP, SAP, or Excel. This isn't a problem companies are ignoring — it's a problem they've found too expensive to solve any other way. Until now.
In 2026 that changed. Modern document intelligence systems — a combination of OCR (optical character recognition), vision-language models, and validation logic — can process structured and semi-structured documents with an accuracy that manual re-entry cannot match. This article explains where this technology pays off, what a real deployment looks like, and where the limits are that no marketing material will tell you about.
Why classic OCR isn't enough
When most people hear "OCR" they picture Tesseract or Adobe Acrobat — tools that turn a scanned page into editable text. For simple cases that's fine. For industrial documents it isn't.
Problems appear on several levels at once:
- A blurred or skewed scan from a 30-year-old faxed invoice trips up classic OCR far more than a modern VLM model.
- Tables, dimensions, and schematics are a blind spot for text-based OCR — you get the right characters, but you lose the structure that gives them meaning.
- Format variability: every supplier has a different invoice template. Classic OCR solutions handle this with rule-based extraction configured per template — which at tens of suppliers means tens of templates and permanent maintenance overhead.
- Contextual meaning: the number "1500" in the bottom-right corner of a page could be a page number, a purchase order number, or an amount in euros — without context classic OCR cannot tell the difference.
Modern VLM (vision-language model, e.g. Qwen2.5-VL) approaches this problem differently: they don't just read characters, they understand document layout and context. This is a qualitative leap, not an incremental improvement.
What a document intelligence pipeline looks like
A real industrial pipeline for document processing is made up of multiple layers. Each of them can be a failure point if underestimated.
1. Ingestion and normalisation
A document arrives as a scanned PDF, a photo taken on a mobile phone, an email attachment, or an EDI output from a legacy system. The first step is conversion to a unified internal format — with metadata (source, date received, document type).
The hidden problem here: for field photographs, input quality is extremely variable. Out of focus, shadowed, folded — a VLM handles this better than classic OCR, but not infinitely better. Input quality still caps output quality.
2. Document classification
Before extraction starts, the system needs to know what it is processing. An invoice, a purchase order, a delivery note, and a technical drawing each require different extraction schemas. A classifier (either a small model or rule-based logic over the document structure) assigns the document to the correct category.
3. Structured extraction
The core of the pipeline: values are extracted from the document according to a predefined schema. For an invoice this typically means:
- Invoice number, issue date, due date
- Supplier and customer registration and VAT numbers
- Line items (description, quantity, unit, net price, VAT, gross price)
- Total amount, bank account number, variable symbol
VLM models handle this end-to-end — they receive an image or PDF as input and return a JSON object matching a given schema. This output corresponds to the concept of structured outputs and JSON mode — the model produces only valid JSON with defined fields.
For technical drawings the situation is more demanding: dimensions, tolerances, and material specifications are scattered across the geometric context of a schematic. VLM-72B models have made significant progress here, but for precision-critical technical documents a human-in-the-loop review of final values is recommended.
4. Validation and cross-checking
Extracted values are subjected to several layers of validation:
- Mathematical consistency: the sum of line items must equal the total; VAT must correspond to the stated rate.
- Reference validation: the purchase order number on the invoice must exist in ERP; the supplier registration number must be in the approved-supplier database.
- Range validation: invoice amount within the range typical for that supplier (anomalies flagged for manual review).
This validation layer is critical. In practice we have seen cases where the model correctly extracted the numbers from the document, but the invoice total did not match the sum of the line items — the error was in the original document, not in the extraction. Without validation, ERP would contain inconsistent data.
5. ERP integration and workflow
Validated records are passed to ERP via API. Documents with a low confidence score (the model is uncertain about a value) or with a failed validation check go into a queue for manual review — with the specific problematic field highlighted.
This is the right approach: not full automation, but assisted automation with a clear human-in-the-loop wherever there is uncertainty.
Where automation pays off and where it doesn't
Not every document type is equally suited to automation. Based on real-world deployments, we can say:
Highly suitable: - Invoices from repeat suppliers (the system "learns" the format) - Standardised order forms (both your own and those from trading partners) - Delivery notes with barcodes or QR codes (hybrid OCR + barcode approach) - Material certificates with a defined structure
Suitable with caveats: - Invoices from new suppliers (first processing requires closer review) - Orders received by email as plain text or HTML — here an LLM over the email body may work better than OCR over an image - Technical data sheets with parameter tables
Less suitable without a dedicated solution: - Handwritten documents, old low-quality fax output - Technical drawings with complex geometric schematics and tolerances — extraction works but requires verification - Contracts and legal documents with complex structure (here LLM over industrial documentation is more valuable than a pure OCR pipeline)
Technical decisions — cloud vs. on-prem
Most companies we work with face the same question: cloud API or an on-premises installation?
Cloud API (Azure AI Document Intelligence, Google Document AI, AWS Textract): - Fast start, no infrastructure to manage - Pay-per-page model — at high volumes (tens of thousands of documents per month) costs can be significant - Invoices and orders contain commercially sensitive data — for regulated industries or where internal GDPR policy applies, cloud is problematic
On-prem VLM (e.g. Qwen2.5-VL-72B quantised):
- Full data sovereignty — no egress
- Higher VRAM requirements: a 72B model requires a multi-GPU setup (roughly 40+ GB VRAM for inference)
- One-time hardware investment, then marginal cost close to zero as document volume scales
For most industrial companies in the EU, the on-prem argument is strong if you have sufficient volume (on the order of thousands of documents per month). For lower volumes or a quick start, a cloud API can be a sensible stepping stone with a migration plan.
Integration with structured outputs and ERP
A critical detail in this pipeline is the reliability of the output format from the LLM. A model that returns clean JSON one time and JSON wrapped in a markdown block the next is unusable for automated integration.
Modern models support constrained decoding — the model generates tokens in conformance with a defined JSON schema, making it physically incapable of returning invalid JSON. This is a necessity, not an option, for production deployment. More on this in the article Structured outputs and JSON mode.
For ERP integration the rule is: never write directly into ERP from an AI model. The standard pattern is:
- 1.AI extracts and validates → result written to a staging table
- 2.Validation rules (a script or workflow engine) check for consistency
- 3.On success → automatic import into ERP; on uncertainty → queue for human review
- 4.A human reviewer sees the document, the extracted values, and the specific field with the problem
This pattern preserves auditability and prevents silent data corruption in ERP.
Drawings and technical documentation
Technical drawings — DXF, geometry PDFs, wiring schematics — are a category of their own. Traditional OCR is almost unusable here, because most of the information lies in the relationships between graphical elements, not in the text itself.
Modern VLM models can extract from a technical drawing: - Dimensions and tolerances (accuracy depends on input quality) - Material and surface finish descriptions - Part numbers and revision identifiers
Where you still need human review: safety-critical tolerances, electrical schematics for ATEX environments, documents for certification processes. AI acts as an assistant here, not as an autonomous decision-maker — similar to how AI Copilot for operators reduces the workload but does not replace accountability.
Common deployment mistakes
To close the technical section — problems we see repeatedly:
"Deploy and forget" — a document intelligence pipeline needs monitoring. A new supplier with an unusual invoice template will lower confidence scores and land in the manual queue; that's fine, but the queue must be watched.
Underestimating variability — in your pilot you test 50 invoices from 5 suppliers. In production you have 500 suppliers, some of whom change their templates without notice. Test on a diverse sample, not only on "clean" cases.
Confidence scores without calibration — the model reports extraction certainty, but those scores are calibrated against training data, not against your documents. In the first weeks of production, track where the model declares "confident" and where it was wrong — set your manual-review thresholds accordingly.
Ignoring edge cases — what happens if an invoice arrives in German? Or a PDF containing two invoices on separate pages? These cases need to be defined and handled explicitly, not left to chance.
Frequently asked questions
How accurate are modern systems at extracting data from invoices?
For structured invoices from repeat suppliers, modern VLM pipelines achieve 95–99% accuracy on key fields (invoice number, total amount, date) in practice. For new or non-standard formats accuracy is lower — which is why the validation layer and manual-review queue are critical. Numbers in marketing materials (99.9%) typically come from controlled tests, not from real production deployments with full input variability.
Does document intelligence require a GPU?
Not for cloud APIs — you pay per API call. For on-premises deployment with VLM models (70B+), yes — you need roughly 40+ GB VRAM for reasonable inference latencies. Smaller models (7–14B) can run on an RTX 4090 (24 GB VRAM with quantisation), but with lower accuracy on complex technical documents. For invoices and purchase orders a 7B model delivers good performance.
Can this be connected to our existing SAP / Pohoda / other ERP?
Yes — a document intelligence pipeline produces structured JSON that can be imported via the ERP's API or through standard integration interfaces (REST, IDoc for SAP, CSV import for simpler systems). The integration itself is not the hard part; the larger effort is defining the staging logic and the validation rules specific to your business processes.
What about documents in different languages (German, Polish)?
Modern VLM models are multilingual and handle most European languages without any special configuration. Validation rules (e.g. company registration number format, IBAN) need to be configured per country. If you process high volumes from a specific country, it is worth verifying accuracy on real samples — performance is not always uniform across languages.
How long does implementation take?
A straightforward pilot on one document type (e.g. invoices from the top-20 suppliers) can be up and running in 4–6 weeks. A full production deployment with ERP integration, monitoring, and coverage of all document types typically takes 3–6 months depending on the complexity of the environment and the number of integration points.
*MP Industrial Solutions helps companies move from manual document re-entry to a verifiable, automated data flow — from a single-document-type pilot through to full production deployment with ERP integration. If you are working on document intelligence or considering where to start, we would be glad to assess your specific situation.*
