The line is running, sensors are measuring, operators are checking. Defects still escape — not because people aren't doing their jobs, but because human visual attention has an objective ceiling when it comes to repetitive tasks. After two hours of monotonous inspection, detection reliability drops significantly. AI vision systems have no such ceiling: they process every part identically, and when deployed correctly they achieve accuracy above 99% where a human inspector typically reaches around 85%.
The problem is that "deployed correctly" is not trivial. In practice we see projects where an AI quality-inspection system delivered transformative results, and equally projects that ended with an installed camera that was switched off after three months. The difference is not the technology — it is what the manufacturer did before buying the first piece of hardware. This article is a decision-making guide: when to go for it, what to resolve first, and where AI alone is not enough.
What AI visual inspection actually does
The core task is simple: a camera captures a part, the model decides OK or NOK, the system flags or rejects it. Behind that simplicity lie three distinct technical tasks that each require a different approach.
Defect detection — localising a specific flaw on a surface or component: a scratch, crack, porosity, faulty weld. The model must not only say "a defect exists" but also show where. Tools such as YOLOv10 or YOLO11 (Ultralytics) are the 2026 standard for real-time detection: they process a frame in under 200 ms, which covers the vast majority of production line speeds.
Anomaly detection — spotting something "different from normal" without knowing in advance exactly what you are looking for. Models trained exclusively on "good" parts learn what normality looks like, and any deviation is flagged as an anomaly. This approach suits serial production where defects are rare and you do not have enough faulty samples for classical supervised training.
Segmentation and measurement — precisely outlining the area of a defect, measuring dimensions, counting objects. SAM 2 (Meta) now enables zero-shot segmentation — it can frame an unknown defect type without prior training, which dramatically accelerates annotation of new defect categories.
Modern systems combine all three: YOLO detects, SAM segments, downstream measurement logic decides on severity.
When AI pays off — and when it falls short
Not every inspection problem warrants a vision model. Before investing in hardware and integration, answer five questions.
1. Is the defect visually consistent? Scratches on an aluminium profile always look similar. Corrosion on welds can take dozens of forms. The more consistent the visual appearance of the defect, the easier it is to train the model and the higher the accuracy you will achieve. Defects with extremely variable appearances (dependent on material temperature, angle of impact, raw-material batch) are far harder to handle.
2. Do you have enough defect images? This is the most common obstacle. Classical supervised training of the YOLO type requires on the order of hundreds to thousands of annotated examples *per* defect category. In practice most manufacturers have archives full of OK parts and only dozens of defect images. Solutions exist — synthetic augmentation, GAN-generated defects, or anomaly detection without defect labels — but they add complexity.
3. Is the inspection reproducible? Lighting, part position, movement speed, reflections — everything must be stable. A model trained under one lighting type will fail under another. That is not a bad model; it is physics. If the production line does not have stable imaging conditions, those must be solved mechanically or structurally first — AI will not fix them.
4. Can an existing solution solve the problem more cheaply? Simple binary checks (component present/absent, correct orientation) are handled reliably by classical rule-based machine vision with profile sensors at a fraction of the cost and without training data. AI pays off where rule-based systems fail — irregular surfaces, shape variability, subjective quality criteria.
5. What is the cost of an escaped defect versus a false alarm? In automotive, a single escaped defect can mean a recall of hundreds of vehicles. In consumer goods the tolerance is higher. The NOK/OK ratio and the cost of both error types determine what threshold you set — and that directly affects whether the AI system will be economically viable.
The reality of accuracy: what 99% means
The figure "99% accuracy" appears in every presentation. It is important to understand what it means — and what it does not.
If a production line produces 10,000 parts per day and the defect rate is 1%, you have 100 defective parts. An AI system with 99% overall accuracy at this distribution can have very different real-world performance depending on how that 1% error is split between false positives and false negatives. For critical applications, overall accuracy is less important than per-defect-class accuracy for the defects you actually need to catch.
In practice we see this pattern: models well trained on the main defect types achieve excellent results on exactly those types, but degrade on rare or novel variants not present in the training data. This is why continuously adding new production examples to the training set is critical — the model is not a one-time investment; it is a living system.
The comparison with human inspection is also contextual. AI systems are consistently better at repetitive, well-defined inspection types. An experienced inspector with deep process knowledge can catch anomalies the model has never seen — a hybrid setup (AI flags, human operator validates edge cases) is standard production practice in 2026.
Low data: how to tackle the chronic problem
A shortage of defect samples is the reality for most manufacturers. Several proven approaches:
Anomaly detection instead of classification. Models such as ViT (Vision Transformer) trained exclusively on "good" parts learn the distribution of normality. During production, every part that deviates from that distribution receives a high anomaly score. Downside: the model cannot name the defect type, only report "something is wrong". For a first deployment phase that is often sufficient.
Synthetic augmentation. Realistic synthetic defects (scratches, cracks, stains) can be generated directly on OK images using algorithmic methods or GAN/diffusion models. In industrial projects we have seen that high-quality synthetic augmentation can replace 30–50% of real defect examples without a significant impact on accuracy — but the result depends heavily on how closely the synthetics match reality.
`SAM 2` for annotation. Manual image annotation is time-intensive. SAM 2 enables zero-shot segmentation — an expert points at a defect and the model immediately outlines it with a precise mask. In practice this cuts annotation time for new defect categories by 60–80% compared with a classical manual workflow.
Transfer learning. Start with a model pre-trained on industrial data (public datasets exist for surface defect detection on metals, textiles, and wood) and fine-tune on your specific data. You need far fewer examples than training from scratch.
For a deeper understanding of the approach to training data — including how many samples you realistically need and how to structure them — see Fine-tuning dataset — quantity and quality.
Edge deployment: where the processing happens
The key architecture question: where will inference run?
Cloud — images are sent to a remote server. Unsuitable for most production lines due to latency (network round-trip plus processing) and availability (internet outage = inspection outage). For offline batch processing of archived images, cloud is acceptable.
On-premise server — an inference server inside the plant network, typically on a GPU (NVIDIA RTX 4090 or A-series). Intranet latency is low (milliseconds); availability depends only on internal infrastructure. This is the standard for most serial production environments.
Edge device — compute at the camera or on an industrial PC at the line. Lightweight vision models exported to ONNX or TensorRT format now allow deployment without a dedicated GPU — for example via OpenVINO on Intel hardware or ONNX Runtime on a standard industrial PC. For simple binary OK/NOK decisions this is sufficient. For complex multi-class detection with high accuracy, edge hardware is still the limiting factor.
In practice we use a combination: the edge device delivers an immediate OK/NOK decision in under 100 ms; results are forwarded to a central server for analytics, retraining, and reporting. This architecture is resilient to network outages while still enabling centralised model management.
When sizing hardware for an inference server, the article on GPU sizing for LLM inference is also useful — the principles of memory bandwidth and VRAM sizing apply equally here.
Integration into the manufacturing process
A vision system that only displays results on a monitor does not deliver full value. Real integration means:
Connection to the production system — MES, SCADA, or directly to a PLC. An NOK signal triggers an action: stopping the line, diverting the part, alerting the operator. Without this link, AI is merely a passive observer.
Traceability — every part has an image, a decision, a timestamp, and a batch ID. In the event of a complaint or customer audit you can instantly replay the inspection record. This is one of the most valuable secondary benefits of AI vision systems — not just automation, but documentation.
Closed loop — more advanced deployments (and we are seeing them more and more) connect the vision system directly to a robotic arm or correction actuator. Detected defect → repair or sorting command → confirmation of the correction via the next image. This closed-loop pattern is the direction the industry is moving.
Dashboard and retraining pipeline — operators must see system performance in real time (OK/NOK counts, types of detected defects, false alarm rate). And when a new defect type appears, there must be a workflow: operator labels the image → annotation → added to the training set → retraining → deploy new model version. Without this pipeline the model ages.
For teams considering how to set up similar agentic loops for long-term AI system management, we recommend reading AI agent architectures (ReAct, Plan-and-Execute).
When classical cameras are enough
It would be dishonest not to say it: there are cases where an AI vision model is not the right answer.
Component presence/absence — whether a screw is in place, a label is applied, a cap is closed. Classical rule-based machine vision handles this reliably at significantly lower cost and without training data.
Dimensional inspection with micrometric accuracy — contact measurement systems or laser profilometers are more precise and reliable than camera inspection for sub-millimetre dimensions.
Extremely simple binary checks — if a defect always looks the same and the contrast against the background is sufficient, a simple threshold algorithm is enough. AI is overkill.
AI vision adds value where variability, complexity, or subjectivity exceeds the capability of rule-based systems. When you are on the boundary, run a pilot: test both approaches on real data and compare the cost of implementation against the improvement in detection.
Frequently asked questions
How many training images do I need for defect detection?
For a classical supervised model (YOLO type) you typically need hundreds of annotated examples per defect category — in practice that means 500 to 2,000 images for a first usable version. If you have fewer, consider anomaly detection (trained on OK parts only) or synthetic augmentation. Annotation quality matters more than raw count — poorly annotated data damages the model more than it helps.
How fast does the camera need to be, and how quickly does the system decide?
For most production lines (up to 60 parts per minute) a 60 fps camera and inference latency under 100 ms are sufficient. YOLO11 on an RTX 4090 processes a frame in 10–30 ms, which provides ample headroom even for faster lines. For extremely high-speed processes (hundreds of parts per minute) line-scan cameras and specialised hardware are required — that is a different class of solution.
Does AI visual inspection work on reflective or transparent materials?
Reflective surfaces (polished metal, chrome) and transparent materials (glass, plastic) are technically more demanding — reflections and variable contrast mislead the model. The solution is a combination of specialised lighting (polarised, dome light, coaxial) and training on data captured under exactly those conditions. It is not an unsolvable problem, but it requires more preparation and testing.
Do I need a GPU directly on the production line?
Not necessarily. For simple models a powerful industrial PC without a dedicated GPU is sufficient (CPU inference is slower but adequate for less demanding tasks). For real-time inspection with complex models we recommend an edge server with a GPU in the control cabinet near the line — not directly in a dusty, vibrating environment. A GPU in the cloud or on a central server is an alternative, but it is dependent on network availability.
Will AI visual inspection ever fully replace human inspection?
In practice, very few operations get close to full replacement. The more common model is augmentation: AI handles 90–95% of inspection autonomously; the remainder — edge cases, new defect types, escalated complaints — goes to a human expert. This hybrid is rational: it saves most of the manual labour while preserving specialist know-how for situations where the model is not yet reliable.
*AI visual inspection is not a product you buy and plug in — it is a system that must be designed, trained, and maintained. At MP Industrial Solutions we help manufacturing companies assess whether and how AI inspection makes sense for their process: from analysing existing data and defect samples, through architecture selection, to integration with an existing MES or SCADA. If you are considering this step, we are happy to look at your specific case directly.*
