
A Statistical Learning Approach for Root Cause Analysis in Quality Control
In modern manufacturing, semifinished components traverse complex production paths across
multiple machines before becoming finished products. This complexity often obscures the root
causes of quality issues, which can stem from various sources, including defective raw materials or
equipment malfunctions. Identifying these issues is traditionally time-consuming and inefficient, as
the cause and effect may be disconnected across different stages of production.
Every finished product undergoes quantitative quality measurements to ensure safety. However,
linking these measurements to specific root causes remains a challenge, as deviations may fall
within tolerance thresholds, and similar measurements can arise from different underlying factors.
We present a novel statistical learning methodology designed to detect anomalies in quality
measurements and trace them back to their most probable root causes. The approach employs an
empirical prior probabilistic distribution for quality metrics, leveraging it to derive a posterior
conditional distribution of anomaly frequencies within subsets of production paths. To efficiently
navigate the vast space of possible subsets, we use a greedy algorithm that identifies a minimal set
of paths explaining the maximum number of anomalies.
This method is process-agnostic and adaptable to any manufacturing setup with quantitative quality
metrics and a sparse categorical representation of production paths. A pilot implementation in a
manufacturing facility demonstrated tangible improvements in quality KPIs, such as waste
reduction, alongside a significant decrease in manual effort required for monitoring and issue
identification. The phased deployment is currently underway, with full-scale implementation
planned across multiple factories in the coming years. The initiative is expected to deliver an
estimated ROI exceeding 400%, excluding previous enabler costs such as the data platform and
other AI projects.