Machine learning (ML) describes the ability of algorithms to structure and interpret data independently or to learn correlations. The use of ML is steadily increasing in companies of all sizes. However, insufficient market readiness of many ML solutions inhibits their application, especially in production systems. Predictive models apply ML to understand the complex behavior of a system through regression from operational data. This enables determining the relationship between factors and target variables. Accurate predictions of these models for production systems are essential for their application, as even minor variations can significantly affect the process. This accuracy depends on the available data to train the ML model. Production data usually shows a high epistemic uncertainty, leading to inaccurate predictions unfit for real-world applications. This paper presents ML-driven, data-centric Design of Experiments (DoE) to create a process-specific dataset with low epistemic uncertainty. This leads to improved accuracy of the predictive models, ultimately making them feasible for production systems. Our approach focuses on determining epistemic uncertainty in historical data of a production system to find data points of high value to the ML model in the factor space. To identify an efficient set of experiments, we cluster these data points weighted by feature importance. We evaluate the model by running these experiments and using the collected data for further training of a prediction model. Our approach achieves a significantly higher increase in accuracy compared to continuing the training of the prediction model with the same amount of regular operating data.
|