Keywords

1 Introduction

With the rise of Industry 4.0, machine data became available and could be used for optimization in multiple use cases. One of them is predictive maintenance (PdM) which can reduce maintenance costs by up to 60% [2]. PdM is focused on condition monitoring and relies on machine learning (ML) models to decrease system downtime. However, in an industrial environment, the performance of ML models mainly depends on the quantity and quality of the available data. Raw data often lacks integrity due to different reasons, such as missing or corrupted values and noisy data. To overcome these challenges, domain knowledge is frequently integrated into ML models [3]. We will refer to this as domain knowledge injection since ML models are the basis and the knowledge enhances them.

Domain Knowledge is mostly available from domain experts in several forms. It can be injected in multiple ways based on its purpose in different phases of model development. It is often not simple to decide, which form of knowledge can be injected in which phase of the modeling process into which ML model. To find a solution for this problem, we analyzed 50 PdM use cases from the literature in which domain knowledge was injected into ML models. Von Rueden et al. [1] have proposed the Informed Machine Learning framework, that provides a taxonomy and a survey on how prior knowledge can be integrated into learning systems. Based on their framework, we developed a decision guidance for PdM use cases to assist data scientists by recommending suitable domain knowledge injection techniques for a given PdM use case and the related knowledge. We structured the use cases based on the analytics type proposed by Steenstrup et al. [4], which is seen often in the manufacturing domain. In addition, we give a recommendation for suitable ML algorithms that where applied successfully in the 50 analyzed use cases. These algorithms are ranked by the frequency they were applied. Trying the algorithms in the suggested order might lead to a good model faster.

In summary, our contributions are:

  1. 1.

    A variation of the framework developed by Von Rueden et al. [1] that structures available forms of domain knowledge, knowledge conversion options and injection phases especially suitable for PdM use cases.

  2. 2.

    A knowledge base containing 50 PdM use cases looking at the analytics type, form of domain knowledge, conversion form and possible phase for the injection into a ML model.

  3. 3.

    Resulting from these, a guidance with a recommendation for suitable injection phases and frequently used ML algorithms for PdM use cases.

2 Related Work

There are multiple works where domain knowledge was injected into ML models [5, 6]. Focusing on PdM applications, Serradilla et al. [2] proposed a methodology to incorporate knowledge but they focused on the process and did not address which form of knowledge should be used in which situation. Based on process discovery, Schuster et al. [7] proposed to utilize domain knowledge injection based on its provision time. They clustered the knowledge by its use in the different phases process discovery, development and post-processing. Von Rueden et al. [1] proposed a framework to incorporate prior knowledge in ML models. Since it is generic, we decided to create a variation of it especially suitable for PdM use cases according to what we learned from the literature study. Most of existing works are focused on how formalized knowledge can be injected into the models to enhance the performance of the applied ML models. According to our knowledge, so far there is no previous work suggesting which ML algorithms might be suitable for the injection of a specific form of domain knowledge in different use case scenarios. This equips the necessity of developing a guidance that can provide advice on how to inject domain knowledge into suitable ML algorithms for PdM.

3 Guidance Development

The goal of the article is to provide a guidance to data scientists to inject domain knowledge into ML models. It gives suggestions for suitable knowledge injection phases and possible ML algorithms based on a given analytics type and knowledge form. The guidance is developed with the results of a literature study in which 50 PdM use cases were analyzed. The use cases are structured according to a knowledge injection framework for PdM use cases, which is used to categorize domain knowledge, knowledge conversions and injection phases. First, the framework is introduced, followed by the literature study. The final result is the guidance in form of a table.

3.1 Knowledge Injection Framework

Based on the informed machine learning framework by Von Rueden et al. [1] and inspired by the Data mining methodology for engineering applications (CRISP-DMME) by Huber et al. [8], the knowledge injection framework depicted in Fig. 1 was developed. It is structured horizontally into three main levels which are described in the following. It is used as a frame of reference when developing a PdM use case to aid the injection process.

Form of domain knowledge related to the use case

The top row of Fig. 1 represents the first level. It comprises the domain knowledge which is directly related to the use case before any conversion or formalization. Von Rueden et al. [1] clustered the knowledge into three types: scientific knowledge, world knowledge and expert knowledge. As already suggested by them, the knowledge in these categories usually is formal, semi-formal, and informal, respectively. We chose to use the degree of formalization as categories in the framework to provide a more technical view. Besides, in manufacturing environments, domain knowledge is often available in an informal or a semi-formal form from domain experts rather than being found in a formalized form.

Fig. 1
A knowledge injection framework in 3 parts. They are, the form of domain knowledge related to the use case, knowledge conversion for injection, and injection of knowledge into the M L model.

Knowledge injection framework for PdM use cases

Informal Domain Knowledge has only indirect impact on constructing the intelligent model, since a formalization in some way is necessary. Usually, this knowledge is acquired by experts through their working life such as simple heuristics or intuitions. Data scientists get benefits from this knowledge to make decisions by following different standard techniques. Examples from PdM use cases include different types of sensors that are suitable to collect a specific type of data, their location or the total number of sensors. Informal domain knowledge is also helpful to apply various pre-processing techniques to the data, such as normalizing/filtering data, or applying different physics-based pre-processing techniques like Fast Fourier Transform, Short-time Fourier Transform, or Wavelet Transform to time series data, which have proven to be useful in similar use cases. This special form of informal domain knowledge can further be defined as domain-specific data science knowledge.

Semi-formal Domain Knowledge is more structured and explicitly available. But it is not entirely structured and needs to be formalized before injection. In the analyzed use cases, it was mainly injected by using logic rules to process data based on an expert opinion or expert-defined thresholds to filter/label input data.

Formal Domain Knowledge refers to expert knowledge that can be injected in a standardized form. Examples include simulation-generated machine data which can be used for model training or equations defined by the domain expert to design the model architecture.

Knowledge conversion for injection

The knowledge conversion is carried out at the second level depicted in the middle row in Fig. 1. It acts as a bridge between domain experts and data scientists. In the literature, the knowledge was converged into a variety of formats, which include the following:

  • Standard techniques which proved to have a good performance in similar use cases in the past, such as specific pre-processing techniques.

  • Different forms of equations, such as mass balances.

  • Human decisions, such as a rating if the machine sounds normal.

  • Logic rules, such as the ones contained in a programmable logic controller.

  • Simulation results, for example from a digital twin of a machine.

  • Probabilistic relations, such as the Weibull distribution [9] to fit the training model.

Injection of Knowledge into the ML Model

The bottom row of Fig. 1 represents the third level. It shows possible phases for the injection of domain knowledge during model development. This level is structured into three groups: data preparation, intelligent model design, and decision making. In data preparation, converged knowledge is used to pre-process the data and to generate relevant features through feature engineering. The next group aims to enhance the performance of an ML model by injecting knowledge into three different tasks of the intelligent model design phase, which are: hypothesis space definition, constraint setting, and loss function regularization. The last group is decision making in which a component to make automated decisions based on the prediction of the ML model is developed. It is mainly used to perform prescriptive tasks such as making appropriate maintenance decisions to mitigate the impact of unpleasant incidents.

Due to the variety of knowledge, it is not possible to give standard paths through the framework. Also, not all of the levels apply in all use cases. Especially for formal knowledge, the conversion might not be needed since the knowledge often can be used directly. Some of the phases in the third level can be used in combination with all conversions, others were found to be applied only in specific phases. All three levels are linked, providing the possibility to switch the levels when further insights into the use case are gained. For example, a standard technique can be used to create results on which an expert decision can be made to apply further conversion techniques.

3.2 Literature Study and Construction of the Knowledge Base

We conducted a literature study, resulting in the analysis of 50 PdM use cases. Therefore, we searched for PdM use cases with the terms ‘condition monitoring’, ‘fault detection’, ‘fault diagnosis’, ‘predictive maintenance’, and ‘prescriptive maintenance’ in combination with the terms ‘artificial intelligence’, ‘machine learning’, ‘domain knowledge’, and ‘domain knowledge integration’. We searched in Google Scholar, ScienceDirect, Springer, IEEE Explore, ACM Digital Library, and Elsevier for works published mainly between 2017 and 2022.

We clustered the use cases according to their objective. Therefore, we used the following four analytics types [4]:

  • Descriptive analytics provides information on what has happened

  • Diagnosis explains why something has happened and the reasons behind it

  • Predictive analytics gives an estimate of what will happen in the future

  • Prescriptive analytics offers recommendations on how to influence the future

Each of these four clusters was further divided by the form of domain knowledge according to the framework shown in Fig. 1. Table 1 depicts the overall knowledge base, showing which conversion of knowledge was injected in which phase for each of the 50 use cases. The table shows, that the main conversions for knowledge injection in the analyzed use cases were the application of standard techniques and human decisions, followed by equations. Logic rules, simulation results and probabilistic relations were applied in only a few use cases. Looking at the third level of the framework, the most relevant phase for injection is pre-processing, followed by feature engineering and hypothesis space. Constraint setting, regularization and decision making were not applied in many of the analyzed use cases. The analytics type prescriptive has far less use cases than the rest. Also, not all forms of knowledge are covered for all use cases.

Table 1 Knowledge base containing 50 PdM use cases structured by analytics type and according to the knowledge injection framework. (ST= standard techniques, EQ= equations, LR= logic rules, SR= simulation results, PR=probabilistic relations, HD= human decisions; PP= pre-processing, FE= feature engineering, HS= hypothesis space, CS= constraint setting, RZ= loss function regularization, DM= decision making)

3.3 Guidance Creation

The guidance is shown in Table 2. It takes the analytics type and knowledge form as input and gives suggestions for a suitable injection phase and suitable ML algorithms. The injection phases and ML algorithms are ranked based on their number of occurrence in the analyzed use cases. We recommend to try the suggestions of the guidance based on the ranking of the entries. For example, if data scientists are searching for a suggestion for a diagnosis use case with informal domain knowledge, trying feature engineering in combination with support vector machine, is the most promising way. In case this does not provide sufficient results, the other ML algorithms could be tried in the given order. Afterwards, the injection technique might be changed to pre-processing. It is also possible to apply different forms of knowledge in multiple modeling phases. Therefore, the guidance should be followed once for each type of available knowledge.

We have used the guidance as a knowledge base for a prototype of a recommender system. The prototype is available at GitHubFootnote 1.

Table 2 Decision guidance suggesting suitable knowledge injection phase and ML algorithms based on a given analytics type and knowledge form. For each ML algorithm, one example use case is provided where the model was applied as an orientation. The number of occurrence of the ML algorithms in the analyzed use cases is given in parenthesis if it is greater than one. (PP= pre-processing, FE= feature engineering, HS= hypothesis space, CS= constraint setting, RZ= loss function regularization, DM= decision making)

4 Examples for the Application of the Guidance

In the following we show how the guidance may be used when applied to two real-world use-cases. The first one is a diagnostic use case that monitors the condition inside a disk stack separator [54]. Domain knowledge was available in a semi-formal form which was converted into standard techniques and logic rules at the knowledge conversion level. From the guidance suggestions, pre-processing, feature engineering, and hypothesis space definition were applied to develop the ML model. Standard techniques and logic rules were applied to pre-process the data, and then a human decision was taken to detect suitable features and set a hypothesis space. Among the models suggested by the guidance, the highest accuracy was achieved by random forest (RF), which is 91.27%.

The second use case is a predictive use case where we used the NASA IMS bearing dataset [55] to predict outer race defects of bearing 1 to estimate its remaining useful life. Here, the domain knowledge was available in an informal form. Pre-processing, feature engineering, and hypothesis space definition were used as knowledge injection phases as suggested by the guidance. In the knowledge conversion level, FFT was used as a standard technique to transform informal knowledge and perform pre-processing. Besides, human decisions were made to conduct feature engineering and define the hypothesis space. Among the suggested models, SVR performed best with a root mean square error of 0.12.

5 Discussion

The evaluation has shown that, by using the guidance, it is possible to quickly select suitable knowledge injection phases and ML algorithms for PdM use cases. It does not guarantee that the suggestions will be the best as the performance of any ML model also depends on the underlying systems and data, including their quality, quantity and type. Choosing one over another mainly depends on the users with regard to the selected use case. However, the users can get ideas and/or a starting point on how to create models for PdM use cases with the injection of domain knowledge. The guidance assists its users both in a direct and indirect way. Directly, the users get suggestions of suitable phases and probably suitable models. Indirectly, the users can get additional ideas from the knowledge injection framework on different further possibilities to inject domain knowledge. At this moment, the knowledge base lacks sufficient use case examples for some of the criteria. For instance, prescriptive analytics contains fewer use cases as the current literature contains fewer examples in this area. This limits the guidance’s ability to give reliable suggestions for this type of use case. Also, the knowledge base has not covered examples for each possible form of domain knowledge and has not the same number of use cases for the conversions. To overcome these problems and to improve the guidance suggestions, it is required to analyze more use cases.

6 Conclusion

This paper presents a guidance which assists its users to inject domain knowledge in ML models for PdM use cases. It suggests possible injection phases and suitable ML algorithms with the analytics type and the form of domain knowledge as input. It is based on a literature study, in which 50 PdM use cases were analyzed. The recommendation of the guidance has been applied to two PdM use cases and delivered a good result in both cases. In its current form, the underlying knowledge base does not contain a balanced amount of use cases in every category. For more reliable suggestions, more use cases should be investigated in the future which are related to the lesser represented categories.