In today’s world of System Operations, there are two dominant approaches to maintenance, Reactive and Preventive. Reactive Maintenance is characterized by waiting for the system to fail, and repairing after the fact as a means of minimizing unnecessary repairs.

The problems with this approach are typically those of increased system downtime; expensive repairs, including travel for specialized repair personnel; and in the worst scenarios, catastrophic failure. Preventive Maintenance is based on replacing parts according to manufacturers’ recommended schedules, with the intent of minimizing unforeseen downtime. This approach raises issues of opportunity cost of materials replaced before a failure, as well as unnecessary maintenance. Alternatively, the goal of Predictive Maintenance is having the ability to accurately predict failures in order to find an optimized balance of reduced downtime and full replacement part utilization.

Solving the Predictive Maintenance problem requires careful coordination between the operational maintainers and the designers and analysts who are likely at a remote site. As seen in Figure 1, real-time data and system operation is available to the maintenance operator, who has visual aids and dashboards for operational assessment. The maintainer also often has informal methods of checking in on the system: visual inspection, sound, smells, and general human pattern recognition of unmodeled effects that correlate with system performance. Automated methods of capturing this maintainer instinctive understanding should be one of the areas of investigation going forward. The data analysts may not have access to all of the data coming out of the sensors in real time due to Data Sovereignty,1 communication throughput, or cost constraints at design time. Optimizing which data to pull back — and how often given project constraints — is difficult, and often hard to change after the fact. Finally, coordinating analysis between the maintainer in the field and the data analysts offsite is vital to updating models and data feeds. 

Figure 1 : Raytheon Predictive Maintenance Architecture

Predictive Maintenance is as much a traditional engineering problem as a data science problem. First and foremost, the diagnostic data gathered is generally time series data from electronic and mechanical control systems. This means that physics of failure and theory of design are available as a starting point for understanding (or predicting) failures in simpler parts of the system. These are also the criteria for deciding, at design time, which diagnostic data will be recorded. Changing the data gathered or adding data taps after manufacture can be extremely expensive, and is typically avoided at all costs. In general, diagnostics based on Control System theory engender more trust, since they are based on theory that can be trusted beyond the regions of test data. Eventually, however, there remain unexpected system responses which lend themselves better to machine learning.

Once the interaction of system components becomes complex enough to warrant machine learning applications, those methods that allow visibility into the decision making process are preferable — taking advantage of available experience from system designers and RAM (Reliability, Availability and Maintainability) forensic analysts. In cases where the systems are fielded far from the data analysts, communication bandwidth and data sovereignty become important considerations, often imposing constraints on the ability to diagnose and/or characterize system performance. For example, if only a fraction of the sensor and failure data can be transmitted, there must be rules on the system at the front end to compress, thin and/or summarize the data. This can include anomaly detection, designer-based rules, and information theoretic methods. As much as possible, the system should be tested against full data and method availability to determine how much information is lost in the summarization process, and how much performance is lost, if any, by restricting the final solution to explainable methods. Finally, at each part of the architecture, the data analytics should have some way to update both training and decisions in real time, to avoid being overwhelmed by the sheer volume of new data coming in. This combined set of restrictions severely constrains the final predictive analytics solution space. 

To illustrate the aforementioned engineering concerns with machine learning methods, a generic example is presented using a ‘counterfeit vs. real banknote’ dataset from the University of California at Irvine (UCITM) Data Repository.2 The dataset of image information is separable in four dimensions (variance, skewness, kurtosis and entropy), but for the purpose of this example, we only allow ourselves two of the dimensions (skewness and entropy), in which the data has a lot of shape and overlap. One could just as easily label the data part failures vs. part non-failures. Building a classifier from this training dataset, then running the original data back through results in the decision plot shown in Figure 2. On the left side it can be seen that we can build classifiers that make decisions (“red” or “blue”) when close to training data, but make no decision when far away from the training data, and label those as anomalies. The x’s in this plot are incorrect classifications, which you expect in the areas of high conflict (overlapping data), shown on the contour plot on the right as areas with tight contour rings (or high elevations). The light green surrounding area at low elevation on the right is set to be areas where the classifier knows not to make any decision. Many classifiers may just make a decision no matter where the new test data comes from, which can cause incorrect decisions with high reported probability, as the following example illustrates.

Figure 2 : Classifier making decisions close to the training data

If we loosen the restriction on areas of confident decisions, and allow the classifier to make decisions progressively further away from the training data, we are subject to the dangerous effects outlined in Figure 3. This is known as the “Open Set” or “Strangeness” problem. Three cases are presented with the exact same classifier and x-y scale; only the allowed decision region and the scale of the log-likelihood ratios is changing. The black circles initially bound a region of no decision (a). As the restrictions are loosened from (a) to (b) and then even further from (b) to (c) these areas become strong decision regions where the classifier is allowed to make decisions where it arguably shouldn’t. This is due to the relative likelihoods being so different when far out on the tails of these multimodal distributions. Also notice the lensing effect in Figure 3c as the red class starts pushing “south” into the bottom circle, far away from any data of either type. This is due to the likelihoods from multiple red Gaussian Distributions focusing relative to the blue in this region. One of the advantages of Gaussian Mixture Models and other models that estimate probability density functions is that they allow for a rational threshold beyond which new points can be called outliers, and no class decision is made. These are just a few of the considerations RPM actively tracked while choosing machine learning methods.

Figure 3 : The “Open Set” or “Strangeness” problem. Making decisions far from training data is dangerous. (a) Initial non-decision areas bound by the black circles become decision regions as classifier restrictions are progressively loosened, (b) and (c).

For Raytheon Predictive Maintenance (RPM), we evaluated multiple datasets across many programs, evaluating a bank of Machine Learning methods for suitability. In general, datasets contained both continuous time series data and discrete state variables. Not surprisingly, discrete Machine Learning approaches tended to work better than continuous approaches as the number of important discrete states increased. Also, because these were control systems, it was common for there to be extremely repetitive data during normal operation, which skewed results for all of the machine learning methods that depend on density, including tree based methods, Neural Networks, and unmodified Gaussian Mixture Models. This meant that the assumption that the training data came from a representative distribution was suspect, at best. Where possible, data summarization concepts were used to not only model bandwidth restrictions, but to mitigate varying data densities as well. 

Data visualization and decision justification were given high priority as evaluation criteria in RPM to enable subject matter expert (SME) feedback. Because RPM is intended to analyze complex systems that are designed to work over long periods, there
is a body of expertise built up during design and integration that is included in the error analysis. Along with this come engineering questions such as: Are the assumptions in the failure models correct? If not, how do we update them? Is there a variable that is showing correlation to failure that was considered unimportant? Can we find a causality linked to such a variable? 

Providing answers to these questions enabled us to not just provide a predicted probability that something was about to go wrong, but also to provide the variables and data instances that influenced the decision. As systems are deployed over longer periods of time, operational costs can increase and automated operational assessment and predictive maintenance become increasingly important. The machine learning models used in RPM are intended to grow with experience and draw upon design expertise in a feedback loop — meeting tomorrow’s challenges while strengthening trust amongst the users.

– Dr. Michael Salpukas

1 Data which may be subject to restrictions or rules by the country of origin

2 Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/datasets/banknote+authentication]. Irvine, CA: University of California, School of Information and Computer Science.