This article covers almost six months of 2021 and talks about how we tried to predict the failure of submersible pumping equipment. The article is unlikely to allow you to copy our experience, but it can set the vector of movement and protect you from mistakes.
At the beginning of the year, we were given the task of predicting equipment failures with a horizon of 7, or better 14, days before the moment of failure. We were very optimistic and thought that we would quickly do something brilliant and useful.
First of all, we decided to turn to the world experience and began to look for whether someone had already done something similar. It turned out that they did, but there is no information about exactly how they did it, some articles described failures. It became clear that we would have to invent our own bicycle. But we believed in success 🙂 and after a short study of the issue, they agreed to try to create such a data model.
We were given data for 2019 (and this was the first mistake that slowed down the research very much). We cleaned up the data, identified the key parameters. We built several test models using Random Forest and XGBoost. We were very pleased with the result: the first models showed 76-86% accuracy in training. We were about to open the champagne, but then a harsh reality overtook us: on previously unknown data, the results of the model left much to be desired.
After a short discussion, we came to the conclusion that we used too little data for training. We received data for 2020. The next step was to train for 2019, and test the model for 2020. The result is sad: only 30% of failures were predicted, and only in the last day.
I must say that even 30% of the result is not bad: an experienced engineer, looking at telemetry, predicts only 10-15% of failures. But to meet the business need, two problems had to be solved:
- Predict more than 50% of failures, and the more, the better. Downtime of equipment due to failures is very expensive for the customer.
- And most importantly: to increase the horizon of failure prediction, since many preventive measures cannot be carried out in a day.
In general, the XGBoost model worked, but it did not solve the business problem well. We put it into trial operation and went to think further.
And then there were months of experiments with parameters and fine tuning. We constructed aggregated parameters, added types of equipment, created new event markup, threw out and added data when training the model, added a neural network based on Keras. And alas, we must admit that the accuracy on real data began to decline, and we began to predict 5-20% of failures to get closer to a person.
Now I see several problems in this:
- We were initially given an incomplete and incorrect data set. This caused many problems later on. When we started to sort out the parameters of the equipment by year, it turned out that the data did not even match. And let’s say the data for 2018, they don’t look like anything at all.Ideally, the graphs should be similar and be aligned along the x axis. At the same time, the equipment and operating modes have not been officially changed.
- The location of the equipment has a very big impact on the results. During the distribution of the model to all equipment, we encountered a difference in operating modes depending on the territory of the equipment installation.
- We tried in vain to squeeze more out of XGBoost than it could give us.
Stage acceptance and start from scratch
After several months of trying to set up the model, we realized that we had gone the wrong way. And we decided to return to research, filtering data and building a new model from scratch.
We analyzed everything that was done and noticed that at some point we began to aggregate time intervals of 3-8-12 hours and make forecasts based on them. This gave good results for the last day, but with the expansion of the horizon, the accuracy dropped significantly. Therefore, we decided to move in two directions at once:
- XGBoost – regression.
- TimeSeriesForestClassifier – with clustering of time segments.
Following the results of the experiments, we abandoned this idea. In general, it is viable and, if there is time, we will definitely return to this bundle, as it shows promising and interesting results. But it has a fatal drawback: there is no normal downward trend, the decline is avalanche-like.
The graph shows the trend of failure, the lower the closer the failure event.
A completely new approach, which we eventually chose for a new beginning. In fact, we returned to the trees that we rejected in the very first month of work on the project. But with its own nuances:
- A completely different quality of data – during the experiments, we learned how to clean them at a qualitatively new level.
- A more correct definition of the point of failure – in fact, what was initially presented to us as a point of failure, in the process of studying the data turned out to be a very conditional date. In reality, the point of failure could differ by a week.
- We already understood that there are several radically different modes of operation of the equipment, and that it is necessary to adjust the models to these modes.
As a result, the first builds and the test showed that we began to predict almost 70% of failures in the last day and about 30% in five days.
The red dot is an immediate failure. Black squares are equipment downtime that did not lead to failure.
We have slightly improved the quality of the forecast, raised it to 78% of real failures in the last day. There are experiments on the construction of aggregated parameters, which in theory should work, but in practice often worsen the results. And most importantly, work is underway to smooth out peaks and false-positive positives. I really hope that we will be able to increase the number of predicted cases to 85% in the near future.
If we had chosen this path initially, would we have come to the result faster? Most likely, yes. We would have assembled 100 fewer models. But the most important thing turned out to be not the number and different types of models, but the purity of the data and the understanding of the processes that came only as a result of experiments with XGBoost.