1 Introduction

The rise of Industry 4.0 has led to a major revolution in industry [1]. Triggered by the rapid advancement of digital technologies and reductions in associated technology costs, Industry 4.0 promises to deliver more flexibility, higher productivity, and lower costs to the manufacturing industry. As a result, companies are focusing their research efforts and budgets to deal with the different barriers between their current status and the envisioned future industry [1].

Current approaches for industrial maintenance range from post-failure maintenance, to preventive and predictive maintenance. The first one constitutes the costlier approach, where devices are often broken and/or cause service interruptions until repaired. The latter one is the cheapest, avoiding machinery damage and long service interruptions. However, it is also the most difficult approach to implement, since it requires extensive knowledge about the behavior of the machine at hand.

One of the Industry 4.0 research lines that has attracted more interest is predictive maintenance. This is the maintenance process where faults and breakdowns are predicted before they occur, allowing for timely maintenance just when it is needed. Thus, industries avoid unscheduled shutdowns, machinery damage and optimize maintenance costs. Compared to traditional maintenance, predictive maintenance is performed just when it is required, instead of regularly scheduled, thus shortening downtimes and costs associated with unnecessary substitutions. Furthermore, it constitutes a substantial improvement over post-failure maintenance, where unscheduled shutdowns cause major disruptions of service and even machinery damage. The first step to predictive maintenance is to know what is happening in the industrial machinery continuously, precisely, and accurately. By using this information, fault detection and diagnosis can be carried out [2].

Unfortunately, there is a general lack of empirical research within the field of maintenance. In [3] authors describe how they found that, out of thousands of papers related to maintenance, only less than 90 displayed empirical real-world evidence. Therefore, empirical approaches that deal with fault detection and maintenance applied to real industry cases with real data are rather scarce.

In this paper, we propose a novel machine learning architecture solution that combines clustering and autoencoders to aid in early failure detection, the first step towards predictive maintenance. After a careful revision of the current state of the art, we checked different combinations of the most appropriate sets of machine learning algorithms and techniques to achieve the best solution with outstanding results empirically tested. One of the great advantages of our approach is that we researched not only one machine learning technique, but the combination of several techniques to achieve the highest performance rate. Our main contributions are:

  • The proposed solution is especially useful in scenarios where the different manufacturing stages cannot be clearly identified using time windows or when the stages can vary from one run to another. This makes our solution especially useful in scenarios where the different manufacturing stages cannot be clearly identified using time windows or when the stages can vary from one run to another.

  • The proposed solution can be easily tailored to industrial processes other than our application to gas turbines.

  • The proposed solution can detect faults that not only present abnormal values above or below safety margins, but also those that affect the normal relationship between magnitudes without exceeding the operation limits, allowing for timely corrective actions.

  • Due to the architecture is using an automatic semi-supervised learning, it does not require labelling. Consequently, it avoids human bias.

  • Finally, it uses does not require faulty data: applicable to new machine models and processes. Moreover, it avoids the creation of faulty data, which can be very costly in an industrial process.

Therefore, we believe that this paper will be a useful reference for researchers and practitioners that face up with the problem of implementing machine learning solutions for preventive and predictive maintenance in manufacturing. Finally, it should be pointed out that the data used throughout the whole solution has been obtained from a real gas turbine running in a real environment. As a result of that, the authors cope with issues specified in [3] using real data.

The remainder of the paper is structured as follows: section 2 presents the related work in the area of predictive maintenance and AI applications. Section 3 covers the aspects related to clustering, machine learning, and AI that must be considered before presenting our solution. Section 4 presents our proposed solution for fault detection and predictive maintenance on industrial processes. Section 5 discusses the results and limitations of the approach. Finally, Sect. 6, describes the conclusions and sketches future works.

2 Related work

In this section, we review the state-of-the-art literature in the research field and highlight the main advantages that our proposal provides in comparison with existing works.

Artificial Intelligence (AI) has been widely used in industry in different fields in the last years, such as for transforming standard manufacturing into intelligent manufacturing [4] or in smart manufacturing. It has also been used for denoising data [5, 6] or capturing latent factors from a multidimensional space of a distribution [7]. In conjunction with other techniques like data analysis and Big Data techniques, AI has contributed to improve data visualization, dashboard handling, and optimal control on decision-making [8].

As mentioned in the previous section, FDD is a crucial part of industrial processes and therefore, different approaches have been presented in the literature. For instance, in [9] authors present a survey based on Nature-inspired optimization algorithms (NIOAs) and this is used as a base for algorithm’s optimization in the first stage of the proposed architecture (clustering stage). On the other hand, in [10] a method based on convolutional neural networks and discrete wavelets is presented. However, this approach is focused on capturing health conditions for specific turbine problems. Furthermore, actual studies present problems dealing with industrial processes with stages that can present variations over time and are not fitted to specific time frames. As such, to tackle this problem, we present a more generic solution that focuses on the health of the industrial process, rather than the turbine, and tackles the problem of non-time-fitted stages of the industrial process.

Consequently, an optimization of the industrial process implies increasing the life expectancy of the turbine as collateral damage is avoided.

Moreover, other techniques that have been used on fault detection in industrial processes are based on SVM (Support Vector Machines) [11]. Nevertheless, this approach uses supervised learning, requiring the labelling of all data and the existence of error tuples. A tuple contains a set of values for each sensor reading in a specific moment. A correct or normal tuple belongs to a normal or correctly working industrial process. Consequently, an erroneous or faulty tuple contains values for each sensor that belongs to an abnormal or incorrect behaviour of the industrial process. Therefore, the labelling of the erroneous tuples can be very difficult to obtain if no data is available (since the obtaining implies provoking an intentional malfunction on the industrial process that could be even destructive for the industrial machinery, and consequently, very expensive). In the presented architecture, we can use non-supervised learning based on normal functioning, making the data capture process easy and less costly.

Focusing on anomaly detection and predictive maintenance, literature shows a close relation with Internet of Things (IoT) techniques [12, 13]. More specifically, autoencoders have been used on industry for anomaly detection on railways [14]. However, in this approach the authors obtain the different stages of the problem by dividing the timestamp of opening and closing the doors in 5 equal bins (using time as the divisor element). Therefore, even though with relevant results, this solution is too tailored to the railway problem and makes it difficult to be directly applied on other cases such as our scenario. The different stages of the operating mode of the turbine can vary its duration, making it difficult to apply any sort of temporary division. In addition, in [? ], the authors propose an auto-encoder-based dynamic threshold to reduce false alarm rate in anomaly detection of steam turbines. However, this approach

Autoencoders (more information about autoencoders can be found in chapter 4.2) and semi supervised learning are widely used in faulty detection. In [15,16,17] authors propose FDD on autoencoders on a Tennessee Eastman Process. Their approach is based on supervised learning (the data from the process is labeled as a “correct” or “faulty”) where the input data always corresponds to normal operations [18]. Moreover, in [19] the authors propose a novel deep transfer learning approach based on sparse auto-encoder for fault diagnosis. In [20], a semi-supervised robust projective and discriminative dictionary learning method for industrial process monitoring is presented. [21] introduces the idea of zero-shot learning into the industry field, tackling the zero-sample fault diagnosis task by proposing the fault description task based on the attribute transfer method. Furthermore, in [22], the authors propose a distributed sensor-fault detection and diagnosis system based on machine learning block with an autoencoder. However, their approach is based on the injection of five specific errors (drift bias, precision degradation, spike and stuck faults), whilst our approach is more generic and can detect every deviation regardless the cause. In [23], the authors propose a predictive railways maintenance strategy based on a hybrid neural architecture of long-short term memory autoencoder, providing the results in a test dataset. Nevertheless, our approach shows the results in the test dataset, and in a synthetic one. Finally, in [24] a transfer dictionary learning method for cross-domain multi-mode process monitoring and fault isolation is described.

It is worth highlighting that this approach is highly interesting because it allows us to correctly classify a binary classification when only correct data is available. If supervised learning techniques are used instead, then it must be taken into account that the input data must be labeled and malfunctioning data must be available. If malfunctioning data is not provided, it should be created and quite often faulty data cannot be generated since it implies forcing a malfunctioning cycle on very costly processes, which can simply become unaffordable. Even more, every different stage can have different faulty data and consequently, every stage must be able to discriminate its own correct and faulty tuples. For instance, in [25] the authors propose a model for FDD in hydraulic machinery integrating LSTM autoencoder detectors and diagnostic classifiers, with promising results. However, it uses a specific dataset for failure classification. The failures that are specified in that dataset are not related with the specific machinery. This can imply labeling faulty data as correct or vice versa, taking into account that some specific factors can not be extrapolated to new distribution (as for instance, ambient factors). Finally, in [26] the authors propose a FDD model based on convolutional neural networks. Neverthless, the create specific failures through a software, and consequently a classification approach is used. In our case, no software is available and no domain knowledge is provided. Thus, our model must extract its insights from the normal operation data distribution. The results between state-of-art and our approach are explained in Results, discussions and limitations.

As it has been shown, there is an extensive literature tackling malfunctioning machinery, where autoencoders are prime candidates for dealing with situations where malfunction data is not available. However, existing approaches have not dealt with a combination of stage-based processes where each stage has its own characteristics coupled with undefined duration and where malfunction data is also unavailable, such as in the case of gas turbines.

Compared to the state of the art, we present a comprehensive and novel architecture, based on different machine learning techniques, developed from the scratch, using data from a real industrial process that tackles these challenges. One of the advantages of our approach is that we also provide a guide to find out the different industrial stages (from a mathematical point of view instead of from a functional point of view, that could imply the introduction of a human bias on the results) and a system based on sliding windows for “faulty error” communication. Additionally, we must highlight that the model is based on semi-supervised learning, allowing us to train the model with only “correct data”. This avoids the need for “faulty data”, since forcing a malfunction on a gas turbine is prohibitive from a monetary point of view.

In order to illustrate our approach, in the next section we introduce a running case study on gas turbines and describe its underneath problems, requiring a more complex approach than a single machine learning solution.

3 Running case study: gas turbines

Industrial processes have increased its complexity dramatically in the last decades. The advances in automatization and techniques related with sensorization have opened a vast field of capabilities related with the control of the industrial process.

In this case study, we present the case of a gas turbine used for electric energy co-generation. A gas turbine is the prime mover for the generator. Its input is thermal energy from burning gas and the output is mechanical power that drives the generator. Thus, this device converts the mechanical energy generated by the gas combustion into electrical energy that is supplied as a product at the terminals of the generator.

The industrial process that governs the turbine is very complex. The turbine mode used in the case study incorporates more than 100 sensors, which provide information about every part of the device. The system measures 31 physical values with backup sensors and multiple measuring points distributed along the turbine and the generator, as we can see in Table 1. It must be highlighted that temperature is checked in 16 different places in the exhaustor. In addition, small variations in the relationship between the different magnitudes can lead to a deviation from the optimal operation and consequently, a loss of production.

The sensor network of the turbine is constituted by an IoT architecture. In that architecture, every sensor emits its value and the timestamp associated to it. However, the sensor emission is not synchronized with other sensors. To solve this, the ML model gets as an input the last value received on each tuple every second. That implies that a tuple of data is processed in real-time every second.

Table 1 Measured variables of the turbine-generator

Moreover, the progressive automation of industrial processes implies a decrease in human intervention on it. This fact is aggravated by the retirement of personnel with knowledge of the domain and with the inclusion of new personnel who lack that knowledge associated with the operation of the industrial process.

As a result, the information provided by the prevention system must be:

  1. 1.

    As fast as possible: If a deviation of the normal process is detected, it must be communicated as soon as possible so that preventive or corrective actions are taken immediately, avoiding potential damage to the machinery and service disruptions.

  2. 2.

    Provided with a suitable abstraction layer: If the information of the FDD system is provided with an ideal abstraction layer, users will be likely to act properly, without providing excessively low level data that may not be understood.

  3. 3.

    Their different running stages during execution time are not always the same ones, and therefore, this makes highly difficult to apply directly a classification technique: Due to the length of every phase in the industrial process can vary, the use of supervised learning and the labelling of the tuples can be tricky. Consequently, an automatic process for labelling must be created to tackle this issue.

Therefore, our goal is to characterize a normal operating state and communicate to the user when a continuous anomaly working process is happening.

However, in order to achieve these goals, the process of the gas turbines presents a few problems that must be properly tackled:

  1. 1.

    Number of stages from expert knowledge can imply a human bias: Since the authors do not have domain knowledge about the possible number of stages that the turbine could have, a brief summary was requested to the experts about how many stages we should be able to detect/consider, and which should the main characteristics that could help to discriminate those stages. Therefore, the industrial process is formed by 4 stages, related with its main dimension (Active Load). However, this approach is based on a physical point of view: experts only focus on the main output. Consequently, only one dimension has been taken in account. That implies that number of stages has not been obtained taking in account the whole multidimensional space (a mathematical point of view), and therefore, a human bias can have been introduced.

  2. 2.

    Correct and faulty data are different for every stage: Every stage differs on its working process to other stages. Consequently, correct data in one stage can be faulty data in another one and vice versa. Therefore, our architecture must be able to discriminate faulty and correct data in every stage.

  3. 3.

    The industrial process can have punctual anomalies in a normal working cycle: Punctual anomalies are part of a normal working process and they should not be communicated to the turbine’s user. Thus, our architecture must focus on a continuous anomaly working process detection. Consequently, punctual anomalies must be dampened and only a continuous anomaly working process (in other words, a continuous set of punctual anomalies) must be communicated.

4 Proposed architecture

The proposed architecture is made up of 3 components and the overall architecture of the system is represented on Fig. 1.

The first stage is responsible for automatically determining the number and characteristics of stages that correspond to the industrial process (and consequently it solves problem 1), the second one tests the correctness of the tuple received (therefore it solves problem 2, and the third element muffles false positives (thus, it solves problem 3). On each subsection, we have thoroughly analyzed each of the components and how each parameter has been obtained. To summarize all this information, at the end of each subsection it is provided a table with the main parameters for that stage of the proposed architecture.

Fig. 1
figure 1

Proposed three-stage architecture

4.1 Dealing with problem 1: detecting stages

In order to deal with problem 1, a correct detection of the number of clusters must be performed. The information provided by the experts revealed that the turbine had 4 states based on the most important parameter of the turbine: the supply of electrical power to the electrical grid (Active Load). According to this information, the different stages are:

  • stage 1. Idle: The turbine is at rest. It does not provide electrical power to the grid.

  • stage 2. Loading: The turbine starts its loading process, trying to reach the nominal operating values. During this stage, the active power supply grows continuously.

  • stage 3 Loaded. The turbine has finished loading and is operating with its nominal values. During this stage, the turbine provides the maximum active power values into the grid.

  • stage 4 Stop**. The turbine has received the stop command and begins to perform the stop** process. During this time, it reduces the electrical energy supplied until the turbine returns to stage 1, at which time it will be ready to restart the cycle.

In a graph representing active load / time, the expected four stages should have (approximately) the form of the Fig. 2.

Fig. 2
figure 2

Ideal working process from an approach based on classification

In order to know the suitable option for detecting the stages that conforms the operating process of the turbine, two different approaches have been compared: a traditional approach focused on classification and knowledge provide by domain experts; and a novel approach focused on clustering with a strictly data driven approach. Due to space restrictions, only the summary of the results of the traditional approach is show: the algorithm works properly for stable phases (phase 1 Idle and phase 3 Full load with a 100% of accuracy, but unfortunately it shows worse results for phase 2 Increase of load and phase 4 Decrease of load, where there is still room for improvement with an accuracy of 60% for phase 2 and an accuracy of 50% for phase 4. Moreover, this approach presents some problems: the labeling of each of the tuples is required for correct training. If such labelling is not provided by the data, then a labeling operation based on expert knowledge is necessary. This is not always easy to achieve with sufficient precision. Furthermore, if the labeling operation is manual, it can have considerable costs depending on the number of tuples to be labelled.

Consequently, the authors focus on a novel clustering-based approach. By means of the clustering, we can locate in a multidimensional space the different tuples of data that are in the vicinity, and consequently those tuples are labelled with the same label, thus greatly facilitating the labelling tasks.

Clustering is a technique of unsupervised machine learning and it consists on the process of grou** similar entities together [27]. It makes uses of statistical methods, distance based or density-based methods. The goal of this unsupervised machine learning technique is to find similarities in the data point and group similar data points together. Grou** similar elements together helps with the discrimination of the different groups. In addition, it helps to profile the attributes of each group. Due to its not necessary the labelling of the groups as in other techniques as classification (clustering obtain its own number of groups and its own labels), the automation of the process is easier and domain knowledge is not required. Moreover, the human bias is completely avoided, due to the algorithms for clustering are strictly mathematical and only a data driven approach can be performed.

Finally, it should be pointed that whereas in classification it is very easy to establish the degree of accuracy of the presented model (we have a predicted label and real label, so we can compare which model is better than another one with different metrics) [28], in clustering the boundary is a little bit fuzzier, due to the model does not have a “real label” to predict. In order to know which of the clustering models is the best one, we will use the “silhouette coefficient” [29]. This coefficient will provide us a guide about which model is best one, focusing on:

  • The density of every of the clusters found (cohesion)

  • The distance between the different clusters (separation)

In order to find different alternatives for the clustering operation on the system and find out the suitable solution, a few clustering algorithms have been tested on our model. The metric that would be used in order to know the suitability will be the silhouette coefficient.

The clustering algorithms used have been:

  • Kmeans [30]

  • DBSCAN (Density-based spatial clustering of applications with noise) [31]

  • HAC (Hierarchical Agglomerative Clustering) [32]

  • GMM (Gaussian Mixture Model) [33]

Every considered algorithm has different parameters that must be tailored in order to find the suitable configuration of the algorithm on stages detection. To simplify this, a PSO (Particle Swarm Optimization) algorithm has been used [34] (an optimization tool). The idea of PSO is to emulate the social behavior of birds and fishes by initializing a set of candidate solutions to search for an optimal one. Particles are scattered around the search-space, and they move around it to find the position of the optimal solution. Each particle represents a candidate solution, and their movements are affected in a two-fold manner: (1) their cognitive desire to search individually, (2) and the collective action of the group or its neighbors [35]. Thanks to it, we can obtain an optimal or sub-optimal value of hyperparameters for each clustering algorithm, and therefore we can focus on results. The results of clustering approach can be seen on Fig. 3

Fig. 3
figure 3

Clustering results on stages detection. Timestamp shows the amount of seconds from the beginning of the industrial process

After the execution of different clustering algorithms with PSO, looking for the best Silhouette coefficient, we obtain these conclusions:

  • Most of the algorithms (GMM, KMeans, HAC) partition the space in almost in same way, varying some punctual tuples in most of the algorithms. Those 3 clustering algorithms can find 6 different stages in the industrial process (DBSCAN has created too much clusters).

  • HAC allows an initial partition of the dimensional space, but due to the algorithm’s own nature, the clustering of additional tuples is not allowed once the space has been partitioned. Moreover, KMeans has same number of clusters, but some of them has a very reduced number of points. Thus, all these 2 options are discarded and the clustering algorithm chosen is GMM.

From this space-partition we can extract more insights:

  • From a mathematical point of view, the turbine has 6 stages. These are similar to those thought by a human from a mechanical point of view, but it must be pointed that during the start of the turbine, an additional stage is detected (stage of red color) surely due to the preparation of the different elements of the turbine before starting to supply active load.

  • The last stage (color in black) is actually a different stage from the first one. This is logical since from a purely mathematical point of view, the sensors continue to send completely different values to those corresponding to the idle stage. For instance, take temperature sensors in the idle stage as an example, these sensors emit a value close to 0, but at the moment that the turbine has stopped supplying power to the network, even if the active load is 0, the temperature values of the turbine are high, since the turbine is in the cooling process. After a properly time, the turbine temperature values will return to 0 and therefore this new stage would be overlapped with the initial start-up stage.

These conclusions help with the idea of avoiding human bias and focus on a mathematical point of view stage partitioning. Finally, the algorithm that we choose for partitioning the space is GMM, and it makes this partition of the data provided.

To sum up, in Table 2 we can see the information related to the components and the detection of the number of phases of the proposed architecture.

Table 2 Main elements of the clustering component

4.2 Dealing with problem 2: fault data detection

After solving the labeling process, we still need to solve which parts of this multidimensional space correspond to the normal operations and which parts are anomalies that occur when the gas turbine malfunctions. For this purpose, we then consider the use of autoencoders and semi-supervised learning. In order to do that, we will use an approach based on autoencoders. An autoencoder is a basic artificial neural network that consists of:

  • An input layer, with N inputs.

  • A hidden layer with Y neurons, where Y < N.

  • An output layer, with N outputs.

The graphic structure of an autoencoder can be seen on Fig. 4.

Fig. 4
figure 4

Structure of an autoencoder

The autoencoder is formed by two stages: a encode stage (from input to hidden layer) and a decode stage (from hidden to output layer). The function of the autoencoder is to compress the input into a latent-space representation, and then to reconstruct the output from this representation, as we can see in Fig. 5. Detail of train/test overfitting point

Fig. 5
figure 5

Working process of an autoencoder

That means that if the autoencoder has captured properly the latent space representation, when the input element is reconstructed from the latent representation, it will be almost or nearly the input element. If the autoencoder is trained with a set of elements that have a common latent space representation (as for instance, all the data tuples that are in a stage on an industrial process), every tuple after passing through the autoencoder will be reconstructed and the output from the output layer will be very similar to the input tuple.

From a mathematical point of view, we can treat the autoencoder as:

  • Encoder: The part that compresses the input into the latent-space representation. It can be represented as h=f(x)

  • Decoder: The part that reconstructs the input from the latent space representation. It can be represented as i=g(h)

To sum up, we can represent the general function of the autoencoder as i=g(f(x)), where i should be very similar to x. If we have a dataset D formed by \(\left( x_n \right)\) elements where \(\left( D = \{x_1,x_2,x_3,\ldots ,x_n\}\right)\) in a multidimensional space of n-dimensions \(D \in \mathbb {R}^n\), the autoencoder will be able to obtain for each \(x_n\) a values of \(i_n\), where \(x_n \simeq i_n\).

As we have previously mentioned, the autoencoder will try to reconstruct the input, such as output will be as close to input as possible. Therefore, if we do the operation output-input for every tuple, the result will be close to 0 for a correct tuple (Input–Output \(\simeq\) 0) and for a faulty one, the absolute value of the operation will be greater than 0 (|Input–Output| \(\ggg\) 0), due to the reconstruction rules that autoencoders has learnt are only valid for correct tuples (and if they are applied to a faulty tuple, the reconstruction of the tuple will trigger the value of the difference of the rest). In other words, the autoencoder can learn “the correct working process” of the industrial process, if we input “correct tuples” on it. Therefore, a correct tuple will have an output close to 0, and a faulty tuple will have an output much greater than 0, despite the problem that is generating the erroneous tuple.

Moreover, we have based the train on Root Mean Squared Error [36], due to this metric will help us to polarize values of correct and faulty data. It must be pointed out that the use of a single autoencoder is not sufficient since the autoencoder is capable of adequately discerning the ideal tuples that occupy a space. However, it should be pointed out that for each of the turbine stages, there will be a different region of the space with ideal operation. Therefore, the number of stages that conforms the industrial process should be automatically detected by means of a clustering algorithm. Then, once the number of clusters is known, each cluster will be trained by means of an autoencoder which will be focused on detecting which part of the multidimensional space is the correct one for that specific cluster. Consequently, we will have one autoencoder for each of the stages of the industrial process. In our particular case of Gas Turbines, we will have 6 autoencoders, as previously described in the last section. Furthermore, the number of neurons in the input and the output layer will be determined by the number of dimensions of the architecture (31 neurons). For choosing the number of neurons of the hidden layer, we have selected 2/3 of the number of neurons in the input layer. This proposal has been used in other autoencoders structures in the literature, as for example in [37].

Moreover, if all the correct tuples produce almost the same output in the autoencoder (its reconstruction error is close to 0) and all the faulty tuples produce different results, we could establish a boundary, based on the histograms of the training and the test data on each autoencoder, based on this reconstruction error. This boundary will discriminate correct tuples from faulty tuples, and it will help us to minimize false positives and false negatives, as we can see in Fig. 6.

Fig. 6
figure 6

Example of boundary for acceptance or rejection of a tuple in an autoencoder

One problem that can have autoencoders is overfitting [38], i.e, when the ANN “memorizes” the data received and loses its capability of prediction (this can create a false sense of accuracy in the model). Consequently, a split of the data has been done between training models and test models (80–20% ), with the aim of obtaining what would be the degree of accuracy of the model trained with completely unknown samples for each stage. Moreover, an excessive number of iterations can contribute to the overfitting problem [39]. To cope with that, a graph relating train/test cost for each iteration has been drawn. Consequently, we can graphically know which is the suitable number of iterations in order to achieve the best performance on unknown tuples. Thus, the training of the autoencoder for each stage has been realized with the optimal number of iterations, as we can see in Fig. 7. This enable to establish a boundary of discrimination between correct and faulty tuples, as it was showed on Fig. 7.

To sum up, in Table 3 we can see the information related to the structure and the topology of the autoencoders in this proposal.

Table 3 Main elements of the classification component
Fig. 7
figure 7

Avoiding overfitting. Top: Overall train/test cost. Bottom: Detail of overfitting point

4.3 Dealing with problem 3: detecting continuous malfunctioning

Once we have detected if a tuple of data corresponds to “correct” functioning or a “faulty” one, the next step is to discriminate if that tuple corresponds to a punctual isolated faulty data, or if it belongs to a set of faulty tuples that are received in a continuous streaming. In order to do that, we will use a sliding window method.

A window is determined by a predefined number of tuples, which are sliding by default, so naturally, their extent spans the most recent stream elements. Therefore, last received elements are more relevant than the previous ones. This is usually implemented as a FIFO (First In, First Out) queue.

Although there exist a great variety of windows (Count-based, Partitioned, Landmark, Fixed-band, etc) we are focusing on a time-based sliding window [40]. This is probably the most common class of window over data streams and it is suitable one for the problem that must be solved.

The sliding windows methods have been used in the industry for data-driven robots [41] and for FDD [42], blockchain architecture and IoT infrastructure [

Table 4 Main elements of the classification component