1 Introduction

According to various reports from global health organizations as well as the World Health Organization, cardiovascular diseases are primarily responsible for deaths worldwide. Annually, the number of deaths from heart diseases is higher than deaths due to any other disease. Eighty five percent of all heart diseases are due to stroke and heart attack. Generally, seventy five percent of cardiac deaths occur globally at places with lower income groups or middle ones. Cardiac arrhythmias and their long-term effects are the main cause of cardiovascular diseases that are overlooked in fatal issues.

Irregular heart rate usually results in a medical condition known as arrhythmia. It is basically a rhythm conduction disorder and arrhythmia have a crucial significance in ECG abnormalities [1, 3, 13]. Each record contains uninterrupted readings of ECG signals from an individual subject, except for two records i.e., 201 and 202. The data from the two mentioned records were taken from only one male subject. There are four records having paced beats of 102,104,107 and 217 in the dataset which are generally not taken for classification purposes. The dataset has two channels of information where the channel was an "altered appendage lead II" (MLII), and the other channel was generally V1 (or V2, V4, V5, up to Subjects).

In our experiments in this paper, only the lead MLII was used because normal QRS complexes are usually more dominant and prominent as compared to V5. Further in this work, AAMI protocol is consented, and a proper map** of all ECG heartbeat marks is done with AAMI names as per the map** information provided in Table 1. The dataset in this examination contains all five types of beats which are Normal beats (N), Supra Ventricular Ectopic (SVEBs), Ventricular Ectopic (VEBs), combination of ventricular and ordinary (F) and unknown beats (Q) [14]. The whole MIT-BIH AD is segregated into Train dataset (DS1) and Validation or test dataset (DS2) for patient specific classification [15]. Each database contains information from 22 records [16].

Table 1 The relation between ECG heartbeat labels to ANSI-AAMI Standards [29]

Train Records [30]

101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124, 201, 203, 205, 207, 208, 209, 215, 220, 223, 230

Test Records

100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212, 213, 214, 219, 221, 222, 228, 231, 232, 233, 234

In this study, the entire MIT-BIH AD is divided into two sections. In the context of the segmentation of the above records, we see that there are two sections, one is the training dataset and the other is the test dataset. Here, for proper and efficient training of the model, we use the training dataset, while validation is performed on the test dataset to check the performance of the RBM model.

2.2 Heartbeats normalization

Heartbeat Normalization is an essential step to preprocessing as it helps us to remove unnecessary information in the form of noise and get the signal with the maximum essential information. Usually, the z-score method or technique is employed for the above purpose. Initially, we compute (μ), which is the statistical mean value of all amplitude values present in each subject of MIT-BIH AD, and then the difference between the mean (μ) and the amplitude point, which is x (i)–(μ) is calculated. Ultimately the result from the above calculation is divided by the standard deviation (σ) of the waveform. Equation 1 described below is employed to normalize all amplitude values.

$$Z = \left( {x\left( i \right) - \mu } \right) / \sigma$$
(1)

2.3 Heartbeat segmentation

For extraction of a full heartbeat from the ECG signal, there is a necessity to describe what one cycle of heartbeat is and then do segmentation on the same data. Our main aim is to get complete heartbeat with all the waves so that complete information can be extracted. In MIT-BIH AD, the ECG signal is recorded by the cardiologist along with the annotation files containing the information. Each heartbeat is segmented with its R peak values ​​based on the information in the annotation files. The R-peak is the center of the heartbeat and contains most information. Here, we are using consecutive heartbeats. Following which we are dissecting the sample of 3 heartbeats in such a way that we get a full ECG signal comprising of all essential segments namely P wave, QRS, and T wave. Figure 3 clearly depicts how we have got a full ECG signal with all successor and predecessor waves. In Fig. 3, t represents the sample positions of an ECG signal, while the voltage at time t is depicted by v(t) (in millivolts), jth R peak is represented by Rj, and Rj has the time index \(T_{{R_{j} }}\) [3].

Fig. 3
figure 3

An illustration for the heartbeat segmentation processes

After segmenting the heartbeat, we get the jth heartbeat Hj, which comprises of sample points between \(\left\lfloor {\frac{1}{2}\left( {T_{{R_{j - 1} }} + T_{{R_{j} }} } \right)} \right\rfloor\) and \(\left\lfloor {\frac{1}{2}\left( {T_{{R_{j} }} + T_{{R_{j + 1} }} } \right)} \right\rfloor\) where ⎿a⏌ means an integer floor function.

For each heartbeat, there is several samples (D) that needs to be set. Henceforth, after measuring the time durations of all the segmented beats, we found a value that is greater than Ninety Five percent of all durations. This value must be then applied to all complete heartbeats for further process.

2.4 Restricted Boltzmann machine (RBM) model

RBM is an exclusive kind of Markov random field which has both stochastic visible as well as hidden layer [17]. A bipartite graph can be used to represent RBM which shows that information flows bi-directionally during the training and usage of the network along with both directions having the same weight. In RBM the joint probability distribution function p (v, h; θ) over the hidden units h and visible unit’s v, with the model parameters being θ, can be represented using an energy function E (v, h; θ) of [17]

$$p\left( {v,h;\theta } \right) = { }\frac{{{\text{exp}}\left( { - E\left( {v,{ }h;\theta } \right)} \right)}}{Z}$$
(2)

where partition function \(Z = \mathop \sum \limits_{v} \mathop \sum \limits_{h} {\text{exp}}\left( { - E\left( {v,h;\theta } \right)} \right)\) with the marginal probability for visible vector v is assigned by the model

$$p\left( {v;\theta } \right) = { }\frac{{\mathop \sum \nolimits_{h} {\text{exp}}\left( { - E\left( {v,h;\theta } \right)} \right)}}{Z}$$
(3)

For a RBM Bernoulli (visible unit), Bernoulli (hidden unit), the energy function is defined as

$$E\left( {v,h;\theta } \right) = { } - \mathop \sum \limits_{i = 1}^{I} \mathop \sum \limits_{j = 1}^{J} w_{i,j} v_{i} h_{j} - { }\mathop \sum \limits_{i = 1}^{I} b_{i} v_{i} - { }\mathop \sum \limits_{j = 1}^{J} a_{j} h_{j}$$
(4)

The symmetric interaction term between visible unit \(v_{i}\) and hidden unit \(h_{j}\), is represented by \(w_{{i{ }j}} {\text{ and }}b_{i} { }and\) \(a_{j}\) represent the bias terms, where I and J are the index numbers of visible and hidden units. Similarly, for a Gaussian (visible) and Bernoulli (hidden) RBM, the energy is

$$E\left( {v,h;\theta } \right) = - \mathop \sum \limits_{i = 1}^{I} \mathop \sum \limits_{j = 1}^{J} w_{{i{ }j}} v_{i} h_{j} - \frac{1}{2}\mathop \sum \limits_{i = 1}^{I} (b_{i} - v_{i} )^{2} - \mathop \sum \limits_{j = 1}^{J} a_{j} h_{j}$$
(5)

The corresponding conditional probability become.

$$p\left( {h_{j} = 1{|}v;\theta } \right) = \sigma \left( {\mathop \sum \limits_{i = 1}^{I} w_{{i{ }j}} v_{i} + a_{j} } \right)$$
(6)
$$p\left( {v_{i} {|}h;\theta } \right) = N\left( {\mathop \sum \limits_{j = 1}^{J} w_{{i{ }j}} h_{j} + b_{i} ,1} \right)$$
(7)

In this equation, \(v_{i}\) follows a mean of Gaussian distribution \(\mathop \sum \nolimits_{j = 1}^{J} (w_{ij} h_{j} + { }b_{i} )\) with variance 1 and takes real values [17]. All stochastic variables that have a real value can be converted to variables having two values using Gaussian Bernoulli. This can then be processed further by using Bernoulli- Bernoulli RBMs. A revised rule for setting up the weights of the RBM can be obtained using the gradient of log likelihood.

$$\Delta w = E_{data} \left( {v_{i} h_{j} } \right) - E_{model} \left( {v_{i} h_{j} } \right)$$
(8)

where \(E_{data} \left( {v_{i} h_{j} } \right)\) the expectation of is perceive in the training set and \(E_{model} \left( {v_{i} h_{j} } \right)\) is that same prediction under the distribution defined by the model. The model above is generative RBM model where the distribution of input data is applied to the hidden variables and it also holds unlabeled. The contrastive divergence (CD) learning algorithm [18] is usually administered to optimize the coarse-grained generative objective function and is used here for the training of an RBM model.

The description discussed above is a generative model of RBM and identifies the input data distribution applied to hidden variables which involves unlabeled information. When label information is present, it can be applied along with unlabeled information to form joint data set. To optimize the imprecise generative objective function associated with information likelihood the contrastive divergence (CD) learning algorithm is applied. CD algorithm is a simple technique used for training an RBM and was developed by Hinton [19].

Encoding dependencies between variables can be said to be one of the purposes of using DL. The dependencies are captured by associating a scalar energy for each configuration of the variables, which works as a measure of compatibility. The main aim of any energy-based model is to minimize some predefined energy function. The RBM model training procedure is around the whole idea of getting the best suited parameters so that the minimum energy state could be reached.

2.4.1 Working of the proposed RBM model

In the stacked RBM network, we have used Bernoulli-RBM from scikit-learn libraries of Python where we have used three different RBM layers. The designed deep neural network has three RBM layers with a layer structure of 416-100-100-100-5. The last two layers are the dense layers. Eventually after the network has been designed the calculated weights and biases are stored. In the sequential model the last two layers which are the dense layers are added to give the final result. For each layer of the network, no. of iterations is set as 100.The first dense layer uses ReLU as its activation function. The output from the third layer of stacked RBM network is fed to the first dense layer. The stored weights are then set to the layers. The last dense layer is used to classify the result into four classes where SoftMax [20] is employed as the activation function with Adam optimizer and categorical cross entropy as the loss function.

3 Experimental results

In this study, the experiment is performed using dual Intel Xeon E5-2600 with a 2.4 GHz processor, 64 GB RAM and Keras library of Python. We have also tested the model for various performance metrics. In this proposed study, stacked RBM classifier is used for detecting 4 types of arrhythmias which also include the normal heart rhythms. The accuracy of the model has been calculated and results have been compared with all the state of art methodologies. The proposed methodology’s performance is evaluated for every category of ECG signal. The performance of the model is based on metrics like specificity, sensitivity and overall accuracy which is calculated with the help of true positive, false positive, true negative and false negative values of the confusion matrix [21, 22].

Sensitivity can be calculated as-

$$S_{E} = \frac{{T_{p} *100}}{{T_{P} + F_{N} }}$$
(9)

Specificity can be calculated as-

$$S_{p} = \frac{{T_{N} *100}}{{T_{N} + F_{P} }}$$
(10)

Accuracy can be calculated as-

$$Accuracy = \frac{{(T_{P} + T_{N} )*100}}{{T_{P} + F_{N} + T_{N} + F_{P} }}$$
(11)

The average accuracy is calculated by finding the mean of individual accuracies of each class.

3.1 Experiment 1st (Patient independent multi class classification for different train-test ratios)

The architectural model used for classification in all the experiments performed is shown in the Fig. 2 above. Table 2 depicts the Patient Independent ECG Signal classification results of our model. Patient independent data mean that training and testing datasets have similar subjects (Patient) heartbeats, that is, training datasets may include similar beats. The overall accuracy along with the sensitivity and specificity of class N, S, V and F is depicted in the table. Since the major portion for cause of arrhythmias is because of S and V classes, their sensitivity and specificity have been highlighted in the results.

Table 2 Performance of the Stacked RBM classifier for Patient Independent Data

From Table 2 it is clear that the best average sensitivity and specificity for all class is 89.07% and 99.04% respectively. Overall, the classifier performs best for 60–40 ratio with the best overall accuracy of 98.61%, sensitivity of 89.07% and specificity of 98.93%.

Table 3 shows the comparison table between the various studies mentioned in [10, 23] and the proposed work. The comparison is done on the basis of the overall accuracy achieved by the models.

Table 3 Performance of other classifiers and our classifier on Patient Independent data

The proposed model has got an accuracy of 98.61% where classification is done into 5 beats namely N, S, V, F and Q.

3.2 Experiment IInd (Patient independent binary classification)

In experiment 2, our stacked RBM model is performed for binary class classification where we are using two classes i.e. N and V. In Table 4, since class V is the most prominent cause for fatal arrhythmias, a binary classification is performed between normal beats and premature ventricular contraction heartbeats. The results in termed of overall accuracy, sensitivity and specificity are mentioned for both the heartbeats. The classifier has achieved an overall accuracy of 99.61%.

Table 4 Performance of other classifiers and our classifier on Patient Independent data

Jun et al. [24] for the purpose of beat detection, proposed the use of DNN with 6 hidden layers of PVC signal established on the popular MIT-BIH AD. In the paper, a 2-class issue was worked upon wherein the normal and PVC beats were drawn out for evaluation. Differentiating this with the raw signal extraction, six number of features were used for representation of a heartbeat including the amplitudes of P, Q and R peaks, RR interval, QRS complex duration and the important ventricular activation time.

The Table 5 shows the comparison based on Overall Accuracy, Sensitivity and Specificity in the paper by Jun et al. [24], and the proposed work on binary classification with respect to patient independent data. Comparing with the classification process of 5-classes in the previous subpart, two class classifications are easier.

Table 5 Performance of other classifiers and our classifier on Patient Independent data for binary classes

3.3 Experiment IIIrd (Patient specific for multi class classification)

In the third experiment, we use two data sets one for training (DS1) and other for testing (DS2) i.e., a patient specific regime is followed. Henceforth, DS1 and the first 300 sample heartbeats of DS2 dataset records are used for the training of the model while the remaining sample heartbeats are used for testing of the proposed model.

In Table 6, we have shown the metrics like accuracy, sensitivity, and specificity for optimum results in the proposed work. The proposed Stacked RBM DNN gets a good overall accuracy of 95.13%. We have used D = 416 for each vector, which means that the network has 416 inputs nodes and 5 output nodes, wherein each resulting output node maps to one specific class in the Table 1. The table also shows the average specificity and sensitivity which are 93.33% and 81.13% respectively.

Table 6 Performance of our classifier for Patient Specific/Patient-oriented data

4 Discussion

In the proposed Stacked RBM classifier over the numerous literature methodologies, we have used raw signal for automated feature extraction [25, 26]. The classification performance evaluation of our proposed method was discovered to be very good and a lot better as compared to the existing systems that work on Patient independent data. Further, without expert interference, the performance of the model is still comparable to many other systems which use patient specific data. The proposed stacked RBM can overcome any drawbacks by using raw ECG wave forms which are aligned as input to get proper and better representations for classification purposes.

In [23], Acharya et al. has used a Deep CNN model with a computer aided diagnosis (CAD) system for accurate assessment of ECG signals. Our proposed method on the other side has used Stacked RBM which has given a better result in terms of specificity and sensitivity. Further, Sandeep et al. in [27] has used PSO and optimization techniques along with feature extraction for ECG signal classification. The accuracy and other metrics of our model are better than the accuracy achieved in the study. Shadmandet al. in [28] has used Particle Swarm Optimization algorithm for training the block based neural network which has been used as a classifier. The accuracy calculated is around 97% which is less than the accuracy of our proposed model.

In [6], Zubair et al. has used CNN on the raw signal which negates the need for any handcrafted features. Our model has also performed on raw ECG signal and given better accuracy as compared to the paper [6]. In [8], Kachuee et al. have used a Deep CNN to perform arrhythmia classification and transferring the knowledge of the above task to Myocardial Infarction task. Further in [10], Saroj et al. has proposed an 11-layer model. The CNN model has used end-to-end structure for classification, but our proposed model has done heartbeat segmentation and a stacked RBM layered network to adjust the weights and biases for the best learning possible. In [11], Sannino et al. has denoised the signal, and performed heartbeat segmentation and temporal feature extraction after which Artificial neural network is used for arrhythmia classification.

Overall, as compared to all the studies mentioned in Table 7, the proposed Stacked RBM gave a very good overall accuracy of 98.61% for patient independent data. For binary classification also the results were good with an overall accuracy of 99.61% and for patient specific data the accuracy is 95.13%.

Table 7 Performance Comparison table for Patient Independent Data of various classifiers with our classifier

5 Conclusion

The proposed RBM model works on the patient-independent scheme for multiclass and binary classification along with a patient dependent classifier. In this study, we have used the popular MIT-BIH arrhythmia database which is a popular patient-specific dataset to segregate the heartbeats into 4 different classes. The four classes consist of one normal heartbeat and 3 abnormal heartbeats namely SVEB, VEB, and F. Further, we used Stacked RBM for patient-independent data classification for multiclass, patient-independent binary classification and patient-specific classification. The best result was obtained using patient-independent classification with an overall accuracy of 98.61%. For patient-independent binary classification, accuracy obtained was 99.61% and for patient specific data, the accuracy obtained was 95.13%. In this paper one end of our proposed system takes raw signals, following which the heartbeat segmentation takes place and eventually the other end of the system gives beat-by-beat results on classification. The proposed algorithm has worked really well for patient-independent classification for multiclass and binary class and has worked comparably well for patient-dependent data. The biggest drawback of this RBM model is that it requires specialized hardware (GPU) and a large dataset to properly train it, because of which computational time seems to be higher than usual classifies but at the same time it offers a better reliability in identifying different patterns.