Introduction

The human circulatory system consists of heart, which contracts and expands to pump the oxygenated blood into the vascular system, receives deoxygenated blood from various parts of the body and routes it into the lungs [1] where it gets oxygenated. Generally the sinus node acts as the source of impulse signal in the heart during normal operation. During abnormal conditions sinus node will not function, instead there will be other sources of impulse generation in the heart, a condition called arrhythmia is said to exist. One of the most occurring abnormalities of the heart is arrhythmia. It occurs due to the change in rhythm of heart. If it is identified well within time, the disease can be very well controlled and better healthcare could be provided to the patient, whereas on the other hand the arrhythmias are life-threatening and may lead to fatal abnormalities like ventricular tachycardia followed by fibrillations and if not therapeutically intervened at proper time, it may take the life of the patient.

Due to the increased incidence of arrhythmia, it has drawn attention worldwide including in the develo** countries. Hence detecting arrhythmia at an early stage is necessary to provide the quality healthcare and prognostic diagnosis.

Electrocardiogram (ECG) is one of the popular noninvasive tool for the diagnosis of heart related diseases, which is the electrical pulsation captured at the human body surface, representative of the functional dynamics [2] of the heart. These activities are almost repetitive and have a normal pattern with different amplitudes and corresponding intervals and morphology, in relation to the normal functioning of the heart. The ECG from a normal heart is called normal sinus rhythm. When the heart departs from its normal function, the ECG also changes in its characteristics. So the prognostic information given by the ECG has to be utilized properly for better diagnostics.

The different computational tools and algorithms are also can be used for automated diagnosis and early detection of heart related abnormalities. Towards this direction, an attempt is made towards automated classification of ECG belonging to normal sinus rhythm and arrhythmia classes.

Different techniques have been used to extract the R point in the ECG. Recently quadratic spline wavelet is used for ECG delineation [3, 4], where individual waves and amplitudes are extracted. Even the traditional Pan and Tompkins [5] algorithm is also computationally effective. Among other features extracted are time frequency features [6, 7] and higher order cumulants [8]. Among various classification methods discriminant analysis [9], error backpropagation neural network [10], self organizing feature maps using neural networks [11], probabilistic neural networks [12], support vector machines [13], independent component analysis and Gaussian mixture model [14].

There are automated methods for classification of arrhythmic beats [15, 16] in the ECG using different techniques. But most of the methods use compression of time domain features using Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) etc [14]. In the proposed method features are compressed using PCA on Discrete Wavelet Transform (DWT) sub bands. Since the DWT gives sparse representation of the data, PCA or any other feature compression technique should condense the features better than the time domain counterpart method.

In the proposed methodology the DWT sub band principal components are used for classification using three different algorithms, i.e., Gaussian mixture model (GMM), error back propagation neural network (EBPNN) and support vector machine (SVM) classifiers. The results are shown, compared and discussed.

Materials

In this work we have used open source data from www.physionet.org from MIT BIH arrhythmia database and MIT BIH normal sinus rhythm database which are described following.

MIT BIH arrhythmia database

It consists of 48 half hour excerpts of two channel ambulatory ECG data obtained from 47 subjects studied by the BIH arrhythmia Laboratory between 1975 and 1979. Twenty three recordings were randomly taken from a set of 4,000 twenty four hour ambulatory ECG data collected from a mixed population including both inpatients (approximately 60%) and outpatients (approximately 40%) at Boston’s Beth Israel Hospital. Remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias. The ECG recordings are sampled at 360 Hz. per channel with 11 bit resolution over 10 mV range.

MIT BIH normal sinus rhythm database

It consists of 18 long term ECG recordings of subjects referred to the Arrhythmia Laboratory at Boston’s Beth Israel Hospital. Subjects included in this database were found to have had no significant arrhythmias; they include 5 men, aged 26 to 45, and 13 women, aged 20 to 50. It is digitized at 128 Hz.

Methodology

The proposed methodology consists of four steps, preprocessing, registration, feature extraction and classification of ECG signal. We have employed three different classifiers; Gaussian mixture model (GMM), error back propagation neural network (EBPNN) [17] and support vector machine (SVM) [18] classifier. Figure 1 shows the block diagram of the proposed system.

Fig. 1
figure 1

Block diagram of the proposed scheme of classification

In the preprocessing stage re-sampling, noise filtering and baseline drift removal are done. The preprocessed ECG signal is subjected to extract the R point. From the detected R point a 200 sample window having R point at its 100th sample is segmented. On each of the window of 200 samples, DWT is applied and the four sub bands are subjected to PCA. After PCA, appropriate number of principal components (PCs) from each sub band is chosen such that they contain 98% of data variability. These PCs are used for statistical validation against the PCs of time domain windowed signal having 200 samples. After statistical validation the significant features are used to classify into normal sinus rhythm and arrhythmia using three different classifiers, viz. GMM, EBPNN and SVM. The results are tabulated and compared.

Preprocessing

Each of the datasets used are sampled at different rates, so a common sampling rate is chosen to be 250 Hz and re-sampled using standard re-sampling techniques [19]. Also the ECG signal suffers from baseline drift, which is corrected by standard filtering [20] techniques.

R point detection and segmentation

The R point is the characteristic of the ECG signal which can be used for registration before feature extraction and subsequent classification. There are many methods available in the literature for registration of ECG, Pan and Tompkins [5] work is simplest, even the wavelet based technique using quadratic spline wavelet [3] is computationally exhaustive. Here in our analysis Pan Tompkins algorithm is been used for detecting R point for ECG registration [14, 25]. It uses derivative, smoothing, summing and threshold operators in its architecture. Derivative gives the slope information; smoothing operation removes the high frequency noise. Finally the group delay incurred by all involved filters is compensated by advancing in time, and the middle point of the rectangular pulses after threshold operation is the R point.

If n is the number of points in the ECG data, and only first derivative is computed, then computation of first derivative is done in O(n) number of operations. Similar way rectification is done in O(n) operations. On the same lines smoothing, second derivative computation, rectification, smoothing, summing and threshold operations each require O(n) operations. Hence total complexity of Pan Tompkins algorithm is O(dkn) where d is the number of blocks present, k is the order of the individual filters (in this case it is 1) and n is the number of points in ECG signal. When d and k are very small compared to n, the total complexity would be O(n) i.e., linear order.

Based on the detected R point 100 points are taken to the right of R point and 99 points before R point with R point itself, altogether 200 point window is taken and used for subsequent discrete wavelet transform (DWT) [21] computation and pattern classification.

Discrete Wavelet Transform (DWT) computation

Fourier transform [20] is a very good tool to analyze the global frequency present in the signal; the transform has good frequency resolution but no time resolution. The time resolution may be increased at the expenditure of reduced frequency resolution. This intuition is incorporated in wavelet transform. It decomposes a signal in time-frequency components, and provides multi resolution analysis (MRA) [21] with different resolution at different levels of decomposition. Thus it can discriminate two signals having same frequency components occurring at different times. The wavelet transform consists of translates and dilates of a basis function called as mother wavelet, such a basis function at scale a and translation b is given by wavelet equation as follows

$$ {\psi_{a,b}}(t) = \frac{1}{{\sqrt {a} }}\psi \left( {\frac{{t - b}}{a}} \right) $$
(1)

When the scale and translation variables are sampled on a dyadic grid, it results in discrete wavelet transform (DWT), whose wavelet equation is given by

$$ {\psi_{m,n}}(t) = {2^{ - m/2}}\psi \left( {{2^{ - m}}t - n} \right) $$
(2)

The wavelet function is associated with a component of the signal having higher frequencies, called as detail of the signal. Using the family of translates and dilates of mother wavelet, we can express the detail coefficients of the DWT of signal x(t) as,

$$ {T_{m,n}} = \int\limits_{ - \infty }^\infty {x(t){\psi_{m,n}}} (t)dt $$
(3)

i.e, detailed coefficients of a signal is given by the inner product between the signal and the wavelet function at different m and n.

Associated with low frequency components of a signal (similar to signal detail), there is another basis function which is orthogonal to the corresponding wavelet function called as scaling function. Using scaling function the approximation coefficients of the signal is given by,

$$ {S_{m,n}} = \int\limits_{ - \infty }^{ + \infty } {x(t){\phi_{m,n}}(t)dt} $$
(4)

where

$$ {\phi_{m,n}}(t) = {2^{ - m/2}}\phi \left( {{2^{ - m}}t - n} \right)\quad {\hbox{and}}\quad \int\limits_{ - \infty }^{ + \infty } {{\phi_{0,0}}(t)dt = 1} $$
(5)

There are many wavelet families. Among them Daubechies 4 (db-4) is used for our study. The ECG signal is decomposed in time frequency using db-4 wavelet for a depth of 4 levels. The frequency components in each sub band are shown in Fig. 2. In the first level of decomposition the approximation signal will contain frequencies up to 62.5 Hz, where as detail signal will contain frequencies from 62.5 Hz till 125 Hz, and so on for other levels of decomposition as shown in Fig. 2.

Fig. 2
figure 2

Wavelet decomposition

The computational complexity of DWT using Mallat’s algorithm is O(2d nk)where d is the depth of decomposition, n is the number of points in ECG data and k is the number of filter coefficients in approximation or detail filters. In our case n = 200, d = 4, k = 8.

In order for a simple and effective feature vector , the reduced principal components of wavelet sub-bands a4,d4,d3,d2 are used as features for statistical test of significance and subsequently classification by three classifiers independently.

Principal Component Analysis (PCA) on wavelet sub band features

In order to compress the features of ECG sub bands, PCA [22] is applied on each of the sub band of interest. We have considered four sub bands, a4, d4, d3, d2; as most of the signal frequencies of interest lies in these bands. PCA is an orthogonal transformation which transforms the sub band coefficients into those directions of maximum variance. The procedure consists of finding data covariance matrix by subtracting mean of the features as,

$$ \Sigma = \frac{1}{N}\left\{ {\left( {d - \bar{d}} \right){{\left( {d - \bar{d}} \right)}^T}} \right\},1 \leqslant i,j \leqslant N $$
(6)

where \( \bar{d} \) is the feature mean vector and N is the number of samples(observations) considered including both arrhythmia and normal sinus rhythm signals. In this study we have included 548 observations including 274 from each of the classes.

The covariance matrix is positive definite. The eigen values and eigen vectors of Σ are computed. The eigen vectors are sorted in the descending order of eigen values. Finally the data is projected into the directions of eigen vectors to obtain principal components (PCs). The first few of them will contain most of the energy contained in the signal. The first few components are chosen on the basis that 99% of the data energy is contained in these PCs.

PCA routine uses singular value decomposition method to find eigen values and eigen vectors of the covariance matrix. The SVD can be performed in O(mn 2) floating point operations, where m is total number of patterns, n is the dimension of each pattern.

We have chosen 13 PCs based on 98% data variability condition and the PCs of ECG sub band coefficients are used to analyze the statistical significance and subsequent classification.

Statistical test

A feature extraction scheme in compact supported wavelet basis space along with principal component analysis is proposed. Theoretically the wavelet features after PCA should give better and compact representation than that of time domain PCA features. Both the time domain and wavelet domain features are subjected to F test [23] and independent sample t-test [24] for statistical significance study.

Independent sample t-test is performed to show the significance of each PC in discriminating two groups with their means. Also another discriminating criterion, i.e., ratio of between group variance to within group variance that leads to F-test is considered to cross validate the PCs in time domain and DWT sub band domain.

Classification

Using the PCA features of ECG sub bands a two class pattern classification problem is formulated. In this paper we have considered three pattern classifiers, viz., GMM, BPNN and SVM.

GMM

Here we have a binary class problem of classification of normal sinus rhythm signal and arrhythmia signal. The Gaussian mixture model assumes that the features are drawn from a normal distribution. We have two mixing components corresponding to normal and arrhythmia classes respectively. Therefore we have two class conditional densities, \( p\left( {{x_n}|{\omega_k}} \right) \), \( 1 \leqslant k \leqslant 2 \) and corresponding class prior probabilities, p(ω k ), \( 1 \leqslant k \leqslant 2 \). Each of the two mixing component has a mean vector and covariance matrix. Since we have applied orthogonal transformation in compact supported basis, the off diagonal elements in the covariance matrix are all approximately zero since the data will be highly uncorrelated. The probability density function of such a model is given by

$$ p{\left( {\left. {x_{n} } \right|\omega _{k} } \right)} = \frac{1} {{2\pi \left| {\Sigma _{k} \left| {^{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}} } \right.} \right.}}\exp {\left\{ { - \frac{1} {2}{\left( {x_{n} - \ifmmode\expandafter\bar\else\expandafter\=\fi{x}_{k} } \right)}^{T} \Sigma _{k} ^{{ - 1}} {\left( {x_{n} - \ifmmode\expandafter\bar\else\expandafter\=\fi{x}_{k} } \right)}} \right\}} $$
(7)

where

$$ {\bar{x}_k} = \frac{1}{{\left| {{X_k}} \right|}}\sum\limits_{{x_n} \in {\omega_k}} {{x_n}} $$
(8)

and

$$ {\Sigma_k} = \frac{1}{{\left| {{X_k}} \right|}}\sum\limits_{{x_n} \in {\omega_k}} {\left( {{x_n} - {{\bar{x}}_k}} \right)} {\left( {{x_n} - {{\bar{x}}_k}} \right)^T} = diag\left( {{\sigma_i}^2} \right),1 \leqslant i \leqslant d $$
(9)

The corresponding posterior probabilities are given by Bayes’ rule as follows.

$$ P\left( {{\omega_k}\left| {{x_i}} \right.} \right) = \frac{{p\left( {{x_i}\left| {{\omega_k}} \right.} \right)}}{{\sum\limits_{k = 1}^2 {p\left( {{\omega_k}} \right)p\left( {\left. {{x_i}} \right|{\omega_k}} \right)} }} $$
(10)

Since our data consists of missing observations or it does not represent the whole of the sample space, the mean vectors and the covariance matrices computed are not the correct ones. Therefore the means and variances are recomputed using Expectation Maximization (EM) algorithm and using maximum likelihood estimation method. The re-estimating formulae are following

$$ {\hat{\mu }_j} = \frac{{\sum\limits_{i = 1}^N {{x_i}P\left( {{\omega_j}\left| {{x_i}} \right.} \right)} }}{{\sum\limits_{i = 1}^N {P\left( {{\omega_j}\left| {{x_i}} \right.} \right)} }} $$
(11)
$$ \hat{\sigma }_j^2 = \frac{{{{\sum\limits_{i = 1}^N {\left( {{x_i} - {{\hat{\mu }}_j}} \right)} }^2}P\left( {{\omega_j}\left| {{x_i}} \right.} \right)}}{{\sum\limits_{i = 1}^N {P\left( {{\omega_j}\left| {{x_i}} \right.} \right)} }} $$
(12)
$$ p\left( {{{\hat{\omega }}_j}} \right) = \frac{1}{N}\sum\limits_{i = 1}^N {P\left( {{\omega_j}|{x_i}} \right)} $$
(13)

The initial prior probability is taken to be 0.5 for each of the classes. An initial model is assumed from the data. The EM algorithm used is having two core steps; E step and M step. During E step class conditional density is computed according to Eq. 7, and from it posterior density according to Eq. 10 is computed. During M step the class model is been re-estimated according to the Eqs. 11 through 13. The process is continued until the new estimate will not change much from the previous estimate, and model gets stabilized. Then the EM based GMM is said to be converged. The logarithm of the class conditional density called as log-likelihood is computed for each of the iteration and it will stop increasing at convergence.

The GMM algorithm is an optimization problem which maximizes the following objective function.

$$ J = \prod\limits_n {\sum\limits_k {p\left( {{\omega_k}} \right)p\left( {{x_n}\left| {{\omega_k}} \right.} \right)} } $$
(14)

The converged centroids are such that the product over all the observations, the total class conditional densities weighted with respective prior probability will be maximized. The EM algorithm determines its new estimate such that it will be approaching to the optimum of the objective function, so as for the algorithm to converge.

GMM is an iterative algorithm, which can be performed in O(ndkT) floating point operations, where n is the number of patterns, d is the total number of features in a pattern, k is the total number of classes present in the data, and T is the number of iterations required for convergence of the algorithm.

Error back propagation neural network

We have used here an error back propagation neural network [17] consisting of one input layer of 13 neurons, one layer of hidden neurons consisting of 6 nodes and one layer of two neurons in the output layer. The neural network is trained to adapt its weights so as to classify the data into two classes. This is also an optimization problem where following objective function is minimized.

$$ J(w) = \frac{1}{2}{\sum\limits_{n = 1}^N {\sum\limits_{k = 1}^N {\left\{ {{y_k}\left( {{x_n};w} \right) - t_k^n} \right\}} }^2} $$
(15)

Where y k (x n ;w) is the network response for k th class neuron in the output layer and \( t_k^n \) is the target for k th class of the n th observation feature vector.

There is a bias term in the input layer. The thirteen PCs of ECG sub band features are fed to the input layer of the neural network, along with bias term for the training set and the weights are adapted using gradient descent method in order to reduce the total mean square error below a threshold. Then the testing data is fed to the neural network architecture based on the derived weights and the output is noted and the data is classified into binary class.

At the input layer of the neural network there are n inputs, n 2 at the second layer, n 3 at the third layer and so on till k for the output layer. If the neural network takes T iterations to converge the total complexity will be O(nn 2 n 3...kT). In our case we have 13 neurons in the input layer, 6 neurons in the hidden layer and 2 neurons in the output layer, the algorithm converges in 19 epochs. Therefore the parameters of complexity are n = 13, n 2 = 6, k = 2, T = 19.

Support vector machine classifier

It is a single layer and highly non-linear network which optimizes the class separation boundary (discriminant hyper plane) such that the distance from the features falling in a given class to the hyper plane gets simultaneously maximized with respect to all the classes. Being a supervised classifier it has generalization ability, by which it can classify unseen data. Suppose \( \left( {{x_i},{y_i}} \right),i = 1:N \) is the data set, where x i is the ith feature point, y i is the class label. For the binary classification problem, let c+ and c are the centroids of the two classes, the classifier response will be

$$ \begin{array}{*{20}{c}} {{y_i} = {\rm sgn} \left( {\left( {x - c} \right).w} \right)} \\{ = {\rm sgn} \left( {\left( {x.{c_{+} }} \right) - \left( {x.{c_{-} }} \right) + b} \right)} \\\end{array} $$
(16)

where

$$ b = \frac{1}{2}\left( {{{\left\| {{c_{-} }} \right\|}^2} - {{\left\| {{c_{+} }} \right\|}^2}} \right) $$
(17)

The optimal hyperplane separating the two classes and satisfying condition in Eq. 16 is

$$ \begin{array}{*{20}c} {{minimize\frac{1}{2}{\left\| w \right\|}^{2} }} \\ {{w,b}} \\ \end{array} $$
(18)

such that

$$ y_{i} .{\left( {{\left( {w.x_{i} } \right)} + b} \right)} \geqslant 1,i = 1,...N $$
(19)

The Lagrangian dual of Eq. 18 is a quadratic programming problem and will find the optimal hyper plane which separate the two classes. It uses a routine called quadratic programming whose complexity is of polynomial order.

Results and discussion

The proposed methodology has been applied on ECG from MIT BIH arrhythmia and normal sinus rhythm databases (described in “Materials”) to form a two class problem. The R point in the ECG is detected by extended Pan Tompkins algorithm, which detects almost all peaks with a good detection rate. The detection of R point is shown in Fig. 3 for normal sinus rhythm signal. The choice of this algorithm is due to its simplicity and real time implementation ability, even though more accurate delineation algorithms [3, 7] are available in the literature.

Fig. 3
figure 3

Detection of R point in normal sinus rhythm signal. The detected R point is shown in red color

After detecting the R point in the ECG, 99 points from the left of R point and 100 points to the right of R point are taken along with the R point itself as a 200 sample window consisting of only one QRS complex. The 200 sample window of one QRS belonging to either arrhythmia or normal sinus rhythm will be the one pattern for the subsequent dimensionality reduction and pattern classification.

The power spectrum of a normal sinus rhythm signal and a arrhythmia signal is shown in Fig. 4. It is observed from Fig. 4, and Fig. 2 that most of the signal frequency of ECG lies in four sub-bands, approximation 4, detail 4, detail 3 and detail 2. Hence these four sub bands are used for subsequent sub band PCA and classification.

Fig. 4
figure 4

Power spectra of a normal sinus rhythm signal and b arrhythmia signal

The wavelet decomposition at different levels of decomposition is depicted in Fig. 5 for normal sinus rhythm signal and Fig. 6 for arrhythmia signal. The four sub band signals are shown, for both normal sinus rhythm and arrhythmia signals.

Fig. 5
figure 5

Normal sinus rhythm signal and its DWT a ECG signal b 2nd level detail signal c 3rd level detail signal d 4th level detail signal e 4th level approximation signal

Fig. 6
figure 6

Arrhythmia signal and its DWT a ECG signal b 2nd level detail signal c 3rd level detail signal d 4th level detail signal e 4th level approximation signal

Suppose if time domain segmented ECG signal after R point detection is used for PCA extraction, the energy profile of the principal components is shown in Fig. 7. The 13 principal components consist of 99.97% of signal energy in these directions.

Fig. 7
figure 7

Energy profile of PCs of time domain ECG signal

The sub bands of ECG are subjected to PCA and the energy profile of PCs is depicted in Fig. 8. The number of PCs considered and the energy contained in PCs directions of the data is given in Table 1.

Fig. 8
figure 8

Energy profile of PCs of ECG sub bands: a 2nd level detail b 3rd level detail c 4th level detail d 4th level approximation

Table 1 Sub band PCA

The segmented ECG from detected R point is decomposed using DWT and on this time frequency sub bands PCA is applied. The convention for classification is that applying PCA on time domain signal in order to compress it. But we have applied PCA on wavelet sub bands, which are yielding better discrimination ability between the normal sinus rhythm and arrhythmia class features so as to classify them. This is substantiated by the statistical independent t test on these features given in Table 2. Table 2 gives the independent t test for time domain PC features and DWT sub band PC features. It is found that from Table 2 that the significant p-value is less than 0.000 for DWT based PC features whereas for time domain PC features only first three features are significant as can be seen by corresponding p-value.

Table 2 Statistical significance test for time domain and DWT domain features

The Gaussian mixture model is trained and the log likelihood profile with respect to iterations is shown in Fig. 9. The GMM converges in 8 iterations.

Fig. 9
figure 9

Log-likelihood of GMM classifier increasing and stabilizing at convergence

The mean square error reduction during training of the error back propagation neural network is depicted in Fig. 10. It is seen that the neural network converges in 19 epochs. The SVM classification is shown in Fig. 11.

Fig. 10
figure 10

Training of the neural network, MSE reduction

Fig. 11
figure 11

SVM classification

The sensitivity, specificity and average accuracy is shown for the three classifiers, i.e., GMM, EBPNN and SVM classifiers in Table 3.

Table 3 Classification performance

Conclusion

In this study, a system approach has been developed in order to screen arrhythmia patients from normal ones based on ECG signal in time frequency sub band domain. In fact, it includes significant (lesser dimension) principal components due to sparser representation of features in time frequency domain. As because of the non-linearity of ECG signal, we have employed higher order classifiers considering non-linear transformations. In case of GMM, neural network and SVM, it is incorporated in EM algorithm, hidden layer and kernel transformation respectively.

From the results, finally it can be interestingly observed that the proposed system approach provides 87.36%, 93.41% and 95.60% overall accuracies for GMM, BPNN and SVM classifiers respectively. As a future work, the features extracted from other wavelet families can be used in our methodology for training and testing. Simultaneously, the statistical validation needs to be verified and hence classifiers are to be employed for final screening with higher accuracy.