Abstract
Brain-Computer Interfaces bridge the communication between brains and devices. Channel selection as a stage for develo** BCI systems allows reducing costs and improve the overall performance. This paper proposes a relevance analysis based on the maximum mean discrepancy as the distance function between a pair of single-channel trials, termed rMMD. The proposed rMMD starts with a trial embedding that highlights temporal dynamics, and ends with a channel ranking according to a designed relevance function. The function relies on the within and between class distances to quantify the discrimination capability of each channel. We evaluate the rMMD on a bi-class motor-imagery (MI) dataset holding 64 channels and more than 40 subjects. In comparison with no channel selection and a heuristic approach, our proposed relevance analysis statistically improves the classification of MI tasks with a reduced set of channels.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Brain-computer interface (BCI) systems provide a communication bridge between a brain and a computer with applications ranging from gaming to clinical. Typical BCI systems are trained using, voluntary or evoked, electroencephalographic (EEG) signals from subjects performing specific mental tasks or under particular conditions [8]. The preference of EEG signals relies on its low recording risk, low implementation cost, and high potential for practical applications as device control [10]. From the voluntary approaches, the motor-imagery (MI) paradigm relies on decoding imagination, not execution, of motor tasks that produces event-related de/synchronization (ERDS) along the brain motor homunculus [5]. Fed by EEG, an MI-BCI framework usually holds four stages: signal pre-processing, feature extraction, channel selection, and classification.
This work is particularly interested in the third stage as removing noisy and redundant channels improves the overall system performance, while reducing implementation costs and setup times [1]. The most explored channel selection approaches employ evolutionary and heuristic algorithms, from which the following are worth mentioning: The Glow Swarm Optimization algorithm followed by a naïve Bayes classifier [6], the Sequential Floating Forward Selection by locally grou** EEG channels [11], the Non-dominated Sorting Genetic Algorithm II for multi-objective optimization [9], and the backtracking search optimization algorithm by the binary encoding the selected channels [4]. Such kind of methods tend to outperform the accuracy rates of the full EEG montage. Nonetheless, the large set of hyperparameters to be tuned makes the heuristic and evolutionary algorithms heavily depend on the initialization. In addition, those kind of algorithms are well-known for its high computational cost in the training stage, constraining their use subject-dependent applications. Other channel selection approaches rely on information measures that are less costly and more accessible to optimize in comparison to above methods. For instance, ranking channels according to the mutual information between trial label and the Laplacian of the average channel power enhanced a BCI system that carried out the feature extraction by common spatial patterns (CSP) [12].
This paper introduces a distance-based channel relevance analysis that compare trials trough the Maximum Mean Discrepancy, termed rMMD. The proposed analysis firstly embeds each single channel to highlight the temporal dynamics of the trials. Then, we assume that each embedded trial follows its own distribution to measure the pair-wise trial distance at the channel level from their means on a Reproduced Kernel Hilbert Space. Thanks to such an assumption, we obtain a single distance value for each pair of time series. Finally, we designed a relevance measure as a function of the within and between class distances. As a result, our measure allows ranking channels according its discrimination capabilities. To evaluate our proposed relevance analysis, we include it as the channel selection stage of a typical BCI system, and compare it against no channel selection and the heuristical SFFS. Results on a dataset of more than 40 subjects evidence the benefit of the rMMD-based channel selection with a significant difference with respect to the compared approaches.
2 Methods
2.1 Single-Channel Trial-Wise Distance
Let a set of N labeled multi-channel EEG trials \(\{{\varvec{X}}_n,l_n\}_{n{{\,\mathrm{\,=\,}\,}}1}^N\), where \({\varvec{X}}_n{{\,\mathrm{\,=\,}\,}}\{{\varvec{x}}_{n}^c{{\,\mathrm{\,\in \,}\,}}\mathbb {R}^T{{\,\mathrm{\,:\,}\,}}c{{\,\mathrm{\,\in \,}\,}}[1,C]\}\) corresponds to the n-th trial with label \(l_n{{\,\mathrm{\,\in \,}\,}}\{-,+\}\) that holds C channels recorded for T time instants. BCI systems attempt to classify unlabeled EEG trials into − or \(+\) depending on features extracted from its multiple channels. Here, we measure the distance at channel level between a pair of trials as the distance between the means of their approximate distributions mapped into a Reproduced Kernel Hilbert Space (RHKS), termed the Maximum Mean Discrepancy (MMD) [7]. To this end, the Hankel transform of window length L embeds each channel and trial into a time series with L time-lagged components, \(\mathbb {R}^T\rightarrow \mathbb {R}^{L\times (T-L)};{\varvec{x}}_{n}^c\mapsto \{{\varvec{y}}_{nt}^c\}_{t{{\,\mathrm{\,=\,}\,}}1}^{(T-L)}\). Assuming that samples from an embedded trial follows the unknown distribution \({\varvec{y}}_{n}^c\sim P_{n}^c({\varvec{y}}){{\,\mathrm{\,\in \,}\,}}[0,1]\), the MMD statistic compares two trials at the c-th channel as:
where \(\mu _n^c{{\,\mathrm{\,\in \,}\,}}\mathcal {H}\) stands for the mean of the distribution \(P_n^c\) in the RHKS, and \(\mathbb {E}_{t,t'}\left\{ \cdot \right\} \) defines the averaging operator along the time instants of two trials, and function \(\Phi {{\,\mathrm{\,:\,}\,}}\mathbb {R}^L\rightarrow \mathcal {H}\) maps from the time embedding into the RHKS. In practice, the kernel trick allows computing inner products as \(\Phi ({\varvec{y}}_{nt}^c)^\top \Phi ({\varvec{y}}_{mt'}^{c})\) by the function Therefore, the MMD statistic results in an inherent comparison of temporal dynamics that are encoded by the probabilistic distribution of embedded trials.
2.2 Distance-Based Supervised Relevance Analysis
The purpose of a supervised relevance analysis is to quantify the discrimination capability of features so that the noisy and redundant ones are removed to improve the overall system performance. In this work, we propose to assess the relevance of each EEG channel to discriminate two conditions \(\{-,+\}\) aiming to reduce the EEG montage size from the start, to easen feature extraction stage, and to improve the classification performance of the whole BCI system. In this sense, we design a relevance measure that looks for the relation between trials and their labels. Besides, the measure must determine how discriminant a channel is according its MMD statistics, so that distances between opposite classes must be very large and within class are expected to be small. Taking the above hypothesis into account, we define the relevance measure as the following ratio:
![](http://media.springernature.com/lw294/springer-static/image/chp%3A10.1007%2F978-3-030-13469-3_95/MediaObjects/480595_1_En_95_Equ2_HTML.png)
being \(d_{nm}^c{{\,\mathrm{\,=\,}\,}}d({\varvec{x}}_{n}^c,{\varvec{x}}_{m}^c)\) the simplified distance notation. The numerator and denominator of Eq. (2) account for the between and within class distances, respectively. Therefore, the larger the numerator and the smaller the denominator - the larger the relevance measure . Particularly,
corresponds to a discriminant channel, while noisy channels attain
. In this way, our relevance measure based on the MMD statistic, termed rMMD, ranks each EEG channel according its discrimination capability fed by the trial distances computed in a RKHS.
3 Experimental Setup
3.1 EEG Dataset
We evaluate the proposed channel selection approach on the subjects from the EEG dataset for motor imagery brain-computer interface [3]. EEG data was recorded using 64 Ag/AgCl electrodes located over the scalp following a 10-10 montage and sampled at 512 Hz. For each subject, the BCI2000 recording system registered the EEG data of five or six runs splitted into 40 trials (20 per class). In turn, each trial is split into ready, instruction, and resting periods. The first period presents a black screen with a fixation cross from the trial strart (\(t{{\,\mathrm{\,=\,}\,}}0\)) to \(t{{\,\mathrm{\,=\,}\,}}2\) s. The second one randomly instructs one of two MI tasks (“left hand” or “right hand”) during \(t{{\,\mathrm{\,\in \,}\,}}[2,5]\) s. The last one displays a blank screen from \(t{{\,\mathrm{\,=\,}\,}}5\) during a random break of 2.1–2.8 s. Trials are further labeled as bad_trial following criteria as the voltage magnitude, correlation with electromyographic activity, and subject comments. Given that this work avoids the bad trials, we validate our approach on the 45 subjects that remain with most of their trials.
3.2 EEG Processing and Parameter Setup
To assess the performance of the proposed relevance analysis, we introduce the rMMD into a subject-dependent BCI framework with the following stages: (i) preprocessing, that filters between \([8\negthickspace -\negthickspace 30]\) Hz and downsamples at 100 Hz each trial using a fifth order Butterworth filter; (ii) channel selection relying on the proposed distance-based supervised relevance analysis; (iii) feature extraction, carried out by the Common Spatial Patterns as a popular algorithm for extracting discriminative patterns from MI; (iv) and classification using the well-known Linear Discriminat Analysis. It is worth noting that the considered framework only processes the period of [2.5–4.5] s of each trial to focus on the learning part of the MI instruction.
Regarding the parameter setup, the rMMD relevance analysis depends on the embedding dimension. Since L constrains the minimum frequency to be analyzed, we fixed \(L{{\,\mathrm{\,=\,}\,}}0.25\) s aiming to account for frequencies as low as 8 Hz. In addition, the computation of the MMD statistic in Eq. (1) demands the selection of a kernel function. In this respect, we use the well-known RBF with bandwith parameter tuned by maximization of the information potential variability [2]. The resulting rMMD setup allows computing the relevance function in Eq. (2) to rank EEG channels. Figure 1 illustrates the attained accuracy along the number of the most relevant channels for each subject within the dataset. Note that subjects were sorted according its performance at 64 channels in order to highlight the benefit of the channel selection.
4 Results and Discussion
Aiming to compare the performance of the proposed rMMD-based relevance analysis, we also compute the classification rates of two baseline approaches, namely, the standard Common Spatial Patterns (CSP) and sequential floating forward selection (SFFS). The former corresponds to the widely accepted feature extraction approach for motor imagery tasks, computed from the 64 channels. The latter consists in a sequential heuristical search for the highest training accuracy with respect to a subset of EEG channels [11]. Figure 2 presents the performance attained by considered approaches for each subject. We ordered the subjects according the CSP accuracy to highlight the accuracy gain. In general, selecting channels based on the proposed relevance analysis outperforms the classification rate of CSP and SFFS. Particularly, rMMD achieves the highest accuracy rate on 18 subjects, while SFFS on ten of them. Accuracy on the remaining subjects is similar for both channel selection approaches. Nonetheless, SFFS underperforms CSP on nine subjects, evidencing algorithmic issues on the iterative selection; whereas rMMD only performs as CSP on five subjects that attain the highest accuracy at the full channel set. Moreover, the introduced channel selection largely increases the performance of subjects with the lowest accuracy rates up to \(13\%\) points, as the case of \(\#17\), \(\#24\), and \(\#52\). Regarding the selection performance, SFFS usually select less channels than rMMD. Particularly, seven out of ten subjects where SFFS reaches the highest accuracy requires less channels than rMMD. However, SFFS may result in a less accurate channel subset than the full EEG, as the case of seven subjects that reduce its performance up to \(7\%\) points. Such an issue is due to the suboptimal nature of SFFS. On the contrary, rMMD reduces less channels without compromising the classification rate. For instance, rMMD holds the 64 channels of subjects \(\#1\) and \(\#43\) to reach the highest performance. Consequently, comparing temporal dynamics among trials by means of rMMD highlights the discrimination capabilities of each EEG channel, so that a reduced EEG montage provides an enhanced classification accuracy and benefits the setup time of the MI-BCI system.
We summarize the performance attained by each compared approach in Table 1. In general, rMMD evidences an accuracy increment of \(5\%\) and \(2\%\) points regarding CSP and SFFS, respectively, with the benefit of reducing the confidence interval (CI). The further statistical means t-test with paired folds of the proposed relevance analysis against both baselines proves an overall significant accuracy increment with p-values smaller than \(0.1\%\). Lastly, the median selected channels for rMMD corresponds to near two thirds of the full EEG montage but doubles SFFS subset. Therefore, the significative difference between the proposed rMMD-based relevance analysis and the baseline approaches proves that accounting for the channel-wise discriminative capability enhances class separability and reduces the montage setup without compromising the overall performance.
For illustrating the influence of the relevance analysis on the feature extraction stage, Fig. 3 depicts the spatial patterns computed from all and selected channels on subjects \(\#17\) and \(\#43\) where CSP achieves the worst and best accuracy, respectively. Note that on subject \(\#43\) the both patterns are similar because achieving the highest accuracy demands the full channel set. On the contrary, the proposed relevance analysis requires only 19 out of 64 channels to increase the performance by \(13\%\) points with respect to conventional CSP on subject \(\#17\), which yields a smoother spatial pattern. As a result, the introduced relevance analysis will never underperform the standard pipeline that lacks a channel selection stage.
5 Concluding Remarks
This work proposes relevance analysis based on the maximum mean discrepancy criterion to select the most discriminative channels on large EEG montages. Our approach takes advantage of the dynamics embedded into the MMD statistic that allows comparing a pair of time series, in this case, two EEG trials on the same channel. Then, such a pair-wise similarity feeds a supervised clustering measure that allows ranking the channels according its discrimination capabilities. According to the achieved accuracy rates on a dataset holding more than 40 subjects, the discriminative relevance criterion provides a channel subset with enhanced classification rates of MI tasks.
Since the studied task is devoted to training and develo** BCI systems for mass consumption, we split our future work into two research directions. Firstly, we will extend for rMMD to feature selection approaches relying on weighting coefficients (linear models) or on feature importance criteria (tree search) aiming to improve the channel selection rate. Secondly, we plan to develop a methodology for channel-wise relevance analysis on large cohorts, aiming at a single low-density EEG montage that suitably performs for a population. Lastly, we will study the subject characteristics that increase its performance with a particular channel subset, so that adaptive montages and pre-trained processing stages spread out the usage of BCI on real world applications.
References
Alotaiby, T., El-Samie, F.E.A., Alshebeili, S.A., Ahmad, I.: A review of channel selection algorithms for EEG signal processing. EURASIP J. Adv. Signal Process. 2015(1), 66 (2015). https://doi.org/10.1186/s13634-015-0251-9
Álvarez-Meza, A.M., Cárdenas-Peña, D., Castellanos-Dominguez, G.: Unsupervised kernel function building using maximization of information potential variability. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 335–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12568-8_41
Cho, H., Ahn, M., Ahn, S., Kwon, M., Jun, S.C.: EEG datasets for motor imagery brain-computer interface. GigaScience 6(7), 1–8 (2017). https://doi.org/10.1093/gigascience/gix034
Dai, S., Wei, Q.: Electrode channel selection based on backtracking search optimization in motor imagery brain-computer interfaces. J. Integr. Neurosci. 16(3), 241–254 (2017). https://doi.org/10.3233/JIN-170017
Edelman, B.J., Baxter, B., He, B.: EEG source imaging enhances the decoding of complex right-hand motor imagery tasks. IEEE Trans. Biomed. Eng. 63(1), 4–14 (2016). https://doi.org/10.1109/TBME.2015.2467312
Franklin Alex Joseph, A., Govindaraju, C.: Channel selection using glow swarm optimization and its application in line of sight secure communication. Clust. Comput., 1–8 (2017). https://doi.org/10.1007/s10586-017-1177-9
Gretton, A., Borgwardt, K., Rasch, M.J., Scholkopf, B., Smola, A.J.: A kernel method for the two-sample problem, May 2008
Handiru, V.S., Prasad, V.A.: Optimized bi-objective EEG channel selection and cross-subject generalization with brain-computer interfaces. IEEE Trans. Hum. Mach. Syst. 46(6), 777–786 (2016). https://doi.org/10.1109/THMS.2016.2573827
Kee, C.Y., Ponnambalam, S.G., Loo, C.K.: Multi-objective genetic algorithm as channel selection method for P300 and motor imagery data set. Neurocomputing 161, 120–131 (2015). https://doi.org/10.1016/j.neucom.2015.02.057
Meng, J., Zhang, S., Bekyo, A., Olsoe, J., Baxter, B., He, B.: Noninvasive electroencephalogram based control of a robotic arm for reach and grasp tasks. Sci. Rep. 6(1), 38565 (2016). https://doi.org/10.1038/srep38565
Qiu, Z., **, J., Lam, H.K., Zhang, Y., Wang, X., Cichocki, A.: Improved SFFS method for channel selection in motor imagery based BCI. Neurocomputing 207, 519–527 (2016). https://doi.org/10.1016/j.neucom.2016.05.035
Yang, H., Guan, C., Wang, C.C., Ang, K.K.: Maximum dependency and minimum redundancy-based channel selection for motor imagery of walking EEG signal detection. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 1187–1191. IEEE, May 2013. https://doi.org/10.1109/ICASSP.2013.6637838
Acknowledgment
This research was supported by the research project 36706 “BrainScore: Sistema compositivo, gráfico y sonoro creado a partir del comportamiento frecuencial de las señales cerebrales”, funded by Universidad de Caldas and Universidad Nacional de Colombia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Luna-Naranjo, D.F., Hurtado-Rincon, J.V., Cárdenas-Peña, D., Castro, V.H., Torres, H.F., Castellanos-Dominguez, G. (2019). EEG Channel Relevance Analysis Using Maximum Mean Discrepancy on BCI Systems. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_95
Download citation
DOI: https://doi.org/10.1007/978-3-030-13469-3_95
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)