Keywords

1 Introduction

Brain-computer interface (BCI) systems provide a communication bridge between a brain and a computer with applications ranging from gaming to clinical. Typical BCI systems are trained using, voluntary or evoked, electroencephalographic (EEG) signals from subjects performing specific mental tasks or under particular conditions [8]. The preference of EEG signals relies on its low recording risk, low implementation cost, and high potential for practical applications as device control [10]. From the voluntary approaches, the motor-imagery (MI) paradigm relies on decoding imagination, not execution, of motor tasks that produces event-related de/synchronization (ERDS) along the brain motor homunculus [5]. Fed by EEG, an MI-BCI framework usually holds four stages: signal pre-processing, feature extraction, channel selection, and classification.

This work is particularly interested in the third stage as removing noisy and redundant channels improves the overall system performance, while reducing implementation costs and setup times [1]. The most explored channel selection approaches employ evolutionary and heuristic algorithms, from which the following are worth mentioning: The Glow Swarm Optimization algorithm followed by a naïve Bayes classifier [6], the Sequential Floating Forward Selection by locally grou** EEG channels [11], the Non-dominated Sorting Genetic Algorithm II for multi-objective optimization [9], and the backtracking search optimization algorithm by the binary encoding the selected channels [4]. Such kind of methods tend to outperform the accuracy rates of the full EEG montage. Nonetheless, the large set of hyperparameters to be tuned makes the heuristic and evolutionary algorithms heavily depend on the initialization. In addition, those kind of algorithms are well-known for its high computational cost in the training stage, constraining their use subject-dependent applications. Other channel selection approaches rely on information measures that are less costly and more accessible to optimize in comparison to above methods. For instance, ranking channels according to the mutual information between trial label and the Laplacian of the average channel power enhanced a BCI system that carried out the feature extraction by common spatial patterns (CSP) [12].

This paper introduces a distance-based channel relevance analysis that compare trials trough the Maximum Mean Discrepancy, termed rMMD. The proposed analysis firstly embeds each single channel to highlight the temporal dynamics of the trials. Then, we assume that each embedded trial follows its own distribution to measure the pair-wise trial distance at the channel level from their means on a Reproduced Kernel Hilbert Space. Thanks to such an assumption, we obtain a single distance value for each pair of time series. Finally, we designed a relevance measure as a function of the within and between class distances. As a result, our measure allows ranking channels according its discrimination capabilities. To evaluate our proposed relevance analysis, we include it as the channel selection stage of a typical BCI system, and compare it against no channel selection and the heuristical SFFS. Results on a dataset of more than 40 subjects evidence the benefit of the rMMD-based channel selection with a significant difference with respect to the compared approaches.

2 Methods

2.1 Single-Channel Trial-Wise Distance

Let a set of N labeled multi-channel EEG trials \(\{{\varvec{X}}_n,l_n\}_{n{{\,\mathrm{\,=\,}\,}}1}^N\), where \({\varvec{X}}_n{{\,\mathrm{\,=\,}\,}}\{{\varvec{x}}_{n}^c{{\,\mathrm{\,\in \,}\,}}\mathbb {R}^T{{\,\mathrm{\,:\,}\,}}c{{\,\mathrm{\,\in \,}\,}}[1,C]\}\) corresponds to the n-th trial with label \(l_n{{\,\mathrm{\,\in \,}\,}}\{-,+\}\) that holds C channels recorded for T time instants. BCI systems attempt to classify unlabeled EEG trials into − or \(+\) depending on features extracted from its multiple channels. Here, we measure the distance at channel level between a pair of trials as the distance between the means of their approximate distributions mapped into a Reproduced Kernel Hilbert Space (RHKS), termed the Maximum Mean Discrepancy (MMD) [7]. To this end, the Hankel transform of window length L embeds each channel and trial into a time series with L time-lagged components, \(\mathbb {R}^T\rightarrow \mathbb {R}^{L\times (T-L)};{\varvec{x}}_{n}^c\mapsto \{{\varvec{y}}_{nt}^c\}_{t{{\,\mathrm{\,=\,}\,}}1}^{(T-L)}\). Assuming that samples from an embedded trial follows the unknown distribution \({\varvec{y}}_{n}^c\sim P_{n}^c({\varvec{y}}){{\,\mathrm{\,\in \,}\,}}[0,1]\), the MMD statistic compares two trials at the c-th channel as:

$$\begin{aligned} d({\varvec{x}}_{n}^c,{\varvec{x}}_{m}^c)&:= D(P_n^c,P_m^c,\Phi ) = \Vert \mu _n^c - \mu _m^c \Vert ^{2}_{\mathcal {H}}\nonumber \\ d({\varvec{x}}_{n}^c,{\varvec{x}}_{m}^c)&= \mathbb {E}_{t,t'}\left\{ \Phi ({\varvec{y}}_{nt}^c)^\top \Phi ({\varvec{y}}_{nt'}^{c})\right\} -2 \mathbb {E}_{t,t'}\left\{ \Phi ({\varvec{y}}_{mt}^c)^\top \Phi ({\varvec{y}}_{mt'}^{c})\right\} \\&+\,\mathbb {E}_{t,t'}\left\{ \Phi ({\varvec{y}}_{mt}^c)^\top \Phi ({\varvec{y}}_{mt'}^{c})\right\} \nonumber \end{aligned}$$
(1)

where \(\mu _n^c{{\,\mathrm{\,\in \,}\,}}\mathcal {H}\) stands for the mean of the distribution \(P_n^c\) in the RHKS, and \(\mathbb {E}_{t,t'}\left\{ \cdot \right\} \) defines the averaging operator along the time instants of two trials, and function \(\Phi {{\,\mathrm{\,:\,}\,}}\mathbb {R}^L\rightarrow \mathcal {H}\) maps from the time embedding into the RHKS. In practice, the kernel trick allows computing inner products as \(\Phi ({\varvec{y}}_{nt}^c)^\top \Phi ({\varvec{y}}_{mt'}^{c})\) by the function Therefore, the MMD statistic results in an inherent comparison of temporal dynamics that are encoded by the probabilistic distribution of embedded trials.

2.2 Distance-Based Supervised Relevance Analysis

The purpose of a supervised relevance analysis is to quantify the discrimination capability of features so that the noisy and redundant ones are removed to improve the overall system performance. In this work, we propose to assess the relevance of each EEG channel to discriminate two conditions \(\{-,+\}\) aiming to reduce the EEG montage size from the start, to easen feature extraction stage, and to improve the classification performance of the whole BCI system. In this sense, we design a relevance measure that looks for the relation between trials and their labels. Besides, the measure must determine how discriminant a channel is according its MMD statistics, so that distances between opposite classes must be very large and within class are expected to be small. Taking the above hypothesis into account, we define the relevance measure as the following ratio:

(2)

being \(d_{nm}^c{{\,\mathrm{\,=\,}\,}}d({\varvec{x}}_{n}^c,{\varvec{x}}_{m}^c)\) the simplified distance notation. The numerator and denominator of Eq. (2) account for the between and within class distances, respectively. Therefore, the larger the numerator and the smaller the denominator - the larger the relevance measure . Particularly, corresponds to a discriminant channel, while noisy channels attain . In this way, our relevance measure based on the MMD statistic, termed rMMD, ranks each EEG channel according its discrimination capability fed by the trial distances computed in a RKHS.

3 Experimental Setup

3.1 EEG Dataset

We evaluate the proposed channel selection approach on the subjects from the EEG dataset for motor imagery brain-computer interface [3]. EEG data was recorded using 64 Ag/AgCl electrodes located over the scalp following a 10-10 montage and sampled at 512 Hz. For each subject, the BCI2000 recording system registered the EEG data of five or six runs splitted into 40 trials (20 per class). In turn, each trial is split into ready, instruction, and resting periods. The first period presents a black screen with a fixation cross from the trial strart (\(t{{\,\mathrm{\,=\,}\,}}0\)) to \(t{{\,\mathrm{\,=\,}\,}}2\) s. The second one randomly instructs one of two MI tasks (“left hand” or “right hand”) during \(t{{\,\mathrm{\,\in \,}\,}}[2,5]\) s. The last one displays a blank screen from \(t{{\,\mathrm{\,=\,}\,}}5\) during a random break of 2.1–2.8 s. Trials are further labeled as bad_trial following criteria as the voltage magnitude, correlation with electromyographic activity, and subject comments. Given that this work avoids the bad trials, we validate our approach on the 45 subjects that remain with most of their trials.

3.2 EEG Processing and Parameter Setup

To assess the performance of the proposed relevance analysis, we introduce the rMMD into a subject-dependent BCI framework with the following stages: (i) preprocessing, that filters between \([8\negthickspace -\negthickspace 30]\) Hz and downsamples at 100 Hz each trial using a fifth order Butterworth filter; (ii) channel selection relying on the proposed distance-based supervised relevance analysis; (iii) feature extraction, carried out by the Common Spatial Patterns as a popular algorithm for extracting discriminative patterns from MI; (iv) and classification using the well-known Linear Discriminat Analysis. It is worth noting that the considered framework only processes the period of [2.5–4.5] s of each trial to focus on the learning part of the MI instruction.

Regarding the parameter setup, the rMMD relevance analysis depends on the embedding dimension. Since L constrains the minimum frequency to be analyzed, we fixed \(L{{\,\mathrm{\,=\,}\,}}0.25\) s aiming to account for frequencies as low as 8 Hz. In addition, the computation of the MMD statistic in Eq. (1) demands the selection of a kernel function. In this respect, we use the well-known RBF with bandwith parameter tuned by maximization of the information potential variability [2]. The resulting rMMD setup allows computing the relevance function in Eq. (2) to rank EEG channels. Figure 1 illustrates the attained accuracy along the number of the most relevant channels for each subject within the dataset. Note that subjects were sorted according its performance at 64 channels in order to highlight the benefit of the channel selection.

Fig. 1.
figure 1

Five-fold averaged accuracy along the number of selected channels for each subject using rMMD. Subjects sorted according their CSP performance. Color encodes the accuracy.

4 Results and Discussion

Aiming to compare the performance of the proposed rMMD-based relevance analysis, we also compute the classification rates of two baseline approaches, namely, the standard Common Spatial Patterns (CSP) and sequential floating forward selection (SFFS). The former corresponds to the widely accepted feature extraction approach for motor imagery tasks, computed from the 64 channels. The latter consists in a sequential heuristical search for the highest training accuracy with respect to a subset of EEG channels [11]. Figure 2 presents the performance attained by considered approaches for each subject. We ordered the subjects according the CSP accuracy to highlight the accuracy gain. In general, selecting channels based on the proposed relevance analysis outperforms the classification rate of CSP and SFFS. Particularly, rMMD achieves the highest accuracy rate on 18 subjects, while SFFS on ten of them. Accuracy on the remaining subjects is similar for both channel selection approaches. Nonetheless, SFFS underperforms CSP on nine subjects, evidencing algorithmic issues on the iterative selection; whereas rMMD only performs as CSP on five subjects that attain the highest accuracy at the full channel set. Moreover, the introduced channel selection largely increases the performance of subjects with the lowest accuracy rates up to \(13\%\) points, as the case of \(\#17\), \(\#24\), and \(\#52\). Regarding the selection performance, SFFS usually select less channels than rMMD. Particularly, seven out of ten subjects where SFFS reaches the highest accuracy requires less channels than rMMD. However, SFFS may result in a less accurate channel subset than the full EEG, as the case of seven subjects that reduce its performance up to \(7\%\) points. Such an issue is due to the suboptimal nature of SFFS. On the contrary, rMMD reduces less channels without compromising the classification rate. For instance, rMMD holds the 64 channels of subjects \(\#1\) and \(\#43\) to reach the highest performance. Consequently, comparing temporal dynamics among trials by means of rMMD highlights the discrimination capabilities of each EEG channel, so that a reduced EEG montage provides an enhanced classification accuracy and benefits the setup time of the MI-BCI system.

Fig. 2.
figure 2

Subject-wise performance of the considered approaches. Top: Average classification accuracy of five folds. Bottom: Median number of selected channels.

We summarize the performance attained by each compared approach in Table 1. In general, rMMD evidences an accuracy increment of \(5\%\) and \(2\%\) points regarding CSP and SFFS, respectively, with the benefit of reducing the confidence interval (CI). The further statistical means t-test with paired folds of the proposed relevance analysis against both baselines proves an overall significant accuracy increment with p-values smaller than \(0.1\%\). Lastly, the median selected channels for rMMD corresponds to near two thirds of the full EEG montage but doubles SFFS subset. Therefore, the significative difference between the proposed rMMD-based relevance analysis and the baseline approaches proves that accounting for the channel-wise discriminative capability enhances class separability and reduces the montage setup without compromising the overall performance.

Table 1. Overall performance of the considered approaches.

For illustrating the influence of the relevance analysis on the feature extraction stage, Fig. 3 depicts the spatial patterns computed from all and selected channels on subjects \(\#17\) and \(\#43\) where CSP achieves the worst and best accuracy, respectively. Note that on subject \(\#43\) the both patterns are similar because achieving the highest accuracy demands the full channel set. On the contrary, the proposed relevance analysis requires only 19 out of 64 channels to increase the performance by \(13\%\) points with respect to conventional CSP on subject \(\#17\), which yields a smoother spatial pattern. As a result, the introduced relevance analysis will never underperform the standard pipeline that lacks a channel selection stage.

Fig. 3.
figure 3

Resulting spatial patterns for the best and worst performing subjects computed from all and selected channels.

5 Concluding Remarks

This work proposes relevance analysis based on the maximum mean discrepancy criterion to select the most discriminative channels on large EEG montages. Our approach takes advantage of the dynamics embedded into the MMD statistic that allows comparing a pair of time series, in this case, two EEG trials on the same channel. Then, such a pair-wise similarity feeds a supervised clustering measure that allows ranking the channels according its discrimination capabilities. According to the achieved accuracy rates on a dataset holding more than 40 subjects, the discriminative relevance criterion provides a channel subset with enhanced classification rates of MI tasks.

Since the studied task is devoted to training and develo** BCI systems for mass consumption, we split our future work into two research directions. Firstly, we will extend for rMMD to feature selection approaches relying on weighting coefficients (linear models) or on feature importance criteria (tree search) aiming to improve the channel selection rate. Secondly, we plan to develop a methodology for channel-wise relevance analysis on large cohorts, aiming at a single low-density EEG montage that suitably performs for a population. Lastly, we will study the subject characteristics that increase its performance with a particular channel subset, so that adaptive montages and pre-trained processing stages spread out the usage of BCI on real world applications.