1 Introduction

Researchers in the artificial intelligence community have struggled for decades trying to build machines capable of matching or exceeding the mental capabilities of humans. One capability that continues to challenge researchers is designing systems which can leverage experience from previous tasks into improved performance in a new task which has not been encountered before. When the new task is drawn from a different population than the old, this is considered to be transfer learning. The benefits of transfer learning are numerous; less time is spent learning new tasks, less information is required of experts (usually human), and more situations can be handled effectively. These potential benefits have lead researchers to apply transfer-learning techniques to many domains with varying degrees of success.

One particularly interesting domain for transfer learning is human activity recognition. The goal of human activity recognition is to be able to correctly classify the current activity a human or group of humans is performing given some set of data. Activity recognition is important to a variety of applications including health monitoring, automatic security surveillance, and home automation. As research in this area has progressed, an increasing number of researchers have started looking at ways transfer learning can be applied to reduce the training time and effort required to initialize new activity recognition systems, to make the activity recognition systems more robust and versatile, and to effectively reuse the existing knowledge that has previously been generated.

With the recent explosion in the number of researchers and the amount of research being done on transfer learning, activity recognition, and transfer learning for activity recognition, it becomes increasingly important to critically analyze this body of work and discover areas which still require further investigation. Although recent progress in transfer learning has been analyzed in [50, 61, 70] and several surveys have been conducted on activity recognition [2, 4, 10, 24], no one has specifically looked into the intersection of these two areas. This survey, therefore, examines the field of transfer-based activity recognition and the unique challenges presented in this domain. For an overview of the survey, see Fig. 1 which illustrates the topics covered in this survey and how they relate to each other.

Fig. 1
figure 1

Content map of the transfer learning for activity recognition domain covered in this survey

2 Background

Activity recognition aims to identify activities as they occur based on data collected by sensors. There exist a number of approaches to activity recognition [28] that vary depending on the underlying sensor technologies that are used to monitor activities, the alternative machine-learning algorithms that are used to model the activities and the realism of the testing environment.

Advances in pervasive computing and sensor networks have resulted in the development of a wide variety of sensor modalities that are useful for gathering information about human activities. Wearable sensors such as accelerometers are commonly used for recognizing ambulatory movements (e.g., walking, running, sitting, climbing, and falling) [31, 40]. More recently, researchers are exploring smart phones equipped with accelerometers and gyroscopes to recognize such movement and gesture patterns [33].

Environment sensors such as infrared motion detectors or magnetic door sensors have been used to gather information about more complex activities such as cooking, slee**, and eating. These sensors are adept in performing location-based activity recognition in indoor environments [1, 27, 38] just as GPS is used for outdoor environments [36]. Some activities such as washing dishes, taking medicine, and using the phone are characterized by interacting with unique objects. In response, researchers have explored the usage of RFID tags and shimmer sensors for tagging these objects and using the data for activity recognition [45, 52]. Researchers have also used data from video cameras and microphones as well [1].

There have been many varied machine-learning models that have been used for activity recognition. These can be broadly categorized into template matching/transductive techniques, generative, and discriminative approaches. Template matching techniques employ a kNN classifier based on Euclidean distance or dynamic time war**. Generative approaches such as naïve Bayes classifiers where activity samples are modeled using Gaussian mixtures have yielded promising results for batch learning. Generative probabilistic graphical models such as hidden Markov models and dynamic Bayesian networks have been used to model activity sequences and to smooth recognition results of an ensemble classifier [35]. Decision trees as well as bagging and boosting methods have been tested [40]. Discriminative approaches, including support vector machines and conditional random fields, have also been effective [11, 27], and unsupervised discovery and recognition methods have also been introduced [22, 58]. The traditional approaches to activity recognition make the strong assumption that the training and test data are drawn from identical distributions. Many real-world applications cannot be represented in this setting, and thus, the baseline activity recognition approaches have to be modified to work in these realistic settings. Transfer-based activity recognition is one conduit for achieving this.

2.1 Transfer learning

The ability to identify deep, subtle connections, what we term transfer learning, is the hallmark of human intelligence. Byrnes [7] defines transfer learning as the ability to extend what has been learned in one context to new contexts. Thorndike and Woodworth [63] first coined this term as they explored how individuals transfer learned concepts between contexts that share common features. Barnett and Ceci provide a taxonomy of features that influence transfer learning in humans [5].

In the field of machine learning, transfer learning is studied under a variety of different names including learning to learn, life-long learning, knowledge transfer, inductive transfer, context-sensitive learning, and meta-learning [3, 19, 64, 65, 70]. It is also closely related to several other areas of machine learning such as self-taught learning, multi-task learning, domain adaptation, and covariate shift. Because of this broad variance in the terms used to describe transfer learning, it is helpful to provide a formal definition of transfer-learning terms and of transfer learning itself which will be used throughout the rest of this paper.

2.2 Definitions

This survey starts with a review of basic definitions needed for discussions of transfer learning as it can be applied to activity recognition. Definitions for domain and task have been provided by Pan and Yang [50]:

Definition 2.1

(Domain) A domain \(D\) is a two-tuple \((\chi , P(X))\). \(\chi \) is the feature space of \(D\) and \(P(X)\) is the marginal distribution where \(X = \{x_1,\ldots ,x_n\} \in \chi \).

Definition 2.2

(Task) A task \(T\) is a two-tuple \((Y, f( ))\) for some given domain \(D\). \(Y\) is the label space of \(D\) and \(f( )\) is an objective predictive function for \(D\). \(f( )\) is sometimes written as a conditional probability distribution \(P(y|x)\). \(f( )\) is not given, but can be learned from the training data.

To illustrate these definitions, consider the problem of activity recognition using motion sensors. The domain is defined by a feature space which may represent the \(n\)-dimensional space defined by \(n\) sensor firing counts within a given time window and a marginal probability distribution over all possible firing counts. The task is composed of a label space \(y\) which consists of the set of labels for activities of interest, and a conditional probability distribution consisting of the probability of assigning a label \(y_i \in y\) given the observed instance \(x \in \chi \).

Using these terms, we can now define transfer learning. In this paper, we specify a definition of transfer learning that is similar to that presented by Pan and Yang [50], but we allow for transfer learning which uses multiple source domains.

Definition 2.3

(Transfer Learning) Given a set of source domains \(DS = {D_{s_1},\ldots ,D_{s_n}}\) where \(n > 0\), a target domain, \(D_t\), a set of source tasks \(TS = {T_{s_1}, \ldots T_{s_n}}\) where \(T_{s_i} \in TS\) corresponds with \(D_{s_i} \in DS\), and a target task \(T_t\) which corresponds to \(D_t\), transfer learning helps improve the learning of the target predictive function \(f_t( )\) in \(D_t\) where \(D_t \not \in DS\) and \(T_t \not \in TS\).

This definition of transfer learning is broad and encompasses a large number of different transfer-learning scenarios. The source domains can differ from the target domain by having a different feature space, a different distribution of instances in the feature space, or both. The source tasks can differ from the target task by having a different label space, a different predictive function for labels in that label space, or both. The source data can differ from the target data by having a different domain, a different task, or both. However, all transfer-learning problems rely on the basic assumption that there exists some relationship between the source and target areas which allows for the successful transfer of knowledge from the source to the target.

2.3 Scenarios

To further illustrate the variety of problems which fall under the scope of transfer-based activity recognition, we provide illustrative scenarios. Not all of these scenarios can be addressed by current transfer-learning methods. The first scenario represents a typical transfer-learning problem solvable using recently developed techniques. The second scenario represents a more challenging situation that pushes the boundaries of current transfer-learning techniques. The third scenario requires a transfer of knowledge across such a large difference between source and target datasets that current techniques only scratch the surface of what is required to make such a knowledge transfer successful.

2.3.1 Scenario 1

In one home which has been equipped with multiple motion and temperature sensors, an activity recognition algorithm has been trained using months of annotated labels to provide the ground truth for activities which occur in that home. A transfer-learning algorithm should be able to reuse the labeled data to perform activity recognition in a new setting. Such transfer will save months of man-hours annotating data for the new home. However, the new home has a different layout as well as a different resident and different sensor locations than the first home.

2.3.2 Scenario 2

An individual with Parkinson’s disease visits his neurosurgeon twice a year to get an updated assessment of his gait, tremor, and cognitive health. The medical staff perform some gait measurements and simulated activities in their office space to determine the effectiveness of the prescribed medication, but want to determine if the observed improvement is reflected in the activities the patient performs in his own home. A learning algorithm will need to be able to transfer information between different physical settings, as well as time of day, sensors used, and scope of the activities.

2.3.3 Scenario 3

A researcher is interested in studying the cooking activity patterns of college students living in university dorms in the United States. The research study has to be conducted using the smart phone of the student as the sensing mechanism. The cooking activity of these students typically consists of heating up a frozen snack from the refrigerator in the microwave oven. In order to build the machine learning models for recognizing these activity patterns, the researcher has access to cooking activities for a group of grandmothers living in India. This dataset was collected using smart home environmental sensors embedded in the kitchen, and the cooking activity itself was very elaborate. Thus, the learning algorithm is now faced with changes in the data at many layers; namely, differences in the sensing mechanisms, cultural changes, age-related differences, different location settings, and finally, differences in the activity labels. This transfer learning from one setting to another diverse setting is most challenging and requires significant progress in transfer-learning domain to even attempt to solve the problem.

These scenarios illustrate different types of transfer that should be possible using machine-learning methods for activity recognition. As is described by these situations, transfer may occur across several dimensions. We next take a closer look at these types of transfer and use these descriptors to characterize existing approaches to transfer learning for activity recognition.

2.4 Dimensions of analysis

Transfer learning can take many forms in the context of activity recognition. In this discussion, we consider four dimensions to characterize various approaches to transfer learning for activity recognition. First, we consider different sensor modalities on which transfer learning has been applied. Second, we consider differences between the source and target environments in which data are captured. The third dimension is the amount and type of data labeling that are available in source and target domains. Finally, we examine the representation of the knowledge that is transferred from source to target. The next sections discuss these dimensions in more detail and characterize existing work based on alternative approaches to handling such differences.

3 Modality

One natural method for the classification of transfer-learning techniques is the underlying sensing modalities used for activity recognition. Some techniques may be generalizable to different sensor modalities, but most techniques are too specific to be generally applicable to any sensor modality other than that for which they are designed to work with. This is usually because the types of differences that occur between source and target domains are different for each sensor modality. These differences and their effect on the transfer-learning technique are discussed in detail in Sect. 4. In this section, we consider only those techniques which have empirically demonstrated their ability to operate on a given sensor modality.

The classification of sensor modalities itself is a difficult problem and indeed creating precise classification topology is outside of the scope of this paper. However, we roughly categorize sensor modalities into the following classifications: video cameras, wearable devices, and ambient sensors. For each sensor modality, we provide a brief description of the types of sensors which are included and a summary of the research works performing transfer learning in that domain. In this section, we do not describe the transfer-learning algorithms used in the papers as that will be discussed in the other dimensions of analysis.

3.1 Video sequences

Video cameras are one of the first sensor modalities in which transfer learning has been applied to the area of activity recognition [75]. Video cameras provide a dense feature space for activity recognition which potentially allows for extremely fine-grained recognition of activities. Spatio-temporal features are extracted from video sequences for characterizing the activities occurring in them. Activity models are then learned using these feature representations.

One drawback of video processing for activity recognition is that the use of video cameras raises more issues associated with user privacy. In addition, cameras need to be well positioned and track individuals in order to capture salient data for processing. Activity recognition via video cameras has received broad attention in transfer-learning research [18, 20, 34, 37, 44, 7275, 77].

3.2 Wearable sensors

Body Sensor Networks are another commonly used sensing mechanism to capture activity-related information from individuals. These sensors are typically worn by the individuals. Strategic placement of the sensors helps in capturing important activity-related information such as movements of the upper and lower parts of the body that can then be used to learn activity models. Sensors in this category include, inertial sensors such as accelerometers and gyroscopes, sensors embedded in smart phones, radio frequency identification sensors and tags. Researchers have applied transfer-learning techniques to both activity recognition using wearable accelerometers and activity recognition using smart phones, but we have not seen any transfer-learning approaches applied to activity recognition using RFID tags. This may be due in part to the relatively low use of RFID tags in activity recognition itself.

Within wearable sensors, two types of problems are generally considered. The first is the problem of activity recognition itself [6, 8, 13, 23, 30, 32, 59, 69, 78, 79], and the second is the problem of user localization, which can then be used to increase the accuracy of the activity recognition algorithm [4749, 51, 81]. Both problems present interesting challenges for transfer learning.

3.3 Ambient sensors

Ambient sensors represent the broadest classification of sensor modalities which we define in this paper. We categorize any sensor that is neither wearable nor video camera into ambient sensors. These sensors are typically embedded in an individual’s environment. This category includes a wide variety of sensors such as motion detectors, door sensors, object sensors, pressure sensors, and temperature sensors. As the name indicates, these sensors collect a variety of activity-related information such as human movements in the environment induced by activities, interactions with objects during the performance of an activity, and changes to illumination, pressure and temperature in the environment due to activities. Researchers have only recently begun to look at transfer-learning applications for ambient sensors with the earliest work appearing around 2008 [66]. Since then the field of transfer learning for activity recognition using ambient sensors has progressed rapidly with many different research groups analyzing the problem from several different angles [14, 25, 5457, 59, 67, 80].

3.4 Crossing the sensor boundaries

Clearly, transfer learning within individual sensor modalities is progressing. Researchers are actively develo** and applying new techniques to solve a variety of problems within any given sensor modality domain. However, there has been little work done that tries to transfer knowledge between any two or more sensor modalities. Kurz et al. [32] and Roggen et al. [59] address this problem using a teacher/learner model which is discussed further in Sect. 5. Hu et al. [25] introduce a transfer-learning technique for successfully transferring some knowledge across sensor modalities, but greater transfer of knowledge between modalities has yet to be explored.

4 Physical setting differences

Another useful categorization of transfer-learning techniques is the types of physical differences between a source and target dataset across which the transfer-learning techniques can achieve a successful transfer of knowledge. In this section, we describe these differences in a formal setting and provide illustrative examples drawn from activity recognition.

We use the terminology for domain, task and transfer learning defined in Sect. 2 to describe the differences between source and target datasets. These differences can be in the form of the feature-space representation, the marginal probability distribution of the instances, the label space, and/or the objective predictive function. When describing transfer learning in general, using such broad terms allows one to encompass many different problems. However, when describing transfer learning for a specific application, such as activity recognition, it is convenient to use more application specific terms. For example, differences in the feature-space representation can be thought of in terms of the sensor modalities and sampling rates and differences in the marginal probability distribution can be thought of in terms of different people performing the same activity, or having the activity performed in different physical spaces.

Even when limiting the scope to activity recognition, it is still infeasible to enumerate every possible difference between source and target datasets. In this survey we consider some of the most common or important differences between the source and target datasets including time, people, devices, space, sensor types, and labels. Table 1 summarizes the relationship between each of these applied differences and the formal definitions of transfer-learning differences.

Table 1 Relationship between formally defined transfer learning differences and the applied meaning for activity recognition

Differences across time, people, devices, or sensor sampling rates result in differences in the underlying marginal probability distribution, the objective predictive function, or both. Several papers focus specifically on transferring across time differences [30, 47, 48, 69], differences between people [12, 23, 54, 79], and differences between devices [78, 81].

Differences created when comparing datasets from different spaces or spatial layouts are reflected by differences in the feature spaces, the marginal probability distributions, the objective predictive functions, or any combination of these. As the number of differences increases, the source and target datasets become less related making transfer learning more difficult. Because of this, current research usually imposes limiting assumptions about what is different between the spaces. Several researchers, for example, assume that some meta-features are added which provide space-independent information [14, 5557, 66, 67]. For WiFi localization, Pan et al. [49] assume that the source and target spaces are in the same building. Applying transfer learning to video clips from different spaces usually results in handling issues of background differences [9, 25, 34, 72, 77, 80].

One of the largest differences between datasets occurs when the source and target datasets have a different sensor modality. This makes the transfer-learning problem much more difficult and relatively little work has been done in this direction. Hu and Yang have started work in this direction in [25]. Additionally, Calatroni et al. [8], Kurz et al. [32] and Roggen et al. [59] take a different approach to transferring across sensor modality by assuming a classifier for the source modalities can act as an expert for training a classifier in the target sensor modality.

5 Data labeling

In this section we consider the problem of transfer learning from the perspective of the availability of labeled data. Traditional machine learning uses the terms supervised learning and unsupervised learning to distinguish learning techniques based on the availability and use of labeled data. To distinguish between source and target labeled data availability we introduce two new terms, informed and uninformed, which we apply to the availability of labeled data in the target area. Thus, informed supervised (IS) transfer learning implies that some labeled data is available in both the target and source domains. Uninformed supervised (US) transfer learning implies that labeled data is available only in the source domain. Informed unsupervised (IU) transfer learning implies that labeled data is only available in the target domain. Finally, uninformed unsupervised (UU) transfer learning implies that no labeled data is available for either the source or target domains. One final case to consider is teacher/learner (TL) transfer learning, where no training data is directly available. Instead a previously-trained classifier (the Teacher) is introduced which operates simultaneously with the new classifier to be trained (the Learner) and provides the labels for observed data instances.

Two other terms that are often used in machine-learning literature and may be applicable here are inductive and transductive learning. Inductive learning refers to learning techniques which try to learn the objective predictive function. Transductive learning techniques, on the other hand, try to learn the relationship between instances. Pan and Yang [50] extend the definitions of inductive and transductive learning to transfer learning, but the definitions do not create a complete taxonomy for transfer-learning techniques. For this reason, we do not specifically classify recent works as being inductive or transductive in nature, but we note here how the inductive and transductive definitions fit into a classification based upon the availability of labeled data.

Inductive learning requires that labeled data be available in the target domain regardless of its availability in the source domain. Thus, most informed supervised and informed unsupervised transfer learning techniques are also inductive transfer-learning techniques. Transductive learning, however, does not require labeled data in the target domain. Therefore, most uninformed supervised techniques are also transductive transfer-learning techniques. Table 2 summarizes this general relationship.

Table 2 General relationship between inductive/transductive learning and the availability of labeled data

Several researchers have developed and applied informed, supervised transfer-learning techniques for activity recognition. These techniques have been applied to activity recognition using wearables [6, 29, 46, 48, 69, 81] and to activity recognition using cameras [18, 34, 44, 25, 26, 54, 66, 67, 69, 80], but a few algorithms are able to take advantage of the labeled target data if it is available [5557]. This focus on uninformed supervised transfer learning is most likely due to the allurement of building an activity recognition framework that can be trained offline and later installed into any user’s space without requiring additional data labeling effort. Wearables have also been used for uninformed supervised transfer-learning research [23, 47, 49, 51, 76, 78, 79] as have cameras [9, 20, 37, 72, 73].

Despite the abundance of research using labeled source data, research into transfer-learning techniques for activity recognition in which no source labels are available is extremely sparse. Pan et al. [47] have applied an uninformed unsupervised technique, transfer component analysis (TCA) to reduce the distance between domains by learning some transfer components across domains in a reproducing kernel Hilbert space using maximum mean discrepancy. We are unaware of any other work for uninformed unsupervised transfer-based activity recognition. We are also unaware of any work on informed unsupervised transfer-based activity recognition. The lack of research into informed unsupervised transfer-based activity recognition is not surprising because the idea of having labeled target data available and not having labeled source data is counterintuitive to the general principle of transfer learning. However, informed unsupervised transfer learning may still provide significant benefits to activity recognition.

The teacher/learner model for activity recognition is considerably less studied than the previously discussed techniques. However, we feel that this area has significant promise for improving transfer learning for activity recognition and making activity recognition systems much more robust and versatile. Roggen et al. [59], Kurz et al. [32], and Calatroni et al. [8] apply the teacher/learner model to develop an opportunistic system which is capable of using whatever sensors are currently contained in the environment to perform activity recognition.

In order for the teacher/learner model to be applicable, two requirements must be met. First, an existing classifier (the teacher) must already be trained in the source domain. Second, the teacher must operate simultaneously with a new classifier in the target domain (the learner) to provide the training for the learner. For example, Roggen et al. [59] equip a cabinet of drawers with an accelerometer for each drawer and then a classifier is trained to recognize which drawer of the cabinet is being opened or closed. This classifier becomes the teacher. Then several wearable accelerometers are attached to the person opening and closing the drawers. Now, a new classifier is trained using the wearable accelerometers. This classifier is the learner. When the individual opens or closes a drawer, the teacher labels the activity according to its classification model. This label is given to the learner which can then be used as labeled training data in real-time without the need to supply any manually labeled data.

The teacher/learner model presents a new perspective on transfer learning and introduces additional challenges. One major challenge of the teacher/learner model is that the accuracy of the learner is limited by the accuracy of the teacher. Additionally, the system’s only source of a ground truth comes from the teacher, and thus, the learner is completely reliant upon the teacher. It remains to be explored whether the learner can ever outperform the teacher and if it does so, whether it can convince itself and others of this superior performance. Finally, while the teacher/learner model provides a convenient way to transfer across different domains, an additional transfer mechanism would need to be employed to transfer across different label spaces.

6 Type of knowledge transferred

Pan and Yang [50] describe four general classifications for transfer learning in relation to what is transferred, instance transfer, feature-representation transfer, parameter transfer, and relational-knowledge transfer.

6.1 Instance transfer

Instance transfer reuses the source data to train the target classifier, usually by re-weighting the source instances based upon a given metric. Instance transfer techniques work well when \(\chi _s = \chi _t\) i.e., the feature space describing the source and target domains are same. They may also be applied after the feature representation has first been transferred to a common representation between the source and target domains.

Several researchers have applied instance transfer techniques to activity recognition. Hachiya et al. [23] develop an importance weighted least-squares probabilistic classification approach to handle transfer learning when \(P(X_s) \ne P(X_t)\) (i.e., the co-variate shift problem) and apply this approach to wearable accelerometers. Venkatesan et al. [29, 68, 69] extend the AdaBoost framework proposed by Freund and Schapire [21] to include cost-sensitive boosting which tries to weight samples from the source domain according to their relevance in the target domain. In their approach, samples from the source domain are first given a relevance cost. As the classifier is trained, those instances from the source domain with a high relevance must also be classified correctly. **an-ming and Shao-zi apply TrAdaBoost (a different transfer-learning extension of AdaBoost) [15] to action recognition in video clips [74] . Lam et al. weight the source and target data differently when training an SVM to recognize target actions from video clips [34]. Training a typical SVM involves solving the following optimization problem:

$$\begin{aligned}&\min _{\vec {w},\xi } \left\{ \frac{1}{2}||\vec {w}||^2 + C \sum \limits _{i=1}^{n}\xi _i \right\} \nonumber \\&\text{ s.t. } y_i (\vec {x_i} \cdot \vec {w} + b) - 1 + \xi _i \ge 0,\quad \xi _i \ge 0 \end{aligned}$$
(1)

where \(\varvec{x_i}\) is the \(i\)th datapoint and \(y_i, \xi _i\) are the label and slack variable associated with \(\varvec{x_i}\). \(\varvec{w}\) is the normal to the hyperplane. \(C\) is the parameter that trades off between training accuracy and margin size. However, to allow for the different source and target weights, they solve the following optimization:

$$\begin{aligned}&\min _{\vec {w},\xi }\left\{ \frac{1}{2}||\vec {w}||^2 + C_s \sum \limits _{i=1}^{n}\xi _i+ C_t \sum \limits _{i=n+1}^{n+m}\xi _i\right\} \nonumber \\&\text{ s.t. } y_i (\vec {x_i} \cdot \vec {w} + b) - 1 + \xi _i \ge 0,\quad \xi _i \ge 0 \end{aligned}$$
(2)

where the parameters are the same as before except the first \(n\) datapoints are from the source data and the last \(m\) datapoints are from the target data.

Unlike the previous instance-based approaches which weight the source instances based on similarity of features between the source and target data, Zheng et al. [80] use an instance-based approach to weight source instances based upon the similarity between the label information of the source and target data. This allows them to transfer the labels from instances in the source domain to instances in the target domain using web-knowledge to relate the two domains [25, 26]. Taking a different approach, several researchers [8, 32, 59] use the real-time teacher/learner model discussed in the previous section to transfer the label of the current instance in the source domain to the instance in the target domain.

6.2 Feature-representation transfer

Feature-representation transfer reduces the differences between the source and target feature spaces. This can be accomplished by map** the source feature space to the target feature space such as \(f:\chi _s\rightarrow \chi _t\), by map** the target feature space to the source feature space such as \(g:\chi _t\rightarrow \chi _s\), or by map** both the source and target feature spaces to a common feature space such as \(g:\chi _t\rightarrow \chi \) and \(f:\chi _s\rightarrow \chi \). This map** can be computed manually [66] or learned as part of the transfer learning algorithm [18, 25, 37, 54, 81].

When the map** is part of the transfer-learning algorithm a common approach is to apply a dimensionality reduction technique to map both source and target feature space into a common latent space [4648, 51]. For example, Chattopadhyay et al. [12] use Isomap [62] to map both the source and target data into a common low-dimensional space after which instance-based transfer techniques can be applied.

In some cases, meta-features are first manually introduced into the feature space and then the feature space is automatically mapped from the source domain to the target domain [6, 14, 67]. An example of this is the work of Rashidi and Cook [57]. They first assign a location label to each sensor indicating in which room or functional area the sensor is located. Then activity templates are constructed from the data for both the source and target data, finally a map** is learned between the source and target datasets based upon the similarity of activities and sensors [55, 56].

6.3 Parameter transfer

Parameter transfer learns parameters which are shared between the source and target tasks. One common use of parameter transfer is learning a prior distribution shared between the source and target datasets. For example, one technique [9] models the source and target tasks using a Gaussian Mixture Model which share a prior distribution, another algorithm [18] learns a target classifier using a set of pre-trained classifiers as prior for the target classifier, and van Kasteren et al. [66] propose a method to learn the parameters of a Hidden Markov Model using labeled data from the source domain, and unlabeled data from the target domain. Later they extend this work to learn hyperparameter priors for the HMM instead of learning the parameters directly [67].

Another common example of parameter transfer assumes the SVM parameter \(w\) can be split into two terms: \(w_0\), which is the same for both the source and target tasks, and \(v\), which is specific to the particular task. Thus, \(w_s = w_0 + v_s\) and \(w_t = w_0 + v_t\). Several works adopt this approach [44, 75].

Using a different approach to parameter transfer, a transfer learning algorithm [49, 51] can extract knowledge from the source domain to impose additional constraints on a quadratically-constrained quadratic program optimization problem for the target domain. Along a similar line of thought, Zhao et al. [78, 79] use information extracted from the source domain to initialize cluster centers for a k-means algorithm in the target domain.

6.4 Relational-knowledge transfer

Relational-knowledge transfer applies to problems in which the data is not independent and identically distributed (i.i.d.) as is traditionally assumed but can be represented through multiple relationships [50]. Such problems are usually represented with a network or graph. Relational-knowledge transfer tries to transfer the relationships of in the source domain to the target domain. This type of transfer learning is not heavily explored, and as far as we are able to determine, no research is currently being pursued in transfer learning for activity recognition using relational-knowledge transfer.

7 Summary

The previous sections analyzed a large body of transfer-based activity recognition research along four different dimensions. Looking at each dimension separately provides an orderly way to analyze so many different papers. However, such separation may also make it difficult to see the bigger picture. Table 3, therefore, summarizes the classification of existing works along these four dimensions.

Table 3 Summarization of existing work based on the four dimension of analysis

8 Grand challenges

Although transfer-based activity recognition has progressed significantly in the last few years, there are still many open challenges. In this section, we first consider challenges specific to a particular sensor modality and then we look at challenges which are generalizable to all transfer-based activity recognition.

As can be seen in Table 5, performing transfer-based activity recognition when the source data is not labeled has not received much attention in current research. Outside the domain of activity recognition, researchers have leveraged the unlabeled source data to improve transfer in the target domain [16, 53, 71], but such techniques have yet to be applied to activity recognition.

Another area needing more attention is relational-knowledge transfer for activity recognition as indicated in Table 6. Relational-knowledge transfer requires that there exist certain relationships in the data which can be learned and transferred across populations. Data for activity recognition have the potential to contain such transferable relationships indicating that this may be an important technique to pursue. See [17, 4143] for examples of relational-knowledge transfer.

Tables 4, 5 and 6 also indicate several more niche areas which could be further investigated. For example, in the video camera domain, most of the work has focused on informed supervised parameter-based transfer learning, while the other techniques have not been heavily applied. Similarly, transferring across different label spaces is a much less studied problem in transfer-based activity recognition. Finally, we note that parameter-based transfer learning is also less studied for the ambient sensor modality.

Table 4 Existing work categorized by sensor modality and the differences between the source and target datasets
Table 5 Existing work categorized by sensor modality and data labeling
Table 6 Existing work categorized by sensor modality and the type of knowledge transferred

The current direction of most transfer-based activity recognition is to push the limits on how different the source and target domains and tasks can be. The scenarios discussed in Sect. 2 illustrate the importance of continuing in this direction. More work is needed to improve transfer across sensor modalities and to transfer knowledge across multiple differences. Instead of transferring learning from one smart home environment to another, can we transfer from a smart home to a smart workplace or smart hospital? We envision one day chaining multiple transfers together to achieve even greater diversity between the source and target populations.

As researchers continue to expand the applicability of transfer learning, two natural questions arise. First, can we define a generalizable distance metric for determining the difference between the source and target populations? Some domain-specific distances have been used in the past, but it would be useful if we had a domain-independent distance measure. This measure could be used to facilitate comparisons between different transfer-learning approaches as well as provide an indication of whether transfer learning should even be applied in a given situations. Such a measure would need to indicate how the source and target data differ (feature space, marginal probabilities, label space, and objective predictive function) as well as quantify the magnitude of the differences. Second, can we detect and prevent the occurrence of negative transfer effects. Negative transfer effects occur when the use of transfer learning actually decreases performance instead of increasing performance. These two questions are actually related, because an accurate distance metric may provide an indication of when negative transfer will occur for a given transfer-learning technique. Rosenstein et al. looked at the question of when to use transfer learning in [60]. They empirically show that when two tasks are of sufficient dissimilarity, negative transfer occurs. Mahmud and Ray define a distance metric for measuring the similarity of two tasks based on the conditional Kolmogorov complexity between the tasks and prove some theoretical bounds using this distance measure [39].

This survey has reviewed the current literature regarding transfer-based activity recognition. We discussed several promising techniques and consider the many open challenges that still need to be addressed in order to progress the field of transfer learning for activity recognition.