Abstract
When identifying the sequence of some species using fewer known gene training data (named target domain), the data of closely related species and unlabeled data of the species (named source domain) could be considered for auxiliary training. However, there are differences in the statistical distribution of the feature space comprising of genetic data of different species. Therefore, this paper proposes a feature and sample jointed transfer (FSJT) method for semi-supervised scenarios, consisting of two modules. In the first module, the distance between the sample probability distribution functions in the feature space is taken as the optimization objective, and a hybrid balanced distribution adaptation method is constructed to transform the feature space of the two domains to increase the similarity between the domains. In the second module, the confidence of the unlabeled data in the target domain is defined and a self-learning sample transfer method is proposed to reduce the impact of samples having large differences in source-domain training data. Simultaneously, to select the suitable source-domain samples and the target domain when the sample size between the fields is very different, the transferred Lasso and the nearest-neighbor (TLR) feature selection method is proposed using FSJT. Then, the whole framework and algorithm flow of the TLR-FSJT model is presented and verified using the transfer learning standard dataset and ribonucleic acid data from GenBank database by comparing it with three machine learning methods and the FSJT model. Results show that the TLR-FSJT model has the highest accuracy in semi-supervised scenarios.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07773-7/MediaObjects/500_2022_7773_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07773-7/MediaObjects/500_2022_7773_Fig2_HTML.png)
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Abbas Q, Raza SM, Biyabani AA, Jaffar MA (2016) A review of computational methods for finding non-coding RNA genes. Genes 7(12):113
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Ostell J, Pruitt KD et al (2018) GenBank. Nucleic Acids Res 46(D1):41–47
Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing
Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):49–57
Cao L (2017) The research of face recognition based on transfer learning and feature fusion. Disseartation, Shandong University
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Dai W, Qiang Y, Xue G,Yong Y (2007) Boosting for transfer learning. In: Proceedings of the twenty-fourth international conference machine learning, (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007. ACM
Dai WY (2009) Instance-based and feature-based transfer learning. Dissertation, Shanghai Jiao Tong University
Djebali S, Davis CA, Merkel A, Dobin A et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108
Duan L, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal Mach Intell 34(3):465–479
Han JY (2015) Semi-supervised text classification algorithms based on transfer learning. Dissertation, Jilin University
Hu DH, Yang Q (2011) Transfer learning for activity recognition via sensor map**. In Twenty-second international joint conference on artificial intelligence
Huang J (2006) Correcting sample selection bias by unlabeled data; advances in neural information processing systems: proceedings of the 2004 conference. Adv Neural Inf Process Syst 19:601–608
Huang X, Rao Y, **e H, Wong TL, Fu LW (2017) Cross-domain sentiment classification via topic-related TrAdaBoost. In: Thirty-first AAAI conference on artificial intelligence
Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y et al (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 47(3):199–208
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artific Intell Appl Comput Eng 160(1):3–24
Li L, Eichten SR, Shimizu R, Petsch K, Yeh CT, Wu W et al (2014) Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol 15(2):1–15
Li Y, Zhu R, Yi L, Nan M (2018) Tradaboost based on improved particle swarm optimization for cross-domain scene classification with limited samples. IEEE J Select Top Appl Earth Observ Remote Sensing 11(9):3235–3251
Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. Comput Vis Pattern Recogn. IEEE
Long MS (2014) Transfer learning: problems and methods. Dissertation, Tsinghua University
Ni C (2017) Research on software defect prediction based on transfer learning. Dissertation, Nan**g University
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Pervouchine DD, Djebali S, Breschi A, Davis CA et al (2015) Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat Commun 6(1):1–11
Post K, Olson ED, Naufer MN, Gorelick RJ, Rouzina I, Williams MC et al (2016) Mechanistic differences between HIV-1 and SIV nucleocapsid proteins and cross-species HIV-1 genomic RNA recognition. Retrovirology 13(1):1–18
Rao CJ, Gao MY, Wen JH, Goh M (2022) Multi-attribute group decision making method with dual comprehensive clouds under information environment of dual uncertain Z-numbers. Inf Sci 602:106–127
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision. Springer, Berlin
Tan B, Song Y, Zhong E, Qiang Y (2015) Transitive transfer learning. In: Acm Sigkdd international conference on knowledge discovery & data mining. ACM
Wang J, Chen Y, Hao S, Feng W, Shen Z (2017) Balanced distribution adaptation for transfer learning. In 2017 IEEE international conference on data mining (ICDM). IEEE
Wen JH, Liu YS, Shi Y, Huang HR, Deng B, **ao XP (2019) A classification model of LncRNA and mRNA based on k-mers and convolutional neural network. BMC Bioinform 20:469
Yu S, Krishnapuram B, Steck H, Rao R, Rosales R (2007) Bayesian co-training. Adv Neural Inf Process Syst 20
Zhang Y, Huang H, Zhang D, Qiu J, Yang J, Wang K et al (2017) A review on recent computational methods for predicting noncoding RNAs. BioMed Res Int
Zhang Y, Yeung DY (2012) Transfer metric learning with semi-supervised extension. ACM Trans Intell Syst Technol 3(3):1–28
Zheng VW, Pan SJ, Yang Q, Pan JJ (2008) Transferring multi-device localization models using latent multi-task learning. In AAAI
Zhou H, Zhang Y, Huang D, Li L (2013) Semi-supervised learning with transfer learning. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Berlin
Funding
This work was jointly supported by the National Nature Science Foundation of China (52272354) and the Innovation fund of Wuhan academy of Agricultural Sciences (XTCX202202; XKCX2022024).
Author information
Authors and Affiliations
Contributions
Jianghui Wen involved in conceptualization, methodology, software, formal analysis, and writing—original draft. Haoran Huang involved in methodology, software, formal analysis, and writing—original draft. Zhenyu Pu involved in formal analysis and supervision. Bing Deng involved in conceptualization, formal analysis, writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors have declared that there are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wen, J., Huang, H., Pu, Z. et al. A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data. Soft Comput 27, 5411–5423 (2023). https://doi.org/10.1007/s00500-022-07773-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07773-7