Log in

A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data

  • Data Analytics and Machine Learning
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

When identifying the sequence of some species using fewer known gene training data (named target domain), the data of closely related species and unlabeled data of the species (named source domain) could be considered for auxiliary training. However, there are differences in the statistical distribution of the feature space comprising of genetic data of different species. Therefore, this paper proposes a feature and sample jointed transfer (FSJT) method for semi-supervised scenarios, consisting of two modules. In the first module, the distance between the sample probability distribution functions in the feature space is taken as the optimization objective, and a hybrid balanced distribution adaptation method is constructed to transform the feature space of the two domains to increase the similarity between the domains. In the second module, the confidence of the unlabeled data in the target domain is defined and a self-learning sample transfer method is proposed to reduce the impact of samples having large differences in source-domain training data. Simultaneously, to select the suitable source-domain samples and the target domain when the sample size between the fields is very different, the transferred Lasso and the nearest-neighbor (TLR) feature selection method is proposed using FSJT. Then, the whole framework and algorithm flow of the TLR-FSJT model is presented and verified using the transfer learning standard dataset and ribonucleic acid data from GenBank database by comparing it with three machine learning methods and the FSJT model. Results show that the TLR-FSJT model has the highest accuracy in semi-supervised scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Abbas Q, Raza SM, Biyabani AA, Jaffar MA (2016) A review of computational methods for finding non-coding RNA genes. Genes 7(12):113

    Article  Google Scholar 

  • Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Ostell J, Pruitt KD et al (2018) GenBank. Nucleic Acids Res 46(D1):41–47

    Article  Google Scholar 

  • Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing

  • Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):49–57

    Article  Google Scholar 

  • Cao L (2017) The research of face recognition based on transfer learning and feature fusion. Disseartation, Shandong University

  • Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    Article  MathSciNet  Google Scholar 

  • Dai W, Qiang Y, Xue G,Yong Y (2007) Boosting for transfer learning. In: Proceedings of the twenty-fourth international conference machine learning, (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007. ACM

  • Dai WY (2009) Instance-based and feature-based transfer learning. Dissertation, Shanghai Jiao Tong University

  • Djebali S, Davis CA, Merkel A, Dobin A et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108

    Article  Google Scholar 

  • Duan L, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal Mach Intell 34(3):465–479

    Article  Google Scholar 

  • Han JY (2015) Semi-supervised text classification algorithms based on transfer learning. Dissertation, Jilin University

  • Hu DH, Yang Q (2011) Transfer learning for activity recognition via sensor map**. In Twenty-second international joint conference on artificial intelligence

  • Huang J (2006) Correcting sample selection bias by unlabeled data; advances in neural information processing systems: proceedings of the 2004 conference. Adv Neural Inf Process Syst 19:601–608

    Google Scholar 

  • Huang X, Rao Y, **e H, Wong TL, Fu LW (2017) Cross-domain sentiment classification via topic-related TrAdaBoost. In: Thirty-first AAAI conference on artificial intelligence

  • Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y et al (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 47(3):199–208

    Article  Google Scholar 

  • Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artific Intell Appl Comput Eng 160(1):3–24

    Google Scholar 

  • Li L, Eichten SR, Shimizu R, Petsch K, Yeh CT, Wu W et al (2014) Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol 15(2):1–15

    Article  Google Scholar 

  • Li Y, Zhu R, Yi L, Nan M (2018) Tradaboost based on improved particle swarm optimization for cross-domain scene classification with limited samples. IEEE J Select Top Appl Earth Observ Remote Sensing 11(9):3235–3251

    Article  Google Scholar 

  • Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. Comput Vis Pattern Recogn. IEEE

  • Long MS (2014) Transfer learning: problems and methods. Dissertation, Tsinghua University

  • Ni C (2017) Research on software defect prediction based on transfer learning. Dissertation, Nan**g University

  • Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  • Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210

    Article  Google Scholar 

  • Pervouchine DD, Djebali S, Breschi A, Davis CA et al (2015) Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat Commun 6(1):1–11

    Article  Google Scholar 

  • Post K, Olson ED, Naufer MN, Gorelick RJ, Rouzina I, Williams MC et al (2016) Mechanistic differences between HIV-1 and SIV nucleocapsid proteins and cross-species HIV-1 genomic RNA recognition. Retrovirology 13(1):1–18

    Article  Google Scholar 

  • Rao CJ, Gao MY, Wen JH, Goh M (2022) Multi-attribute group decision making method with dual comprehensive clouds under information environment of dual uncertain Z-numbers. Inf Sci 602:106–127

    Article  Google Scholar 

  • Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision. Springer, Berlin

  • Tan B, Song Y, Zhong E, Qiang Y (2015) Transitive transfer learning. In: Acm Sigkdd international conference on knowledge discovery & data mining. ACM

  • Wang J, Chen Y, Hao S, Feng W, Shen Z (2017) Balanced distribution adaptation for transfer learning. In 2017 IEEE international conference on data mining (ICDM). IEEE

  • Wen JH, Liu YS, Shi Y, Huang HR, Deng B, **ao XP (2019) A classification model of LncRNA and mRNA based on k-mers and convolutional neural network. BMC Bioinform 20:469

    Article  Google Scholar 

  • Yu S, Krishnapuram B, Steck H, Rao R, Rosales R (2007) Bayesian co-training. Adv Neural Inf Process Syst 20

  • Zhang Y, Huang H, Zhang D, Qiu J, Yang J, Wang K et al (2017) A review on recent computational methods for predicting noncoding RNAs. BioMed Res Int

  • Zhang Y, Yeung DY (2012) Transfer metric learning with semi-supervised extension. ACM Trans Intell Syst Technol 3(3):1–28

    Google Scholar 

  • Zheng VW, Pan SJ, Yang Q, Pan JJ (2008) Transferring multi-device localization models using latent multi-task learning. In AAAI

  • Zhou H, Zhang Y, Huang D, Li L (2013) Semi-supervised learning with transfer learning. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Berlin

Download references

Funding

This work was jointly supported by the National Nature Science Foundation of China (52272354) and the Innovation fund of Wuhan academy of Agricultural Sciences (XTCX202202; XKCX2022024).

Author information

Authors and Affiliations

Authors

Contributions

Jianghui Wen involved in conceptualization, methodology, software, formal analysis, and writing—original draft. Haoran Huang involved in methodology, software, formal analysis, and writing—original draft. Zhenyu Pu involved in formal analysis and supervision. Bing Deng involved in conceptualization, formal analysis, writing—review and editing.

Corresponding author

Correspondence to Bing Deng.

Ethics declarations

Conflict of interest

The authors have declared that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, J., Huang, H., Pu, Z. et al. A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data. Soft Comput 27, 5411–5423 (2023). https://doi.org/10.1007/s00500-022-07773-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07773-7

Keywords

Navigation