A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data

Wen, Jianghui; Huang, Haoran; Pu, Zhenyu; Deng, Bing

doi:10.1007/s00500-022-07773-7

A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data

Data Analytics and Machine Learning
Published: 11 January 2023

Volume 27, pages 5411–5423, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jianghui Wen¹,
Haoran Huang¹,
Zhenyu Pu² &
…
Bing Deng²

199 Accesses
Explore all metrics

Abstract

When identifying the sequence of some species using fewer known gene training data (named target domain), the data of closely related species and unlabeled data of the species (named source domain) could be considered for auxiliary training. However, there are differences in the statistical distribution of the feature space comprising of genetic data of different species. Therefore, this paper proposes a feature and sample jointed transfer (FSJT) method for semi-supervised scenarios, consisting of two modules. In the first module, the distance between the sample probability distribution functions in the feature space is taken as the optimization objective, and a hybrid balanced distribution adaptation method is constructed to transform the feature space of the two domains to increase the similarity between the domains. In the second module, the confidence of the unlabeled data in the target domain is defined and a self-learning sample transfer method is proposed to reduce the impact of samples having large differences in source-domain training data. Simultaneously, to select the suitable source-domain samples and the target domain when the sample size between the fields is very different, the transferred Lasso and the nearest-neighbor (TLR) feature selection method is proposed using FSJT. Then, the whole framework and algorithm flow of the TLR-FSJT model is presented and verified using the transfer learning standard dataset and ribonucleic acid data from GenBank database by comparing it with three machine learning methods and the FSJT model. Results show that the TLR-FSJT model has the highest accuracy in semi-supervised scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Protein Localization Using a Domain Adaptation Approach

Active Sampling Based on MMD for Model Adaptation

An evaluation of approaches for using unlabeled data with domain adaptation

Article 07 July 2016

Data availability

Enquiries about data availability should be directed to the authors.

References

Abbas Q, Raza SM, Biyabani AA, Jaffar MA (2016) A review of computational methods for finding non-coding RNA genes. Genes 7(12):113
Article Google Scholar
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Ostell J, Pruitt KD et al (2018) GenBank. Nucleic Acids Res 46(D1):41–47
Article Google Scholar
Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing
Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):49–57
Article Google Scholar
Cao L (2017) The research of face recognition based on transfer learning and feature fusion. Disseartation, Shandong University
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Article MathSciNet Google Scholar
Dai W, Qiang Y, Xue G,Yong Y (2007) Boosting for transfer learning. In: Proceedings of the twenty-fourth international conference machine learning, (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007. ACM
Dai WY (2009) Instance-based and feature-based transfer learning. Dissertation, Shanghai Jiao Tong University
Djebali S, Davis CA, Merkel A, Dobin A et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108
Article Google Scholar
Duan L, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal Mach Intell 34(3):465–479
Article Google Scholar
Han JY (2015) Semi-supervised text classification algorithms based on transfer learning. Dissertation, Jilin University
Hu DH, Yang Q (2011) Transfer learning for activity recognition via sensor map**. In Twenty-second international joint conference on artificial intelligence
Huang J (2006) Correcting sample selection bias by unlabeled data; advances in neural information processing systems: proceedings of the 2004 conference. Adv Neural Inf Process Syst 19:601–608
Google Scholar
Huang X, Rao Y, **e H, Wong TL, Fu LW (2017) Cross-domain sentiment classification via topic-related TrAdaBoost. In: Thirty-first AAAI conference on artificial intelligence
Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y et al (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 47(3):199–208
Article Google Scholar
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artific Intell Appl Comput Eng 160(1):3–24
Google Scholar
Li L, Eichten SR, Shimizu R, Petsch K, Yeh CT, Wu W et al (2014) Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol 15(2):1–15
Article Google Scholar
Li Y, Zhu R, Yi L, Nan M (2018) Tradaboost based on improved particle swarm optimization for cross-domain scene classification with limited samples. IEEE J Select Top Appl Earth Observ Remote Sensing 11(9):3235–3251
Article Google Scholar
Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. Comput Vis Pattern Recogn. IEEE
Long MS (2014) Transfer learning: problems and methods. Dissertation, Tsinghua University
Ni C (2017) Research on software defect prediction based on transfer learning. Dissertation, Nan**g University
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Article Google Scholar
Pervouchine DD, Djebali S, Breschi A, Davis CA et al (2015) Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat Commun 6(1):1–11
Article Google Scholar
Post K, Olson ED, Naufer MN, Gorelick RJ, Rouzina I, Williams MC et al (2016) Mechanistic differences between HIV-1 and SIV nucleocapsid proteins and cross-species HIV-1 genomic RNA recognition. Retrovirology 13(1):1–18
Article Google Scholar
Rao CJ, Gao MY, Wen JH, Goh M (2022) Multi-attribute group decision making method with dual comprehensive clouds under information environment of dual uncertain Z-numbers. Inf Sci 602:106–127
Article Google Scholar
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision. Springer, Berlin
Tan B, Song Y, Zhong E, Qiang Y (2015) Transitive transfer learning. In: Acm Sigkdd international conference on knowledge discovery & data mining. ACM
Wang J, Chen Y, Hao S, Feng W, Shen Z (2017) Balanced distribution adaptation for transfer learning. In 2017 IEEE international conference on data mining (ICDM). IEEE
Wen JH, Liu YS, Shi Y, Huang HR, Deng B, **ao XP (2019) A classification model of LncRNA and mRNA based on k-mers and convolutional neural network. BMC Bioinform 20:469
Article Google Scholar
Yu S, Krishnapuram B, Steck H, Rao R, Rosales R (2007) Bayesian co-training. Adv Neural Inf Process Syst 20
Zhang Y, Huang H, Zhang D, Qiu J, Yang J, Wang K et al (2017) A review on recent computational methods for predicting noncoding RNAs. BioMed Res Int
Zhang Y, Yeung DY (2012) Transfer metric learning with semi-supervised extension. ACM Trans Intell Syst Technol 3(3):1–28
Google Scholar
Zheng VW, Pan SJ, Yang Q, Pan JJ (2008) Transferring multi-device localization models using latent multi-task learning. In AAAI
Zhou H, Zhang Y, Huang D, Li L (2013) Semi-supervised learning with transfer learning. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Berlin

Download references

Funding

This work was jointly supported by the National Nature Science Foundation of China (52272354) and the Innovation fund of Wuhan academy of Agricultural Sciences (XTCX202202; XKCX2022024).

Author information

Authors and Affiliations

School of Science, Wuhan University of Technology, Wuhan, 430070, People’s Republic of China
Jianghui Wen & Haoran Huang
Wuhan Academy of Agricultural Sciences, Wuhan, 430208, People’s Republic of China
Zhenyu Pu & Bing Deng

Authors

Jianghui Wen
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Pu
View author publications
You can also search for this author in PubMed Google Scholar
Bing Deng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jianghui Wen involved in conceptualization, methodology, software, formal analysis, and writing—original draft. Haoran Huang involved in methodology, software, formal analysis, and writing—original draft. Zhenyu Pu involved in formal analysis and supervision. Bing Deng involved in conceptualization, formal analysis, writing—review and editing.

Corresponding author

Correspondence to Bing Deng.

Ethics declarations

Conflict of interest

The authors have declared that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wen, J., Huang, H., Pu, Z. et al. A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data. Soft Comput 27, 5411–5423 (2023). https://doi.org/10.1007/s00500-022-07773-7

Download citation

Accepted: 20 December 2022
Published: 11 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00500-022-07773-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Protein Localization Using a Domain Adaptation Approach

Active Sampling Based on MMD for Model Adaptation

An evaluation of approaches for using unlabeled data with domain adaptation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A novel feature and sample joint transfer learning method with feature selection in semi-supervised scenarios for identifying the sequence of some species with less known genetic data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Protein Localization Using a Domain Adaptation Approach

Active Sampling Based on MMD for Model Adaptation

An evaluation of approaches for using unlabeled data with domain adaptation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation