Abstract
Gene selection is a key issue in the analysis of microarray data with small samples and variant correlation. The main objective of this paper is to select the most informative genes from thousands of genes with strong correlation. This is achieved by proposing an efficient two-stage gene selection (TSGS) algorithm. In this algorithm, the L 2-norm penalty are firstly introduced to achieve the grou** effect for the highly correlated genes. To overcome the small samples problem, the augmented data technique is then used to produce an augmented data set. Finally, by using the recently proposed two-stage algorithm, the most informative genes can be selected effectively. Simulation results confirm its effectiveness of the proposed approach in comparison with the popular Elastic Net method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Liu, B., Wan, C., Wang, L.: An efficient semi-unsupervised gene selecttion method via spectra biclustering. IEEE Transactions on Nanobioscience 5(2), 110–114 (2006)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Cai, R., Hao, Z., Yang, X., Wen, W.: An efficient gene selection algorithm based on mutual information. Neurocomputing 72, 991–999 (2009)
Zhou, X., Mao, K.Z.: LS bound based gene selection for DNA micorarray data. Bioinformatics 21(8), 1559–1564 (2005)
Freund, Y., Schapire, R.: A dicision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J.R. Statist. Soc.B 67(2), 301–320 (2005)
Li, K., Peng, J.X., Bai, E.W.: A two-stage algorithm for identification of nonlinear dynamic systems. Automatica 42(7), 1189–1197 (2006)
Marquardt, D.W.: Generalized inverses, ridge regression, biased linerar estimation, and nonlinear estimation. Technometrics 12(3), 591–612 (1970)
Nelles, O.: Nonlinear system identification. Springer (2001)
Sha, N., Vannucci, M., Brown, P., Trower, M., Amphlett, G.: Gene selection in arthritis classification with large-scale microarray expression profiles. Comparative and Functional Genomics 4, 171–181 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Du, D., Li, K., Deng, J. (2013). An Efficient Two-Stage Gene Selection Method for Microarray Data. In: Li, K., Li, S., Li, D., Niu, Q. (eds) Intelligent Computing for Sustainable Energy and Environment. ICSEE 2012. Communications in Computer and Information Science, vol 355. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37105-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-37105-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37104-2
Online ISBN: 978-3-642-37105-9
eBook Packages: Computer ScienceComputer Science (R0)