Classification of RNA-seq Data

  • Chapter
  • First Online:
Statistical Analysis of Next Generation Sequencing Data

Abstract

Next-generation sequencing technologies have made it possible to obtain, at a relatively low cost, a detailed snapshot of the RNA transcripts present in a tissue sample. The resulting reads are usually binned by gene, exon, or other region of interest; thus the data typically amount to read counts for tens of thousands of features, on no more than dozens or hundreds of observations. It is often of interest to use these data to develop a classifier in order to assign an observation to one of several pre-defined classes. However, the high dimensionality of the data poses statistical challenges: because there are far more features than observations, many existing classification techniques cannot be directly applied. In recent years, a number of proposals have been made to extend existing classification approaches to the high-dimensional setting. In this chapter, we discuss the use of, and modifications to, logistic regression, linear discriminant analysis, principal components analysis, partial least squares, and the support vector machine in the high-dimensional setting. We illustrate these methods on two RNA-sequencing data sets.

indicates joint first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 149.79
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 149.79
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The training set is the set of observations used to fit the classifier.

  2. 2.

    We thank Liguo Wang for providing us the raw counts for the prostate cancer data set used in [49].

  3. 3.

    In greater detail, for all methods except for Poisson LDA, we divided each observation by the scaling factors discussed in Sect. 11.7. In contrast, in applying Poisson LDA, observations were not divided by the scaling factor—instead, the scaling factor is directly incorporated into (11.14).

  4. 4.

    Briefly, R-fold cross-validation involves splitting the observations in the training set into R sets. Then for r = 1, …, R, we build classifiers for a range of tuning parameters using all observations except those in the rth fold. We then calculate the error e r of each of these classifiers on the observations in the rth fold. Finally, we calculate the cross-validation error as \(\frac{1} {R}\sum _{r=1}^{R}e_{r}\). The tuning parameter value corresponding to the minimum cross-validation error is selected.

  5. 5.

    Proposals have been made for an â„“ 1-penalized SVM that results in a sparse decision rule, but the standard SVM decision rule involves all of the features [100].

References

  1. Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)

    Book  MATH  Google Scholar 

  2. Aguilera, A.M., Escabias, M., Valderrama, M.J.: Using principal components for estimating logistic regression with high-dimensional multicollinear data. Comput. Stat. Data Anal. 50(8), 1905–1924 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  3. Allen, D.M.: The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1), 125–127 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  4. Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)

    Article  Google Scholar 

  5. Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2(4), e108 (2004)

    Article  Google Scholar 

  7. Barshan, E., Ghodsi, A., Azimifar, Z., Zolghadri Jahromi, M.: Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recogn. 44(7), 1357–1371 (2011)

    Article  MATH  Google Scholar 

  8. Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function, naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10(6), 989–1010 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  9. Boulesteix, A.L.: PLS dimension reduction for classification with microarray data. Stat. Appl. Genet. Mol. Biol. 3(1), 1–33 (2004)

    MathSciNet  Google Scholar 

  10. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  11. Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)

    Article  Google Scholar 

  12. Bullard, J., Purdom, E., Hansen, K., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinform. 11, 94 (2010)

    Article  Google Scholar 

  13. Chun, H., Keleş, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. Roy. Stat. Soc. Ser. B (Stat. Meth.) 72(1), 3–25 (2010)

    Google Scholar 

  14. Chung, D., Keles, S.: Sparse partial least squares classification for high dimensional data. Stat. Appl. Genet. Mol. Biol. 9(1), Article 17 (2010)

    Google Scholar 

  15. Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53(4), 406–413 (2011)

    Article  MathSciNet  Google Scholar 

  16. Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal components analysis to the exponential family. In Advances in Neural Information Processing Systems, pp. 617–624 (2001)

    Google Scholar 

  17. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  18. d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)

    Google Scholar 

  19. d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)

    Google Scholar 

  20. Datta, S., Pihur, V., Datta, S.: An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data. BMC Bioinformatics 11(1), 427 (2010)

    Article  Google Scholar 

  21. De Leeuw, J.: Principal component analysis of binary data by iterated singular value decomposition. Comput. Stat. Data Anal. 50(1), 21–39 (2006)

    Article  MATH  Google Scholar 

  22. Dietterich, T.G.: Ensemble methods in machine learning. In: Multiple Classifier Systems, pp. 1–15. Springer, Berlin (2000)

    Google Scholar 

  23. Dillies, M.A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., et al.: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14(6), 671–683 (2013)

    Article  Google Scholar 

  24. Ding, B., Gentleman, R.: Classification using generalized partial least squares. J. Comput. Graph. Stat. 14(2), 280–298 (2005)

    Article  MathSciNet  Google Scholar 

  25. Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90(432), 1200–1224 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  26. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  27. Efron, B.: Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 78(382), 316–331 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  28. Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21(7), 1104–1111 (2005)

    Article  Google Scholar 

  29. Frank, L.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35(2), 109–135 (1993)

    Article  MATH  Google Scholar 

  30. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Software 33(1), 1–22 (2010)

    Google Scholar 

  31. Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)

    Article  Google Scholar 

  32. Fu, X., Fu, N., Guo, S., Yan, Z., Xu, Y., Hu, H., Menzel, C., Chen, W., Li, Y., Zeng, R., et al.: Estimating accuracy of RNA-seq and microarrays with proteomics. BMC Genom. 10, 161 (2009)

    Article  Google Scholar 

  33. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  34. Geisser, S.: The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70(350), 320–328 (1975)

    Article  MATH  Google Scholar 

  35. Grosenick, L., Greer, S., Knutson, B.: Interpretable classifiers for FMRI improve prediction of purchases. IEEE Trans. Neural Syst. Rehabil. Eng. 16(6), 539–548 (2008)

    Article  Google Scholar 

  36. Guo, Y., Hastie, T., Tibshirani, R.: Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8(1), 86–100 (2007)

    Article  MATH  Google Scholar 

  37. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  38. Haas, B.J., Zody, M.C., et al.: Advancing RNA-seq analysis. Nat. Biotech. 28(5), 421–423 (2010)

    Article  Google Scholar 

  39. Hastie, T., Buja, A., Tibshirani, R.: Penalized discriminant analysis. Ann. Stat. 23(1), 73–102 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  40. Hastie, T., Tibshirani, R.: Discriminant analysis by Gaussian mixtures. J. Roy. Stat. Soc. Ser. B (Methodological) 58(1), 155–176 (1996)

    Google Scholar 

  41. Hastie, T., Tibshirani, R., Buja, A.: Flexible discriminant analysis by optimal scoring. J. Am. Stat. Assoc. 89, 1255–1270 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  42. Hastie, T., Tibshirani, R., Friedman, J.J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)

    Book  Google Scholar 

  43. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  44. Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  45. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013)

    Book  MATH  Google Scholar 

  46. Jolliffe, I.: Principal Component Analysis. Wiley, New York (2005)

    Google Scholar 

  47. Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003)

    Article  MathSciNet  Google Scholar 

  48. Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)

    MATH  MathSciNet  Google Scholar 

  49. Kannan, K., Wang, L., Wang, J., Ittmann, M.M., Li, W., Yen, L.: Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. Proc. Natl. Acad. Sci. 108(22), 9172–9177 (2011)

    Article  Google Scholar 

  50. Lee, S., Huang, J.Z., Hu, J.: Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4(3), 1579–1601 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  51. Lee, S.I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient L1 regularized logistic regression. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, pp. 401–408. AAAI Press, Menlo Park (1999); MIT Press, Cambridge, London (2006)

    Google Scholar 

  52. Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99(465), 67–81 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  53. Leek, J.T., Scharpf, R.B., Bravo, H.C., Simcha, D., Langmead, B., Johnson, W.E., Geman, D., Baggerly, K., Irizarry, R.A.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11(10), 733–739 (2010)

    Article  Google Scholar 

  54. Leng, C.: Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data. Comput. Biol. Chem. 32(6), 417–425 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  55. Li, J., Witten, D.M., Johnstone, I.M., Tibshirani, R.: Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics 13(3), 523–538 (2012)

    Article  Google Scholar 

  56. Ma, Z.: Sparse principal component analysis and iterative thresholding. Ann. Stat. 41(2), 772–801 (2013)

    Article  MATH  Google Scholar 

  57. Mai, Q., Zou, H., Yuan, M.: A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99(1), 29–42 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  58. Malone, J.H., Oliver, B.: Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. 9, 34 (2011)

    Article  Google Scholar 

  59. Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic, New York (1980)

    Google Scholar 

  60. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., Gilad, Y.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18(9), 1509–1517 (2008)

    Article  Google Scholar 

  61. Marx, B.D.: Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38(4), 374–381 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  62. Marx, B.D., Smith, E.P.: Principal component estimation for generalized linear regression. Biometrika 77(1), 23–31 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  63. McCarthy, D.J., Chen, Y., Smyth, G.K.: Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleic Acids Res. 40(10), 4288–4297 (2012)

    Article  Google Scholar 

  64. McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman and Hall, Boca Raton (1989)

    Book  MATH  Google Scholar 

  65. Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. Roy. Stat. Soc. Ser. B (Stat. Meth.) 70(1), 53–71 (2008)

    Google Scholar 

  66. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Map** and quantifying mammalian transcriptomes by RNA-seq. Nat. Meth. 5(7), 621–628 (2008)

    Article  Google Scholar 

  67. Nguyen, D.V., Rocke, D.M.: Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18(9), 1216–1226 (2002)

    Article  Google Scholar 

  68. Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1), 39–50 (2002)

    Article  Google Scholar 

  69. Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)

    MATH  Google Scholar 

  70. Oshlack, A., Wakefield, M.J.: Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 4(14) (2009)

    Google Scholar 

  71. Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12(2), 87–98 (2010)

    Article  Google Scholar 

  72. Park, M.Y., Hastie, T.: L1-regularization path algorithm for generalized linear models. J. Roy. Stat. Soc. Ser. B (Stat. Meth.) 69(4), 659–677 (2007)

    Google Scholar 

  73. Park, P.J.: ChIP–seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009)

    Article  Google Scholar 

  74. Quackenbush, J.: Microarray data normalization and transformation. Nat. Genet. 32, 496–501 (2002)

    Article  Google Scholar 

  75. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.R-project.org/

  76. Robinson, M.D., McCarthy, D.J., Smyth, G.K.: edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)

    Google Scholar 

  77. Robinson, M.D., Oshlack, A.: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010)

    Article  Google Scholar 

  78. Schein, A.I., Saul, L.K., Ungar, L.H.: A generalized linear model for principal component analysis of binary data. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, pp. 14–21 (2003)

    Google Scholar 

  79. Shao, J.: Linear model selection by cross-validation. J. Am. Stat. Assoc. 88(422), 486–494 (1993)

    Article  MATH  Google Scholar 

  80. Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99(6), 1015–1034 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  81. Shendure, J.: The beginning of the end for microarrays? Nat. Meth. 5(7), 585–587 (2008)

    Article  Google Scholar 

  82. Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Stat. Soc. Ser. B (Methodological) 36, 111–147 (1974)

    MATH  Google Scholar 

  83. Tarazona, S., García-Alcalde, F., Dopazo, J., Ferrer, A., Conesa, A.: Differential expression in RNA-seq: a matter of depth. Genome Res. 21(12), 2213–2223 (2011)

    Article  Google Scholar 

  84. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodological) 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  85. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)

    Article  Google Scholar 

  86. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18(1), 104–117 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  87. Trendafilov, N.T., Jolliffe, I.T.: Projected gradient approach to the numerical solution of the SCoTLASS. Comput. Stat. Data Anal. 50(1), 242–253 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  88. Trendafilov, N.T., Jolliffe, I.T.: DALASS: variable selection in discriminant analysis via the LASSO. Comput. Stat. Data Anal. 51(8), 3718–3736 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  89. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2000)

    Book  MATH  Google Scholar 

  90. Wang, Z., Gerstein, M., Snyder, M.: RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)

    Article  Google Scholar 

  91. Weston, J., Watkins, C.: Multi-class support vector machines. Technical report, Citeseer (1998)

    Google Scholar 

  92. Witten, D., Tibshirani, R., Gu, S.G., Fire, A., Lui, W.O.: Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biol. 8(58) (2010)

    Google Scholar 

  93. Witten, D.M.: Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5(4), 2493–2518 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  94. Witten, D.M., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. Roy. Stat. Soc. Ser. B (Stat. Meth.) 73(5), 753–772 (2011)

    Google Scholar 

  95. Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)

    Article  Google Scholar 

  96. Wold, H., et al.: Estimation of principal components and related models by iterative least squares. Multivariate Anal. 1, 391–420 (1966)

    MathSciNet  Google Scholar 

  97. Wold, S.: Cross-validatory estimation of the number of components in factor and principal components models. Technometrics 20(4), 397–405 (1978)

    Article  MATH  Google Scholar 

  98. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. Ser. B (Stat. Meth.) 68(1), 49–67 (2006)

    Google Scholar 

  99. Zhu, J., Hastie, T.: Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3), 427–443 (2004)

    Article  MATH  Google Scholar 

  100. Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. Adv. Neural Inform. Process. Syst. 16(1), 49–56 (2004)

    Google Scholar 

  101. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Ser. B (Stat. Meth.) 67(2), 301–320 (2005)

    Google Scholar 

  102. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

D.W. received support for this work from NIH Grant DP5OD009145, NSF CAREER Award DMS-1252624, and a Sloan Foundation Research Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kean Ming Tan or Ashley Petersen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Tan, K.M., Petersen, A., Witten, D. (2014). Classification of RNA-seq Data. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_11

Download citation

Publish with us

Policies and ethics

Navigation