Abstract
Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires map** sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference map** approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.
Similar content being viewed by others
Availability of data and materials
All DiffKAP resources are available at: http://www.appliedbioinformatics.com.au/index.php/DiffKAP
References
Barshis DJ, Ladner JT, Oliver TA, Seneca FO, Traylor-Knowles N, Palumbi SR (2013) Genomic basis for coral resilience to climate change. Proc Natl Acad Sci 110:1387–1392. https://doi.org/10.1073/pnas.1210224110
Berkelmans R (2002) Time-integrated thermal bleaching thresholds of reefs and their variation on the Great Barrier Reef. Mar Ecol Prog Ser 229:73–82
Bhuvaneswari T, Bhagwat AA, Bauer WD (1981) Transient susceptibility of root cells in four common legumes to nodulation by rhizobia. Plant Physiol 68:1144–1149
Bourne DG, Garren M, Work TM, Rosenberg E, Smith GW, Harvell CD (2009) Microbial disease and the coral holobiont. Trends Microbiol 17:554–562. https://doi.org/10.1016/j.tim.2009.09.004
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A (2007) UniProtKB/Swiss-Prot. Methods Mol Biol 406:89
Ferguson BJ, Indrasumunar A, Hayashi S, Lin M-H, Lin Y-H, Reid DE, Gresshoff PM (2010) Molecular analysis of legume nodule development and autoregulation. J Integr Plant Biol 52:61–76. https://doi.org/10.1111/j.1744-7909.2010.00899.x
Garber M, Grabherr MG, Guttman M, Trapnell C (2011) Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 8:469–477. https://doi.org/10.1038/nmeth.1613
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. https://doi.org/10.1038/nbt.1883
Haas BJ, Zody MC (2010) Advancing RNA-Seq analysis. Nat Biotechnol 28:421–423. https://doi.org/10.1038/nbt0510-421
Hayashi S, Reid DE, Lorenc MT, Stiller J, Edwards D, Gresshoff PM, Ferguson BJ (2012) Transient Nod factor-dependent gene expression in the nodulation-competent zone of soybean (Glycine max [L.] Merr.) roots. Plant Biotechnol J 10:995–1010. https://doi.org/10.1111/j.1467-7652.2012.00729.x
Hoegh-Guldberg O (1999) Climate change, coral bleaching and the future of the world’s coral reefs. Mar Freshw Res 50:839–866. https://doi.org/10.1071/MF99078
Jones SI, Vodkin LO (2013) Using RNA-Seq to profile soybean seed development from fertilization to maturity. PLoS One 8:e59270. https://doi.org/10.1371/journal.pone.0059270
Kaniewska P, Chan CKK, Kline D, Ling EYS, Rosic N, Edwards D, Hoegh-Guldberg O, Dove S (2015) Transcriptomic changes in coral Holobionts provide insights into physiological challenges of future climate and ocean change. PLoS One 10:e0139223. https://doi.org/10.1371/journal.pone.0139223
Kurtz S, Narechania A, Stein J, Ware D (2008) A new method to compute k-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 9:517. https://doi.org/10.1186/1471-2164-9-517
LaJeunesse TC, Parkinson JE, Gabrielson PW, Jeong HJ, Reimer JD, Voolstra CR, Santos SR (2018) Systematic revision of symbiodiniaceae highlights the antiquity and diversity of coral endosymbionts. Curr Biol 28:2570–2580.e2576. https://doi.org/10.1016/j.cub.2018.07.008
Libault M, Thibivilliers S, Bilgin DD, Radwan O, Benitez M, Clough SJ, Stacey G (2008) Identification of four soybean reference genes for gene expression normalization. Plant Genome 1:44–54. https://doi.org/10.3835/plantgenome2008.02.0091
Lim DKY, Schuhmann H, Thomas-Hall SR, Chan KCK, Wass TJ, Aguilera F, Adarme-Vega TC, Dal’Molin CGO, Thorpe GJ, Batley J, Edwards D, Schenk PM (2017) RNA-Seq and metabolic flux analysis of Tetraselmis sp. M8 during nitrogen starvation reveals a two-stage lipid accumulation mechanism. Bioresour Technol 244:1281–1293. https://doi.org/10.1016/j.biortech.2017.06.003
Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770. https://doi.org/10.1093/bioinformatics/btr011
Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682. https://doi.org/10.1038/nrg3068
Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11:31–46. https://doi.org/10.1038/nrg2626
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Map** and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628 http://www.nature.com/nmeth/journal/v5/n7/suppinfo/nmeth.1226_S1.html
Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40:e155. https://doi.org/10.1093/nar/gks678
Nookaew I, Papini M, Pornputtapong N, Scalcinati G, Fagerberg L, Uhlén M, Nielsen J (2012) A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Res 40:10084–10097. https://doi.org/10.1093/nar/gks804
Oshlack A, Robinson M, Young M (2010) From RNA-seq reads to differential expression results. Genome Biol 11:220. https://doi.org/10.1186/gb-2010-11-12-220
Peng Y, Leung HCM, Yiu SM, Chin FYL (2011) Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27:i94–i101. https://doi.org/10.1093/bioinformatics/btr216
Pernice M, Dunn SR, Miard T, Dufour S, Dove S, Hoegh-Guldberg O (2011) Regulation of apoptotic mediators reveals dynamic responses to thermal stress in the reef building coral Acropora millepora. PLoS One 6:e16095. https://doi.org/10.1371/journal.pone.0016095
Ramakers C, Ruijter JM, Deprez RHL, Moorman AFM (2003) Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci Lett 339:62–66. https://doi.org/10.1016/S0304-3940(02)01423-4
Roberts A, Pimentel H, Trapnell C, Pachter L (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27:2325–2329. https://doi.org/10.1093/bioinformatics/btr355
Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu AL, Tam A, Zhao YJ, Moore RA, Hirst M, Marra MA, Jones SJM, Hoodless PA, Birol I (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–912 http://www.nature.com/nmeth/journal/v7/n11/abs/nmeth.1517.html#supplementary-information
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. https://doi.org/10.1093/bioinformatics/btp616
Rosic NN, Pernice M, Dove S, Dunn S, Hoegh-Guldberg O (2011a) Gene expression profiles of cytosolic heat shock proteins Hsp70 and Hsp90 from symbiotic dinoflagellates in response to thermal stress: possible implications for coral bleaching. Cell Stress Chaperones 16:69–80. https://doi.org/10.1007/s12192-010-0222-x
Rosic NN, Pernice M, Rodriguez-Lanetty M, Hoegh-Guldberg O (2011b) Validation of housekee** genes for gene expression studies in Symbiodinium exposed to thermal and light stress. Mar Biotechnol 13:355–365. https://doi.org/10.1007/s10126-010-9308-9
Rosic NN, Leggat W, Kaniewska P, Dove S, Hoegh-Guldberg O (2013) New-old hemoglobin-like proteins of symbiotic dinoflagellates. Ecol Evol 3:822–834. https://doi.org/10.1002/ece3.498
Rosic N, Kaniewska P, Chan C-K, Ling E, Edwards D, Dove S, Hoegh-Guldberg O (2014) Early transcriptional changes in the reef-building coral Acropora aspera in response to thermal and nutrient stress. BMC Genomics 15:1052. https://doi.org/10.1186/1471-2164-15-1052
Rosic N, Ling EYS, Chan CKK, Lee HC, Kaniewska P, Edwards D, Dove S, Hoegh-Guldberg O (2015) Unfolding the secrets of coral-algal symbiosis. ISME J 9:844–856. https://doi.org/10.1038/ismej.2014.182
Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA (2010) Genome sequence of the palaeopolyploid soybean. Nature 463:178–183. https://doi.org/10.1038/nature08670
Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092. https://doi.org/10.1093/bioinformatics/bts094
Tan PK et al (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 31:5676–5684. https://doi.org/10.1093/nar/gkg763
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and cufflinks. Nat Protoc 7:562–578. https://doi.org/10.1038/nprot.2012.016
Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol 3:research0034
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63. https://doi.org/10.1038/nrg2484
Wang L, Feng Z, Wang X, Wang X, Zhang X (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26:136–138. https://doi.org/10.1093/bioinformatics/btp612
**ong X, Frank DN, Robertson CE, Hung SS, Markle J, Canty AJ, McCoy KD, Macpherson AJ, Poussier P, Danska JS, Parkinson J (2012) Generation and analysis of a mouse intestinal metatranscriptome through Illumina based RNA-sequencing. PLoS One 7:e36009. https://doi.org/10.1371/journal.pone.0036009
Zerbino D, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. https://doi.org/10.1101/gr.074492.107
Acknowledgements
Support from the Australian Genome Research Facility (AGRF), the Queensland Cyber Infrastructure Foundation (QCIF) and the Australian Partnership for Advanced Computing (APAC) is gratefully acknowledged.
Funding
The authors received financial support from the Australian Research Council (Projects LP0882095, LP0883462 and DP0985953).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
ESM 1
(XLSX 5761 kb)
Rights and permissions
About this article
Cite this article
Chan, CK.K., Rosic, N., Lorenc, M.T. et al. A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference. Funct Integr Genomics 19, 363–371 (2019). https://doi.org/10.1007/s10142-018-0647-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10142-018-0647-3