Abstract
High-performance Next-Generation Sequencing (NGS) has become a widely used technology to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analysis strategies is map** NGS reads against a reference database and a critical issue emerges in this phase: the problem of multireads. In this paper we present a novel approach to represent and quantify read map** ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, providing a description of the uncertainty of the results due to multiread presence. In a preliminary experiment on HeLa cells, the method correctly detected the possibility of false positiveness, while on a case-control study of human endobronchial biopsies, the method identified 11 genes with possible different expression, four of them with an uncertain fold change. This last result was confirmed by FDR adjusted Fisher’s test, while DESeq2 did not provide significant differences between case and control.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(A'=A-1\) if \(A>0\), otherwise \(A'=A=0\).
- 2.
A standard application of the extension principle to the fold change results in a fuzzy set with a complex membership function, which requires complex computations without any real benefits.
- 3.
Given two expressions \(e_1\) and \(e_2\) of a gene in two samples, the MA-plot places the gene on a plane (M, A) where \(M=\log _2 ({e_1}/{e_2})\) (the fold change) and \(A = (1/2)\log _2(e_1e_2)\) (average intensity).
- 4.
The centroid is computed with the constraint of falling inside the interval [B, C].
- 5.
The boundaries are estimated as hyperbolas, with their parameters fitted on the dataset; their horizontal asymptotes represent a limit fold change value under which differential expression loses significance.
- 6.
The possibility measure between two fuzzy sets \(F_1\) and \(F_2\) is defined as \(\varPi (F_1, F_2)=\max _x \min \{F_1(x), F_2(x)\}\).
References
Faulkner, G.J., Forrest, A.R., Chalk, A.M., Schroder, K., Hayashizaki, Y., Carninci, P., HUme, D.A., Grimmond, S.M.: A rescue strategy for multimap** short sequence tags refines surveys of transcriptional activity by CAGE. Genomics 91(3), 281–288 (2008)
Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25(8), 1026–1032 (2009)
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read map** uncertainty. Bioinformatics 26(4), 493–500 (2010)
Li, B., Dewey, C.N.: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 12(1), 323 (2011)
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15(12), 550 (2014)
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., Pachter, L.: Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat. Protoc. 7(3), 562–578 (2012)
Glaus, P., Honkela, A., Rattray, M.: Identifying differentially expressed transcripts from RNA-Seq data with biological variation. Bioinformatics 28(13), 1721–1728 (2012)
Negoita, C., Zadeh, L.A., Zimmermann, H.J.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1, 3–28 (1978)
Pedrycz, W., Gomide, F.: An Introduction to Fuzzy Sets: Analysis and Design. MIT Press, Cambridge (1998)
Wilming, L.G., Gilbert, J.G.R., Howe, K., Trevanion, S., Hubbard, T., Harrow, J.L.: The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 36(suppl 1), D753–D760 (2008)
Acknowledgments
We thank Dr. Flavio Licciulli, Dr. Mariano Caratozzolo and Dr. Flaviana Marzano for their suggestions and help with NGS data elaboration. A.C. is supported by Progetto MICROMAP PON01_02589.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Consiglio, A., Mencar, C., Grillo, G., Liuni, S. (2016). Managing NGS Differential Expression Uncertainty with Fuzzy Sets. In: Angelini, C., Rancoita, P., Rovetta, S. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2015. Lecture Notes in Computer Science(), vol 9874. Springer, Cham. https://doi.org/10.1007/978-3-319-44332-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-44332-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44331-7
Online ISBN: 978-3-319-44332-4
eBook Packages: Computer ScienceComputer Science (R0)