Abstract
Research based on a strategy of single-cell low-coverage whole genome sequencing (SLWGS) has enabled better reproducibility and accuracy for detection of copy number variations (CNVs). The whole genome amplification (WGA) method and sequencing platform are critical factors for successful SLWGS (<0.1 × coverage). In this study, we compared single cell and multiple cells sequencing data produced by the HiSeq2000 and Ion Proton platforms using two WGA kits and then comprehensively evaluated the GC-bias, reproducibility, uniformity and CNV detection among different experimental combinations. Our analysis demonstrated that the PicoPLEX WGA Kit resulted in higher reproducibility, lower sequencing error frequency but more GC-bias than the GenomePlex Single Cell WGA Kit (WGA4 kit) independent of the cell number on the HiSeq2000 platform. While on the Ion Proton platform, the WGA4 kit (both single cell and multiple cells) had higher uniformity and less GC-bias but lower reproducibility than those of the PicoPLEX WGA Kit. Moreover, on these two sequencing platforms, depending on cell number, the performance of the two WGA kits was different for both sensitivity and specificity on CNV detection. The results can help researchers who plan to use SLWGS on single or multiple cells to select appropriate experimental conditions for their applications.
Similar content being viewed by others
Introduction
A strategy of single-cell low-coverage whole genome sequencing (SLWGS) is suited for the detection of chromosomal aberrations1. Typically, next-generation sequencing (NGS) requires nanogram amounts of DNA to construct a library for sequencing2, whereas a single cell only contains 6–7 pg of genomic DNA (gDNA). Therefore, a critical step for single-cell sequencing is whole-genome amplification (WGA) to generate sufficient DNA for library construction.
Three WGA methods are widely used for SLWGS, namely, degenerate-oligonucleotide-primed polymerase chain reaction (DOP-PCR) (marketed as WGA4 kit; Sigma-Aldrich, St. Louis, MO, US)2, multiple displacement amplification (MDA) (marketed as REPLI-g Single Cell Kit; QIAGEN, Germantown, MD, US)3, and a combination of displacement pre-amplification and PCR amplification (marketed as PicoPLEX WGA Kit; Rubicon Genomics, Ann Arbor, MI, US)4. Many comparisons have evaluated the efficiency among these WGA kits5,http://github.com/iontorrent/tmap) was employed to perform the alignment and resulted in bam format. The map** parameter (tmap mapall -v -Y -u -o 2 -a 0 -n 6 stage1 map4) and the alignment output model parameter in the map** methods were set as “map all” and “0”, respectively. “map all” indicates multi-map** procedure, whereas “0” indicates output the unique best hit reads. After removing the duplication on POS, the unique non-duplication reads were used for further analysis.
GC-bias calculation
GC content bias is the proportion of G and C bases in a specific region compared with that reported previously12, which describes the bias resulting from the GC content. The bias leads to abnormal sequencing depth in a specific genomic region, which potentially influences the uniformity of read distribution. Moreover, two primary categories are based on NGS for CNVs-detection methods: the pair-end map** (PEM) and the depth of coverage (DOC)13,14. Most CNVs detection tools are universally designed based on the DOC methods14. Coverage of depth depending on the GC content can complicate the accuracy of CNVs detection. To describe the GC-bias in WGA, we referred to the method in the article of Nora Rieber15.
Let R1, R2… R w represent the unique non-duplication mapped reads that align to the W windows.
where M represents the average number of unique non-duplication mapped reads on each autosome window, L w is obtained via a loess local regression fit of the unique non-duplication mapped reads against the G + C content, and ΔR GC is the quantitative value of GC-bias. Small values of ΔR GC indicate the GC-bias is less serious. However, ΔR GC is a relative measure and can be influenced by WGA uniformity.
Data analyses
The windows selection was performed referring to previous reports, GC-bias correction and copy number analysis12. In brief, the reference genome (GRCh37, UCSC release hg19) was divided into sliding SE50 simulated reads and mapped back to the origin reference genome with a maximum of two mismatches. Among the 100 K simulated unique mapped reads in continuous windows, we allowed 20 K overlap** reads to exist. The GC content of each window was calculated and used for the GC-bias correction. The normalized depth ratio (NDR) is the unique mapped non-duplication reads of each window divided by the total average unique mapped non-duplication reads, which was used to calculate the coverage and evaluate the reproducibility and uniformity. Additionally, we referred to the algorithm from Zhang et al.12 to detect CNVs. To remain as close to the characteristics of the human reference genome as possible, we used the optimized dynamic window size to call CNVs. After the GC-bias correction and binary segmentation, we discerned the CNVs breakpoints. Sensitivity and specificity were calculated as follow:
where FNR is short for false negative rate which equal to the false negative signal number divided by the total true positive signal number. FPR is short for false positive rate which equal to the signal number divided by the total true positive signal number. TNR is short for negative true negative rate which equal to the true negative signal number divided by the total true negative signal number. TPR is short for true positive rate which equal to the true positive signal number divided by the total true positive signal number. The difference in different groups was analysed by one-way ANOVA16. We also performed the Mann–Whitney-Wilcoxon test to assess the variation between two groups. Differences yielding P-values below or equal to 0.05 were considered significant. Numbers given before the ‘±’ symbol in results indicate the average value, and numbers given after the ‘±’ symbol indicate standard deviation.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Results
Comparison of amplification time and yield
The amplification yield was compared using the two WGA kits in the final volume of 75 μL of amplification product. The WGA4 kit had the WGA product at the concentration of 72.98 ± 17.81 ng/μL, whereas the PicoPLEX WGA Kit had the WGA product at the concentration of 37.56 ± 4.96 ng/μL. The yield of different cell numbers using the same WGA kit was not different, but a significant difference was detected between the two WGA kits. Additionally, approximately 4.5 h with the WGA4 kit and 2.5 h with the PicoPLEX WGA Kit were required to finish the WGA procedure. Comparatively, less time was consumed with the PicoPLEX WGA Kit to obtain sufficient yield for library construction.
Data production
To reduce the effect of sequencing depth on the comparison of each combination, we randomly extracted 2 million clean reads from the total data of each sample (Supplementary Table II, HiSeq2000, Supplementary Table III, Proton). The extraction strategy and reason are described previously7. Table 1 shows the mean basic statistics of both platforms. We found that the mean unique map** rate (58.72%) of PicoPLEX WGA Kit was lower than that of WGA4 kit (62.43%) on the HiSeq2000 platform (Supplementary Fig. S1). On the Proton platform, the average unique map** rate of WGA4 kit was 91.23% and that of the PicoPLEX WGA Kit was 91.36% (Supplementary Fig. S1), the map** rate of WGA4 kit was much higher than PicoPLEX WGA Kit on the Hiseq2000 platform.
To gain further insights into the data quality, we investigated the discordantly mapped reads derived from different libraries and sequencing processes. The mismatch rate, deletion rate and insertion rate are a series of important parameters to consider for calling single-nucleotide variants (SNVs). Based on the alignment results and the Compact Idiosyncratic Gapped Alignment Report (CIGAR), we encoded matches and mismatches with an ‘M’, insertions with an ‘I’ and deletions with a ‘D’. Subsequently, we defined ErrorRate as the sum of mismatch rate, deletion rate and insertion rate (Table 1). The results of variance analysis (Supplementary Fig. S2) suggested that the PicoPLEX WGA Kit had a lower ErrorRate (P < 0.01) than that of the WGA4 kit on the HiSeq2000 platform independent of cell number. The results were reversed on the Ion Proton platform. Furthermore, the ErrorRate of Hiseq2000 was lower than that of Ion Proton with the same WGA kit.
However, whether the map rate of Ion Proton was higher than that of Hiseq2000 or the difference between the mismatch rate, insertion rate and deletion rate was significant could not be determined because the two sequencing platforms were not comparable because of the different alignment methods used and different sequencing principles17.
GC-bias of four combinations
Generally, GC-bias is considered an important factor that complicates data analysis. The plot of the NDR at various genomic regions versus the GC content showed that the average GC content was 39.70% on HiSeq2000 and 41.86% on Ion Proton using the WGA4 kit, which were values very close to those of the reference genome (41.9%). By contrast, the average GC content was 44.10% on HiSeq2000 and 45.22% on Ion Proton with the PicoPLEX WGA Kit (Fig. 2). These results demonstrated the amplification preference of the PicoPLEX WGA Kit on GC-rich regions.
Commonly, ΔRGC is used to quantify GC-bias, and a small ΔRGC value indicates reduced GC-bias. We analysed the ΔRGC value for the four combinations on the two platforms (Fig. 3). On the HiSeq2000 platform, the values of ΔRGC from PicoPLEX WGA Kit amplified data were 0.25 ± 0.08 and 0.29 ± 0.05 for single cell and multiple cells, respectively, whereas the values were 0.08 ± 0.04 and 0.14 ± 0.03 for single cell and multiple cells, respectively, of WGA4 kit amplified data. Conclusively, SS had significantly less GC-bias than that of RS (P < 0.05), and SM had less GC-bias than that of RM (P < 0.05). Thus, data generated with the WGA4 kit had less GC-bias than the data generated with the PicoPLEX WGA Kit on the Hiseq2000 platform. On the Ion Proton platform, the values of ΔRGC from PicoPLEX® amplified data were 0.13 ± 0.04 for RM and 0.15 ± 0.08 for RS. The values of ΔRGC from WGA4 kit amplified data were 0.04 ± 0.01 for SM and 0.03 ± 0.01 for SS. To summarize, data generated with the WGA4 kit had less GC-bias than data generated with the PicoPLEX WGA Kit for single cell (P < 0.05) and multiple cells (P < 0.05).
Based on this discovery, a weighted correction strategy could be used to remove the GC-bias (Fig. 4), which was reported to correct more than 99.9% of the GC-bias12.
Reproducibility Evaluation
Reproducibility is the ability to reproduce experimental results, either by the sample type or experimental combination, and is particularly important when the amount of DNA is typically at a picogram level. In this study, we used Pearson’s correlation coefficient of the NDR on a selected window along the autosome to quantify the reproducibility between two representative combinations. The correlation value matrix was calculated between any two cell lines among the 11 cell lines.
On the HiSeq2000 platform, the correlation values of PicoPLEX WGA Kit amplification data were 0.62 ± 0.18 and 0.79 ± 0.03 for single cell and multiple cells, respectively; whereas the values were 0.28 ± 0.08 and 0.57 ± 0.06 for single cell and multiple cells, respectively, when using the WGA4 kit. RS had significantly better reproducibility than that of SS (P < 0.05), and RM also had better reproducibility than that of SM (P < 0.05).
On the Proton platform, the correlation values of PicoPLEX WGA Kit amplification data were 0.76 ± 0.15 and 0.91 ± 0.02 for single cell and multiple cells, respectively; whereas the values were 0.69 ± 0.08 and 0.86 ± 0.03 for single cell and multiple cells, respectively, when using the WGA4 kit (Fig. 5). RS had significantly better reproducibility than that of SS (P < 0.05), and RM had significantly better reproducibility than that of SM (P < 0.05). These results demonstrated that the PicoPLEX WGA Kit outperformed WGA4 kit on reproducibility for the corresponding cell number on both Hiseq2000 and Ion Proton platforms.
Genome coverage uniformity
Coverage depth has been widely employed in different CNVs calling algorithms, and uniformity of WGA product is important to coverage depth and CNVs detection. Therefore, we characterized the uniformity by comparing the uniformity of reads distribution using the extracted data mentioned above. We simulated the theoretical sequencing depth distribution, which followed the Poisson distribution (124,011 dots, λ = 30), and normalized it by dividing by 30. Previously, we found that the distribution of data from the WGA4 kit was close to the theoretical one on the two sequencing platforms; whereas bias was observed in the data from the PicoPLEX WGAKit (Fig. 6). The CV value effectively described the relative variance of chromosomal depth, uniformity, and overall GC-bias in previous studies21.
In this study, we did not consider sample processing time, reagents consumption, labour costs or sample size. Those parameters might have an important role in technology selection, particularly in the scenario of clinical use. However, rapid advances in sequencing technology are likely to change those parameters in the future. Researchers within the expanding field of single cell research can obtain various experimental parameters from the cell lines before managing a multitude of clinical samples from large trials. In pre-implantation genetic screening (PGS) research, those advantages become more obvious because PGS involves a screening process before implantation for one or more nuclei from oocytes [a polar body or bodies (PBs)] or embryos (blastomere or trophectoderm cells) to detect the chromosomal CNVs23, and therefore, SLWGS for identifying CNVs has become common practice in PGS24.
References
Ning, L. et al. Quantitative assessment of single-cell whole genome amplification methods for detecting copy number variation using hippocampal neurons. Sci Rep 5, 11415, https://doi.org/10.1038/srep11415 (2015).
Sanchez-Cespedes, M., Cairns, P., Jen, J. & Sidransky, D. Degenerate oligonucleotide-primed PCR (DOP-PCR): evaluation of its reliability for screening of genetic alterations in neoplasia. Biotechniques 25, 1036–1038 (1998).
Paez, J. G. et al. Genome coverage and sequence fidelity of phi29 polymerase-based multiple strand displacement whole genome amplification. Nucleic Acids Res 32, e71, https://doi.org/10.1093/nar/gnh069 (2004).
Lu, S. et al. Probing meiotic recombination and aneuploidy of single sperm cells by whole-genome sequencing. Science 338, 1627–1630, https://doi.org/10.1126/science.1229112 (2012).
Hou, Y. et al. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing. Gigascience 4, 37, https://doi.org/10.1186/s13742-015-0068-3 (2015).
Huang, L., Ma, F., Chapman, A., Lu, S. & **e, X. S. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications. Annu Rev Genomics Hum Genet 16, 79–102, https://doi.org/10.1146/annurev-genom-090413-025352 (2015).
Chen, S. et al. Performance comparison between rapid sequencing platforms for ultra-low coverage sequencing strategy. PLoS One 9, e92192, https://doi.org/10.1371/journal.pone.0092192 (2014).
Merriman, B., Ion Torrent, R., Team, D. & Rothberg, J. M. Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 33, 3397–3417, https://doi.org/10.1002/elps.201200424 (2012).
Tang, Z. et al. A dynamic database of microarray-characterized cell lines with various cytogenetic and genomic backgrounds. G3 (Bethesda) 3, 1143–1149, https://doi.org/10.1534/g3.113.006577 (2013).
Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 12, R18, https://doi.org/10.1186/gb-2011-12-2-r18 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Zhang, C. et al. A single cell level based method for copy number variation analysis by low coverage massively parallel sequencing. PLoS One 8, e54236, https://doi.org/10.1371/journal.pone.0054236 (2013).
Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res 19, 1270–1278, https://doi.org/10.1101/gr.088633.108 (2009).
Duan, J., Zhang, J. G., Deng, H. W. & Wang, Y. P. Comparative studies of copy number variation detection methods for next-generation sequencing technologies. PLoS One 8, e59128, https://doi.org/10.1371/journal.pone.0059128 (2013).
Rieber, N. et al. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS One 8, e66621, https://doi.org/10.1371/journal.pone.0066621 (2013).
Bewick, V., Cheek, L. & Ball, J. Statistics review 9: one-way analysis of variance. Crit Care 8, 130–136, https://doi.org/10.1186/cc2836 (2004).
Liu, L. et al. Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012, 251364, https://doi.org/10.1155/2012/251364 (2012).
Zhang, X. et al. The comparison of the performance of four whole genome amplification kits on ion proton platform in copy number variation detection. Biosci Rep 37, https://doi.org/10.1042/BSR20170252 (2017).
Ning, L. et al. Current challenges in the bioinformatics of single cell genomics. Front Oncol 4, 7, https://doi.org/10.3389/fonc.2014.00007 (2014).
Scheinin, I. et al. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res 24, 2022–2032, https://doi.org/10.1101/gr.175141.114 (2014).
Benjamini, Y. & Speed, T. P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40, e72, https://doi.org/10.1093/nar/gks001 (2012).
Wang, W., Wei, Z., Lam, T. W. & Wang, J. Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Sci Rep 1, 55, https://doi.org/10.1038/srep00055 (2011).
Fragouli, E. et al. The cytogenetics of polar bodies: insights into female meiosis and the diagnosis of aneuploidy. Mol Hum Reprod 17, 286–295, https://doi.org/10.1093/molehr/gar024 (2011).
Wells, D. et al. Clinical utilisation of a rapid low-pass whole genome sequencing technique for the diagnosis of aneuploidy in human embryos prior to implantation. J Med Genet 51, 553–562, https://doi.org/10.1136/jmedgenet-2014-102497 (2014).
Acknowledgements
The authors thank **uqing Zhang, Zhe Song, Hongtai Liu, Yifan **e, **ongbin Kang and Jiaming Xu for their valuable suggestions and critical discussion. This study was funded by the Guangdong Province science and technology project of China (NO. 2013B022000005) and supported by the Shenzhen Municipal Government of China (NO. CXZZ20130517144604091, NO. ZDSYS20140509153457495), Key Laboratory of Cooperation Project in Guangdong Province (2011A060906007), Shenzhen Engineering Laboratory for Birth Defects Screening (NO. DRC-SZ [2011]861) and the Shenzhen Development and Reform Commission (NO. CXZZ20140808170655268). Moreover, I thank my family, particularly my dear grandmother and sister, for all they did and are doing.
Author information
Authors and Affiliations
Contributions
Y.Q., H.F.Z. and D.Y.C. contributed equally to this study, with Y.Q. responsible for data acquisition and writing the manuscript, H.F.Z. responsible for the practical part and D.Y.C. responsible for the bio-informatics part of the project. P.Z. and G.D.H. participated in data analysis; whereas D.Y.C., Y.Q., L.X., Z.Z. and Z.H.F. developed the methodology for cell isolation and manipulation. Study design and conception came from D.Y.C., P.L., Y.Q. and F.C., and J. X., Q.Y.S., Z.Z., Y.G., J.W. and H.M.Y. provided critical revisions. All authors gave final approval.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, D., Zhen, H., Qiu, Y. et al. Comparison of single cell sequencing data between two whole genome amplification methods on two sequencing platforms. Sci Rep 8, 4963 (2018). https://doi.org/10.1038/s41598-018-23325-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-018-23325-2
- Springer Nature Limited
This article is cited by
-
Kronos scRT: a uniform framework for single-cell replication timing analysis
Nature Communications (2022)
-
Single-cell sequencing of the small and AT-skewed genome of malaria parasites
Genome Medicine (2021)
-
Utility of Plasmodium falciparum DNA from rapid diagnostic test kits for molecular analysis and whole genome amplification
Malaria Journal (2020)
-
Evaluation of whole genome amplification and bioinformatic methods for the characterization of Leishmania genomes at a single cell level
Scientific Reports (2020)
-
Current status of spent embryo media research for preimplantation genetic testing
Journal of Assisted Reproduction and Genetics (2019)