Data availability
The workflows, their description and instructions on how to use them can be found at https://galaxyproject.org/projects/vgp/workflows/. The requisite tools are installed on usegalaxy.org and usegalaxy.eu, and are in the process of being installed on usegalaxy.org.au. These genomes were supported by collaborators of the VGP and ERGA, and the QC analyses reported here to test the VGP Galaxy pipeline do not release those that are under specific embargo policies for genome-wide analyses (e.g., https://genome10k.ucsc.edu/data-use-policies/). New genome assemblies are available in the GenomeArk repository: https://www.genomeark.org/. After manual curation, the assemblies are submitted to the US National Center for Biotechnology Information (NCBI) under the BioProject Vertebrate Genome Project: https://www.ncbi.nlm.nih.gov/bioproject/48924317.
References
Hotaling, S., Kelley, J. L. & Frandsen, P. B. Proc. Natl Acad. Sci. USA 118, e2109019118 (2021).
Formenti, G. et al. Trends Ecol. Evol. 37, 197–202 (2022).
Theissinger, K. et al. Trends Genet. 39, 545–559 (2003).
Lewin, H. A. et al. Proc. Natl Acad. Sci. USA 119, e2115635118 (2022).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Genome Biol. 21, 245 (2020).
Galaxy Community. Nucleic Acids Res. 50, W345–W351 (2022).
Lander, E. S. & Waterman, M. S. Genomics 2, 231–239 (1988).
Bray, S. & Maier, W. Automating Galaxy workflows using the command line. Galaxy Training Network (2023).
Galaxy Community. Galaxy Server administration. Galaxy Training Network https://github.com/galaxyproject/training-material (2019).
Formenti, G. et al. Genome Biol. 22, 120 (2021).
Uliano-Silva, M. et al. BMC Bioinform. 24, 288 (2023).
Wenger, A. M. et al. Nat. Biotechnol. 37, 1155–1162 (2019).
Batut, B. et al. Cell Syst. 6, 752–758.e1 (2018).
Lariviere, D., Ostrovsky, A., Gallardo, C., Pickett, B. & Abueg, L. VGP assembly pipeline - short version. Galaxy Training Network (2023); https://gxy.io/GTN:T00040
Rautiainen, M. et al. Nat. Biotechnol. 41, 1474–1482 (2023).
Cheng, H., Asri, M., Lucas, J., Koren, S. & Li, H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Preprint at ar** the pipeline tutorials and Andrea Guarracino for their useful comments to the manuscript. This work was supported in part by the Intramural Research Program of the US National Human Genome Research Institute (NHGRI), the US National Institutes of Health (NIH) and the Howard Hughes Medical Institute (HHMI). The authors are grateful to the broader Galaxy community for their support and software development efforts. This work is funded by NIH grants U41 HG006620, U24 HG010263, U24 CA231877 and U01CA253481, along with US National Science Foundation grants 1661497, 1758800 and 2216612. The work was also supported in part by The Human Frontier Science Program (HFSP) RGP0025/2021, the Swiss National Science Foundation (SNSF) grants 202669 and 198691, the Swiss State Secretariat for Education, Research and Innovation (SERI) grant 22.00173 and Horizon Europe under the Biodiversity, Circular Economy and Environment program (REA.B.3, BGE 101059492). Usegalaxy.eu is supported by German Federal Ministry of Education and Research grants 031L0101C and de.NBI-epi to B.G. Computational resources are provided by the Advanced Cyberinfrastructure Coordination Ecosystem (ACCESS-CI), Texas Advanced Computing Center, and the JetStream2 scientific cloud.
Author information
Authors and Affiliations
Contributions
D.L. built the assembly pipeline with support from G.F., L.A., C.G.-A., B.G., A.O., H.C., M.U.-S., B.D.P., A.R., M.v.d.B. and the VGP assembly working group. L.A., A.D., G.R.G., A.M.G., G.M.G., N.J., C.J., B.O., S.S., M.S. and T.T. generated one or several assemblies used in the analyses. B.J.K., K.R. and M.J.P.C. validated the zebra finch assemblies. J.C. performed the manual curation on the zebra finch assembly. L.A. assembled and evaluated the mitochondrial genomes. N.B. established the decontamination pipeline and performed the contamination analyses. N.B. and M.P.-F. compared the scaffolding strategies. A.N. performed the analyses on XBP1. C.G.-A. and B.D.P. developed the training material with support from the user community. K.H. and M.C. sourced and arranged for sample procurement for species in this study. J.R.B., N.J., T.T., B.O’T., O.F., C.L., H.K., T.M.-B. and R.M.W. generated the PacBio and Hi-C data. G.F., M.C.S., A.N., A.M.P. and E.D.J. conceived the study and drafted the manuscript. All authors, including A.A. and R.W.W., contributed to writing and editing the manuscript and approved it.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Supplementary information
Supplementary Information
Supplementary Notes and Supplementary Figs. 1–14
Supplementary Table
Supplementary Tables 1–10
Rights and permissions
About this article
Cite this article
Larivière, D., Abueg, L., Brajuka, N. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat Biotechnol 42, 367–370 (2024). https://doi.org/10.1038/s41587-023-02100-3
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-023-02100-3
- Springer Nature America, Inc.