Log in

Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy

  • Correspondence
  • Published:

From Nature Biotechnology

View current issue Submit your manuscript

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: VGP–Galaxy assembly pipeline (version 2.1) consists of 10 workflows that can be combined into 8 analysis trajectories depending on the combination of input data.
Fig. 2: Phylogenetic tree and assembly statistics of genomes assembled using the VGP–Galaxy assembly pipeline.

Data availability

The workflows, their description and instructions on how to use them can be found at https://galaxyproject.org/projects/vgp/workflows/. The requisite tools are installed on usegalaxy.org and usegalaxy.eu, and are in the process of being installed on usegalaxy.org.au. These genomes were supported by collaborators of the VGP and ERGA, and the QC analyses reported here to test the VGP Galaxy pipeline do not release those that are under specific embargo policies for genome-wide analyses (e.g., https://genome10k.ucsc.edu/data-use-policies/). New genome assemblies are available in the GenomeArk repository: https://www.genomeark.org/. After manual curation, the assemblies are submitted to the US National Center for Biotechnology Information (NCBI) under the BioProject Vertebrate Genome Project: https://www.ncbi.nlm.nih.gov/bioproject/48924317.

References

  1. Hotaling, S., Kelley, J. L. & Frandsen, P. B. Proc. Natl Acad. Sci. USA 118, e2109019118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Formenti, G. et al. Trends Ecol. Evol. 37, 197–202 (2022).

    Article  CAS  PubMed  Google Scholar 

  3. Theissinger, K. et al. Trends Genet. 39, 545–559 (2003).

    Article  Google Scholar 

  4. Lewin, H. A. et al. Proc. Natl Acad. Sci. USA 119, e2115635118 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Genome Biol. 21, 245 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Galaxy Community. Nucleic Acids Res. 50, W345–W351 (2022).

    Article  Google Scholar 

  7. Lander, E. S. & Waterman, M. S. Genomics 2, 231–239 (1988).

    Article  CAS  PubMed  Google Scholar 

  8. Bray, S. & Maier, W. Automating Galaxy workflows using the command line. Galaxy Training Network (2023).

  9. Galaxy Community. Galaxy Server administration. Galaxy Training Network https://github.com/galaxyproject/training-material (2019).

  10. Formenti, G. et al. Genome Biol. 22, 120 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Uliano-Silva, M. et al. BMC Bioinform. 24, 288 (2023).

  12. Wenger, A. M. et al. Nat. Biotechnol. 37, 1155–1162 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Batut, B. et al. Cell Syst. 6, 752–758.e1 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lariviere, D., Ostrovsky, A., Gallardo, C., Pickett, B. & Abueg, L. VGP assembly pipeline - short version. Galaxy Training Network (2023); https://gxy.io/GTN:T00040

  15. Rautiainen, M. et al. Nat. Biotechnol. 41, 1474–1482 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Cheng, H., Asri, M., Lucas, J., Koren, S. & Li, H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Preprint at ar** the pipeline tutorials and Andrea Guarracino for their useful comments to the manuscript. This work was supported in part by the Intramural Research Program of the US National Human Genome Research Institute (NHGRI), the US National Institutes of Health (NIH) and the Howard Hughes Medical Institute (HHMI). The authors are grateful to the broader Galaxy community for their support and software development efforts. This work is funded by NIH grants U41 HG006620, U24 HG010263, U24 CA231877 and U01CA253481, along with US National Science Foundation grants 1661497, 1758800 and 2216612. The work was also supported in part by The Human Frontier Science Program (HFSP) RGP0025/2021, the Swiss National Science Foundation (SNSF) grants 202669 and 198691, the Swiss State Secretariat for Education, Research and Innovation (SERI) grant 22.00173 and Horizon Europe under the Biodiversity, Circular Economy and Environment program (REA.B.3, BGE 101059492). Usegalaxy.eu is supported by German Federal Ministry of Education and Research grants 031L0101C and de.NBI-epi to B.G. Computational resources are provided by the Advanced Cyberinfrastructure Coordination Ecosystem (ACCESS-CI), Texas Advanced Computing Center, and the JetStream2 scientific cloud.

Author information

Authors and Affiliations

Authors

Contributions

D.L. built the assembly pipeline with support from G.F., L.A., C.G.-A., B.G., A.O., H.C., M.U.-S., B.D.P., A.R., M.v.d.B. and the VGP assembly working group. L.A., A.D., G.R.G., A.M.G., G.M.G., N.J., C.J., B.O., S.S., M.S. and T.T. generated one or several assemblies used in the analyses. B.J.K., K.R. and M.J.P.C. validated the zebra finch assemblies. J.C. performed the manual curation on the zebra finch assembly. L.A. assembled and evaluated the mitochondrial genomes. N.B. established the decontamination pipeline and performed the contamination analyses. N.B. and M.P.-F. compared the scaffolding strategies. A.N. performed the analyses on XBP1. C.G.-A. and B.D.P. developed the training material with support from the user community. K.H. and M.C. sourced and arranged for sample procurement for species in this study. J.R.B., N.J., T.T., B.O’T., O.F., C.L., H.K., T.M.-B. and R.M.W. generated the PacBio and Hi-C data. G.F., M.C.S., A.N., A.M.P. and E.D.J. conceived the study and drafted the manuscript. All authors, including A.A. and R.W.W., contributed to writing and editing the manuscript and approved it.

Corresponding authors

Correspondence to Erich D. Jarvis, Michael C. Schatz, Anton Nekrutenko or Giulio Formenti.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Supplementary Information

Supplementary Notes and Supplementary Figs. 1–14

Supplementary Table

Supplementary Tables 1–10

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Larivière, D., Abueg, L., Brajuka, N. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat Biotechnol 42, 367–370 (2024). https://doi.org/10.1038/s41587-023-02100-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-023-02100-3

  • Springer Nature America, Inc.