Abstract
Carbohydrate Structure Database (CSDB) is a curated glycan data collection and a glycoinformatic platform. In this report, its database, analytical, and other components that have appeared for the recent years are reviewed. The major improvements were achieving close-to-full coverage on glycans from microorganisms, launching modules for glycosyltransferases and saccharide conformations, online glycan builder and 3D modeler, NMR simulator, NMR-based structure predictor, and other tools.
Graphical Abstract
Similar content being viewed by others
Abbreviations
- API:
-
Application programming interface
- CASPER:
-
Computer-assisted spectrum evaluation of regular polysaccharides
- CAZY:
-
Carbohydrate-active enzyme
- COSY:
-
Correlation spectroscopy
- CSDB:
-
Carbohydrate Structure Database
- GLC:
-
Gas-liquid chromatography
- GT:
-
Glycosyltransferase
- GODDESS:
-
Glycan-optimized database-driven empirical spectrum simulation
- GRASS:
-
Generation, ranking, and assignment of saccharide structures
- HMBC:
-
Heteronuclear multiple bond correlation
- HOSE:
-
Hierarchical organization of spherical environment
- HSQC:
-
Heteronuclear single-quantum coherence
- ICD:
-
International Classification of Diseases
- IEDB:
-
Immune Epitope Database
- KEGG:
-
Kyoto Encyclopedia of Genes and Genomes
- MSDB:
-
Monosaccharide Database
- MESH:
-
Medical subject headings
- NLM:
-
National Library of Medicine
- PMID:
-
PubMed identifier
- RDF:
-
Resource description framework
- ROESY:
-
Rotating frame Overhauser effect spectroscopy
- SMILES:
-
Simplified molecular-input line-entry system
- SNFG:
-
Symbolic nomenclature for glycans
- TOCSY:
-
Total correlation spectroscopy
- PDB:
-
Protein Data Bank
- WURCS:
-
Web3 unique representation of carbohydrate structures
References
Egorova KS, Toukach PV. Glycoinformatics: bridging isolated islands in the sea of data. Angew Chem Int Ed. 2018;57(46):14986–90. https://doi.org/10.1002/anie.201803576.
Lisacek F, Mariethoz J, Alocci D, Rudd PM, Abrahams JL, Campbell MP, Packer NH, Stahle J, Widmalm G, Mullen E, et al. Databases and associated tools for glycomics and glycoproteomics. Methods Mol Biol. 2017;1503:235–64. https://doi.org/10.1007/978-1-4939-6493-2_18.
Abrahams JL, Taherzadeh G, Jarvas G, Guttman A, Zhou Y, Campbell MP. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol. 2020;62:56–69. https://doi.org/10.1016/j.sbi.2019.11.009.
Copoiu L, Malhotra S. The current structural glycome landscape and emerging technologies. Curr Opin Struct Biol. 2020;62:132–9. https://doi.org/10.1016/j.sbi.2019.12.020.
Scherbinina SI, Toukach PV. Three-dimensional structures of carbohydrates and where to find them. Int J Mol Sci. 2020;21(20):7702. https://doi.org/10.3390/ijms21207702.
A practical guide to using glycomics databases. 1st ed. K.F. Aoki-Kinoshita, Editor. 2017: Springer Tokyo. https://doi.org/10.1007/978-4-431-56454-6.
Aoki-Kinoshita KF, Campbell MP, Lisacek F, Neelamegham S, York WS, Packer NH. Glycoinformatics, in Essentials of Glycobiology, Ch. 52, A. Varki, et al., Editors. Cold Spring Harbor Laboratory Press: Cold Spring Harbor (NY); 2022. https://doi.org/10.1101/glycobiology.4e.52.
Lütteke T. Glycan data retrieval and analysis using GLYCOSCIENCES. de Applications. In A Practical Guide to Using Glycomics Databases, Ch. 16, K.F. Aoki-Kinoshita, Editor. Springer Japan: Tokyo, Japan; 2017. pp. 335–350. https://doi.org/10.1007/978-4-431-56454-6_16.
Campbell MP, Peterson R, Mariethoz J, Gasteiger E, Akune Y, Aoki-Kinoshita KF, Lisacek F, Packer NH. UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res. 2014;42(Database issue):D215–21. https://doi.org/10.1093/nar/gkt1128.
Kahsay R, Vora J, Navelkar R, Mousavi R, Fochtman BC, Holmes X, Pattabiraman N, Ranzinger R, Mahadik R, Williamson T, et al. GlyGen data model and processing workflow. Bioinformatics. 2020;36(12):3941–3. https://doi.org/10.1093/bioinformatics/btaa238.
Maeda M, Fujita N, Suzuki Y, Sawaki H, Shikanai T, Narimatsu H. JCGGDB: Japan consortium for glycobiology and glycotechnology database, in Glycoinformatics, Ch. 12, T. Lütteke and M. Frank, Editors. Humana Press: New York; 2015. pp. 161–179. https://doi.org/10.1007/978-1-4939-2343-4_12.
Mariethoz J, Alocci D, Gastaldello A, Horlacher O, Gasteiger E, Rojas-Macias M, Karlsson NG, Packer NH, Lisacek F. Glycomics@ExPASy: Bridging the gap. Mol Cell Proteomics. 2018;17(11):2164–76. https://doi.org/10.1074/mcp.RA118.000799.
Yamada I, Shiota M, Shinmachi D, Ono T, Tsuchiya S, Hosoda M, Fujita A, Aoki NP, Watanabe Y, Fujita N, et al. The GlyCosmos portal: a unified and comprehensive web resource for the glycosciences. Nat Methods. 2020;17(7):649–50. https://doi.org/10.1038/s41592-020-0879-8.
Lee S, Inzerillo S, Lee GY, Bosire EM, Mahato SK, Song J. Glycan-mediated molecular interactions in bacterial pathogenesis. Trends Microbiol. 2022;30(3):254–67. https://doi.org/10.1016/j.tim.2021.06.011.
Herget S, Ranzinger R, Maass K, Lieth CW. GlycoCT-a unifying sequence format for carbohydrates. Carbohydr Res. 2008;343(12):2162–71. https://doi.org/10.1016/j.carres.2008.03.011.
Rigden DJ, Fernandez-Suarez XM, Galperin MY. The 2016 database issue of nucleic acids research and an updated molecular biology database collection. Nucleic Acids Res. 2016;44:D1–6. https://doi.org/10.1093/nar/gkv1356.
Zhulin IB. Databases for microbiologists. J Bacteriol. 2015;197(15):2458–67. https://doi.org/10.1128/JB.00330-15.
Aoki-Kinoshita KF. Using databases and web resources for glycomics research. Mol Cell Proteomics. 2013;12(4):1036–45. https://doi.org/10.1074/mcp.R112.026252.
Li X, Xu Z, Hong X, Zhang Y, Zou X. Databases and bioinformatic tools for glycobiology and glycoproteomics. Int. J. Mol. Sci. 2020;21(18) https://doi.org/10.3390/ijms21186727.
Toukach PV, Egorova KS. Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts. Nucleic Acids Res. 2016;44(D1):D1229–36. https://doi.org/10.1093/nar/gkv840.
Toukach PV, Egorova KS. Source files of the Carbohydrate Structure Database: the way to sophisticated analysis of natural glycans. Sci Data. 2022;9(1):131. https://doi.org/10.1038/s41597-022-01186-9.
Toukach FV, Shashkov AS. Computer-assisted structural analysis of regular glycopolymers on the basis of 13C NMR data. Carbohydr Res. 2001;335(2):101–14. https://doi.org/10.1016/s0008-6215(01)00214-2.
Lundborg M, Widmalm G. Structural analysis of glycans by NMR chemical shift prediction. Anal Chem. 2011;83(5):1514–7. https://doi.org/10.1021/ac1032534.
Loss A, Stenutz R, Schwarzer E, von der Lieth CW. GlyNest and CASPER: two independent approaches to estimate 1H and 13C NMR shifts of glycans available through a common web-interface. Nucleic Acids Res. 2006;34(Web Server issue):W733–7. https://doi.org/10.1093/nar/gkl265.
Doubet S, Albersheim P. CarbBank. Glycobiology. 1992;2(6):505–7. https://doi.org/10.1093/glycob/2.6.505.
Toukach PV, Shirkovskaya AI. Carbohydrate Structure Database and other glycan databases as an important element of glycoinformatics. Russ J Bioorg Chem. 2022;48(3):457–66. https://doi.org/10.1134/s1068162022030190.
Neelamegham S, Aoki-Kinoshita K, Bolton E, Frank M, Lisacek F, Lütteke T, O’Boyle N, Packer N, Stanley P, Toukach P, et al. Updates to the symbol nomenclature for glycans (SNFG) guidelines. Glycobiology. 2019;29(9):620–4. https://doi.org/10.1093/glycob/cwz045.
Bochkov AY, Toukach PV. CSDB/SNFG structure editor: An online glycan builder with 2D and 3D structure visualization. J Chem Inf Model. 2021;61(10):4940–8. https://doi.org/10.1021/acs.jcim.1c00917.
Alocci D, Suchánková P, Costa R, Hory N, Mariethoz J, SvobodováVařeková R, Toukach P, Lisacek F. SugarSketcher: quick and intuitive online glycan drawing. Molecules. 2018;23(12):3206. https://doi.org/10.3390/molecules23123206.
Scherbinina SI, Frank M, Toukach PV. Carbohydrate Structure Database oligosaccharide conformation tool. Glycobiology. 2022;32(6):460–8. https://doi.org/10.1093/glycob/cwac011.
Chernyshov IY, Toukach PV. REStLESS: automated translation of glycan sequences from residue-based notation to SMILES and atomic coordinates. Bioinformatics. 2018;34(15):2679–81. https://doi.org/10.1093/bioinformatics/bty168.
Burley SK, Berman HM, Kleywegt GJ, Markley JL, Nakamura H, Velankar S. Protein Data Bank (PDB): The single global macromolecular structure archive. Methods Mol Biol. 2017;1607:627–41. https://doi.org/10.1007/978-1-4939-7000-1_26.
Kirschner KN, Yongye AB, Tschampel SM, Gonzalez-Outeirino J, Daniels CR, Foley BL, Woods RJ. GLYCAM06: a generalizable biomolecular force field. Carbohydrates J Comput Chem. 2008;29(4):622–55. https://doi.org/10.1002/jcc.20820.
Matsubara M, Aoki-Kinoshita KF, Aoki NP, Yamada I, Narimatsu H. WURCS 2.0 update to encapsulate ambiguous carbohydrate structures. J Chem Inf Model. 2017;57(4):632–7. https://doi.org/10.1021/acs.jcim.6b00650.
Sehnal D, Grant OC. Rapidly display glycan symbols in 3D structures: 3D-SNFG in LiteMol. J Proteome Res. 2019;18(2):770–4. https://doi.org/10.1021/acs.jproteome.8b00473.
Toukach PV, Egorova KS. New features of Carbohydrate Structure Database notation (CSDB Linear), as compared to other carbohydrate notations. J Chem Inf Model. 2020;60(3):1276–89. https://doi.org/10.1021/acs.jcim.9b00744.
UniProt Consortium. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):D523–31. https://doi.org/10.1093/nar/gkac1052.
Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Sherry ST, Yankie L, Karsch-Mizrachi I. GenBank 2024 update. Nucleic Acids Res. 2024;52(D1):D134–7. https://doi.org/10.1093/nar/gkad903.
Aoki-Kinoshita KF, Kanehisa M. Glycomic analysis using KEGG GLYCAN, in Glycoinformatics, Ch. 7, T. Lütteke and M. Frank, Editors. Humana Press: New York; 2015. pp. 97–107. https://doi.org/10.1007/978-1-4939-2343-4_7.
Drula E, Garron ML, Dogan S, Lombard V, Henrissat B, Terrapon N. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res. 2022;50(D1):D571–7. https://doi.org/10.1093/nar/gkab1045.
Egorova KS, Toukach PV. CSDB_GT: a new curated database on glycosyltransferases. Glycobiology. 2017;27(4):285–90. https://doi.org/10.1093/glycob/cww137.
Egorova KS, Knirel YA, Toukach PV. Expanding CSDB_GT glycosyltransferase database with Escherichia coli. Glycobiology. 2019;29(4):285–7. https://doi.org/10.1093/glycob/cwz006.
Egorova KS, Smirnova NS, Toukach PV. CSDB_GT, a curated glycosyltransferase database with close-to-full coverage on three most studied nonanimal species. Glycobiology. 2021;31(5):524–9. https://doi.org/10.1093/glycob/cwaa107.
Martini S, Nielsen M, Peters B, Sette A. The immune epitope database and analysis resource program 2003–2018: reflections and outlook. Immunogenetics. 2020;72(1–2):57–76. https://doi.org/10.1007/s00251-019-01137-6.
Mariethoz J, Khatib K, Alocci D, Campbell MP, Karlsson NG, Packer NH, Mullen EH, Lisacek F. SugarBindDB, a resource of glycan-mediated host-pathogen interactions. Nucleic Acids Res. 2016;44(D1):D1243–50. https://doi.org/10.1093/nar/gkv1247.
Toukach PV. Supplementing the Carbohydrate Structure Database with glycoepitopes. Glycobiology. 2023;33(7):528–31. https://doi.org/10.1093/glycob/cwad043.
Harrison JE, Weber S, Jakob R, Chute CG. ICD-11: an international classification of diseases for the twenty-first century. BMC Med Inform Decis Mak. 2021;21(Suppl 6):206. https://doi.org/10.1186/s12911-021-01534-6.
Kapaev RR, Toukach PV. GRASS: semi-automated NMR-based structure elucidation of saccharides. Bioinformatics. 2018;34(6):957–63. https://doi.org/10.1093/bioinformatics/btx696.
Tiemeyer M, Aoki K, Paulson J, Cummings RD, York WS, Karlsson NG, Lisacek F, Packer NH, Campbell MP, Aoki NP, et al. GlyTouCan: an accessible glycan structure repository. Glycobiology. 2017;27(10):915–9. https://doi.org/10.1093/glycob/cwx066.
Kapaev RR, Egorova KS, Toukach PV. Carbohydrate structure generalization scheme for database-driven simulation of experimental observables, such as NMR chemical shifts. J Chem Inf Model. 2014;54(9):2594–611. https://doi.org/10.1021/ci500267u.
Kapaev RR, Toukach PV. Improved carbohydrate structure generalization scheme for 1H and 13C NMR simulations. Anal Chem. 2015;87(14):7006–10. https://doi.org/10.1021/acs.analchem.5b01413.
Kapaev RR, Toukach PV. Simulation of 2D NMR spectra of carbohydrates using GODESS software. J Chem Inf Model. 2016;56(6):1100–4. https://doi.org/10.1021/acs.jcim.6b00083.
de Vienne DM. Lifemap: exploring the entire tree of life. PLoS Biol. 2016;14(12): e2001624. https://doi.org/10.1371/journal.pbio.2001624.
Stroylov V, Panova M, Toukach P. Comparison of methods for bulk automated simulation of glycosidic bond conformations. Int J Mol Sci. 2020;21(20):7626. https://doi.org/10.3390/ijms21207626.
Lütteke T. Translation and validation of carbohydrate residue names with MonosaccharideDB routines, in A Practical Guide to Using Glycomics Databases, Ch. 3, K. Aoki-Kinoshita, Editor. Springer Japan; 2017 pp. 29–40. https://doi.org/10.1007/978-4-431-56454-6_3.
Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, McVeigh R, O'Neill K, Robbertse B, et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford), 2020. 2020:baaa062. https://doi.org/10.1093/database/baaa062.
White J. PubMed 2.0. Medical reference services quarterly. 2020;39(4):382–387. https://doi.org/10.1080/02763869.2020.1826228.
Ranzinger R, Herget S, von der Lieth CW, Frank M. GlycomeDB-a unified database for carbohydrate structures. Nucleic Acids Res. 2011;39:D373–6. https://doi.org/10.1093/nar/gkq1014.
Ranzinger R, Aoki-Kinoshita KF, Campbell MP, Kawano S, Lutteke T, Okuda S, Shinmachi D, Shikanai T, Sawaki H, Toukach P, et al. GlycoRDF: an ontology to standardize glycomics data in RDF. Bioinformatics. 2015;31(6):919–25. https://doi.org/10.1093/bioinformatics/btu732.
Egorova KS, Kondakova AN, Toukach PV. Carbohydrate structure database: tools for statistical analysis of bacterial, plant and fungal glycomes. Database (Oxford); 2015. https://doi.org/10.1093/database/bav073.
Toukach FV, Ananikov VP. Recent advances in computational predictions of NMR parameters for the structure elucidation of carbohydrates: methods and limitations. Chem Soc Rev. 2013;42(21):8376–415. https://doi.org/10.1039/c3cs60073d.
Dorst KM, Widmalm G. NMR chemical shift prediction and structural elucidation of linker-containing oligo- and polysaccharides using the computer program CASPER. Carbohydr Res. 2023;533:108937. https://doi.org/10.1016/j.carres.2023.108937.
Acknowledgements
The author acknowledges all the CSDB team involved in the CSDB product life cycle. The participants are listed at http://csdb.glycoscience.ru/help/credits.html.
Funding
The author declares no funding in 2023-2024.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
The reported work did not include any chemical or biological experiment, except those set in silico.
Conflict of interest
The author declares no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Published in the topical collection featuring Current Progress in Glycosciences and Glycobioinformatics with guest editors Joseph Zaia and Kiyoko F. Aoki-Kinoshita.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Toukach, P. Carbohydrate Structure Database: current state and recent developments. Anal Bioanal Chem (2024). https://doi.org/10.1007/s00216-024-05383-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00216-024-05383-w