Main Text

The Journal of Cheminformatics (hereafter JCheminform) contains chemical structures in nearly every published article. However, if readers want to find which articles contain a particular structure, or download the structures from a particular article, it is not possible unless the author makes them readily available. Even then, each author might do this differently, increasing the downstream effort by data users.

JCheminform is hel** lead the way for open and FAIR (Findable, Accessible, Interoperable, Reusable [1]) chemical informatics. For example, all Additional Files in JCheminform articles are uploaded into FigShare [2], thus hel** make them more accessible. However, JCheminform can further its FAIRness leadership by enabling two key initiatives for its chemical structure content: establishing a consistent approach to reporting chemical structures per article by introducing a chemical structure data template for Additional Files and enhancing the article DOI metadata to link back to this chemical structure data file DOI in FigShare.

While bioinformatics, crystallography and other scientific fields require data to be published in an open repository, there is no such requirement for chemical information; yet, there is movement in this direction. Some publishers (e.g., Springer Nature [3]) have established automated submission of chemical structure content to open archives (such as PubChem [4]) using name-entity approaches. Some primary chemistry journals have established standard templates for reported chemical structure and associated content (e.g., Journal of Medicinal Chemistry [5]) or by means of direct chemical structure submission into an open archive (e.g., Nature Chemical Biology [6]).

If chemical structures found in articles are provided in a machine-readable way, e.g., via a template file, then the chemical structures are more interoperable and can be readily reused by researchers and directly integrated by chemical-centric resources. If the journal supports findability of this chemical data, for example by indicating its availability in the article DOI metadata or by providing a map** file (containing the article DOI and the FigShare DOI of the chemical data file) on the journal website, then scientists and chemical resources can readily locate the chemical structure content. By improving the findability and interoperability, the barriers for reuse are lowered (see e.g., [7]). These author contributions, especially with valuable additional annotation information, are essential to fill gaps in the current chemical knowledge [8]. Thus, we believe it is time for JCheminform to take these logical next steps towards Open Science and help authors provide Open and FAIRer chemical data.

Chemical structure data template file

Authors should be encouraged to submit their chemical structure information using a standard template as “Additional Files” with their manuscript in either CSV (*.csv), TSV (*.tsv) or SDF (*.sdf) format.

For CSV/TSV, the header (first row) indicates the data content of each column; each subsequent row corresponds to a complete chemical record description: chemical structure, chemical names, identifiers, comments, and any other data the authors wish to provide (as additional columns). The case-insensitive template CSV/TSV column headers (or SDF SD fields) are: SMILES, InChI, and InChIKey for chemical structure; Name and Synonym for chemical names; and Comment for textual comments. Any additional columns headers, e.g., for data, further identifiers or metadata, are up to the author. The Synonym and Comment columns may be provided more than once per record.

The author-submitted template file should contain at least one of the following columns: SMILES, InChI, Name or InChIKey. The Name column corresponds to a single primary name for the chemical structure. Each Synonym column corresponds to an additional chemical name (one name entry per column). Each Comment column can be added to provide additional text that may be important to the downstream user. Authors can also provide additional CSV/TSV columns or SDF SD fields of information to describe their chemical substances (with unique, descriptive headers) for additional context. Chemical database identifiers or registry numbers could be included in this manner, or as a Synonym. See Fig. 1 and Additional file 1 [9] as an example.

Fig. 1
figure 1

Example chemical structure data template file for interoperable chemical information [9]

Note that chemical records indicating chemical structure with only InChIKey or Name will not contain sufficient information to describe a chemical structure, and can only be mapped to existing entries in destination resources. Batch services are available (e.g. from PubChem [4, 10] or CompTox [11, 12]) for authors to add e.g. SMILES and/or InChI to their records.

Closing

The era of Open and FAIR chemical science is upon us. Providing standard templates and guidance for authors to submit their chemical data in an interoperable manner that is tagged in a findable way by the journal will help close the gaps in databases, raise the visibility of their scientific contributions, improve the machine readability, help authors meet funding data sharing requirements and increase the utility of information in JCheminform. We strongly believe that authors and readers alike will greatly appreciate this FAIRifying value-add towards Open Science.