Background

Human Nutritional Science studies the effects of food components on metabolism, health, performance, and disease resistance of humans, also encompassing the study of human behavior related to food choices. Nutritional epidemiology, on the other hand, assesses the relations between diet, nutrients and health, and disease outcomes [1]. Yet, there is a major disconnection between the description of nutrition-based prevention of disease and the understanding of the complex network of interactions by which nutrition modulates health. To fill this gap, a set of nutrition-related sub-disciplines (e.g., nutritional biochemistry, clinical nutrition, nutritional epidemiology, nutrigenetics, and nutrimetabolomics) provide fundamental evidence at different levels and from different perspectives, contributing to the expansion of nutritional science as a more systematic and complex discipline [2, 3]. As nutrition data are heterogeneous in terms of quality and nature, a comprehensive consideration of all aspects is challenging [4], even if substantial advance has been made to improve the reporting of findings and the data quality [5] of nutrition research [6], which is one of the prerequisites for integrated analysis.

To integrate evidence, a systematic re-organization of concept definitions is needed. Currently, concept definitions are often derived from multiple sources, with the drawback that slight variations can lead to misleading interpretations [7]. Since in bioscience in general, and in nutritional science in particular, the same concept can be referred to by multiple synonymous terms, abbreviations, or acronyms [8], as well as using different languages, term classifications such as the Medical Subject Headings (MeSH) [9] or the NCI Thesaurus [10] provide fundamental resources. However, thesauri or controlled vocabularies for biomedical information do not specify relations between concepts. Although those efforts can be used to standardize general study descriptions, considerable advances would arise from the use of resources that, in addition to standardizing the vocabulary, also include connections/relations between classes, such as ontologies, specifically tailored to the nutritional sciences.

Often biomedical researchers refer to ontologies using the terminologies more appropriately pertaining to “controlled vocabularies,” “thesauri” (i.e., a list, often organized in a hierarchy or taxonomy, of concepts and their textual descriptions), or “taxonomies” (i.e., a hierarchy consisting of terms denoting classes linked by sub- and super-class relations). A proper ontology, however, is defined as a formal representation of knowledge in a certain reality (i.e., a certain domain of knowledge), in a way that different people—and, notably, computers—can understand the concepts it contains and learn about the reality that is being represented [8, 11]. Ontologies consist of defined classes of entities, typically structured within a knowledge hierarchy where concepts are connected by standardized [12] semantic relationships (i.e., “is-a,” “part-of”) formally specifying knowledge relations such as generalizations of specifications of the reality of interest [13].

Open Biomedical Ontologies (OBO), established in 2001, is a platform for develo** interoperable ontologies for biomedical research [14]. Efforts have been made in the agricultural field to develop nutrition-oriented ontologies focused on the description of food components such as “the food classification and description system” [15] developed by European Food Safety Authority (EFSA). Other notable efforts in develo** food-focused ontologies were reviewed elsewhere [16]. Based on literature search and public ontological repository queries (OBO Foundry searched using ONTOBEE, and Bioportal), a single example of a nutritional ontology was found (the Bionutrition Ontology—BNO, http://purl.bioontology.org/ontology/BNO). The latter represents a controlled vocabulary of nutritional terms, without a proper annotation of terms or definition of properties, and lacks orthogonality (i.e., no terms are imported or refer to external ontologies). To the authors’ knowledge, a proper ontology integrating the terms related to food description, medical science, genetics, genomics data, and nutritional science methods for diet and health research is not available to date. To fill this gap, we present the Ontology for Nutritional Studies (ONS) to facilitate the harmonization and integration of biological samples collected using different methodologies, referred to by differing terminologies in various fast-growing sub-disciplines in the dietary and health research.

The ONS was developed within the European Nutritional Phenotype Assessment and Data Sharing Initiative (ENPADASI) consortium [17], which joins scientists from 51 research centers in nine countries of Europe with the common effort to handle and make available big nutritional data through the open access nutritional database Data Sharing In Nutrition (DASH-IN) [17, 18]. DASH-IN is a distributed pan-European infrastructure and supports the storage of both interventional and observational studies and provides the tools for distributed management and search and analysis of the data [19]. The development of this infrastructure requires an ontology to harmonize biochemical, genetic, clinical, and nutritional concepts typically found in intervention and observational studies. The ontology would provide a coherent means of data annotation and data querying over the distributed infrastructure. Further developments of the project led to a stronger need for a proper conceptual framework such as the ONS that could be used by a broader nutrition community to build upon for annotating general nutritional studies. The ENPADASI framework gathered researchers from different nutrition-related fields (health sciences, biology, genetics, microbiology, agricultural sciences, food technology, science of materials, chemistry, metabolomics, genomics, bioinformatics, and metagenomics) and offered the ideal milieu for creating the first ontology in nutrition.

Methods

Terms to be included in the ONS were collected among partners of the ENPADASI consortium, as well as from templates for data and metadata upload into the DASH-IN databases. In compliance with the OBO Foundry principles [14], the ONS has been developed to be as follows: (i) Interoperable with other ontologies, as it has been formalized using the latest OWL 2 Web Ontology Language [20] and RDF specifications [21] and edited using Protégé [22]; the hermit reasoner (http://hermit-reasoner.com/) was used for consistency checking. (ii) Accessible, under the Creative Commons license (CC BY 4.0), published on GitHub (https://github.com/enpadasi/Ontology-for-Nutritional-Studies) and at NCBO BioPortal (http://bioportal.bioontology.org/ontologies/ONS). (iii) Orthogonal to other ontologies by reusing existing terms. Besides assuring compliance with the OBO Foundry principles, we also ensured that the ONS followed the increasingly established FAIR principles [23]. As such, the ONS is also published in the FAIRsharing database (https://fairsharing.org/bsg-s001068).

To enhance interoperability with other ontologies, the ONS builds on a subset of the Ontology for Biomedical Investigations (OBI) [24]. The subset was created using the ONTODOG tool [31]. In fact, joint data analysis has already started hel** to achieve new discoveries [32]. In the ONS, we have included the minimal required study information in the growing conceptual/ontological framework. Each minimal required study term was placed at the appropriate hierarchical level in the ontology. To easily identify terms pertaining to the minimal study information, an annotation property (“in_minimal_requirements_subset”) was created.

Application scenarios

The ONS is designed to enable the description of both intervention and observational studies in human nutrition. Here, we present two application scenarios based on published nutritional studies, one for the observational study design and one for the interventional study design. Figures 2 and 3 illustrate how the ONS was built to support the standardized annotation of most descriptors of a nutritional study, starting from initial phases of a study (i.e., formalizing the definition of population stratum) to finally connect to the specific results and how they were obtained. Figures and descriptions have to be intended at the single instance level (i.e., specific for the study object of description). For this reason, we introduced the use of individuals (and their connections) for very study-specific element alongside concepts in classes. In the text below, the italic notation indicates the properties, while the notation PREFIX:CLASS is used to indicate classes in the ontology, for example the notation “ONS:Diet” indicates the class with label “Diet” in the ONS ontology. For abbreviation of the ontologies, we refer the reader to the list of imported ontologies in the “Methods” section.

Fig. 2
figure 2

Application scenario to the description of an observational study: modeling of the CHANCE study with the ONS. Terms in rhombus indicate instance-level terms specific to the CHANCE study (i.e., the specific conclusion of the CHANCE study), while terms in rectangular boxes represent general concept in the ONS. The presented semantic representation should be intended at the single instance level for the purpose of specifically describe CHANCE study

Fig. 3
figure 3

Application scenario to the description of an intervention study: modeling of the FLAVURS study with the ONS. Terms in rhombus indicate instance-level terms specific to the FLAVURS study (i.e., the specific conclusion of the FLAVURS study), while terms in rectangular boxes represent general concept in the ONS. The presented semantic representation should be intended at the single instance level for the purpose of specifically describe FLAVURS study

Observational studies

The first application scenario is represented by the CHANCE study [33]. Figure 2 illustrates how the ONS can be used to formalize information on how the study was conducted. This observational study aims at develo** novel and affordable nutritious foods to optimize the diet and reduce the risk of diet-related diseases among groups at risk of poverty (ROP). The CHANCE study uses two different approaches to draw its final conclusion. The first is a literature search process (EDAM:Literature search), performed with a specific textual literature database query (i.e., an instance of the class ONS:Literature database query). Output of the literature search process is a number of scientific publications (IAO:Scientific publication) which are subject to analysis and review to extract data (OBCS:data collection from literature), a process that ultimately results in an organized data matrix (OBCS:Data matrix). CHANCE also included an observational study approach. In this case, a population was firstly divided into sub-populations based on their economic income. This stratification (STATO:Population stratification prior to sampling) was carried out following a specific stratification rule (STATO:Stratification rule), based on the risk of poverty (ROP) of the subjects assessed with a questionnaire (ONS:Income assessment). The stratified population was then challenged with (i.e., is specified input of) two nutritional questionnaires (ONS:Food frequency and ONS:Food diary) aimed at assessing the foods consumed by the subjects and producing results finally organized in a data matrix. In both cases, the data matrices (OBCS:Data matrix) specific for this study contain information about the nutrients and food consumed by the population and represent the specified data object on which conclusions are drawn (OBI:drawing a conclusion based on data).

Intervention studies

The second application scenario is represented by the FLAVURS (impact of increasing doses of flavonoid-rich and flavonoid-poor fruit and vegetables on cardiovascular risk factors in an ‘at risk’ group) study [34]. Figure 3 illustrates how the ONS can be used to formalize the information on how the study was conducted. This interventional study aimed to investigate the effects of high and low flavonoid diets on the vascular function and other cardiovascular disease risk factors. In this study, a population, selected on the basis of the stratification rule (STATO:Stratification rule) of having a relative risk of develo** cardiovascular disease higher than 1.5, has been randomly divided (OBI:Group randomization and OBI:Randomized group participant role) into three groups: control group (CT), high flavonoid group (HF), and low flavonoid group (LF). Each of the groups was challenged with a different diet (ONS:Diet): CT followed the usual diet (ONS:Usual Diet), which is defined to have exactly 0 interventions (ERO:Intervention); in the HF and the LF groups, individuals were challenged with two different types of intervention diet (ONS:Intervention diet) encompassing two different intervention (ERO:Intervention) protocols. In HF diet, the intervention was performed by the prescription of consuming fruit and vegetables with high flavonoid content, while in the LF diet the intervention was concretized by the prescription of consuming fruit and vegetables with low flavonoid content.

Urine and blood (OBI:Urine specimen and OBI:Blood specimen) were collected from individuals (OBI:Collecting specimen from organism) and analyzed (i.e., they inherited the evaluant role OBI:Evaluant role) by an HPLC assay (HPLC class) including untargeted metabolomics [35]. Output of the analysis was a data item in the form of a matrix (OBCS:Transformed data item) that is used to draw specific FLAVURS conclusions (OBI:Drawing a conclusion based on data and OBI:conclusion based on data).

Discussion and conclusions

The ONS is the first systematic effort to provide a formal ontology framework for the description of nutritional studies. In this context, the main aim of the ONS is the establishment of an ontological framework that can assist nutrition researchers by selecting the appropriate terms from the wide range of existing ontologies and creating the relevant missing key concepts for the field. Nutrition researchers, who might not necessarily be familiar with ontologies and concept standardization, can find in the ONS a single knowledge entry point for a unified and standardized terminology without having to resort to numerous ontology sources. In addition to standardizing concept descriptions and assisting in annotation, the ONS will structure querying of nutritional studies stored in public databases (such as the resources developed in the ENPADASI project). Finding the suitable studies (i.e., those more directly comparable regarding design, employed stratification criteria, or type of intervention diet employed) represents the basis for integrated analysis. Such a query, in fact, cannot be efficiently based on string matching, but rather on more complex textual analysis and machine learning methodologies for which ontology is crucial. A well-established nutritional ontology would also enable more accurate search for required data as well as the automated integration and analysis of data from multiple sources [36].

Diet, nutrient, and food are indeed central concepts for nutritional sciences, and they were included and connected with higher level concepts in ONS. Moreover, the ONS supports the research needs identified by other initiatives such as the Food Biomarkers Alliance (FoodBAll) by including for the first time in a formal ontology the concept of biomarker in nutrition, and its sub-classes, as defined in [30].

Besides acquiring widespread utilization, an ontology can be considered successful only if (i) continuous development and (ii) constant contribution/updates from researchers with specific knowledge is ensured. We invite and encourage researchers in the nutritional field to contribute to the further development, adoption, and promotion of the ONS. Contributions are already possible using the GitHub tracking/issues system (Additional file 1) and an online community platform to facilitate the process of curation and extension of the ONS will be developed for this purpose. As a next challenge, the ONS aims to integrate nutritional studies with non-life sciences such as economy, psychology, and sociology, which also influence the nutritional status of individuals [37,38,39].