Keywords

1 Introduction

It is already known that the diet of the world’s population has a massive impact on climate change [1, 2]. However, still too little attention is being paid to the climate change’s impact on the growing conditions of ingredients for different foods and beverages and further, similarly, to emission rates due to, for example, production and logistics. The provenance and climate change impact on various foods are often not clearly known or accessible, both for end consumers as well as for the whole supply chain elements.

To give an example, many food options are untrivial and interdependent in terms of sustainability, for example, it may be unclear to consumers that production of mineral water (due to the packaging materials used) may be more damaging to the climate than the production of rice, and further aspects (e.g. logistics and prices) become relevant. As in all information-intensive environments, food producers and consumers continuously face complex decisions on which ingredients or products to choose, in which amounts and how to process them or which alternatives to select for the products that are consumed regularly. To make decisions, the food producers, providers and consumers need access to data about these food items, for example their nutritional value, taste, sustainability characteristics as well as the needed nutrients and logistics information. This information is still scattered, and the quality of the data varies. Meanwhile, data indicating climate change impact of different foods and beverages exists (or can be collected) as well as data on the supply chains. However, these data are still often not easily available and discoverable, and have no explicit representation and connections between them, in a way these can be achieved with semantic technology (ontologies and knowledge graphs). Generally, ontologies are data models representing a set of concepts and their relationships in a domain. Knowledge graphs include domain-specific data points (instances of these concepts) and their specific data and object properties. Knowledge graphs are highly scalable and flexible data structures which allow us to develop a linked data model to illustrate these explicit linkages between the crops, nutritional contents and also growth temperature conditions.

In this work we aim to reach a clear understanding of the diets and how to make them equivalently nutritious, sustainable and approaching near-zero CO2 emission, as well as also change the diets considering and adapting to climate change characteristics taking into account the growing conditions. The main question that is to be answered with this research is: “How can we interlink datasets so that an alternative to the current consumed products can be (automatically) found by taking into account the nutritional composition, growing conditions which will be effected by climate change and sustainability information?” and “To what extent can that process be automated?”.

The objective of the research is to identify the relevant data and make them more accessible for discoveries and supporting (automatic) decision making in the food supply chain and for end consumers. Thus, the goal is to develop and generate knowledge graphs, benefiting from semantic technology which helps interlinking scattered information using standardized concepts. With employment of knowledge graphs and using them for interlinking and our I-KNOW-FOO project approach presented in this work, one will be able to create a web-like large-scale data infrastructure and tools to easily explore the domain for everyone (researchers, businesses, policy makers, manufacturers, consumers), as well as to assist in making estimates of CO2 footprints of various foods and beverages and in adaptation of the diets given the climate change.

The remainder of this document is organized as follows. Section 2 explains the methodology of the work and the starting points with relation to the available data. In Sect. 3, the results are presented, Sect. 4 evaluates the approach, and Sect. 5 describes the conclusions and future work.

2 Data and Methods

For the purpose of the study, data on import, growing conditions and nutritional value of crops was required. The datasets were searched on the web, through data repositories, government websites, academic databases, and open data portals. Preferably, the data on growing conditions and nutritional values would be accessed through existing ontologies as this saves steps in data management and it implies that a shared vocabulary already exists. Alternatively, datasets and databases were converted to linked data and a knowledge graph was constructed manually. The following tables (Tables 1 and 2) include the ontologies and databases that have been collected on both crop growing conditions and food items, respectively. Data inquiry has been conducted based on the following keywords: Crop ontology, Plant ontology, Agriculture ontology, Growing conditions ontology, Nutritional profile ontology, Crop traits ontology, Crop characteristics ontology.

Table 1. Ontologies on crop growing conditions.
Table 2. Ontologies on food items and nutritional composition.

As the datasets are scattered, the research started initially by making an inventory of the available datasets, ontologies and knowledge graphs on food products, and the impact of climate change on the food products availability. The datasets and databases screened were the SHARP Indicators Database, Food Consumption Impact datasets (Optimeal-Blonk Sustainability Datasets), RIVM Sustainability dataset, the Pizza dataset, the Eaternity Database, Data Explorer: Environmental Impacts of Food, Dataset on potential environmental impacts of water deprivation and land use for food consumption in France and Tunisia. Food environmental impact UK database (by Clark et. al.), and the World Food LCA Database. Among those, a few are publicly available [16,17,18,19,20,21,22,23,24].

As the databases and datasets are from all over the world, the food products vary from one database to another and it is not straightforward to map or relate them. Additionally, the existing food databases provide information about impact of food consumption on sustainability and they do not have a direct link to changing climate conditions which is required to determine alternatives for the original products that are currently part of the diets.

We then defined a use case that focuses on the most-imported crops in the Netherlands to connect the consumption to the changing climate. For most-imported crops, their important nutritional values are determined and alternative crops are found (e.g., for the case when the Netherlands may run out of these crops in a changing climate over years). For this goal (to determine the most-imported crops) we use the FAOSTAT Database [25]. The crop information was manually interlinked to growing conditions. The most useful information was considered to be found in the ECOlogical CROP Database (ECOCROP) [8], as it contains information on the growing conditions of more than 3000 crops. However, unfortunately, it was not represented as linked data, so we had to uplift it to this format.

3 Results

The use case focuses on the most-imported crops in the Netherlands. Our aim was to determine the most imported crops to the Netherlands in order to evaluate their important nutritional characteristics and to find alternative crops to these crops for the Netherlands, if these crops become unavailable (such as due to climate change). Top 10 commodities that were imported to the Netherlands within the last 5 years (2016–2020) were screened using the TRADE Datasets for Crops and livestock products in the FAOSTAT database. Moreover, commodities supplied to the Netherlands were assessed using Food Balance Datasets in terms of Domestic Food Supply Quantity (1000 t/yr) and Food Supply Quantity (kg/capita/year). These commodities are listed in terms of their import quantity, import values, supply quantities, in descending order (see Table 3).

Table 3. Comparison of imported food groups versus food supply data (from 2016–2020).

We focused on three main commodities that are imported in high quantities and supplied to the Dutch population, and identified soybean, wheat and potato as the mostly imported and consumed food products. The next step was to find nutritionally similar alternative crops using the NEVO Dutch Food Composition Database. We have also searched for growing conditions of original and alternative crops, and developed knowledge graphs to link these data and reuse parts of the existing knowledge graphs. The alternatives are generated by manually processing the intersection of different result sets of queries on the knowledge graphs (either manually or automatically) for nutritional equivalent (or nutritionally better) food items and crops that are more climate-resistant. In the following parts of the section, we will describe the resulting ontologies and knowledge graphs and the querying in our approach.

3.1 I-KNOW-FOO Ontologies and Knowledge Graphs

To be able to answer basic queries for our problem setting, we have prepared the data as follows, applying manual and automated uplifting and extension to knowledge graphs and ontologies.

Manually Generated Alternatives.

To find alternatives to the three crops, we have focussed on parameters of climate resilience, nutrient-rich comparable crops and food products that have been screened using knowledge rules provided by a dietary expert using the NEVO database. These possible alternative crops have then been evaluated in terms of their resistance to temperature increases in a changing climate using crop growth temperatures from the ECOCROP database.

Generating an ECOCROP Ontology.

The ECOCROP database is transformed into a knowledge graph manually. First, the dataset has been cleaned. The measurementType ‘optimalGrowthTemperature’ has been subdivided into maxGrowthCelsiusTemperature and minGrowthCelsiusTemperature to distinguish between the two as well as add a unit into the predicate. The triples consist of the occurenceID as subject, measurementType as predicate and measurementValue as object. These have been transformed using the OntoText Refine tool and have been loaded into an RDF repository in RDF4J. OntoText Refine is a software tool that supports the transformation of string data into knowledge graphs [26].

ECOCROP Extension and Interlinking to FIO and FoodOn.

We have extended ECOCROP manually by adding triples linking some of the occurenceIDs in ECOCROP to the IDs of crops in FoodOn (including NCBITaxon [27]) and food items in FIO (Food Item Ontology [15]), based on the RIVM NEVO IDs. In FoodOn, we have chosen for instances of the organism class, because these represent the plants rather than the different foods that may originate from these plants. The plants namely are grown under (climate-changing or not) temperatures, not so much the foods. The relations used for linking the concepts are the skos:closeMatch and the owl:sameAs relations.

The open access ECOCROP ontology and knowledge graphs created in our project are available at: https://git.wur.nl/FoodInformatics/i-know-foo.git.

3.2 Querying the Knowledge Graphs

Subsequently, we have loaded the triples in the triple repository, where the information can be queried using SPARQL. In the future, this could be done by an automated tool. The query that we have formulated searches for crops that are more resilient to a warmer climate, being candidates to replace the current crop. So far in this exercise, we have only focused on the maximum growing temperature being one of the important factors in climate change on crop growth [28]. In our examples, the maximum optimal growing temperatures are 33 ℃ for soybean, 23 ℃ for wheat and 25 ℃ for potato. Combining this information with nutritional values information, still leaves multiple but often restricted options for food alternatives with similar nutrition characteristics. For example, for potatoes, possible alternatives are beans white/brown dried, peas green dried, chestnuts raw, tapioca, cassava raw, taro raw, yam raw, tannia raw, beans black eyed dried, peas split yellow/green dried, tamarind, flour cassava. The alternatives are found by intersecting the different result sets, i.e., the climate-resilient crops from ECOCROP and the alternatives as defined by nutritionists.

To obtain the solutions, queries have been written to find alternatives based on growing temperature and these alternatives have been superimposed onto the nutritional results from NEVO. This identifies the alternatives that are more climate resilient as well as nutritional equivalent. For each crop, a SPARQL query can be written for finding alternatives when altering the optimal maximum growing temperature. For example, for wheat, the maximum optimal growing temperature is 23 ℃. The query will therefore be:

figure a

This query returned 1,790 crops with a maximum optimal growth temperature greater than 23. These alternatives were then superimposed onto the nutritional alternatives for wheat from NEVO, resulting in the four alternatives as listed in Table 4.

We have attempted to automate the intersecting (superimposing) of the different result sets (temperature-resistant crops, food items with equivalent or improved starch, pyridoxine, ascorbic acid and potassium levels), but unfortunately that has not worked out. The SPARQL query, given below, appeared to be too heavy due to the five filters that were required. With four filters (one removed) it was still possible to obtain results (in a regular desktop computer set up), but the processing time went up unacceptably high at increasing query complexity (given number of filters included):

figure b

We have also converted the query to a nested query with subqueries for each of the filters, with the aim to retrieve a relatively small part of the data per subquery and hence reduce the amount of data processed in the overarching main query, but that had no effect. Future research should focus on (further) query optimization.

What is more, the ECOCROP ontology should be extended with candidate alternatives for crops provided by the dietary expert. Presently, all crops/food items in the ontology are considered as alternatives (i.e., only based on higher growing temperature and equivalent or better nutritional values), rather than a specific set that is really suitable as alternative for the specific food item focused on, replacing the current food item in a meal or recipe at a specific moment of the day.

4 Evaluation

The following section describes the results of the evaluation of the approach. The interoperability was tested through a use-case scenario in which new data was linked to the knowledge graph and queried for results. The section contains a description of the use case scenario, the approach to linking new data, and the results from the query.

Suppose a certain area of cropland is being affected by an increase in average annual temperature, rendering it increasingly more difficult for wheat to grow as it requires not to exceed a certain maximum temperature throughout the year. A farmer may want to find other crop options to cultivate on the farmland in order to increase efficiency and climate resilience. However, besides searching for alternative crops that can withstand higher temperatures, the farmer is also concerned about the change in profits when switching to alternatives. When changing to a different crop, producer prices for yearly yield will also change. Therefore, if the farmer wants to identify climate-resilient crops and prioritize these results based on producer prices per tonne, a new dataset should be added to the knowledge graph.

Table 4. Alternatives for wheat based on nutritional profile and growing temperatures.

In order to add pricing as a further prioritizing variable for the identified crops, a new dataset was also added to the knowledge graph as a part of evaluation. Note that the data for this use case is synthetic, for evaluation purposes, and it does not represent actual market data. All results and figures from this validation should not be interpreted as real crop pricing data. As data on producer pricing of crops is difficult to find due to frequent changes and a lack of accessibility, a synthetic database provides a viable alternative for a validation use case.

Synthetic data was generated to create a simulation for producer prices on the crops that have been identified as alternatives for wheat (see Table 5). Subsequently, the synthetic data was added to the repository, and the original SPARQL query for wheat was extended and run in the repository as follows.

figure c
Table 5. Synthetic data on producer prices for wheat alternatives (€/t).

After the query output was superimposed on the manually created NEVO alternatives, it resulted in the data as shown in Table 6.

Table 6. Query result for wheat alternatives including producer prices.

5 Conclusions and Future Work

More sustainable food production, distribution and consumption options can be discovered by all stakeholders, eventually leading to near-zero CO2 emission diets and sustainable food production that will have a positive impact on climate change and will also be adaptive to it. Linking datasets and unchaining the information about crops and food products allow automatically finding nutritionally similar alternatives in case of changing climate. This research demonstrates that automation is possible. In this work, alternatives are generated manually for the three most-imported crops in the Netherlands to showcase the feasibility of automatic generation. The growing conditions of the crops are defined in the ECOCROP ontology which we based on the open ECOCROP data, with the nutritional values available from FIO and based on NEVO. The linking between NEVO database and the ECOCROP ontology is done through the NEVO codes (inserted in the ECOCROP ontology).

The findings demonstrate the effectiveness of linking structured datasets and ontologies to facilitate automated decision-making. By querying the knowledge graph, nutritionally similar alternatives can be identified to adapt to changing climate conditions. Importantly, the use of the developed knowledge graph is not limited to this study alone. It serves as a foundation for further development, inviting a multitude of stakeholders to contribute and integrate additional data sources. Furthermore, future enhancements can involve the integration of an advanced ontology into multiple infrastructures and data platforms, for example, for ontology-enabled food ingredient substitution [14], thereby increasing its utility and impact in the field of sustainable food production. However, extensive querying may be reaching computational performance bottlenecks in usual computational settings of regular users.

Furthermore, a user interface might increase the usability for other stakeholders in the future, besides researchers. It has been demonstrated before that visual elements, including graphs and images, are more easily understood than text and numbers [29]. Earlier research shows techniques for visualizing SPARQL query outputs from GraphDB, with the goal of increasing the understandability of vast knowledge graphs and complex queries. Besides ontology visualization tools such as Nitelight and FedViz, other studies have constructed visualization tools and frameworks to increase understandability with end-users such as in studies on raising awareness of data sharing consent [30, 31]. In these cases, a framework for an application is created where the user communicates with the front end that links to GraphDB through several APIs and visualizes the resulting data for increased understandability. Similar user interfaces could be developed for the current knowledge graph when implemented in non-academic situations.