1 Background

The quality of forest soils, an important factor affecting the productivity and stability of forest ecosystems, may be a limiting factor for the long-term sustainability of forest management (Jonard et al. 2015; Garrett et al. 2021). Forest soil condition is also one of the fundamental properties influencing an ecosystem’s potential adaptation to climate change (Babst et al. 2013; Charru et al. 2014).

Current data show serious nutrient deficiencies in a large part of the Czech Republic (Novotný et al. 2018; Borůvka et al. 2020b; Pecháček et al. 2023) and a discrepancy between measured and expected soil properties according to the national forest typological system (Viewegh et al. 2003; Šrámek et al. 2013). Several soil surveys with slightly different sampling and analytical methods were organized in the last 20 years (e.g., within the ICP-Forest Programme, National Forest Inventory, and soil monitoring in polluted areas) in the country, which made the overall evaluation complicated. For this reason, data from different soil surveys were validated (Šrámek et al. 2020) harmonized and aggregated to create a single comprehensive database. Subsequently, environmental and geographic parameters were added for the sampling sites. The dataset is a collection of information about fundamental soil properties relevant to forest nutrition.

2 Methods

2.1 Soil data origin

Data entered into the database were provided by three institutions and consisted of four sub-databases with varying degrees of homogeneity related to individual soil surveys (Table 1). (1) The Central Institute for Supervising and Testing in Agriculture database (CISTA db) contains soil sampling data focused on forest nutrition surveys in formerly air-polluted areas and genetically valuable forest stands (genetic conservation units) (Fiala et al. 2013; Reininger et al. 2011). During the survey, three soil layers were sampled: the upper organic layer (OFH), organo-mineral soil horizon (A), usually 1–5 cm thick, and the deeper mineral soil down to the depth of ca. 30 cm. Samples were analyzed in the CISTA laboratory, which was accredited according to the EN ISO/IEC 17043:2010 standard.

Table 1 The original database‘s main information

(2) The National Forest Inventory database (FMI_NIL 2 db) was administrated by the Forest Management Institute (FMI) (Kučera and Adolt 2019). The sampling was carried out according to a standardized methodology by genetic horizons (not fixed depths) and by multiple sampling teams. A constant spectrum of chemical parameters was analyzed in the samples, and analyses were carried out in the FMI laboratories, in some cases with lower accuracy of determination (total element contents with an accuracy of 100 mg kg−1).

(3) The FMI typological database (FMI_typological db) has been sampled over the years by many teams according to a standardized methodology by genetic horizons and analyzed by different methods. It contains a variable number of parameters determined in individual samples.

(4) The database of the Forestry and Game Management Research Institute (FGMRI db), contains a very homogeneous spectrum of sampling carried out within the ICP Forests programme and BioSoil project (de Vos and Cools 2011, Lorenz and Becher 2012; Šrámek et al. 2013), as well as systematic sampling for the preparation and control of liming and fertilization projects in forest stands and other types of soil surveys carried out within research projects. The sampling was carried out by a very close research team, typically for fixed depths, following the methodology of the ICP Forests programme (Cools and De Vos 2016). The full range of soil chemistry parameters was analyzed by the FGMRI testing laboratory, which has developed a high standard of quality assurance and quality control, including regular participation in national and international ring tests.

2.2 Data proceeding

The collected data were verified in the first step by numerical and graphical analyses. In all the surveys, the upper organic layer (humus layer OFH) was sampled separately; the depth of sampled mineral soil, however, differs. Thus, the verified data had been converted to three harmonized layers using a weighted average: (i) upper organic layer (FH), (ii) upper mineral soil in depth from 0 to 30 cm (M30), and (iii) deeper mineral layer in depths from 30 to 80 cm (M38).

2.3 Comparison of analytical methods

Combining the results of different surveys was complicated because different methods of chemical analysis (leachate) were used. To make the database more homogeneous the original data were recalculated in cases where the sound correlation between the two used methods was disposable (e.g. Záhornadská 2002, Čechmánková et al. 2021). For exchangeable elements, the leachate in BaCl2, for extractable elements Aqua regia leachate is used as a standard in the database. Used transfer functions with corresponding coefficient of determination are introduced in Table 2. Nevertheless, the original analytical method is stated in the database to make users able to re-convert data to their original values, if needed. For elements where no suitable transfer function was known to authors, the data were not recalculated and were presented according to the individual analytic method. Such a layout is not the ideal one but it allows the users to work with recalculated data, and original data—e.g., only with a part with identical analytical methods—or use their own procedure for data homogenization.

Table 2 Conversion of elemental concentrations in forest soils

2.4 Environmental and geographical parameters

Environmental and geographical data were obtained from different digital maps and databases which are described in the metadata paragraph below (Borůvka et al. 2022).

3 Access to the data and metadata description

The dataset is available at https://doi.org/10.5281/zenodo.10608814. The associated metadata is available at https://metadata-afs.nancy.inra.fr/geonetwork/srv/fre/catalog.search#/metadata/38f24573-3c0d-469a-a66a-7060ce082155.

The data are contained in the Agregated_Soil_Database.xlsx file (Neudertová Hellebrandová et al. 2023). The first column (Sampling_Site_ID) contains the sampling site identifier, obtained from the source databases, the second column (Data_Source) informs about the original data provider (FGMRI db: data from the Forestry and Game Management Research Institute database; CISTA db: data from the Central Institute for Supervising and Testing in Agriculture database; FMI_typological db: data from the Forest Management Institute typological database; FMI_NIL 2 db: data from the Forest Management Institute National Forest Inventory database) and the third column (Layer) identifies the soil layer (FH: upper organic soil horizon; M03: upper mineral soil layer 0–30 cm; M38: deeper mineral soil layer 30–80 cm). Columns Date and Year contain information about the sampling date (if it is available) and year.

Sampling site characteristics include information about terrain, climate, forest site and soil class.

Geographical coordinates (Lat, Long) were measured during sampling. The altitude (Altitude) was extracted from the raster DTM 4G (Digital Terrain Model of the Czech Republic 4th Generation) with a resolution of 5 m (Brázdil 2010) and the steepness (Slope) and the orientation to cardinal directions (Aspect) were calculated using Surface Toolbox in ArcMap 10.5.

Climate data (Aver_Temp: mean annual temperature and Annual_Prec: average annual precipitation for 2000–2020) were obtained from the database WorldClim.org at a resolution of 1 km (Fick & Hijmans 2017).

The forest sites were characterized by land cover categories (Deciduous, Mixed and Coniferous forest) obtained from the database CORINE Land Cover 2018 (EEA 2018) at a resolution of 100 m.

Other forest site characteristics (FVZ: forest vegetation zone; ES: ecological series; EC: edaphic category) were obtained from the map of forest typology at a scale of 1:10,000 (ÚHÚL 2020a, b a). This map is derived from the “Typological System of Forest Management Institute”, based on ecological environmental factors (Viewegh et al. 2003).

Structured stand type (SST), which provides information about groups of tree species and the characteristics of their mixing in the stand, was obtained from the map of structured stand types at a scale of 1:10,000 (ÚHÚL 2020a, b b). For example, the code D1P3P7 provides information that the stand consists of Norway spruce (Picea abies) (Group 1), Scots pine (Pinus silvestris) (Group 3) and European ash (Fraxinus excelsior) and/or Narrow-leaved ash (Fraxinus angustifolia) (Group 7). Norway spruce has a 70–89.9% representation (D), Scots pine and ash have a 10–29.9% representation (P). The creation of codes for SST is described in Tables 3 and 4.

Table 3 Structured stand type (SST)—tree groups
Table 4 Structured stand type (SST)—nature of mixing

The last forest site characteristic is grouped soil class (GSC), extracted from the Czech soil information system PUGIS at the scale 1:250,000 (Kozák et al. 1996).

The results of the chemical analyses were recorded using the following variables: the pH (pH_H2O, pH_exch), exchangeable calcium (Ca_exch), magnesium (Mg_exch) and potassium (K_exch), the available phosphorus (P_pa), total calcium (Ca_tot), magnesium (Mg_tot), potassium (K_tot) and phosphorus (P_tot), the total content of carbon (C_tot) and nitrogen (N_tot), the cation exchange capacity (CEC) and the base saturation (BS). Each of these variables also has an associated variable "method" (_met) that provides information about the method of determination. The list of methods used in different source databases is mentioned See Table 5 in Appendix.

4 Technical validation

Basic data analysis identified the following sources of error:

  1. 1)

    Errors in the sampling:

Incorrect determination of the boundary between the overlying organic horizon and the organomineral/mineral soil layer.

Missing or overlap** of some sampling depths in the case of sampling by genetic horizons.

Confusion of samples during collection or transport.

  1. 2)

    Errors in laboratory processing:

Insufficient homogenization of the sample before analysis.

Errors in the compilation of data into output reports (e.g., confusion of columns).

Errors in the manual transcription of measurement results for older samples.

  1. 3)

    Errors in the compilation of databases:

Various combinations of formal errors, in particular, copying data into the wrong fields, incorrectly entered units in the case of databases containing different data sources, etc.

To verify the accuracy of the data, minimum and maximum thresholds for individual parameters, as well as limits based on the relationships between selected parameters, were tested, e.g. pH(KCl):pH (H2O) (Fig. 1), Ctot:Ntot, pH (H2O):Ca exch; Ca exch: Ca tot, P_pa: pH (H2O) (Fig. 2) etc. (Šrámek et al. 2020). As a base for such a relationship, the results of the BioSoil project were used, as particular attention was paid to quality assurance and quality control of sampling and analysis within this project.

Fig. 1
figure 1

Check of data quality: relation of active pH(H2O) and exchangeable pH(KCl) soil reaction in the upper mineral soil 0–30 cm. Wrong values (errors) originating most probably from incorrect pH calculation (pH(H2O) ≠ pH(KCl) + 0.5) are marked by a red ellipse (Šrámek et al. 2020)

Fig. 2
figure 2

Check of data quality: Relation between pH(H2O) and available phosphorus in the upper mineral soil layer 0–30 cm. (Šrámek et al. 2020)

Most of the limit values were set at two levels. The first was a ‘warning’ (suspicious values which the provider of data was prompted to check). Depending on the result of the check, the value was left in the database, corrected or removed. The second value was ‘error’ (strongly suspicious values). Such values were removed from the database unless the provider of data did correct them.

The limit (control) values for the upper organic soil horizon (FH), the upper mineral soil layer (M03) and the deeper mineral soil layer (M38) are given See Tables 6, 7 and 8 in Appendix.

5 Reuse potential and limits

This is a unique dataset providing valuable information on the state of forest soils in the Czech Republic.

The data have already been successfully used in previous studies focused on the assessment of forest soil quality in terms of nutrition and adaptation of forest stands (Vašát et al. 2021a, b, Komprdová et al. 2021, Borůvka et al. 2020a, Borůvka et al. 2020b, Novotný et al. 2020). Recently data were used by different partners in four scientific projects focusing on soil carbon sequestration, quality of NATURA 2000 sites, landscape quality and biodiversity. There is also a vision to use this dataset as a base for the next forest soil survey in the Czech Republic, especially as a tool for evaluating change and trends in soil chemistry. Releasing the data in open source publication will increase the potential to use them also in broader international studies.

The limits of the database are clearly associated with a limited number of parameters which are restricted to basic chemical properties only. The next challenge for authors will be to complete the dataset with reliable data for risk elements (e.g. cadmium, lead) and physical soil properties if they will be available in sufficient quality.