1 Introduction

A sound mechanical characterisation of both intact rock and discontinuities lies behind any reliable model of rock mass behaviour and subsequent rock engineering designs. Even though the discontinuous nature of rock masses has clearly marked the field of rock engineering from the 1970s decade and onwards, the study of the strength, deformability and post-failure of intact rock material has also been a main concern since the early days of rock mechanics.

The predominant presence of compressive stresses in many rock mechanics applications, the relative simplicity of this form of loading in laboratory testing, and the fact that rock compressive strength is one of the main inputs in widely used engineering rock mass classification systems, have probably tipped the balance towards the development of many studies focussed on compression over the 1960s (Cook 1963; Coates 1964; Cook and Hodgson 1965; Deere and Miller 1966).

Nevertheless, tensile strength (read uniaxial tensile strength) has been identified as a key property for a correct design of underground openings in rock masses, especially when determining their critical span (Diederichs and Kaiser 1999) and in the excavation-induced tensile-failure of surrounding rock in underground works subjected to high in situ stresses (Vazaios et al. 2019; Liang et al. 2020). It also controls the mechanism of flexural toppling in rock slopes (Adhikary et al. 1997; Alzo’ubi et al. 2010) and represents a relevant part of most strength envelopes (Hoek 2023). Tensile strength is also of great relevance in the design of foundations in rock masses, according to earthquake-resistance standards (Hashiba et al. 2017).

Despite the already referred topics, industry still tends to rely on estimating tensile strength (Cai 2009; Hoek and Martin 2014; Perras and Diederichs 2014) rather than measuring it. This is probably due to the existence of relationships between important compressive-test-related parameters (i.e. crack initiation) and tensile strength. Perras and Diederichs (2014) stressed, in accordance with former authors (Hoek 1964; Hawkes et al. 1973; Ramana and Sarma 1987), that these should be used as first-pass estimates in preliminary design and highly recommended tensile strength determination should be from uniaxial tensile tests carried out in a laboratory.

However, a common practice in rock engineering projects all over the world is to indirectly estimate the uniaxial tensile strength by means of the Brazilian tests, since they are somewhat cheaper, simpler to conduct and easily accessible. The obtained results and failure patterns can notably differ from those of the direct tensile strength (DTS) tests as shown by multiple studies (Wijk et al. 1978; Gorski et al. 2007; Liu et al. 2014; Perras and Diederichs 2014).

The determination of rock tensile strength by direct tensile tests in the laboratory has somewhat been overlooked in the rock engineering field, yet there seems to be still controversy in the scientific community about the most rigorous method for this purpose. The difficulty in ensuring purely axial loading (without applying moment or torsion) and certain complexities involving sample preparation could be behind this trend. A variety of alternative methods for rock tensile strength determination have been proposed in the literature (Wuerker 1955; Brace 1964; Hoek 1964; Hawkes and Mellor 1970; Schock and Louis 1982; Xu et al. 1988; Gorski and Yu 1996; Fuenkajorn and Klanphumeesri 2010; Liu et al. 2021). The deformability of rocks under tensile loads has received even less attention, with only a few studies conducted on this topic so far (Schock and Louis 1982; Liao et al. 1997; Gercek 2007). The deformability in tension is often assumed similar to that for compressive loads, despite studies indicating that Young’s modulus in tension is generally lower than at compression for a given rock (Schock and Louis 1982; Fuenkajorn and Klanphumeesri 2010; Muñiz-Menéndez and Pérez-Rey 2023).

The ASTM first published a standardised methodology (ASTM 1975), which was recently updated (ASTM 2020) for direct tensile strength tests on intact rock core specimens. This was followed by the publication of the ISRM ‘Suggested methods for determining tensile strength of rock materials’ (ISRM 1978), where a specific section related to direct tensile strength tests on cylindrical rock cores was provided alongside the Brazilian tests yielding an indirect tensile strength also called splitting strength. Both the ASTM and ISRM methodologies for the DTS test resort to cylindrical rock cores to determine tensile strength by butt-jointing of the rock specimen to a pair of metal caps connected to the pulling system via a linkage system ensuring a pure tensile load without inducing bending or torsional stresses. This method does not require any geometric modification to the cylindrical rock core except adapting its length, nor the use of grips or holders to join the rock core to the pulling system. These features had also been observed by former researchers (Hawkes and Mellor 1970) and seem to be rather useful and appropriate from an engineering perspective. None of the aforementioned two referred methods provided guidelines for determining the deformability of rock materials under tensile loads. ASTM (2020) also requires a failure away from the loading platens for the test to be valid.

Considering all the already referred studies and observations, the overall objective of this work was to assess whether consistent and reproducible tensile strength, and associated deformability results could be obtained using the current ISRM methodologies for DTS tests, for the setup and execution of the test itself, and Uniaxial Compressive Strength (UCS) tests for the deformation measurement (Bieniawski and Bernede 1979a, b).

To achieve this goal, a benchmark experiment was designed, where four laboratories carried out a series of DTS tests on two rock types with different lithologies (granite and sandstone), including deformability measurements, under certain controlled conditions. One of the main objectives of this experiment was to identify features that affected the results and the potential issues to be addressed leading to a rigorous update of the current methodologies.

2 Materials and Methods

2.1 Rocks

Two rocks were selected for this experiment, namely a granite from the Northwest of Spain commercially named as Blanco Mera granite, and a sandstone known as Cotta sandstone, quarried near the city of Dresden (Saxony, Germany). Photos of these rocks, with associated photomicrographs of thin sections and tensile-failure surfaces are shown in Fig. 1.

Fig. 1.
figure 1

a Blanco Mera granite specimen; b corresponding photomicrograph of a thin section and c detailed tensile-failure surface photograph; d Cotta sandstone specimen; e corresponding photomicrograph of a thin section and f detailed tensile-failure surface photograph

The former rock corresponds to a white-coloured granite, with a medium-to-coarse-grained texture with grain sizes ranging from 1 to 6 mm, and a mean bulk density of 2.59 g/cm3. In terms of UCS and elasticity parameters, its average UCS is 110 MPa, Ec = 52.4 GPa, and νc = 0.28, measured on 54-mm diameter specimens (Alejano et al. 2021).

The second adopted rock type corresponds to a Cretaceous fine-grained, siliceous quartz arenite (> 90% quartz) with K-feldspar, kaolinite, illite and glauconite as accessory constituents. Typically, Cotta sandstone is grey to yellowish–brownish colour showing clay-bearing, organic and ferritic flakes parallel to bedding. It presents a mean bulk density of 2.06 g/cm3 (Baumgarten 2015). Regarding the strength and elasticity parameters, when loaded in compression perpendicularly to the bedding (transversely isotropic material), this rock presents a UCS90 = 31.2 MPa, Ec90 = 8.82 GPa and νc90 = 0.15; when loaded parallelly, it shows a UCS0 = 26.2 MPa, Ec0 = 10.8 GPa and νc0 = 0.18. For the current experimental programme, all the sandstone specimens were cored with bedding parallel to the loading axis (length), therefore β = 0°.

2.2 Experimental Programme

In this multi-laboratory experiment, the participating institutions—Laboratorio de Geotecnia–CEDEX in Spain (Laboratory A); TU Bergakademie Freiberg in Germany (Laboratory B); the Civil Engineering Department at the Lassonde School of Engineering, York University in Canada (Laboratory C) and RISE, Research Institutes of Sweden (Laboratory D)—were asked to perform the same series of direct tensile strength tests on cylindrical specimens of the two selected rocks. Two of the participating institutions (Laboratorio de Geotecnia–CEDEX in Spain, and TU Bergakademie Freiberg in Germany) prepared 4 sets of 7 cylindrical specimens of a given rock each. Each group of rock specimens to be sent was created by random selection from the total sample set and then delivered to the other laboratories involved in the study. Therefore, each institution had initially available 14 specimens (7 per rock type) to carry out, at least, 5 tests per material received. The length-to-diameter ratio of these specimens was about 2.7, in agreement with the current recommendations made by the ISRM (1978). The approximate specimen dimensions were d = 50 mm and L = 135 mm.

The following general procedure was adopted by all laboratories:

  1. 1.

    Visual inspection of the specimens and, if needed, make additional preparatory steps to reshape or grind damaged ends or surfaces that may be caused by ship**. Measure specimen dimensions and check tolerances as indicated by ISRM (1978),

  2. 2.

    Dry specimens until constant mass in an oven at a temperature of 105 °C prior to the sample preparation to avoid a possible influence of the water content,

  3. 3.

    Glue end caps on the ends of the specimens and (if selected as deformation measurement device) the four strain gauges (two axial and two diametral) at the mid-height of the rock specimen, ensuring they are equally spaced and alternating between axial and diametral ones if stacked gauges are not used,

  4. 4.

    Let the specimens with the applied resin cure for, at least, the time recommended by the supplier in order to reach sufficient strength,

  5. 5.

    Start the tensile test limiting its duration until failure within the range of 5 to 15 min, by kee** the same loading or displacement rate for all test series within the same rock group.

The experiments were conducted in a manner allowing the recording of tensile force, axial and circumferential deformation/displacement throughout the entire duration of the test. All laboratories carried out, at least, 5 tests per rock (Labs A and B performed 7 for each rock). At the end of the experiment, a total of 51 test results were collected, with 49 being considered valid. A valid test was determined to be one where the failure crack developed within the rock material only (i.e. not intersecting the platen surface) and both (at least) load and strain/displacement data were recorded throughout the test (i.e. no failed strain gauges during testing).

The elasticity parameters (Et, νt) were determined from average axial and diametral stress–strain curves as derived from the tests. The applied procedure is outlined as follows:

  1. 1.

    Determine the ultimate value of direct tensile strength (DTS) for a given test through dividing the maximum tensile load by the initial cross-sectional area (A0),

  2. 2.

    Calculate the 50% of the DTS,

  3. 3.

    Compute the average of the axial strain/displacement recordings taken from the two curves obtained, if multiple gauges have been used,

  4. 4.

    Compute the average of the diametral strain/displacement recordings taken from the two curves obtained, if multiple gauges have been used,

  5. 5.

    For all the curves obtained as listed in 3. and 4., select the point corresponding to the 50% of the DTS. Select, for all the curves, the group of points within the range of 50% DTS ± 5% and fit a regression line to them.

  6. 6.

    Compute the Young’s modulus (Et) for a given test, which corresponds to the slope of the line fitted as indicated in 5. (axial strain–stress curve). Note that, in the case where any unload–reload cycles were performed, the stress at which the measurement was taken was intentionally shifted slightly outside of these cycles in order to mitigate their influence on the deformability,

  7. 7.

    Compute the Poisson’s ratio (νt) for a given test, as the negative ratio of the slopes of the lines (radial to axial) fitted as indicated in point 5.

2.3 DTS Test Machines

Four machines were employed in the experiment, as shown in Fig. 2. The one available at the Lab A was a servo-hydraulic 500-kN testing apparatus designed and developed by Mecánica Cientı́fica (Spain). The testing apparatus allows, either the application of compressive and tensile loads with a double-action loading piston. In the tensile mode, two metal caps able to hold cylindrical specimens are connected, through two roller chains arranged perpendicularly one to another, to the system (Fig. 2a). The loading rate was set to 0.02 kN/s for granite specimens and to 0.01 kN/s for sandstone specimens.

Fig. 2
figure 2

Machines used for DTS tests in this experiment: a Lab A; b Lab B; c Lab C and d Lab D

Lab B had available a multipurpose testing machine TIRAtest28500 with a capacity of 500 kN for compressive loading, and 100 kN for tensile loading. The cylindrical metal caps where the cylindrical rock specimens are placed, can be connected to the machine’s crosshead via threaded rods with ball joints (Fig. 2b). Lab B performed the DTS tests with displacement control at a rate of 0.05 mm/min for both rocks.

In the case of Lab C, an MTS Criterion Model 64.605 Static-Hydraulic Universal Testing System with a 600 kN rated force capacity was used. The rods for holding the rock specimens were directly gripped by the testing machine (Fig. 2c) without any flexible connexion, with attention given to ensuring end faces of the plattens and specimens were square with the loading direction and planar. Lab C also performed displacement-controlled tests, with the sandstone specimens being loaded at 0.02 mm/min and the granite specimens being loaded at 0.05 mm/min.

The test equipment used by Lab D was an electromechanical load frame, MTS Sintech 20/D, with a capacity of 100 kN. In this case, the load is also applied using two metallic roller chains, connecting the hydraulic grips and the end caps attached to the specimen. The chains were rotated 90° to each other to try to exert a pure tensile force, reducing any undesired moment in the specimen. Figure 2d shows the load setup. The lower roller chain is concealed by the black rectangular part which catches the lower part of the specimen when it breaks and to prevent the specimen and equipment from being damaged. Displacement-controlled tests were also performed in Lab D, with an initial displacement rate allowing a time to failure (TTF) of 5–15 min.

2.4 Bonding Materials and Deformation Measuring Devices

The direct tensile strength tests performed in these experiments required bonding the upper and lower parts of each specimen onto metal caps, through which the uniaxial tensile load was transferred to the rock. Each laboratory selected their own glueing/cementing material. In a similar manner, the selection of axial and radial strain/displacement measuring devices (displacement transducers, strain gauges…) was done independently by each participating institution. Detailed information about these aspects is presented in Table 1.

Table 1 Information regarding the displacement/deformation measuring devices and bonding materials used in this benchmark experiment

3 Test Results

3.1 Raw Results

All the rock specimens tested in this experiment are presented, after failure, in Fig. 3. In general terms, tensile-failure occurred within the rock material in all samples except for two cases corresponding to granite specimens 42 and 62 (Fig. 3e). The latter were considered invalid tests since the tensile crack intersected the platen rather than passing entirely through the specimen. It has also been noticed that, in several cases, the fractures occurred closer to one of the caps and not at the middle point of the specimen. As long as the fracture location is not influenced by the caps, the test result is valid. It has also been observed that the failure planes in the valid tests were equally distributed along the specimen length.

Fig. 3
figure 3

Rock specimens after failure (Blanco Mera granite on the left side, and Cotta sandstone on the right side): a, b specimens tested at Lab. A; c, d specimens tested at Lab. B; e, f specimens tested at Lab. C and g, h specimens tested at Lab. D

From the 49 remaining valid laboratory DTS tests carried out, three parameters were extracted, namely: the ultimate tensile strength (DTS), the Young’s modulus (Et) and the Poisson’s ratio (νt) in tension, in the way explained in Sect. 2.2. The single results for the two rocks studied are presented in Table 9 (Blanco Mera granite) and Table 10 (Cotta sandstone) in the Appendix.

Stress–strain curves from the experiments for all tested specimens and all four participant laboratories are presented in Fig. 4 (Blanco Mera granite) and in Fig. 5 (Cotta sandstone). The average strain of the two strain gauges in each direction (axial and circumferential) is shown in the Figures. The strains were zeroed after the specimens were mounted in the testing machine and with a small applied pre-stress. In each of the graphs, there is one curve plotted with a thicker line, considered representative of the tensile stress–strain behaviour of the studied rocks as observed in the laboratories.

Fig. 4
figure 4

Stress–strain curves from DTS tests carried out in Blanco Mera granite: a Lab. A; b Lab. B; c Lab. C, and d Lab. D (the specimen corresponding to the highlighted curve is indicated in brackets in the legend, with ‘G’ denoting granite)

Fig. 5
figure 5

Stress–strain curves from DTS tests carried out in Cotta sandstone: a Lab. A; b Lab. B; c Lab. C, and d Lab. D (the specimen corresponding to the highlighted curve is indicated in brackets in the legend, with ‘S’ denoting sandstone)

3.2 Preliminary Observations

The stress–strain results show different behaviours between the four laboratories. The results from Laboratory A (Figs. 4a and 5a) show a notable variation in the stress–strain results between the tested specimens. Some strain measurements also indicate that there has been a strain measurement problem since they do not show a smooth curve in the elastic region, but rather a not physically correct response. The results from Laboratory B (Figs. 4b and 5b) show stress–strain results with a small variation between the tested specimens. The loading and unloading cycles at low stress induce some strain ratchetting. The results from Laboratory C (Figs. 4c and 5c) show stress–strain results with a variation between the tested specimens. Notably there is a large variation in the strength values. It should be kept in mind that significant deformations and strains may have been induced in the specimens when the grips are closed in the machine due to the specimen holder design with a rigid linkage system. The initial strain state in the specimens is, however, unknown in the beginning of the test and the stress–strain curves in Figs. 4c and 5c are only showing the relative change of the strains during the tests. The results from Laboratory D (Figs. 4d and 5d) show stress–strain results with a small variation between the tested specimens.

4 Statistical Analysis

4.1 General Results

The main datasets including all DTS (Fig. 6a), Et (Fig. 6b) and νt (Fig. 6c) values for each studied rock, are represented in terms of boxplots, which provide a summarised overview of the distribution of data, including the median (horizontal black line), range or extension expressed through the whiskers, corresponding to 1.5 × IQR (Inter-Quartile Range) and those values lying out of the range (indicated by black crosses) of each dataset.

Fig. 6
figure 6

Boxplot representations of the general datasets for DTS (a), Et (b) and νt (c). Blanco Mera granite is presented in blue colour, and Cotta sandstone in grey

The mean DTS value for Blanco Mera granite is 5.64 MPa (Fig. 6a), slightly lower than the median value (5.99 MPa). The standard deviation of this dataset is 1.22 MPa, and the standard error (SE) is 0.25 MPa. For the case of Cotta sandstone, a mean DTS value equal to 1.70 MPa was obtained and a median equal to 1.91 MPa. The standard deviation of this dataset is 0.52 MPa, and the SE = 0.10 MPa. In terms of variability, a more widespread dataset can be observed for Blanco Mera granite results, presenting negative skewness, as a consequence of some relatively low values collected from Lab C. The Cotta sandstone dataset is more or less symmetric, even though there are more abnormal values observed.

In the case of Young’s moduli Et (Fig. 6b) for Blanco Mera granite, a mean value of 23.36 GPa was calculated, almost similar to the median value (23.43 GPa). The standard deviation of this dataset is 6.59 GPa, and the standard error equals 1.34 GPa. Regarding the Cotta sandstone dataset, an average Et value equal to 7.95 GPa was obtained and a slightly lower median value (7.72 GPa). Standard deviation and standard error are lower than those corresponding to the Blanco Mera granite, being 2.55 GPa and 0.51 GPa for the Cotta sandstone dataset, correspondingly. Both datasets are more or less symmetric, even though the results corresponding to the Blanco Mera granite are more disperse.

The Poisson’s ratio results were also analysed. The mean value for the Blanco Mera granite dataset was equal to 0.11 and the median value equal to 0.10. The standard deviation of this dataset is 0.05, and the standard error equals 0.01. The Cotta sandstone dataset shows a mean value of 0.15, and a median equal to 0.13. The standard deviation is 0.08, and the standard error equal to 0.02. In this case, both datasets are somewhat right-skewed, with the Cotta sandstone results being more disperse.

Broadly speaking, the experimental results showed that Blanco Mera granite presents higher DTS values than the Cotta sandstone. More particularly, when mean DTS results are compared with their reported UCS values, the UCS/DTS and UCS0/DTS ratios for Blanco Mera granite and Cotta sandstone equal to 19.7 and 15.4, respectively.

Relevant differences between the elasticity parameters for the two rocks have been observed in such a manner that the Cotta sandstone is much more deformable than Blanco Mera granite under tensile loads. The deformability behaviour was shown to be different between compressive and tensile loading, evidenced by Ec/Et and Ec0/Et ratios for mean values of the Youngs’ moduli, equal to 2.24 and 1.36 for Blanco Mera granite and Cotta sandstone, respectively.

The Poisson’s ratios in tension are approximately similar for the two rocks, even though particularly skewed datasets have been observed. The relation νc/νt equals to 2.55 for Blanco Mera granite, being for Cotta sandstone νc0/νt = 1.2.

4.2 Comparison of Results per Laboratory

The mean values and the standard error of the mean values of DTS, Et and νt for each studied rock and participant laboratory are presented in Table 2. Overall, the mean DTS values for Blanco Mera granite retrieved from the Labs A, B and D are greater (and more similar to each other) than those obtained from Lab C. The same occurs in general terms for Cotta sandstone results.

Table 2 Mean values and standard error of the mean for the three determined parameters according to each participating laboratory and rock

For the Young’s modulus, Labs A, C, and D are similar and larger than Lab B for the Blanco Mera granite. However, for the Cotta sandstone, Labs A and B have similar values to each other and Labs C and D have similar values to each other, with both groups being outside the range of standard error of the others. With respect to the Poisson’s ratio, two lab groups of values are observed for both rock types, with Labs C and B having the closest values for the Blanco Mera granite and Labs B and D having the same values for the Cotta sandstone. Overall, the largest standard errors of the mean values were observed for results coming from Lab C, reaching values in the range of 2 to 5 times greater than the lowest ones, as derived from the rest of the laboratories.

The differences observed with the DTS values coming from different laboratories are also evident when representing the data with boxplots (Fig. 7).

Fig. 7
figure 7

DTS results represented in terms of boxplots for Blanco Mera granite (a) and Cotta sandstone (b)

In this case, average and median DTS value determined at Lab C display particularly lower values than the rest for the Blanco Mera granite, although the variability of Lab C values is similar to Lab A. Variability is the largest for Lab C for Cotta sandstone and again for this Lab, average and median values are lower than the other laboratories.

Regarding the Young’s moduli in tension, represented in terms of boxplots in Fig. 8, more similar results between the laboratories have generally been identified than the DTS values. Wider variability for Labs A and C were found for the Young’s moduli in tension of the Blanco Mera granite, whilst all laboratories reported much narrower ranges of moduli for the Cotta sandstone. In this case, the lowest results were determined to be from Lab B (Blanco Mera granite) and Labs A and B (Cotta sandstone).

Fig. 8
figure 8

Young’s moduli in tension (Et) results, represented in terms of boxplots for Blanco Mera granite (a) and Cotta sandstone (b)

The calculated Poisson’s ratios in tension were, for Blanco Mera granite, similar for the case of Laboratories A, C and D, showing relatively low values with certain variability (Fig. 9).

Fig. 9
figure 9

Poisson’s ratio in tension (νt) results represented in terms of boxplots for Blanco Mera granite (a) and Cotta sandstone (b)

It has to be noted that only one value could be registered from Lab B, which differs significantly. The results corresponding to Cotta sandstone were, in general, somewhat more variable than those obtained from Blanco Mera granite. Laboratories A and C showed higher, although more variable results, compared to those coming from Laboratories B and D.

4.3 Variability Analysis of the Results

As derived from the boxplot representations shown in Sect. 3.2.2 (Figs. 7, 8 and 9), significant differences on the datasets representative of each laboratory are visually evident, especially for DTS values. In order to enhance the understanding of results and variability, some statistical comparisons were implemented.

Before selecting the most suitable statistical method for comparing the results, it is essential to assess the fundamental characteristics of the datasets, specifically normality and homoscedasticity (equality of variances), as these assessments are necessary for the correct selection of the statistical methods employed.

First, the normality of the datasets (gathering the DTS, Et and νt results according to each rock) was studied. The Shapiro–Wilk test (Shapiro and Wilk 1965) is a well-known, robust method to assess the ‘normality’ of a dataset, in a way that it allows rejecting a null hypothesis—stating the data to be normally distributed—when the p-value obtained is less than the significance level (α), typically 0.05. The test is useful for the selection of appropriate central-tendency estimators, like mean or median. The results obtained from this test when applied to the datasets (DTS, Et and νt) according to each rock type are presented in Table 3. In general terms, the datasets tend to be normal, with some exceptions mainly for Blanco Mera granite. These results are in line with the boxplots presented in Fig. 6.

Table 3 Performance of the Shapiro–Wilk test for normality assessment (p-value indicated in brackets, holding normality for p > 0.05) for each studied dataset, grouped according to the two rock types studied

After the normality analysis, the homoscedasticity amongst the groups was studied by means of the Levenes’ Test (Levene 1960). The results for these tests are presented in Table 4.

Table 4 Performance of the Levene’s test for homoscedasticity (p-value indicated in brackets, holding homoscedasticity for p > 0.05) for each studied dataset, grouped according to the two rock types studied

The most common methods for comparing groups (usually comparison of mean values) such as the Analysis of Variance (ANOVA) (Fisher 1919, 2019) are only suitable when analysing groups with a normal distribution and homoscedasticity. This is more pronounced when, as is the case, the number of data in each group is small. Given this, it was decided to use the Kruskal–Wallis method (Kruskal and Wallis 1952). This is a robust and non-parametric method for median comparison, not requiring the normality conditions of the analysed population to be satisfied. This method is robust to small sample sizes such as those discussed in this paper. The method was used in all cases as it was equally valid regardless the normality and the homoscedasticity assumptions are satisfied. To strengthen the analysis, another robust method, called Wilcox’s method (Wilcox 2013) was also used to make this comparison. This is a variation of Welch's t test (Welch 1947) in which mean and variance are replaced by a trimmed mean (tr = 0.2 in this case) and Winsorized variance.

With the above-mentioned statistical tests, the results obtained by the different laboratories can be formally compared with each other. The level of significance used in this analysis is, as used above, α = 0.05. The results are presented in Table 4, for DTS values determined from the two rocks.

The values presented in Table 5 show that, for Blanco Mera granite, the null hypothesis is not fulfilled so the DTS test results from the four laboratories differ. Something different occurs for Cotta sandstone DTS values, where the comparison yielded a p-value greater than 0.05, meaning that results can be considered similar, in statistical terms.

Table 5 Performance of comparison tests for DTS results amongst laboratories

The same procedure as for the DTS analyses was adopted for the results corresponding to the Young’s moduli in tension. The assessment of datasets in terms of similarity of the mean Et values is presented in Table 6. Regarding the Young’s moduli in tension, and according to Table 6, it can be observed that neither for Blanco Mera results nor for Cotta sandstone results the null hypothesis is satisfied, which means that the four laboratories differ and cannot be considered to have determined similar values.

Table 6 Performance of comparison tests for Et results amongst laboratories

The Poisson’s ratios in tension were also assessed through the application of the same tests, and the results of this analysis is presented in Table 7. Note the comparison involving Laboratory B for Blanco Mera granite has been omitted due to the existence of only a single valid νt value in this dataset. According to the results presented in Table 7, the tests for comparing the Poisson’s ratios in tension show differences between the groups analysed, as in the majority of the cases discussed above.

Table 7 Performance of comparison tests for νt results amongst laboratories

5 Discussion

In the present study, 49 valid DTS tests with deformability measurements were carried out on two rocks (granite and sandstone) by four different laboratories from Europe (Germany, Spain and Sweden) and Canada. Each laboratory used a different testing apparatus and setup.

Tensile stress–strain behaviour and strength obtained from these tests have been investigated, as well as any potential effect on how the tests were conducted in various laboratories, with different load transferring systems, machines and measuring devices. Considering these scenarios, one of the main sources of variability expected for this experiment corresponds to the heterogeneous nature of the rock material. This can lead to a bias in the statistical populations of the specimens sent to the different laboratories, affecting also the variance within the datasets. Other sources are the preparation and delivery of the rock specimens to the laboratories, the production of datasets from laboratory tests and the data evaluation after all the results are gathered. This work focussed on the two last sources of variability. Even though the number of participating laboratories may be limited, the number of specimens tested is statistically appropriate for the observations herein derived.

There were 2 invalid test results since the failure plane included the interface between the specimen and the end cap. The majority (49 tests) was valid, which supports the feasibility of the DTS test. Obtaining failures intersecting with the interface between the specimen and the end cap are inevitable with the given design of end caps glued to the specimen.

The failure planes in the valid tests were equally distributed along the specimen length. The probability of failure should be equally possible along the specimen length due to the intrinsic distribution of heterogeneities in the internal structure, pre-existing weak planes and micro-fractures, damage at the outer surface of the sample (microcracks) due to drilling, from where the fracture starts. This observation regarding the failure surface position is also supported by other results, as provided by Wijk (1980). It is noted that in the ASTM (2020) standard, it is stressed that a valid test in isotropic rock should develop the failure surface at or near the mid-point of the specimen without further motivation.

The failure mechanism in tension is completely different from, e.g. at uniaxial compression tests, where ideally, axial cracks will form due to radial expansion during compression caused by the Poisson’s effect. Restriction of free expansion due to the interaction between specimen and the loading platens (friction) is then of significance and supresses failure in the specimen near the loading platens. Invalid tests, as discussed above, could be avoided by the design where the specimens have a waist (Gorski and Yu 1996), but to the cost of a more difficult preparation step of the specimens.

Concerning the tensile strength behaviour, a preliminary analysis of the data shows significant statistical differences amongst the DTS results retrieved from the different laboratories. In general terms, the DTS results can be acceptably similar for each studied lithology concerning the expected variance of other tests results from the literature for the same test suite on the same rock type—see, i.e. results in Perras and Diederichs (2014). Nevertheless, when analysing Blanco Mera granite mean results, they cannot be considered to belong to the same statistical population for the comparisons performed. DTS results tend to group in higher values, being the lower ones clustered sometimes below a statistical representative range of 1.5 × IQR, when analysing the complete dataset. This reflects possible experimental issues that may not allow obtaining repeatable DTS results.

Particularly low DTS results have been found for the dataset retrieved from Laboratory C, especially for Blanco Mera granite. This result can be associated with the use of a rigid system for transmitting the tensile load to the specimen. The observation confirms the need for flexible systems for the rock specimen–machine linkage to obtain a purely axial load of the specimen, without undesired bending or torsional stresses as is outlined in the ASTM and ISRM methods. Even though the samples were prepared plan-parallel by Laboratory C, there are still a number of things that can induce undesired bending of torsional stress that is difficult to avoid, e.g. imprecise lineation of the testing machine, lineation of the rods on the metal caps attached to the specimen and disturbance when the grips are closed.

The variation in the DTS results on the granite specimens from Laboratory A is significantly larger than for Laboratory B and D and with a lower mean value. Looking at the stress–strain results (Fig. 4a), the specimen with the lowest strength displays twice as high axial strain than the selected representative result (G-93) and even higher deviation for the circumferential strain. This indicates that the loading has not been purely axial and therefore may be the reason for a reduced failure stress.

In the case of Cotta sandstone, results showed to be statistically more similar, than those for Blanco Mera granite, even though not comparable from the general Welch’s test analyses.

Considering the recommendation of testing at least five specimens (n ≥ 5) per rock sample by the current ISRM (1978) methodology, the use of the Median as an estimator seems to be appropriate, in such a way that the DTS can be determined as the Median of the already referred 5 tests. For test suit with a larger number of samples (n >  > 5), a cut-off criterion to discard abnormally low values (mainly associated with experimental defects) can be applied. This would imply using a robust central-tendency estimator of the dataset (such as the Median), and the standard deviation (SD). Let X = {x1, x2, …,xn} be the dataset containing n DTS results. The proposed subset XT that would be selected for calculations, which should fulfil Eq. 1.

$${X}_{T}=\left\{{x}_{i}\in X:{x}_{i}\ge {\text{Median}}\left(X\right)-{\text{SD}}(X)\right\}$$
(1)

where XT = subset after removing abnormally low values. X = original dataset. xi = i-DTS result. SD = standard deviation of the dataset.

After this correction, the calculation of the Median(Xt) can be suggested as a good estimator of the DTS value for a given rock sample with several laboratory results.

If Eq. 1 is computed with the original results obtained in this study, the new calculated DTS values are improved in statistical terms, as it can be derived from Table 8.

Table 8 Descriptive statistical parameters for the original dataset and ‘trimmed’ dataset using Eq. 1

Equation 1 is intended to help in reducing the standard deviation of the dataset, thus reducing both the coefficient of variation and the differences between the average and the median values.

With reference to the Young’s moduli in tension, more similar results have been observed between each laboratory dataset, especially for Blanco Mera granite, even though not comparable in statistical terms according to Welch’s t test. The larger variability observed for datasets from Laboratories A and C could be associated with some limitations of the equipment (i.e. length of the roller chains in the linkage system close to the recommended limit (2 × D) for Lab A and a rigid system in Lab C) which could yield a specimen bending. Any bending is detrimental for the determination of the deformation modulus by two reasons since elastic stress–strain response is non-linear. First, the tensile-failure value will be reduced and the evaluation interval for the deformation modulus will be different as compared with another one having higher failure strength if the evaluation interval is determined as a percentage for the failure stress (50% of the DTS in this study). Second, the use of two strain gauges opposite of each other should cancel out any effect of bending if the response would be linear elastic, which is not the case. The laboratories A, C and D used foil strain gauges. The variation of Et in the results from laboratory D was small, so strain measurements should not be the cause of the higher variation found in the results from laboratory A and C. Laboratory B used clip-on extensometers with a gauge length of 100 mm for the axial strain determination. The results of tensile stiffness showed a small variability, but a lower value of the mean value than the other laboratories both for the granite and for the Cotta sandstone. The measurement setup is not complying with the ISRM method for deformability measurements in compression (Bieniawski and Bernede 1979a) as the gauges “should not encroach within D/2 of the specimen ends”, where D is the specimen diameter. However, as argued about the effect of metal caps on the failure position, violating the D/2 requirement with the amount as in this case (0.35 × D from the specimens ends), it should not infer on the results. It is worth noting that the deformations in a specimen are about 10–20 times smaller than in a uniaxial compression test. This makes the sensitivity for any disturbance of the extensometer setup, e.g. small micro-slip at the attachment points, more visible during the measurements.

The effect of certain anisotropy shown by Cotta sandstone, as well as the presence of clay-bearing, organic and ferritic flakes could affect the deformational behaviour, an aspect that should also be accounted for.

A proper selection of measuring devices may be in order. Overall, both employed measuring techniques (strain gauges and extensometers mounted directly on the sample) to monitor the axial deformation were found to work well.

Regarding the Poisson's ratios, it has been observed that they generally exhibit more variability when compared to the other parameters. Specifically, higher values (even higher than in previously measured in compression) have been documented for the Cotta sandstone for Laboratories A and C. The strain measurements in both Laboratories (A and C) were done using strain gauges with the different measuring position for each strain gauge. In case of bending, this configuration may amplify the variability of the Poisson’s ratios. Laboratory D used stacked gauges for which this issue is eliminated. In this sense, it has been observed that for relatively low radial strain values (as in the case of DTS tests and especially for the studied granite), the resolution of circumferential measuring chain-type transducers may not be sufficient to capture this behaviour. Moreover, the fact that the chain-type transducer is measuring by retracting could also introduce some resistance or hysteresis, something which would also explain the low readings.

For transversely isotropic rocks, different strain behaviour could be expected if a strain gauge is placed across or along a bedding plane when β is not equal to 0°. For radial strain measurements, the chain-type transducer will yield an averaged strain value since it measures around the whole specimen contour.

With respect to the statistical analyses, the number of data points collected for each sample in each laboratory was somewhat small (n ≤ 7). Some robust methods have been employed that can be used in groups with a small number of samples. In any case, the statistical analysis of small sample sizes could, in the worst case, increase the possibility of introducing ‘type II’ errors (that is, not rejecting the null hypothesis when it is actually false, also called ‘false negative’), something that would lead to conclude that there is no significant difference between the results obtained by the different laboratories. In view of the results, this does not seem to be relevant for this analysis since, in most cases, a significant difference has been found.

According to the boxplot representations evidencing quite variable and slightly lower DTS values for one laboratory, the effect of a rigid (non-flexible) loading system, as well as potential natural or ship**-induced fractures or other defects, may have also affected the results, although less likely to be the main cause in this case.

Findings suggest that, in general, greater care in specimen handling and ship** is required when DTS testing is envisioned, as additional induced defects could have a significantly greater influence on tensile tests when compared to compressive strength tests. This is reinforced by findings by Gorski and Yu (1996), who concluded that stress relief from cored samples at the AECL URL – 420 level must have induced micro-fractures that directly reduced the DTS values from those expected. The negative influence on the expected DTS caused by pre-existing cracks, but also by the heterogeneity of rock microstructure, was also reported by Jiang et al. (2022) through numerical modelling of DTS and BTS tests. They reported that the tensile strength and crack orientations change due to the increased microstructure heterogeneity.

The duration of each test was set prior to the experiment to be in the range of 5–15 min, by balancing the suggested times by ISRM (1978) and ASTM (2020). An inconsistency in the ISRM (1978) current method was detected since the recommended loading rates were found to be too high even for the lowest suggested time to failure (5 min.).

The analysis of a possible influence of the loading/displacement rates as adopted by each laboratory is out of the scope of this work, and should not be critical for our results according to the relatively low rates used. The authors, recognising the value of promoting more DTS testing, chose not to impose strict constraints on the participating laboratories to avoid discouraging future utilisation of the test. However, future research could investigate the influence in a more rigorous manner to remove the inconsistency in the current ISRM (1978) Suggested Method.

6 Conclusions

The present study intended to look at extending the current DTS methods by adding deformation measurements to determine elasticity parameters at tensile loading besides the DTS based on the current ISRM suggested method for DTS testing (ISRM 1978) and determination of elasticity parameters inspired by the ISRM (Bieniawski and Bernede 1979a, b). Sources for potential influence on results when carrying out DTS tests on two rocks (Blanco Mera granite and Cotta sandstone) through cylindrical cores in four different laboratories were assessed. Four testing machines, different deformation measuring devices and bonding materials were used for the tests. The current ISRM suggested method for DTS testing (ISRM 1978) inspired the main methodology herein presented, even though some aspects differ in some laboratories, which ultimately helped when confirming or detecting deficiencies.

According to the 51 (49 valid) tests carried out in this study, the DTS test with deformability measuring was found to be practically feasible, provided some considerations are taken. Considering test setup (devices) and strain measurement method together, the four laboratories used entirely different procedures. By this, the results highlighted various issues in an illustrating manner. The use of rigid linkage systems connecting the load frame with the rock specimen was found inappropriate for DTS tests and deviating from the given procedure in ISRM (1978) as well as ASTM (2020), as they affect a reliable determination of this parameter, producing premature specimen failure. This could also affect the integrity of the bonding materials, which have shown a good performance for almost all the tests. One laboratory used a linkage system that conceptually looked to be according ISRM (1978) and ASTM (2020), but could have provided tests with some bending of the specimens according to results. Besides a proper linkage system, it is stressed that the loading platens are centred to the specimens when they are glued together, such that a concentrical load can be applied.

Deformations and strains are in the order of 10–20 times smaller than in a uniaxial compression test and with reversed deformations, axial extension and circumferential contraction. A deformation measurement equipment that works for compression tests does not automatically become fit to be used in direct tensile tests. With appropriate measuring devices like strain gauges or clip-on extensometers, the determination of Young’s moduli in tension through axial strain measurements was proved to be effective. Comparing the results with the ones from compression tests is highlighting the bimodular behaviour (dependence on the loading mode: compressive or tensile) of the two rocks studied. For radial strain measuring intended for determining Poisson’s ratios in tension, circumferential measuring chain-type transducers were revealed to be not appropriate, being the performance of conventional strain gauges better. A possible way to ensure that the test setup and measurement work properly is to conduct tests on specimens with already known material properties, e.g. aluminium, to check that no bending or torsion stresses are induced before rock specimens with unknown properties are going to be tested.

This benchmark has also served to identify certain shortcomings in the existing ISRM Suggested Method for DTS tests (ISRM 1978) specifically regarding inadequate (abnormally high, in the range of 0.5–1.0 MPa/s) loading rates, inappropriate statistical analyses of results, the absence of recommendations for deformability measurements (both concerning the equipment and the determination of deformability parameters in tension: Et and νt), or those referring to the mechanical features of bonding materials. Having considered these aspects, the present work aims at hel** in potential improvements of such methodologies.

From the aforementioned analyses, it is clear that further studying the DTS test and its implications in determining the uniaxial tensile strength and deformability of rocks in tension are needed. Provided updated methodologies and new suggestions become developed, this test is the most suitable and feasible way to capture the true tensile behaviour of rock materials in common rock mechanics laboratories and our study shows that it can be achieved.