Comparing the performance of machine learning models for predicting the compressive strength of concrete

Loureiro, Arthur Afonso Bitencourt; Stefani, Ricardo

doi:10.1007/s44290-024-00022-w

Comparing the performance of machine learning models for predicting the compressive strength of concrete

Research
Open access
Published: 04 June 2024

Volume 1, article number 19, (2024)
Cite this article

Download PDF

You have full access to this open access article

Discover Civil Engineering Aims and scope Submit manuscript

Comparing the performance of machine learning models for predicting the compressive strength of concrete

Download PDF

Arthur Afonso Bitencourt Loureiro¹ &
Ricardo Stefani^1,2

387 Accesses
Explore all metrics

Abstract

This work aimed to investigate and compare the performance of different machine learning models in predicting the compressive strength of concrete using a data set of 1234 compressive strength values. The predictive variables were selected based on their relevance using the SelectKBest method, resulting in an analysis of eight and six predictive variables. The evaluation was conducted through linear correlation studies via simple linear regression and non-linear correlation studies using support vector regression (SVR), random forest (RF), gradient boosting (GB), and artificial neural networks (ANN). The results showed a coefficient of determination (R²) = 0.897 and a root mean square error (RMSE) = 6.535 MPa for SVR, R² = 0.885 and RMSE = 5.437 MPa for GB, R² = 0.868 and RMSE = 5.859 MPa for GB and R² = 0.894 and RMSE = 5.192 MPa for ANN, all for test set and eight predictor variables. The comparison between the machine learning methods revealed significant differences. For instance, ANN stood out with a higher R² value, demonstrating its remarkable ability to explain the variability in the data. ANN also showed the lowest RMSE value, indicating notable accuracy in the predictions. Although ANN has demonstrated higher performance, GB shows a closer performance, which no differences from a practical application. The choice between these approaches depends on considerations regarding the balance between explainability and accuracy. While GB provides a more in-depth understanding of the relationship between variables, ANN stands out for the accuracy of its predictions.

Machine-Learning Methods for Estimating Performance of Structural Concrete Members Reinforced with Fiber-Reinforced Polymers

Article 14 June 2024

Enhancing rock fragmentation assessment in mine blasting through machine learning algorithms: a practical approach

Article Open access 25 April 2024

Prediction of the mechanical performance of polyethylene fiber-based engineered cementitious composite (PE-ECC)

Article Open access 18 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Concrete, which is known for its strength, durability, and versatility, is extensively used in civil construction [1] and it plays a critical role in building structures of various sizes, from small to large infrastructure projects. Therefore, many parameters should be assessed to ensure concrete safety and stability. Among these parameters, the compressive strength of concrete is a fundamental parameter in civil engineering, with direct implications for safety and durability, being widely applied as a fundamental concrete safety indicator [1,2,3]. Thus, compressive strength of concrete should be assessed to ensure that concrete has safety and resistance accordingly. However, the traditional methods used to test compressive strength are destructive, expensive, and limited, especially considering the complexity of concrete composition, which is influenced by ingredient proportions and curing time [4].

As concrete properties are influenced by a wide range of variables, such as quantity of cement, water, aggregates, and additives [5,6,7], it can be difficult to find a formulation with the desired resistance and strength for a specific application. In the search of high-performance concrete, many revolutionary formulations have been developed during the past years, including fiber-reinforced concrete [6, 8, 9], nanoparticle-enriched concrete [10], self-healing concrete [11,12,13], as well as structural high performance concrete enriched with fly ashes and other “low value” additives such as slag powder and silica fume [14, 15]. It is known that the proportions and interactions of such additives and aggregates with cement and water affect the compressive strength of concrete [2]. Therefore, new formulations of concrete should be subjected to essays and tests to verify how the proportion and interactions of such components affect the mechanical properties of concrete, which can be costly and time-consuming.

In this context, Machine Learning (ML) techniques are an innovative approach that can account for a wide range of complex variables and interactions and have been increasingly applied in the field of civil engineering [16,17,18,19,20,21,22,23]. ML enables one to model and predict concrete strength with greater accuracy and efficiency, which can significantly reduce costs and contribute to advances in develo** more robust and durable concrete [16, 24, 25]. Although ML has been successfully applied to predict concrete structural features [26,27,28], predicting the compressive strength of concrete, which is pivotal for ensuring its ability to withstand applied loads, has been challenging, leading to the development of various ML models over the years [2, 28,29,30,31]. The advantage of ML models is that they consider multiple variables and can identifying complex patterns in the data [32, 33], therefore ML can be a useful tool to aid the design and development of highly resistant concrete. Despite the increasingly development of ML models, it is difficult to predict the compressive strength accurately due to non-linear relationships between the concrete components, and several works report distinct performances in predicting the compressive strength of concrete [16, 24, 25]. Given this context, this work sought to scrutinize the application and performance of ML techniques, including multiple linear regression (MLR), support vector machines for regression (SVR), gradient boosting (GB), random forest (RF) and artificial neural networks (ANN), to predict compressive strength and explore the advantages and challenges associated with these ML techniques. Moreover, this work conducted a comprehensive analysis of the interactions between predictive variables in concrete composition and their impact on compressive strength to gain in-depth insights into the relationships between these variables.

2 Materials and methods

2.1 Dataset

An initial dataset [34] was supplemented with information extracted from the literature to enhance the diversity and robustness of the learning process [16]. The final data set consists of 1234 records on the compressive strength of concrete, which were used for the ML and algorithm training. This dataset provided comprehensive and representative information on the properties of concrete and their compressive strength values. An existing dataset was chosen to ensure that the research was based on a representative and diverse sample, thereby making the results more robust and generalizable. Another advantage of this approach is that it allows for external validation of the model and results. This external validation serves as an independent check that strengthens the reliability of the study.

Furthermore, utilizing a pre-existing dataset was crucial for time and resource efficiency, as collecting data can be time-consuming and costly. By using this data set to train an ML algorithm, it was possible to explore the patterns and relationships between the eight input variables (water, cement, fly ash, blast furnace slag, superplasticizer, coarse aggregate, fine aggregate, and curing time) and the output variable (compressive strength). Each concrete variable has interdependent relationships, as demonstrated in Table 1.

Table 1 Meaning of the variables that make up concrete

Full size table

2.2 Attribute selection

To identify the best attributes from dataset to train the models and predict compressive strength, we employed the “SelectKBest” feature selection [35]. The selection criterion used was the F regression, which measures the linear correlation between each attribute and the target variable. After the analysis, ML models were trained with two different scenarios by selecting k independent (predictors) variables: one with eight predictor variables (K = 8), including water, cement, fly ash, blast furnace slag, superplasticizer, coarse aggregate, fine aggregate, and curing time; and another with six predictor variables (K = 6), in which fly ash and fine aggregates were removed. This comparison allowed us to assess how the concrete components can influence the compressive strength of concrete.

2.3 Machine learning models

Data were divided into training and test sets in an 8:2 ratio. Then, an initial exploratory analysis was conducted to evaluate the data quality using descriptive statistics, which is crucial to understand the nature of the data and identify any potential obstacles that may impact the effectiveness of model training. Lastly, five methods were selected to develop the prediction model: multiple linear regression (MLR), support vector regression (SVR), Random Forest (RF), Artificial Neural Networks (ANN) and Gradient Boosting (GB). The performance of each model was further compared with each other. The quality of the model’s fit was assessed using two metrics: the coefficient of determination (R²) and the root mean square error (RMSE). The former measures how well the model’s predictions align with the actual values, while the latter quantifies the closeness between the model’s predictions and the actual values. For evaluation purposes, we printed out the results of model training.

2.3.1 Exploratory analysis

An initial exploratory analysis was conducted to get descriptive statistics and insights about data quality and variable correlations. This analysis yielded pertinent information, such as the number of fields completed in the database, the standard deviation, and the maximum (max) and minimum (min) values for each variable.

2.3.2 Multiple linear regression

Multiple Linear regression (MLR) is employed to model the relationship between a set of independent variables and a dependent variable, and this study utilized a flexible approach to capture complex relationships. The MLR model was employed to obtain prediction values that could be used as a reference to assess other ML models performance in this work.

2.3.3 Support vector machine for regression

This technique is an extension of support vector machine (SVM) algorithms [36]. Unlike SVM, which provides a binary output, SVR estimates a real-valued function and is better suited for solving regression problems [37, 38].

To train SVR models, it is necessary to establish the required parameters. Hence, the following parameters and their functions were defined as described in the literature: kernel, which is a mathematical function responsible for data transformations; C, a regularization parameter which controls the trade-off between maximizing the hyperplane margin and minimizing the training error term; epsilon (regression precision), which establishes the maximum acceptable error; gamma, which improves classification accuracy and reduces regression errors for training data; and degree, which controls the complexity of the transformed space [37, 39, 40]. Hence, the parameters and their values were selected using GridSearchCV method, which operates through an iterative process, where each iteration defines a combination of parameter values and calculates a score. The tool identifies the combination with the highest score as the best one.

2.3.4 Gradient boosting

Another method used in this study was Gradient Boosting (GB), an ML technique designed for regression and classification problems. This particular approach generates a prediction model as an ensemble of weak prediction models, typically represented by decision trees [19, 41]. The parameters employed and their respective functions are as follows:

n_estimators: This parameter represents the number of boosting stages. Since GB is resistant to overfitting, a larger number generally leads to better performance. In this work, this parameter was set to 1000;
max_depth: This parameter determines the maximum depth of the individual regression estimators. The maximum depth restricts the number of nodes in the tree, and adjusting this parameter aims to optimize performance, with the ideal value depending on the interaction of the input variables. In this work this parameter was set to 10;
min_samples_split: This parameter specifies the minimum number of samples required to split an internal node.. In this work this parameter was set to 20;
learning_rate: This parameter refers to the learning rate, which decreases the contribution of each tree based on the learning_rate value. There exists a trade-off between learning_rate and n_estimators. Therefore, in this work this parameter was set to 0.01;
loss: This parameter denotes the loss function to be optimized, where ‘squared_error’ refers to the RMSE.

2.3.5 Artificial neural networks

To apply Artificial Neural Networks (ANN), the architecture of the neural network was adjusted for optimization by considering the following parameters:

solver = ‘adam’: This refers to the weight optimization solver. ‘adam’ is a gradient-based stochastic optimization algorithm particularly well-suited for large data sets.
hidden_layer_sizes = (32, 64, 32): This represents the number of neurons in the hidden layers. The model will have three hidden layers, with the first layer containing 32 neurons, the second layer containing 64 neurons, and the third layer containing 32 neurons.
n_iter_no_change = 200: This indicates the maximum number of epochs to iterate without observing an improvement in the training process.
random_state = 1: This is the seed of the random number generator, which is used to initialize the weights randomly.
max_iter = 5000: This represents the maximum number of interactions for the solver.
learning_rate_init = 0.0001: This signifies the initial learning rate for the ‘adam’ solver.
verbose = True: This setting enables the printing of the training progress.

Once the training is complete, the ANN model is ready to make predictions.

2.3.6 Random forest regressor

Random forest regression is an ensemble machine learning technique that combines decision trees to predict the value of a target response [42]. Random forest algorithms use bootstrap sampling to generate sets of random samples for training each model base tree, which means that instead of training all observations, each tree of RF is trained on a subset of the observations. The predictive ability and range of random forest models can be tuned by adjusting some parameters. In this study, four parameters were used, as will be discussed below. The first parameter is “n_estimators”, which represents the number of trees in the forest. In this work, the final value n_estimators = 1000 was set. The second parameter is “random_state”, which controls the bootstrap randomness. To ensure reproductivity, an integer value equal to 42, which is default to scikit-learn, was adopted in this work. The third parameter is “criterion”, which control how the algorithm will measure the quality of the training and test. Hence, in this work, the criterion was set as “mean_absolute_error” to ensure comparison with other machine learning models developed in this work.

2.4 Influence of independent variable sensivity

ML algorithms are often referred to as black boxes, that is, it is not possible to infer the relationship between the input (independent) and the output (dependent) variables, unlike often done in statistical modeling. However, it is a myth that the relationships between variables and how ML models work cannot be explained [43, 44]. Hence, to determine the influence of each concrete component (variable) in the compressive strength, we used the permutation_importance function at the sklearn.inspection package. Details on the methodology are described elsewhere [22].

3 Results and discussion

3.1 Exploratory analysis

Table 2 presents the findings of the preliminary exploratory analysis conducted on the database under investigation. This analysis yielded pertinent information, such as the number of fields completed in the database, the standard deviation, and the maximum (max) and minimum (min) values for each variable.

Table 2 Input variables in the compression analysis process

Full size table

The mean compressive strength of the concrete was 35.1 MPa, indicating a relatively high strength suitable for applications requiring this property. However, there was considerable variability in the compressive strength, as evidenced by the standard deviation of 16.2 MPa. This variability may be attributed to factors such as the quality of the materials, mixing and curing procedures, and testing methods.

Conversely, the minimum compressive strength of 2.33 MPa was significantly low, making the concrete associated with this strength unsuitable for most applications. This low strength may have been due to issues with the concrete mix or the testing procedures. Additionally, the compressive strength at the 25th percentile was 23.1 MPa, meaning 25% of the concrete samples had a compressive strength lower than 23.1 MPa. At the 50th percentile, the compressive strength was 33.7 MPa; at the 75th percentile, it was 44.4 MPa. These findings indicated that the median compressive strength of the concrete was 33.7 MPa, and 75% of the samples had a strength above 33.7 MPa. Furthermore, the maximum compressive strength was 82.6 MPa, a very high strength value suitable for demanding applications.

Figure 1 shows the frequence distribution of the dependent (compressive strength) and independent variables. The figure shows that the values distribution of compressive strength is diverse and is sufficiently distributed for training ML models without any under- or overfitting problems.

The data suggests that concrete has a high resistance to compression, although there was significant variation in this property; ML techniques, known for their flexibility and adaptability, can effectively capture both linear and non-linear variations. By adjusting models to identify complex patterns and relationships between variables, these algorithms can predict and explain substantial variations in the data, including anomalies and outliers. Training on diverse data sets and optimizing hyperparameters allow these techniques to create dynamic models that can identify anomalous behavior or outliers, thereby enhancing the robustness and accuracy of data analysis [17, 33].

3.2 Correlation analysis

A correlation analysis was conducted to identify the concrete components with the strongest relationship with concrete strength. The results showed that concrete compressive strength has a moderate positive correlation with cement (0.676) and weaker correlations with other materials, with it also has a moderate positive correlation with test age of concrete. Figure 2 shows a heatmap with all variables correlations.

From Fig. 2, we can conclude that most independent variables have a weak correlation with the compressive strength of concrete. These correlations provide valuable insights for develo** predictive models of concrete strength, particularly when using non-linear ML techniques, particularly support vector, boosting and ANN [16, 25, 45].

3.3 Machine learning models

3.3.1 Multiple linear regression

Figure 3 shows the linear regression model analysis used eight predictor variables (k = 8), and the results revealed moderate performance (R² = 0.594 and RMSE = 10.319 MPa). However, when the model was used with only the six best predictor variables (k = 6) selected by the SelectKBest method, the coefficient of determination decreased (R² = 0.450) and RMSE increased (RMSE = 18,097 MPa).

The MLR model falls short in terms of R², which is too low in both scenarios. Initially, the MLR analysis displayed moderate performance, indicating its limitation in handling the complexity of the underlying relationships. Reducing the number of predictor variables to six led to a further decline in performance, emphasizing the model's sensitivity to selecting these variables.

This suggests that the regression line does not adequately fit the data (Fig. 3). As a result, this suggests that statistical regression models are not suitable for accurately predicting the compressive strength of concrete due to its low precision. Despite the low precision of MLR in predicting the compressive strength, from attribute selection results (Fig. 4) it is evident that the superplasticizer and blast furnace slag significantly affect the resulting value of the compressive strength. This observation corroborates previous findings [18, 34].

3.3.2 Support vector regression

A final SVR model with C = 100, degree = 1, gamma = 0.1 and max_iter = 1000 was found to be the best SVR estimator via grid search. A comparative analysis of different k settings in the context of SVR provides relevant information about the model’s performance (Fig. 5). When k was set to 8, including all predictive variables, the R² reached 0.836, indicating the model’s ability to explain the variability in concrete compressive strength. Furthermore, the RMSE was calculated at 6.535 MPa, indicating high accuracy in the estimates. Surprisingly, when k was adjusted to 6, there was a slight increase in the performance indicators. The R² reached 0.840, suggesting that the model had a robuster ability to capture the underlying relationships in the data. The corresponding RMSE was 6.442 MPa, indicating an acceptable level of accuracy in the predictions. This suggests that fine aggregate and fly ashes do not influence so much in concrete compressive strength, this observation corroborates the scores of attribute selection (Fig. 4) and previous studies [18, 24, 34].

3.3.3 Gradient boosting

Gradient boosting was utilized to predict concrete strength in the models with six and eight predictive variables (Fig. 6). The models showed an excellent fit to the data, achieving R² = 0.886 and RMSE = 5.434 MPa for k = 8 and R² = 0.867 and RMSE = 5.842 MPa for k = 6. The analysis indicated that while both sets of parameters produced significant results, the configuration with k = 8 exhibited slightly higher accuracy due to its lower RMSE and higher R². Moreover, gradient boosting outperforms MLR and is slightly better than SVR, indicating better reliability than the other models.

3.3.4 Artificial neural networks

By comparing the results obtained from different configurations of ANN (Fig. 7), significant variations in predictive performance for concrete compressive strength can be observed. In the scenario with k = 8, the neural model incorporated variables such as water, cement, fly ash, blast furnace slag, superplasticizer, coarse aggregate, fine aggregate, and curing time. The results revealed a strong coefficient of determination (R² = 0.895), indicating that the model can explain the variability in compressive strength. Additionally, the RMSE (5.192 MPa) indicated a relatively high level of accuracy in the predictions.

As for k = 6, the model’s performance slightly decreased (R² = 0.855), implying that the ANN accounted can predict compressive strength with k = 8 or k = 6 with almost no performance loss. Furthermore, with k = 6 RMSE increased to 6.098 MPa, indicating a greater prediction spread than actual values.

3.3.5 Random forest regression

Random Forest Regressor was the last model utilized to predict concrete strengths with six and eight predictive variables (Fig. 8). The models showed an excellent fit to the data, achieving R² = 0.868 and RMSE = 5.859 MPa for k = 8 and R² = 0.855 and RMSE = 6.145 MPa for k = 6. The analysis indicated that while both sets of parameters produced significant results, the configuration with k = 8 exhibited slightly higher accuracy due to its lower RMSE and higher R². Moreover, RF performance is comparable with other ML models, such as gradient boosting and support vector regressor.

3.4 Overall performance of machine learning models

Our findings showed that including more independent variables significantly improves the results (Table 3), regardless of the ML algorithm used. This improvement is evident in both the coefficient of determination, which measures the model’s explanatory capacity, and the RMSE, which reflects the accuracy of the predictions. This finding emphasizes the importance of comprehensive and relevant predictor characteristics in predictive analysis.

Table 3 Performance comparison of ML models

Full size table

Several studies demonstrate the potential of ML models to predict the compression strength of materials with considerable accuracy [18, 19, 23,24,25, 31, 34, 46, 47]. Through training on comprehensive datasets, these models learn complex patterns and relationships between material characteristics and their respective strength [25]. The scientific literature documents cases where ML models achieve coefficients of determination (R²) exceeding 0.90 [23, 30, 31], mostly using neural networks, which represents a reasonable level of accuracy for practical applications. In this work, ANN also outperforms other ML techniques, especially with eight predictive variables.

In the case of six predictive variables compared to using eight predictive variables, we observed numerically lower values, highlighting the significance of specific physical characteristics of fly ash and fine aggregate excluded from the analysis. Fly ash plays a crucial role in concrete by enhancing cohesion, reducing exudation and segregation, and extending the setting time of fresh concrete [14]. In its hardened state, fly ash contributes to temperature reduction through hydration reactions, resulting in more resilient concrete. Fine aggregates (e.g., sand) are incorporated into concrete to fill voids and improve the workability of the fresh material, playing a decisive role in concrete strength.

The comparison between the models with 6 and 8 variables demonstrates that including more variables positively impacts the model's accuracy in predicting concrete strength (Table 3). The results show that including more variables in the model leads to an increase in the accuracy of predicting concrete strength. This is because the additional variables provide more information about the behavior of concrete, allowing the model to make more precise estimates.

However, the choice of the number of variables to be included in the model should be made carefully, as an excessive number of variables can lead to overfitting, which can impair its generalization. On the other hand, including an insufficient number of variables can lead to underfitting, which can reduce its accuracy.

A comparative analysis of the models, including SVR, GB, RF and ANN, revealed that GB exhibited the highest coefficient of determination (R²) with six predictive variables, while ANN exhibit the highest performance when training and testing with eight predictive variables. Interestingly, the differences between ANN and GB are too small to be significant from a practical point of view. Hence, both models demonstrated a superior ability to explain variation in the data compared to the other models (MLR, RF and SVR). Moreover, the ANN excelled in minimizing the RMSE, indicating more accurate predictions and reduced scattering in relation to the real values, leading to a more precise estimate of the concrete’s compressive strength.

3.5 Influence of independent variable sensitivity

A graph with the importance of each variable (Fig. 9) shows that the besides the components present in every concrete (cement, water, and aggregates), additives such as blast furnace slag, fly ash and super plasticizers have a direct influence on compressive strength.

Thus, this analysis corroborates the previous observation that such additives affect compressive strength of concrete [14, 21, 47]. Moreover, previous studies involving compressive strength of concrete, machine learning as well as experimental assays found the same influence of concentration and type of additives in compressive strength of concrete [16, 21, 23, 47].

4 Conclusions

This research utilized several modeling approaches to predict the compressive strength of concrete by considering relevant predictor variables. Our findings led us to conclude that the influence of multiple factors on the compressive strength of concrete makes the use of simple modelling techniques, such as multiple linear regression, unsuitable for the regression problems addressed herein. Therefore, using more sophisticated techniques, such as artificial neural networks and ensemble methods, i.e. gradient boosting and random forest, are more appropriate for the regression problems addressed in this study. Moreover, these techniques are particularly good in mitigating problems that could arise during model training due to the limited availability of data, as ensemble techniques are known for being very accurate when data availability is limited. Thus, the choice between these approaches depends on the project’s specific requirements, data size and the emphasis placed on interpretability versus accuracy.

Data availability

Some or all data, models, or code that support the findings of this study are available from the corresponding author github.

References

Wiktor V, Jonkers HM. Quantification of crack-healing in novel bacteria-based self-healing concrete. Cem Concr Compos. 2011;33:763–70. https://doi.org/10.1016/j.cemconcomp.2011.03.012.
Article Google Scholar
Young BA, Hall A, Pilon L, et al. Can the compressive strength of concrete be estimated from knowledge of the mixture proportions?: New insights from statistical analysis and machine learning methods. Cem Concr Res. 2019;115:379–88. https://doi.org/10.1016/j.cemconres.2018.09.006.
Article Google Scholar
Rocha SON, Maia NA, de CarvalhoJúnior ÁB, et al. Utilização de redes neurais para estimativa da resistência à compressão do concreto simples/use of neural networks to estimate the compressive strength of simple concrete. Braz J Dev. 2020;6:79910–22. https://doi.org/10.3411/bjdv6n10-424.
Article Google Scholar
Congro M, de Monteiro VM, A, Brandão ALT, et al. Prediction of the residual flexural strength of fiber reinforced concrete using artificial neural networks. Constr Build Mater. 2021. https://doi.org/10.1016/j.conbuildmat.2021.124502.
Article Google Scholar
Neville AM. Properties of concrete. 5th ed. London: Pearson; 2011.
Google Scholar
Deng Z, Liu X, Liang N, et al. Flexural performance of a new hybrid basalt-polypropylene fiber-reinforced concrete oriented to concrete pipelines. Fibers. 2021;9:1–15. https://doi.org/10.3390/fib9070043.
Article Google Scholar
LaHucik J, Dahal S, Roesler J, Amirkhanian AN. Mechanical properties of roller-compacted concrete with macro-fibers. Constr Build Mater. 2017;135:440–6. https://doi.org/10.1016/j.conbuildmat.2016.12.212.
Article Google Scholar
Ochi T, Okubo S, Fukui K. Development of recycled PET fiber and its application as concrete-reinforcing fiber. Cem Concr Compos. 2007;29:448–55. https://doi.org/10.1016/j.cemconcomp.2007.02.002.
Article Google Scholar
Li J, Niu J, Wan C, et al. Comparison of flexural property between high performance polypropylene fiber reinforced lightweight aggregate concrete and steel fiber reinforced lightweight aggregate concrete. Constr Build Mater. 2017;157:729–36. https://doi.org/10.1016/j.conbuildmat.2017.09.149.
Article Google Scholar
Costa VC, de Souza Junior FG, Thomas S, et al. Nanotechnology in concrete: a bibliometric review. Braz J Exp Des Data Anal Inferential Statis. 2021. https://doi.org/10.5574/bjedis.v1i1.48410.
Article Google Scholar
Li VC, Yang E-H. Self healing in concrete materials. In: van der Zwaag S, editor. Self healing materials: an alternative approach to 20 centuries of materials science. Berlin: Springer; 2007. p. 161–93.
Chapter Google Scholar
Hossain MR, Sultana R, Patwary MM, et al. Self-healing concrete for sustainable buildings. A review. Environ Chem Lett. 2022;20:1265–73.
Article Google Scholar
Kumar Jogi P, Vara Lakshmi TVS (2020) Self healing concrete based on different bacteria: A review. In: Materials Today: Proceedings. Elsevier Ltd, pp 1246–1252
Liu F, Ding W, Qiao Y. Experimental investigation on the flexural behavior of hybrid steel-PVA fiber reinforced concrete containing fly ash and slag powder. Constr Build Mater. 2019;228:1–13. https://doi.org/10.1016/j.conbuildmat.2019.116706.
Article Google Scholar
Ashraf M, Iqbal MF, Rauf M, et al. Develo** a sustainable concrete incorporating bentonite clay and silica fume: mechanical and durability performance. J Clean Prod. 2022;337:1–13. https://doi.org/10.1016/J.JCLEPRO.2021.130315.
Article Google Scholar
Feng DC, Liu ZT, Wang XD, et al. Machine learning-based compressive strength prediction for concrete: an adaptive boosting approach. Constr Build Mater. 2020. https://doi.org/10.1016/j.conbuildmat.2019.117000.
Article Google Scholar
Zhou Z-H. Machine learning. 1st ed. Singapore: Springer; 2021.
Book Google Scholar
Güçlüer K, Özbeyaz A, Göymen S, Günaydın O. A comparative investigation using machine learning methods for concrete compressive strength estimation. Mater Today Commun. 2021. https://doi.org/10.1016/j.mtcomm.2021.102278.
Article Google Scholar
Zhang C, Zhu Z, Liu F, et al. Efficient machine learning method for evaluating compressive strength of cement stabilized soft soil. Constr Build Mater. 2023;392:131887. https://doi.org/10.1016/J.CONBUILDMAT.2023.131887.
Article Google Scholar
Chen G, Tang W, Chen S, et al. Prediction of self-healing of engineered cementitious composite using machine learning approaches. Appl Sci. 2022;12:1–27. https://doi.org/10.3390/app12073605.
Article Google Scholar
Song H, Ahmad A, Farooq F, et al. Predicting the compressive strength of concrete with fly ash admixture using machine learning algorithms. Constr Build Mater. 2021. https://doi.org/10.1016/j.conbuildmat.2021.125021.
Article Google Scholar
Pessoa CLE, Peres Silva VH, Stefani R. Prediction of the self-healing properties of concrete modified with bacteria and fibers using machine learning. Asian J Civil Eng. 2024;25:1801–10. https://doi.org/10.1007/s42107-023-00878-w.
Article Google Scholar
Tipu RK, Suman BV. Development of a hybrid stacked machine learning model for predicting compressive strength of high-performance concrete. Asian J Civil Eng. 2023;24:2985–3000. https://doi.org/10.1007/s42107-023-00689-z.
Article Google Scholar
Alghamdi SJ. Prediction of concrete’s compressive strength via artificial neural network trained on synthetic data. Eng, Technol Appl Sci Res. 2023;13:12404–8. https://doi.org/10.4808/etasr.6560.
Article Google Scholar
Cook R, Lapeyre J, Ma H, et al. Prediction of compressive strength of concrete: critical comparison of performance of a hybrid machine learning model with standalone models. J Mater Civil Eng. 2019. https://doi.org/10.1061/(ASCE).
Article Google Scholar
Marani A, Jamali A, Nehdi ML. Predicting ultra-high-performance concrete compressive strength using tabular generative adversarial networks. Materials. 2020;13:1–24. https://doi.org/10.3390/ma13214757.
Article Google Scholar
Marani A, Nehdi ML. Predicting shear strength of FRP-reinforced concrete beams using novel synthetic data driven deep learning. Eng Struct. 2022;257:114083. https://doi.org/10.1016/j.engstruct.2022.114083.
Article Google Scholar
Kumar R, Rai B, Samui P. Machine learning techniques for prediction of failure loads and fracture characteristics of high and ultra-high strength concrete beams. Innov Infrastruct Solut. 2023. https://doi.org/10.1007/s41062-023-01191-w.
Article Google Scholar
Anjum M, Khan K, Ahmad W, et al. Application of ensemble machine learning methods to estimate the compressive strength of fiber-reinforced nano-silica modified concrete. Polymers. 2022. https://doi.org/10.3390/polym14183906.
Article Google Scholar
Tipu RK, Batra V, Suman, et al. Efficient compressive strength prediction of concrete incorporating recycled coarse aggregate using Newton’s boosted backpropagation neural network (NB-BPNN). Structures. 2023. https://doi.org/10.1016/j.istruc.2023.105559.
Article Google Scholar
Kumar Tipu R, Panchal VR, Pandya KS. An ensemble approach to improve BPNN model precision for predicting compressive strength of high-performance concrete. Structures. 2022;45:500–8. https://doi.org/10.1016/j.istruc.2022.09.046.
Article Google Scholar
Kuntz D, Wilson AK. Machine learning, artificial intelligence, and chemistry: How smart algorithms are resha** simulation and the laboratory. Pure Appl Chem. 2022;94:1019–54. https://doi.org/10.1515/pac-2022-0202.
Article Google Scholar
Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Electron Mark. 2021;31:685–95. https://doi.org/10.1007/s12525-021-00475-2/Published.
Article Google Scholar
Yeh I-C. Modeling of strength of high-performance concrete using artificial neural networks. Cem Concr Res. 1998;28:1797–808. https://doi.org/10.1016/S0008-8846(98)00165-3.
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet Google Scholar
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing. 2020;408:189–215. https://doi.org/10.1016/j.neucom.2019.10.118.
Article Google Scholar
Smola AJ, Scholkopf B, Scholkopf S. A tutorial on support vector regression. Alphen aan den Rijn: Kluwer Academic Publishers; 2004.
Book Google Scholar
Zhang F, O’Donnell LJ. Support vector regression. In: Mechelli A, Vieira S, editors. Machine learning: methods and applications to brain disorders. Amsterdam: Elsevier; 2019. p. 123–40.
Google Scholar
Meyer D. Support vector machines. R-News. 2001;1:23–6.
MathSciNet Google Scholar
Mammone A, Turchi M, Cristianini N. Support vector machines. Wiley Interdiscip Rev Comput Stat. 2009;1:283–9.
Article Google Scholar
Nguyen NH, Tong KT, Lee S, et al. Prediction compressive strength of cement-based mortar containing metakaolin using explainable categorical gradient boosting model. Eng Struct. 2022. https://doi.org/10.1016/j.engstruct.2022.114768.
Article Google Scholar
Ehrman TM, Barlow DJ, Hylands PJ. Virtual screening of Chinese herbs with random forest. J Chem Inf Model. 2007;47:264–78. https://doi.org/10.1021/ci600289v.
Article Google Scholar
Pilania G. Machine learning in materials science: from explainable predictions to autonomous design. Comput Mater Sci. 2021;193:110360. https://doi.org/10.1016/J.COMMATSCI.2021.110360.
Article Google Scholar
Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019;1:206–15. https://doi.org/10.1038/s42256-019-0048-x.
Article Google Scholar
Ziolkowski P, Niedostatkiewicz M. Machine learning techniques in concrete mix design. Materials. 2019. https://doi.org/10.3390/ma12081256.
Article Google Scholar
Kaveh A, Khavaninzadeh N. Efficient training of two ANNs using four meta-heuristic algorithms for predicting the FRP strength. Structures. 2023;52:256–72. https://doi.org/10.1016/j.istruc.2023.03.178.
Article Google Scholar
Ali ASM. Predicting compressive strength of concrete with fly ash, metakaolin and silica fume by using machine learning techniques. Latin Am J Solids Struct. 2022. https://doi.org/10.1590/1679-78257022.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate Program in Materials Science, Federal University of Mato Grosso, Campus Araguaia, Barra do Garças, Mato Grosso, 78605-091, Brazil
Arthur Afonso Bitencourt Loureiro & Ricardo Stefani
Department of Chemistry, Institute of Exact and Biological Sciences, Federal University of Ouro Preto, Ouro Preto, Minas Gerais, 35400-000, Brazil
Ricardo Stefani

Authors

Arthur Afonso Bitencourt Loureiro
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Stefani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.A.B.L wrote the first draft of manuscript and prepared all figures R.S manuscript revision and improvement, wrote the final version of manuscript.

Corresponding author

Correspondence to Ricardo Stefani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Loureiro, A.A.B., Stefani, R. Comparing the performance of machine learning models for predicting the compressive strength of concrete. Discov Civ Eng 1, 19 (2024). https://doi.org/10.1007/s44290-024-00022-w

Download citation

Received: 27 March 2024
Accepted: 30 May 2024
Published: 04 June 2024
DOI: https://doi.org/10.1007/s44290-024-00022-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Comparing the performance of machine learning models for predicting the compressive strength of concrete

Abstract

Similar content being viewed by others

Machine-Learning Methods for Estimating Performance of Structural Concrete Members Reinforced with Fiber-Reinforced Polymers

Enhancing rock fragmentation assessment in mine blasting through machine learning algorithms: a practical approach

Prediction of the mechanical performance of polyethylene fiber-based engineered cementitious composite (PE-ECC)

1 Introduction

2 Materials and methods

2.1 Dataset

2.2 Attribute selection

2.3 Machine learning models

2.3.1 Exploratory analysis

2.3.2 Multiple linear regression

2.3.3 Support vector machine for regression

2.3.4 Gradient boosting

2.3.5 Artificial neural networks

2.3.6 Random forest regressor

2.4 Influence of independent variable sensivity

3 Results and discussion

3.1 Exploratory analysis

3.2 Correlation analysis

3.3 Machine learning models

3.3.1 Multiple linear regression

3.3.2 Support vector regression

3.3.3 Gradient boosting

3.3.4 Artificial neural networks

3.3.5 Random forest regression

3.4 Overall performance of machine learning models

3.5 Influence of independent variable sensitivity

4 Conclusions

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation