Introduction

Gestational diabetes mellitus (GDM) is characterised by any degree of glucose intolerance that either develops or is first identified during pregnancy [1]. It encompasses cases of previously undiagnosed glucose intolerance that may have existed before or emerged during pregnancy, regardless of subsequent management approaches, such as dietary modification or insulin therapy, and whether the condition persists post-pregnancy [2]. Regional disparities in GDM prevalence are evident, with the highest rates found in the Middle East and North Africa (12.9%), followed by Southeast Asia (11.7%), the Western Pacific (11.7%), South and Central America (11.2%), and the lowest rates in Europe (5.8%), North America, and the Caribbean (7.0%) [3]. GDM is a widespread pregnancy complication, affecting 1–14% of pregnancies worldwide, with variations influenced by patient ethnicity and diagnostic criteria [4, 5]. The impact of GDM on maternal and fetal health is significant, often leading to preterm delivery, cesarean section, excessive fetal growth, hyperinsulinemia, hypoglycemia, and hyperbilirubinemia in newborns [6,7,8]. Additionally, GDM can progress to Type 2 Diabetes Mellitus (T2DM), resulting in birth-related complications, visceromegaly, fetal macrosomia, and an increased risk of metabolic disorders for both mother and child, including hypertension, obesity, and metabolic syndrome [9, 10].

The precise pathophysiological mechanisms of GDM remain incompletely understood, but hormonal imbalances, impaired insulin sensitivity, and pancreatic β-cell malfunction are suggested contributors [11]. About 16% of pregnancies globally are linked to hyperglycemia, with 84% classified as GDM [12]. GDM significantly contributes to the onset of T2DM in both mothers and offspring, emphasising the importance of effectively managing blood glucose levels during pregnancy to prevent and reduce the prevalence of T2D in future generations [13]. Historically, screening for GDM relied on medical history, previous obstetric outcomes, and family history of T2D. However, this approach exhibited an approximate 50% failure rate in detecting GDM among pregnant women. In 1973, a pivotal study recommended adopting the 50 g 1-h oral glucose tolerance test as a screening tool, which is now widely used by approximately 95% of obstetricians in the United States for GDM screening. In 2014, the U.S. Preventive Services Task Force (USPSTF) recommended GDM screening for all pregnant women at 24 weeks [12, 14, 15].

Early screening and diagnosis of GDM are crucial for reducing the risks of pregnancy-related complications, such as macrosomia, preterm birth, pre-eclampsia, and neonatal intensive care admissions [14, 16]. Existing diagnostic tools have limitations in this regard. To enhance the prediction of GDM, clinical, sociodemographic, and anthropometric data have been employed in traditional regression analysis-based clinical risk prediction models. Recent advancements in machine learning promise to increase the accuracy of disease perception, diagnosis, and management. For instance, Belsti et al. [17] used a predictive analysis on antenatal care records. Their model achieved 85% accuracy, 90% precision, 78% recall, 84% F1-score, 81% sensitivity, 90% specificity, 92% positive predictive value, 78% negative predictive value, and a Brier Score of 0.39, surpassing the performance of traditional statistical methods. Most outcome prediction models enable early intervention in high-risk women and cost-effective screening by identifying low-risk individuals, potentially eliminating the need for glucose tolerance tests [18]. This review explores the effectiveness of machine learning algorithms in detecting GDM, incorporating relevant studies and data on their application for GDM detection.

Methodology

Literature search strategy

A literature search was carried out to review the role of machine learning algorithms in the early detection of GDM and their impact on fetomaternal outcomes. The following databases were searched: PubMed, Scopus, Web of Science, and Google Scholar. The search was conducted for studies published between 2000 and September 2023. The following keywords were used (“machine learning”[MeSH Terms] OR (“machine”[All Fields] AND “learning”[All Fields]) OR “machine learning”[All Fields]) AND (“algorithms”[MeSH Terms] OR “algorithms”[All Fields]) AND (“diabetes, gestational”[MeSH Terms] OR (“diabetes”[All Fields] AND “gestational”[All Fields]) OR “gestational diabetes”[All Fields] OR (“gestational”[All Fields] AND “diabetes”[All Fields] AND “mellitus”[All Fields]) OR “gestational diabetes mellitus”[All Fields]).

Inclusion and exclusion criteria

Articles were included if they met the following criteria:

  • Published in English.

  • Peer-reviewed original studies.

  • Focused on applying machine learning algorithms in the context of GDM.

  • Included information on using machine learning in detecting or predicting GDM.

The exclusion criteria were:

  • Systematic analyses, meta-analyses, reviews, conference abstracts, case reports, editorials, and letters.

  • Studies that did not provide relevant information or data on the topic.

Study selection

Two independent reviewers (NA & EK) initially screened titles and abstracts to identify potentially relevant articles. Full-text articles were then retrieved for further evaluation. Discrepancies were resolved through discussion, and a third reviewer (GO) was consulted when necessary.

Data extraction

Data were extracted from the selected articles, including study design, sample size, characteristics of the study population, machine learning algorithms employed, predictive variables used, outcomes measured, and reported results.

Data synthesis

The findings from the selected studies were synthesised to provide an overview of the current evidence regarding the role of machine learning algorithms in the early detection of GDM and their impact on fetomaternal outcomes. Common themes, trends, and methodological differences were identified. Results were analysed and presented in a clear and organised manner.

Results

The studies in this review focused on predicting and detecting GDM through machine learning algorithms (See Table 1). Most were retrospective studies; others were cohort studies, and two were randomised clinical trials. The populations studied vary in size, from smaller cohorts of just a few thousand individuals to larger populations exceeding 30,000. The studies reviewed utilised diverse machine learning algorithms, including Naïve Bayes, Decision Trees, Support Vector Machines, Neural Networks, Logistic Regression, Lasso-Logistics, Gradient Boosting Decision Tree (GBDT), Deep Neural Network (DNN), Gaussian Naïve Bayes (GNB), Bernoulli Naïve Bayes (BNB), and various ensemble methods such as Light Gradient Boosting Machine (LGBM) and Extreme Gradient Boosting (XGBoost). Data sources include pregnancy registries, perinatal databases, clinical records, and data from health institutions or hospitals.

Table 1 Characteristics of reviewed studies

Model performance and comparison

The studies conducted by Kang et al. (2023) and Yunzhen et al. (2020) demonstrated notable outcomes in terms of model performance and comparison [21, 31] conducted a study aiming to characterise GDM in pregnant women better using Attenuated Total Reflection Fourier-transform infrared (ATR-FTIR) spectroscopy. The study employed chemometric approaches, integrating feature selection algorithms along with discriminant analysis methods such as Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Support Vector Machines (SVM). The results obtained by Genetic Algorithm Linear Discriminant Analysis (GA-LDA) were reported as the most satisfactory, achieving % accuracy, sensitivity, and specificity of 100%.

Results in diverse populations

Mukkesh Kumar et al. [26] conducted a cohort study to evaluate the predictive ability of the existing UK National Institute for Health and Care Excellence (NICE) guidelines for assessing GDM using machine learning. This study employed the CatBoost gradient boosting algorithm and the Shapley feature attribution framework for predictive modelling. The findings of the study revealed that the existing UK NICE guidelines were insufficient to assess GDM risk in Asian women. Furthermore, the non-invasive predictive model developed in this study demonstrated superior performance to the current state-of-the-art machine learning models in predicting GDM. Similarly, Mukkesh Kumar et al. [27] built a preconception-based GDM predictor to enable early intervention. Additionally, the study aimed to assess the associations of top predictors with GDM and adverse birth outcomes. Participants were recruited from multi-ethnic groups (Chinese, Malay, Indian, or any combination of these three ethnicities). The study employed an evolutionary algorithm-based automated machine learning (AutoML) approach, incorporating the SHAP (SHapley Additive exPlanations) framework and TPOT (Tree-based Pipeline Optimization Tool). The study successfully devised a population-based predictive care solution, utilising an AutoML approach, to assess the risk of develo** GDM among Asian women in the preconception period. While effective in some contexts, their findings revealed that these algorithms proved insufficient for accurately assessing GDM risk in some ethnic groups of women. This study highlights the need for population-specific considerations when addressing GDM.

Predictive models for specific cohorts

Yuhan et al. [28] conducted a Randomized Clinical Trial to apply machine learning techniques to develop a Clinical Decision Support System (CDSS). The objective was to predict the risk of Gestational Diabetes Mellitus (GDM), specifically in a high-risk group of women with overweight and obesity.. The study employed both Random Forest and Logistic Regression models for prediction. The study successfully developed a simple yet effective model utilising machine learning algorithms to predict the risk of GDM in the first trimester. Notably, the model achieved this without relying on blood examination indexes. Li-Li et al. [29] conducted a retrospective study to investigate the application of a machine learning algorithm for predicting GDM in early pregnancy. The machine learning algorithm employed in the study was the Random Forest regression algorithm. Notably, the model identified body weight at birth and the mother’s weight as strongly predictive variables for GDM. Additionally, other variables such as colpomycosis, kidney disease, the number of births by the mother, regular menstruation, blood type, and hepatitis consistently ranked among the top 20 most influential factors. They were found to be linked to GDM in the study.

Clinical data and treatment modality

Lauren et al. [25] conducted a population-based cohort study to investigate whether clinical data at different stages of pregnancy could predict the treatment modality for GDM. The focus of the study was on predicting the risks for pharmacologic treatment beyond medical nutrition therapy (MNT) for pregnant women diagnosed with GDM. The study employed transparent and ensemble machine learning methods for predictive modelling, incorporating LASSO regression and a super learner. The super learner included classification, regression tree, LASSO regression, random forest, and extreme gradient boosting algorithms. The study’s findings demonstrated reasonably high predictability for GDM treatment modality at GDM diagnosis and maintained high predictability at 1-week post-GDM diagnosis. In parallel, Jenny et al. [23] demonstrated the development of an innovative method for implementing proportionate care delivery based on existing features within GDM clinics. For predictive modelling, the study employed linear and non-linear tree-based regression models, including metrics such as XGBoost MSE (Mean Squared Error), R2 (R-squared), and MAE (Mean Absolute Error). The findings suggest that such a machine learning-based stratification system could provide an effective and practical approach for tailoring care interventions based on existing features within GDM clinics, potentially improving patient outcomes and resource allocation.

Discussion

The studies reviewed here encompass various methodologies, underlining the multifaceted nature of GDM prediction. One striking trend within this collection of studies is the detailed comparison of machine learning algorithms. Algorithms like XGBoost and Logistic Regression have demonstrated their effectiveness in GDM prediction [29]. However, it is essential to recognise that there is no one-size-fits-all solution. While XGBoost displayed superiority in several studies, comprehending the strengths and weaknesses of different algorithms becomes crucial for optimising predictive models within various contexts.

The importance of early prediction for effective GDM management cannot be overstated, and it is evident in the significant emphasis placed on this aspect in the reviewed studies [25, 34] (Fig. 1). The rationale behind early prediction lies in the potential to initiate timely interventions and provide personalised care to pregnant women at risk of develo** GDM. The complications associated with GDM can have profound and long-lasting effects on both the mother and child, making early detection a critical component of effective healthcare [35]. This emphasis on early prediction is reflected in the proliferation of diverse models designed to forecast GDM risk during the early stages of pregnancy. The variety of models exemplified by the comprehensive work of Gabriel Cubillos et al. [19] underscores the collective ambition within the scientific community to enhance the accuracy and reliability of GDM predictions. The study by Gabriel Cubillos and their team is particularly noteworthy as it prioritised early prediction and explored the potential of different machine-learning models [19]. They expanded the toolkit for healthcare providers and researchers by develo** and optimising twelve distinct models. These models are fine-tuned to deliver high prediction performance during the early stages of pregnancy. This multi-pronged approach allows for more comprehensive risk assessment, increasing the chances of timely interventions. The focus on early prediction is not only about identifying cases but also about develo** a deeper understanding of the factors and variables that contribute to the development of GDM [36]. By emphasising the importance of early detection, these studies pave the way for tailoring interventions that can prevent or mitigate the impact of GDM. The ultimate goal is to improve maternal and fetal health outcomes by making proactive, personalised care a standard practice in obstetrics.

Fig. 1
figure 1

Translating machine learning predictions into clinical interventions for gestational diabetes

Studies within this review underscore the importance of tailoring predictive models to specific populations and demographic groups when addressing the prediction and early detection of GDM [19, 23, 30]. These studies highlight that a one-size-fits-all approach is insufficient, and demographic-specific considerations are essential for constructing accurate predictive models. Mukkesh Kumar et al. [26] have made a particularly striking contribution by shedding light on the limitations of employing uniform guidelines for diverse populations, specifically emphasising the challenges faced by Asian women. Their findings reveal that traditional, broadly applicable guidelines may not adequately capture the unique risk factors and nuances associated with GDM in Asian populations. This study emphasises the necessity of considering ethnicity, genetics, and other demographic-specific factors when constructing predictive models for GDM. By doing so, healthcare providers can better identify at-risk individuals within these populations and tailor interventions and care strategies to their specific needs. Similarly, the research conducted by Yuhan Du et al. (2022) provides a compelling illustration of the potential for augmenting prediction accuracy by focusing on high-risk groups [23]. In this case, the study zeroes in on women who are overweight or obese, a demographic with a higher susceptibility to GDM. By develo** a specialised clinical decision support system for this specific cohort, the study recognises the unique risk profile of these individuals. This targeted approach can enhance prediction accuracy, ensuring women at the highest risk receive the necessary attention, interventions, and care. These findings indicate the importance of healthcare equity, emphasising that predictive models must be sensitive to the diversity of the populations they serve. The one-size-fits-all approach is no longer adequate, as demographic factors significantly determine GDM risk. Future research and healthcare initiatives should consider these demographic-specific considerations when designing predictive models, ultimately leading to more accurate risk assessment and better-tailored interventions.

Lauren et al. (2022) and Jenny et al. (2022) made substantial contributions to the field by emphasising the importance of integrating clinical data into the predictive models for GDM [23, 25]. These studies provide valuable insights into how leveraging clinical data can enhance the treatment and care delivery for individuals diagnosed with GDM, ultimately improving patient outcomes. The integration of clinical data into predictive models offers several crucial advantages. First and foremost, it enables healthcare providers to personalise and optimise the treatment and care for pregnant individuals diagnosed with GDM. By considering clinical data such as responsiveness to medical nutrition therapy, they can tailor interventions to each patient’s specific needs. This individualised approach is essential, as GDM management can vary significantly from one person to another [37]. Furthermore, incorporating clinical data fosters a more patient-centred approach to care. It ensures that the treatment plan aligns with the patient’s specific health profile, preferences, and response to interventions. This patient-centred approach can improve patient satisfaction, compliance, and overall well-being. Jenny et al. [23] introduced the concept of proportionate care delivery based on available clinical data. This innovative approach streamlines care and ensures that resources are allocated efficiently, addressing patients’ needs more effectively [30]. By leveraging existing clinical data, healthcare providers can identify individuals at risk of high blood glucose levels, enabling proactive intervention and reducing the likelihood of complications associated with uncontrolled GDM.

Nonetheless, it is essential to acknowledge that challenges persist within GDM prediction. A common challenge encountered is the extensive array of variables associated with GDM [

Limitations and strengths of review

This review explores various studies on predicting and detecting GDM through machine learning methods. It encompasses a wide range of study designs, population groups, and machine learning algorithms, providing an inclusive overview of this field’s current state of research. However, the studies included in this review span across different geographical regions and demographic profiles. While this diversity enriches the scope of the review, it can simultaneously limit the generalizability of findings. GDM risk factors and predictive models may exhibit variations among populations, and the review would benefit from a more thorough discussion of the implications arising from this variability. Additionally, this review primarily relies on studies published in English, which might introduce publication bias, potentially overlooking negative or inconclusive results less readily available in English literature.

Conclusion

Predicting and early detecting GDM through machine learning techniques is a dynamic and evolving field. This review shows significant findings and trends across diverse studies, shedding light on the potential and challenges within this domain. The significance of early prediction in facilitating effective GDM management is striking, with numerous studies committed to crafting models capable of identifying GDM risk in the early stages of pregnancy. XGBoost emerged prominently as a consistent performer, showcasing superior predictive capabilities across various cohorts and time points. These models create opportunities for timely interventions and personalised care, ultimately improving outcomes for both mothers and infants. Nevertheless, the challenges at hand are notable. The vast array of variables associated with GDM poses a substantial hurdle in the quest for accurate prediction models. The selection and weighting of these variables remain intricate tasks, necessitating ongoing research and innovation in feature engineering. Furthermore, the emphasis on tailoring predictive models to specific populations, evident in studies focusing on Asian women or high-risk groups, underscores the importance of demographic-specific considerations. Predictive models must adapt to these groups’ unique characteristics and risk factors. The practicality of implementing proportionate care delivery based on readily available clinical data underscores the value of leveraging existing resources effectively. As technology and healthcare data continue to advance, there is an opportunity for future research to harness real-time data from wearable devices and genetic information to enhance predictive models further. These emerging data sources could revolutionise GDM prediction and early intervention. Focusing on patient-centred outcomes and exploring the role of social determinants in GDM prediction can deepen our understanding of this condition. It can pave the way for more comprehensive and effective management strategies considering medical variables and broader contexts in which GDM occurs. This review offers valuable insights and directions for future studies in GDM prediction through machine learning techniques.