Introduction

In vitro fertilization and embryo transfer (IVF-ET), as the primary method for treating infertility, offers hope to the majority of infertile couples to conceive their own children. With the advancement of assisted reproductive technology, the success rate of IVF-ET treatments has significantly improved. Research indicated that transferring high-quality embryos can enhance clinical pregnancy and live birth rates [1,2,3,4,5], whereas low-quality embryos may increase the risk of miscarriage [6]. Embryo quality also had a significant impact on the implantation success rates of natural cycle in vitro fertilization (NC-IVF) [5]. Furthermore, the transplantation of non-optimal embryos was associated with a higher incidence of ectopic pregnancy [7]. Therefore, transferring high-quality embryos can improve clinical pregnancy outcomes.

In the process of ovarian stimulation during IVF-ET, poor ovarian response (POR) is a primary cause for adverse pregnancy outcomes. POR is characterized by high cycle cancellation rates, increased gonadotropin (Gn) dosage, fewer eggs retrieved, suboptimal oocyte quality, and lower clinical pregnancy rates. According to the Poseidon criteria, based on age, antral follicle count (AFC), and anti-Müllerian hormone (AMH), patients are divided into two main categories: ‘unexpected’ poor responders (groups 1 and 2) and ‘expected’ poor responders (groups 3 and 4). POR limits the success of treatment with assisted reproductive technologies (ART) [8], especially for patients in POSEIDON Groups 3 and 4, these patients face a high risk of not producing high-quality embryos suitable for transfer, often leading to multiple ovarian stimulation attempts. This not only increases the physical and emotional strain but also escalates the financial burden [9]. The Progestin Primed Ovarian Stimulation (PPOS) regimen has been clinically confirmed to effectively suppress the luteinizing hormone (LH) peak, with no adverse effect on the quality of eggs and embryos, making it a safe and effective protocol [10,11,12]. Thus, as controlled ovarian hyperstimulation(COH) trends towards simplification and individualization, the PPOS protocol has gained widespread clinical application.

Machine learning (ML), a subset of artificial intelligence technologies, employs algorithms that adapt and enhance performance by continuously processing tasks and accumulating experience, thereby adjusting parameters automatically without explicit programming [13]. ML encompasses various methodologies, including Logistic Regression (LR), Support Vector Machines(SVM), Decision Trees, Random Forests(RF), Neural Networks(NN), and Naive Bayesian learning. Some studies have confirmed that machine learning approaches can achieve better predictive performance than traditional statistical methods [14, 15]. As ML technology evolves, it promises to enhance IVF success rates by aiding clinical decision-making and predicting reproductive outcomes [16, 17].

However, research remains limited on the factors influencing the formation of high-quality embryos in POSEIDON ‘expected’ patients undergoing the PPOS protocol. This study aims to explore these factors and utilize machine learning techniques to establish a predictive model for ovulation induction outcomes based on individual patient characteristics.

Materials and methods

Patients

This was a retrospective cohort study conducted at Sichuan **xin **nan Women and Children’s Hospital (China). All the fresh IVF cycles performed in infertile couples from January 2015 to December 2021, were reviewed for possible inclusion. Inclusion criteria: (1) in accordance with the criteria of POSEIDON’s expected low prognosis: AFC < 5 and AMH < 1.2 ng / ml, (2) the ovulation inducing formula was PPOS regimen. Exclusion criteria: (1) the presence of reproductive or endocrine system disorders such as endometriosis, uterine fibroids, adenomyosis, polycystic ovary syndrome, and thyroid function abnormalities; (2) use of donor eggs or no eggs retrieved; (3) chromosomal abnormalities of one or both couples; (4) missing data. The discussion concerning high-quality cleavage embryos is confined to 4,216 cycles. The analysis of high-quality blastocysts is solely restricted to 1,924 cycles who had all their embryos cultured to the blastocyst stage (Fig. 1).

Fig. 1
figure 1

Patient inclusion flowchart

Variables

Gathering clinical information from patients, which includes the duration of infertility, type of cycle (initial or repeated), nature of infertility (primary or secondary), age, body mass index (BMI), basal follicular-stimulating hormone (FSH), basal estradiol (E2) ,basal progesterone (P), basal LH, FSH/LH, AFC, AMH, FSH on human chorionic gonadotropin (HCG) day, LH on HCG day, P on HCG day, E2 on HCG day, number of follicles ≥ 14 mm on HCG day, total dose of gonadotropin (Gn), dosing days of Gn, the number of oocytes retrieved, MII oocytes and 2PN fertilized oocytes. In addition to the aforementioned variables, we have also incorporated delta FSH, delta LH, delta P, and delta E2, which represent the changes in hormonal levels from the initiation day to the day of HCG day. In the analysis of high-quality blastocysts, we also collected data on the number of cleavage embryos and high-quality cleavage embryos.

Clinical protocol

Patients [18, 19] received oral administration of medroxyprogesterone acetate (MPA, **anju Pharmaceutical, Zhejiang, China) at a dosage of 6–10 mg/day from the 2nd to the 5th days of the menstrual cycle. On the same days, FSH (Urofollitropin, Lizhu Pharmaceutical Group, Shanghai, China) was injected at a dosage of 150–300 IU per day. During the ovarian stimulation period, follicle development and serum levels of LH and E2 were monitored to adjust the dose of Gn. When at least one follicles reached a diameter of ≥ 17 mm, an injection of HCG (Merck Serono, Switzerland or Lizhu Pharmaceutical, China) and/or a GnRH agonist (Ferring Pharmaceutical, Switzerland) was used as the trigger. Oocyte retrieval was then performed 34–36 h later.

Embryo quality assessment

For cleavage embryos, embryos were assigned a subjective score based on the regularity or symmetry of blastomere size, the quality of the cytoplasm, and the degree of embryonic fragmentation in accordance with the specifications of Cummins et al [20]. Thus, a badly fragmented, irregularly cleaved embryo with a patchy or grainy cytoplasm would be assessed as grade IV, whereas an embryo of the highest quality would be assessed as grade I. In the present study, grade I and II embryos were defined as high quality; grade III and IV were defined as low quality. For blastocyst evaluation, we use the Gardner [21] criteria to grade blastocysts into six categories based on the size of the blastocoel, the development of the inner cell mass, and the trophectoderm. Grades 1–3 represent lower quality, while grades 4–6 indicate higher quality.

Statistical analysis

All statistical analyses and model building were conducted using R software (version 4.3.1), utilizing packages such as randomForest, nnet, xgboost, e1071, rpart, caret, pROC, and ggplot2. The data set of the patients was randomly divided into the training set and the validation set (5:1). Descriptive statistics of quantitative and qualitative data were presented as mean (SD) and numbers (percentages), respectively. Based on the data, the T-test was conducted for normally distributed continuous variables, the Mann–Whitney U test was conducted for non-normally distributed continuous variables, and the Chi-squared test was conducted for classified variables. Univariate and multivariate logistic regression analysis was conducted to identify factors influencing the formation of high-quality embryos in patients. Statistical significance was considered at P < 0.05. Subsequently, predictive models were constructed using LR, RF, NN, XGBoost, SVM, NB, Decision Trees, and K-Nearest Neighbors (KNN). Model training involved the use of ten-fold cross-validation and grid search to determine the optimal parameters for each algorithm, aiming to enhance model performance. The performance of each model was evaluated using the area under the receiver operating characteristic curve.

Results

High-quality cleavage embryos

Analysis of baseline information

From January 2015 to December 2021, 4,216 cycles who used the PPOS protocol and met the POSEIDON criteria for the expected POR group were included in this study. These cycles were allocated into either training set (N = 3372) or validation set (N = 844) for model establishment and validation. The baseline characteristics were shown in Table 1. In the training set, 1555 cycles (46.1%) achieved high-quality cleavage embryos. There were no statistically significant differences in the cycle type, infertility type, duration of infertility, age, BMI, basal FSH, basal LH, FSH/LH, basal P, basal E2, AFC, AMH, and the formation of high-quality cleavage embryos between the training set and the validation set (P > 0.05).

Table 1 Baseline characteristics of study population

Differences in fourteen factors between high-quality and non-high-quality cleavage embryo groups

The participants were divided into two groups based on the acquisition of high-quality cleavage embryos: the high-quality cleavage embryo group (N = 1950) and the non-high-quality cleavage embryo group (N = 2266). In the high-quality cleavage embryo group, the AFC, AMH, P on HCG day, E2 on HCG day, delta E2, and the number of follicles ≥ 14 mm on HCG day were significantly higher compared to the non-high-quality embryo group (P < 0.05). Women who obtained high-quality cleavage embryos used more Gn for longer durations and retrieved more oocytes, M II oocytes, and 2PN fertilized oocytes (P < 0.05). These findings suggest that patients in the high-quality cleavage embryo group exhibit higher ovarian responsiveness and better ovarian reserve (Table 2).

Table 2 Comparison between the high-quality cleavage embryo group and the non-high-quality cleavage embryo group

Fourteen factors are associated with the formation of high-quality cleavage embryos

The overall rate of obtaining high-quality cleavage embryos was 46.3%. In the univariate logistic regression analysis, fourteen factors were associated with the formation of high-quality cleavage embryos: basal FSH, basal LH, AFC, AMH, LH on HCG day, E2 on HCG day, delta LH, delta E2, number of follicles ≥ 14 mm on HCG day, total dose of Gn, dosing days of Gn, number of retrieved oocytes, MII oocytes, and 2PN fertilized oocytes (P < 0.05). After adjusting for confounding factors, basal LH, delta LH, number of retrieved oocytes, and 2PN fertilized oocytes were identified as independent predictors of obtaining high-quality cleavage embryos (P < 0.05) (Table 3).

Table 3 Univariate and multivariate logistic regression analysis

Construction and evaluation of the prediction model

The factors significantly associated with the formation of high-quality cleavage embryos were selected for the construction of a predictive model. In the initial phase of constructing our predictive model, we did not include the number of retrieved oocytes, M II oocytes, and 2PN fertilized oocytes (Table 7). The performance evaluation and ROC curves of different models are available in Table 8; Fig. 2. In M1, the AUC values for all models were not very satisfactory, with the XGBoost model performing slightly better than others (AUC = 0.672, 95% CI = 0.636–0.708). In M2, although the performances of the models were comparable, the RF model exhibited superior performance (AUC = 0.788, 95% CI = 0.759–0.818). Additionally, in both M1 and M2, the performance of the KNN model was significantly below the expected standards.

Fig. 2
figure 2

(A) ROC curve of the M1; (B) ROC curve of the M2; (C) ROC curve of the M3

High-quality blastocysts

Analysis of baseline information

Cycles who underwent culture of all embryos to the blastocyst stage were included in the study (N = 1924). These cycles were divided into a training set (N = 1539) or a validation set (N = 385) for the development and validation of the model. Baseline characteristics are shown in Table 4. In the training set, 97 cycles (6.3%) obtained high-quality blastocysts. There were no statistically significant differences between the training and validation sets in terms of cycle type, infertility type, duration of infertility, age, BMI, baseline FSH, baseline LH, FSH/LH , basal P, basal E2, AFC, AMH, and the formation of high-quality blastocysts (P > 0.05).

Table 4 Baseline characteristics of study population

Differences in seventeen factors between high-quality and non-high-quality blastocyst groups

The participants were divided into two groups based on the acquisition of high-quality blastocyst: the high-quality blastocyst group (N = 124) and the non-high-quality blastocyst group (N = 1800). In the high-quality blastocyst group, basal P, AFC, AMH, FSH on HCG day, P on HCG day, E2 on HCG day, delta FSH, delta E2, number of follicles ≥ 14 mm on HCG day, retrieved oocytes, MII oocytes, 2PN fertilized oocytes, cleavage embryos, and high-quality cleavage embryos were all significantly higher compared to the non-high-quality blastocyst group. Conversely, age, BMI, and LH on HCG day were significantly lower in the high-quality blastocyst group (P < 0.05) (Table 5).

Table 5 Comparison between the high-quality blastocyst group and the non-high-quality blastocyst group

Thirteen factors are associated with the formation of high-quality blastocysts

The overall rate of obtaining high-quality blastocysts was 6.40%. AFC, E2 on HCG day, delta E2, number of follicles ≥ 14 mm on HCG day, retrieved oocytes, MII oocytes, and 2PN fertilized oocytes are not only influencing factors for high-quality cleavage embryos but also for high-quality blastocysts (P < 0.05). In addition, age, BMI, basal P, delta FSH, number of cleavage embryos, and high-quality cleavage embryos are also influencing factors for high-quality blastocysts (P < 0.05). After adjusting for confounding factors, age, AFC, number of 2PN fertilized oocytes, cleavage embryos, and high-quality cleavage embryos were identified as independent predictors for the formation of high-quality blastocysts (P < 0.05) (Table 6).

Table 6 Univariate and multivariate logistic regression analysis

Construction and evaluation of the prediction model

Factors significantly associated with the formation of high-quality blastocysts were used to construct the models (Table 7). The performance of each model and the ROC curves can be seen in Table 8; Fig. 2. Similar to the models predicting high-quality cleavage embryos, the KNN model performed far below the acceptable range. Among the other models, XGBoost achieved the best performance (AUC = 0.813, 95% CI = 0.741–0.884).

Table 7 Factors included in model construction
Table 8 Performance of three predictive models

Discussion

Selecting high-quality embryos is crucial for successful pregnancy. In our study, we established three predictive models: M1 and M2 are designed to predict the formation of high-quality cleavage embryos, M3 is aimed at predicting the formation of high-quality blastocysts. The performance of these models demonstrates that they have predictive value for the formation of high-quality cleavage embryos or blastocysts. However, all models in M1 performed poorly, with AUC values below acceptable levels. This could be attributed to the composition of our cohort or possibly the factors included. By incorporating three additional variables—retrieved oocytes, MII oocytes, and 2PN fertilized oocytes—into M2, the model’s performance significantly improved, with a notable increase in AUC. This suggests that these variables play a crucial predictive role in the formation of high-quality cleavage embryos.

Our research has found that indicators such as age, AMH, AFC, FSH, and LH are associated with the formation of high-quality embryos. Previous research [22] has revealed that with the increase in women’s age, the incidence of aneuploidy in embryos and oocytes, as well as the decline in embryo quality, increase. These changes result in a reduced number of viable embryos and an increased risk of miscarriage. Some studies [22,23,24,25,26,27,28,29,30,31,32,33] indicated that as women age and baseline ovarian markers change, such as lower AMH and AFC, along with higher basal FSH, the prognosis was observed to worsen. High E2 levels are common during COH, and elevated E2 may affect embryo quality and further affect pregnancy outcomes in IVF [34,35,36,37]. The results of a study [34] showed that a decline of more than 30% in donor serum E2 levels during the ovarian stimulation process adversely affected the quality of recipient embryos. A decrease in E2 levels adversely impacts embryo quality, leading to reduced clinical pregnancy rates, ongoing pregnancy rates, and an increased rate of early miscarriage [38]. Our study aligns with previous findings, observing that for both high-quality cleavage embryos and blastocysts, the parameters of AFC, AMH, E2 on HCG day, and delta E2 are significantly higher in high-quality embryos compared to non-high-quality embryos. While age shows no significant difference between the high-quality and non-high-quality cleavage embryo groups, there is a notable difference in the blastocyst groups. This suggests that for older patients with expected POR, the decision to culture all embryos to the blastocyst stage should be made with caution.

The two-cell theory suggested that normal follicular growth and maturation require both LH and FSH, and that the levels and ratios of these hormones are critical at different points in the menstrual cycle [39]. The fluctuations in LH levels during the follicular phase significantly impact the morphological and functional changes of the oocytes, thereby affecting their meiotic state and the fertilization capability of the zygote [40]. A prospective study [41] indicated that a decrease in LH levels during controlled ovarian stimulation was associated with a decline in oocyte and embryo quality. Previous studies on long and antagonist protocols [42, 43] indicated that LH levels below 0.5 IU/L or 1.0 IU/L on the day of triggering are associated with reduced oocyte retrieval rates and fewer high-quality embryos. In contrast, our study shows that the average LH levels in all four groups of patients were significantly higher than these thresholds, and our protocol was the PPOS protocol. This might explain the different outcomes observed in our study. Additionally, research had demonstrated that basal FSH levels are correlated with overall ovarian responsiveness [44]. Our study results showed that although there was no significant statistical difference in basal FSH levels between the high-quality cleavage/blastocyst group and the non-high-quality group, basal FSH was significantly associated with the formation of high-quality cleavage embryos in the univariate logistic regression analysis. This discrepancy may be due to our target population consisting of expected POR rather than unexpected POR patients.

Ovarian stimulation inducing multi-follicular growth can lead to the collection of multiple oocytes. Several studies had suggested that a higher number of retrieved oocytes was associated with improved outcomes [45,46,47], whereas contrasting research has posited that an increase in the number of retrieved oocytes was correlated with a decline in oocyte quality, subsequently leading to embryos with reduced developmental potential [48]. Our results indicate that the number of oocytes retrieved in the high-quality cleavage/blastocyst groups was significantly higher than in the non-high-quality groups, suggesting that a greater number of retrieved oocytes increases the likelihood of obtaining high-quality embryos. However, multivariate logistic regression indicates that an increased number of retrieved oocytes is an independent risk factor for high-quality cleavage embryos (OR = 0.805, 95% CI = 0.701–0.925). Therefore, there may be an optimal range of oocyte numbers that can enhance embryo quality and optimize live birth rates. It is well-known that the number of 2PN fertilized oocytes reflects, to a certain extent, the quality of oocytes, laboratory culture conditions, and operational techniques. High-quality embryos originate from a good ovarian response, and the number of 2PN fertilized oocytes can effectively reflect the quality of both sperm and oocytes, significantly influencing the formation of high-quality embryos [49]. Therefore, clinicians should thoroughly evaluate ovarian reserve function before treating patients, administer ovarian stimulation medications, and strive to improve oocyte quality and increase the number of 2PN fertilized oocytes to enhance the rate of high-quality embryo formation.

ML enables the interpretation of data and the construction of prediction models, has been increasingly utilized in clinical settings, particularly within complex systems involving multiple variables [50, 51]. Our study is the first to employ various machine learning methods, utilizing patient clinical characteristics and laboratory data, to establish a predictive model for high-quality embryo formation in expected POR patients undergoing PPOS protocol. Regardless of whether it was M1, M2, or M3, the models built using the KNN method consistently underperformed compared to other machine learning techniques, suggesting that our data might not be suitable for the KNN method. In contrast, XGBoost performed well across Models M1, M2, and M3. XGBoost is a tree-based algorithm that predicts by constructing multiple decision trees. It has natural robustness to outliers, which means that outliers are less likely to significantly impact the choice of split points in the model. This robustness could be a critical factor in its superior performance across various models.

Several limitations of this study warrant attention. First, it is a retrospective study based on data from a single center, there is a certain risk of bias, and the collected data inevitably contain human errors. Second, the study included a limited set of clinical features and did not conduct stratified analyses of factors such as male semen quality, thus presenting certain limitations. Future efforts will aim to expand the range of predictive factors screened to further optimize the model. Additionally, this model was developed based on patients with POR and may not be applicable to other groups. Finally, although the formation of high-quality embryos can reflect embryo quality, it does not fully represent pregnancy outcomes. Future studies could expand the sample size and undertake prospective, multicenter research to provide references for the clinical treatment of infertility.

Conclusion

In summary, our study identified basal LH, delta LH, the number of retrieved oocytes, and 2PN fertilized oocytes as independent factors influencing high-quality cleavage embryos, while age, AFC, the number of 2PN fertilized oocytes, cleavage embryo, and high-quality cleavage embryo were independent factors for high-quality blastocysts. Additionally, we integrated readily available predictive variables such as E2 on HCG day, delta E2, delta LH, retrieved oocytes, MII oocytes, and 2PN fertilized oocytes to construct predictive models. These models are used to forecast the formation of high-quality cleavage embryos and blastocysts in women with POR undergoing treatment with the PPOS protocol.