Introduction

Type 2 diabetes mellitus (T2DM), hypertension and hyperlipidemia are chronic conditions that can result in severe complications1,2, including cardiovascular disease (CVD), the leading cause of death worldwide3,4. Unfortunately, a large number of patients with these conditions remain undiagnosed. Most updated literatures showed that in 2019 there are 50.1% (231.9 million) of diabetes patients still undiagnosed worldwide5. A large proportion of cases with hypertension6 and hyperlipidemia7 are also unware of their condition, particularly in low and middle income countries8.

Screening of those at risk of chronic conditions is of significance for both individuals and wider society, yet there are gaps in current practices. Early identification is generally based on commonly collected risk factors, such as age, gender, smoking status, body mass index (BMI) and family history. A number of societies and task forces have recommended various screening guidelines that consist of these risk factors9,10,11; however, there are growing concerns that such guidelines might be inadequate and inaccurate12,13,14,15. For example, the American Diabetes Association (ADA) and the US Preventive Services Task Force (USPSTF) guidelines have shown only a fair performance when externally validated12,13. Furthermore, a trial exploring the effectiveness of a population-based screening programme in the United Kingdom found that screening was not associated with a reduction in all-cause mortality over a median period of 9.6 years15. A number of commonly used screening functions have also been shown to be ineffective in population screening16.

To include novel or extra factors might help to identify high risk groups more accurately and has the potential to improve current screening guidelines for chronic conditions, in terms of both effectiveness and efficiency. Indeed, a growing body of studies has identified a strong relationship between women’s reproductive factors (RF) and chronic conditions. For example, early menarche has been found to be associated with an increased risk of T2DM17,18, obesity and insulin resistance19. Moreover, a retrospective study conducted in Europe showed that, after adjustment for confounding, early menopause and shorter reproductive life span was associated with T2DM20. A Japanese study also found a similar relationship regarding hypercholesterolemia21. Furthermore, the China Kadoorie Biobank study reported that Chinese women with late menopause (≥53 years) were 1.21 (95% CI: 1.03–1.42) times more likely to have T2DM than women with menopause at 46–52 years old (p < 0.0001)22. Another Chinese study also found that a higher number of live births was associated with hypertension and DM, and mediated by lifestyle and dyslipidemia55. High accessibility and low cost are outstanding advantages of RFs. Furthermore, there are studies that have shown that the validity and reproducibility of self-reported RFs are good56. Therefore, RFs have potential as screening tools for chronic conditions and could improve current screening guidelines.

A number of important limitations need to be considered. First, this is a cross-sectional study and hence it cannot infer causality. Second, this study only tested the possibility that RFs can be incorporated into a screening tool and did not give the actual sensitivity and specificity of RFs to screen for chronic conditions. In terms of further research, a structured screening tool should be developed and externally validated. Third, although not included in the current study, uric acid and HbA1c are also crucial biomarkers for chronic conditions and it is important that future research takes them into account. Last, interpretability is always a key concern when applying machine learning to medical data analysis. Many advanced methods have been proposed to unfold the black box of neuron networks. In future study, we hope to focus on this specific question and explore the GCC more comprehensively.

To conclude, autoencoder performed well in the dimensionality reduction of clinical biomarkers, demonstrating its potential in further medical data process. Women with earlier age at menarche and shorter reproductive life span are more likely to suffer from chronic conditions. Due to high accessibility and effectiveness, RFs show potential to be included in preliminary screening tools for general chronic conditions in clinical practice and could enhance current screening guidelines.