Introduction

Non-alcoholic fatty liver disease (NAFLD) shares a common pathophysiology with type 2 diabetes, obesity, dyslipidemia, and cardiovascular disease (CVD) [1]. Recently, a ‘multiple hit model’ has been accepted as a reasonable hypothesis for explaining the pathophysiology of NAFLD [2]. A sedentary lifestyle, poor eating habits, genetic factors, and epigenetic factors interact and synergistically modulate individual risk of NAFLD development.

Non-high-density lipoprotein (non-HDL) cholesterol, the result of subtracting high-density lipoprotein (HDL) cholesterol concentration from total serum cholesterol, is a strong predictor for CVD, which is the second most common cause of death in patients with NAFLD [3,4,5]. Although the influence of non-HDL cholesterol for CVD incidence has been established, there is a lack of data about the association between non-HDL cholesterol and NAFLD. A previous epidemiologic study revealed that non-HDL cholesterol level has a higher predictive power for the incidence of NAFLD than levels of total cholesterol, low-density lipoprotein (LDL) cholesterol, triglycerides, and HDL cholesterol [6]. In the aforementioned study, a total of 20.8% of people with a non-HDL cholesterol level between 130 and 160 mg/dL and 24.6% of those with a non-HDL cholesterol level > 160 mg/dL developed new-onset NAFLD whereas people with a non-HDL cholesterol level < 130 mg/dL did not develop NAFLD [6]. However, there is potential limitation in the previous study because only a spot-checked non-HDL cholesterol level was used, even though the non-HDL cholesterol level changes with time. Maintaining a lower non-HDL cholesterol level is suggested as the best strategy for the management of CVD [7], and thus, it should be a crucial issue whether changes in non-HDL cholesterol with time are significant to predict the incidence of NAFLD.

In the previous NAFLD GWAS study, genetic variants for pathogenesis and prognosis were discovered through various methods [8]. In particular, phospholipase domain-containing 3 (PNPLA3) [

Fig. 1
figure 1

Flow chart of the study population

Data collection

Each participant’s height (cm) and weight (kg) were measured to the nearest 0.1 cm and 0.1 kg, respectively. Body mass index (BMI, kg/m2) was calculated. Waist circumference (WC, cm) was measured to the nearest 0.1 cm in the horizontal plane: midway between the lowest rib and the iliac crest. The average of the last two measured values were defined as the systolic blood pressure (SBP) and diastolic blood pressure (DBP); we also calculated the mean blood pressure (MBP).

Each participant was requested to respond to self-reported questionnaires regarding his/her diet, smoking status, alcohol drinking status, and physical activity. For the assessment of diet, a validated, 103-item semi-quantitative food frequency questionnaire was used. Total energy intake (kcal/day) was calculated. For smoking status, participants were classified as a never smoker, an ex-smoker, an intermittent smoker, or a daily smoker. The amount of alcohol intake (g/day) was calculated by multiplying the average amount of pure alcohol (10 g/per glass of drink), the number of glasses of alcoholic drinks consumed at a time (glasses/time), and the frequency of alcohol use (times/days). After excluding heavy drinkers, participants were divided into current drinkers or not. Physical activity of each participant was evaluated using an International Physical Activity Questionnaire. The metabolic equivalent of task (MET)-hours per day (MET-hr/day) was estimated and participants were classified into three categories according to their physical activity levels: low (< 7.5 MET-hr/day), moderate (7.5–30 MET-hr/day), or high (> 30 MET-hr/day).

After at least 8 h of fasting, blood samples of each participant were collected. Whole blood platelet count, fasting plasma glucose (FPG), concentrations of serum insulin, total cholesterol, triglyceride, HDL cholesterol, aspartate aminotransferase (AST), alanine aminotransferase (ALT), and C-reactive protein (CRP) were analyzed. Non-HDL cholesterol was calculated by subtracting serum HDL cholesterol level from serum total cholesterol level. In the case of serum triglyceride < 400 mg/dL, LDL cholesterol was calculated using the Friedewald formula.

We defined hypertension (HTN) as (1) a SBP ≥ 140 mmHg, (2) a DBP ≥ 90 mmHg, or (3) having treatment with anti-hypertensive medications [20]. Diabetes mellitus (DM) was defined as (1) a FPG ≥ 126 mg/dL, (2) a plasma glucose level ≥ 200 mg/dL at 2-h after the 75 g oral glucose tolerance test, (3) a glycosylated hemoglobin level ≥ 6.5%, (4) having treatment with anti-diabetic medications, or (5) having treatment with insulin therapy [21]. Dyslipidemia was defined as having serum total cholesterol concentration ≥ 240 mg/dL, LDL cholesterol concentration ≥ 160 mg/dL, HDL cholesterol concentration < 40 mg/dL, triglyceride concentration ≥ 200 mg/dL, or treatment with lipid-lowering medications [22].

Serum non-HDL cholesterol trajectories

During the mean 5.76 years of the exposure period, temporal serum non-HDL cholesterol trends were determined by trajectory modeling with the concentration of serum non-HDL cholesterol at the baseline survey, first follow up, second follow up, and third follow up. We used group-based trajectory modeling to classify the trend of serum non-HDL cholesterol over time. This modeling assumes that participants are part of multiple trajectory groups capable of simultaneously estimating probabilities for multiple trajectories [23, 24]. According to these assumptions, the time-dependent covariates account for the variation in the mean trajectory within each group. The trajectories of serum non-HDL cholesterol of each group were classified using the r package ‘traj.’ In addition, the optimal number of non-HDL cholesterol trajectories of each group was evaluated using the r package ‘NbClust.’ Based on the trajectory modeling results, we categorized people into two groups, namely, (1) an increasing non-HDL cholesterol trajectory group and (2) a stable trajectory group (Additional file 1: Fig. S1).

Assessment of NAFLD

To assess NAFLD status, we used a NAFLD-liver fat score. The formula for the NAFLD-liver fat score is as follows:

NAFLD-liver fat score = − 2.89 + 1.18 \(\times\) metabolic syndrome (Yes: 1, No: 0) + 0.9 \(\times\) DM (Yes: 1, No: 0) + 0.15 \(\times\) insulin (µIU/mL) + 0.04 \(\times\) AST (U/L)—0.94 \(\times\) AST/ALT.

The presence of NAFLD was defined as a NAFLD-liver fat score greater than − 0.640 [25].

Genoty**

Genomic DNA was extracted from the participants’ peripheral blood and genotyped using the Affymetrix Genome-Wide Human SNP Array 5.0  [26]. Single-nucleotide polymorphisms (SNPs) with minor allele frequencies (MAF) < 0.05, genotype calling rates < 95%, or deviated from the Hardy–Weinberg equilibrium (p < 1.0 \(\times\) 10–6) were removed. Then, participants with inconsistent sex or calling rates at ~ 90% were excluded. Plink (v1.90) was used for quality control [27]. To impute the missing genotype data, the Beagle 5.0 software program was used [28]. Further details regarding the protocol have been described by Chung W et al. [28].