Background

DNA methylation is a cellular activity at which the hydrogen atom on carbon 5 in the cytosine of CpG di-nucleotide (also called CpG marker) is replaced by a methyl group [1]. Through DNA methylation, gene activity can be silenced either by interfering with the binding of transcription factors or by interacting with the modification of histone protein [2].

Previous studies have demonstrated that abnormal DNA methylation led to gastric carcinogenesis by either hyper-methylating several tumor-suppressive miRNAs [3,4,5] or hypo-methylating onco-miR [6]. In addition, DNA methylation also regulated the erythropoiesis of embryonic stem cell [7], the pathogenesis of idiopathic pulmonary fibrosis [8], the neurodevelopment of the human hippocampus [9], and other processes. In addition to regulating disease pathogenesis, DNA methylation also performs long-term regulatory activities. Children suffered from early adversity, such as being raised in an orphanage, had higher global methylation patterns, and their neural-related genes were silenced by hyper-methylation [10]. Moreover, DNA methylation was also involved in nutritional control of the reproductive statuses of honeybees, as a result controlling the generation of workers or queens [11]. Through regulating the expressions of many critical genes, DNA methylation plays important roles not only in cellular activities but also in many human diseases. However, few DNA methylation-related studies have been conducted for Kawasaki disease.

Kawasaki disease (KD) is an acute systemic vasculitis disease, and it usually attacks children less than 5 years of age. The most severe complication of KD is coronary artery aneurysm (CAA), which affects approximately 20–25% of KD patients without timely treatment with intravenous immunoglobulin (IVIG) [12]. Therefore, KD is the major cause of acquired heart disease in children in developed countries [13]. The etiopathogenesis of KD may be attributed to the combined effects of genetics, immunity, and infection [14]. Although the exact etiology of KD is still unknown, predicting KD is possible with molecular markers [15]. To date, only few studies have focused on the regulation of DNA methylation in KD [16, 17]. However, these studies only conducted profiling of DNA methylation patterns, without further investigating whether the extent of DNA methylation affected the pathogenesis of KD. In addition, although considered to be negatively correlated with gene expression, DNA methylation of several CpG markers was reported to promote gene expressions [18, 19].

To answer these questions, we conducted a study in which we collected DNA and RNA samples from KD subjects, followed by combining the DNA methylation profiling data with the gene expression information for a systems biology perspective. Previous studies determined the correlations between DNA methylation and gene expression with CpG beta values (methylation percentages) and gene expression intensities [19]. In this study, we focused on the variation ratios of CpG beta values and the ones of gene expression intensities among different sets of samples. First, we identified modestly negative correlations between DNA methylation and gene expression regardless of whether the CpG markers were located upstream or downstream of the promoter regions. Second, we showed that the S100A gene family enhanced leukocyte transendothelial migration in KD with an in vitro cell model.

Results

Subject information

In this study, we enrolled 24 non-fever healthy control subjects (HC), 21 fever control subjects (FC, patients with fever but not diagnosed as KD or not having a KD history) and 18 KD patients. Blood samples from the KD patients were drawn both at the acute phase 1 day before IVIG treatment (KD1) and at the convalescent phase 3 weeks after IVIG treatment (KD3). Blood samples from the remaining subjects were drawn once. As shown in Additional file 1, 8 out of the 21 FC subjects suffered from acute sinusitis and 19.5 and 14.3% of the FC subject population had gastroenteritis and bronchopneumonia, respectively. No significant difference was observed in age (p = 0.0536, t test) or gender (p = 1, Fisher’s exact test) between the 12 HC and 12 KD subjects whose samples were used for the Illumina HumanMethylation 450 BeadChip assays (M450 K). In addition, no significant difference was observed in age (p = 0.1108, t test) or gender (p = 0.7, Fisher’s exact test) between the 18 HC and 18 KD subjects used for the Affymetrix GeneChip® Human Transcriptome Array 2.0 (HTA 2.0) assays. All of the KD patients met the diagnosis criteria of AHA 2004 [20].

DNA methylation variations among samples

From the total HC, KD1, and KD3 DNA samples, we selected 12 HC, 12 KD1, and 12 KD3 ones for bisulfite conversion, followed by M450K assays on the 36 bisulfite converted DNA samples (Additional file 1). The generated raw data was analyzed with Partek. First, we examined the overall methylation patterns of the three sets using a PCA plot. As shown in Fig. 1a, the three sets can be clearly distinguished in terms of their methylation patterns. The KD3 set was located distinct from the other two ones, whereas, the HC and KD1 sets slightly overlapped with each other. When the FDR < 0.05 and variation ratio > 1.1 criteria were specified, there were 12,209, 13,936, and 14,643 significant CpG markers among the KD1 vs. HC, KD3 vs. HC, and KD3 vs. KD1 comparisons (Table 1), respectively. These significant CpG markers formed a union of 25,984 CpG markers, and the heat map of which is demonstrated in Fig. 1b. Table 1 and Fig. 1b show that most of the significant CpG markers in the KD1 vs. HC comparison were hypo-methylated in the KD1 samples, reflecting hypo-methylation of CpG markers with the onset of KD.

Fig. 1
figure 1

DNA methylation profiles among the HC, KD1, and KD3 sets. We conducted methylation microarray (M450K) assays on 12 HC, 12 KD1, and 12 KD3 samples. The generated raw data was analyzed with Partek to produce a a PCA plot and b a heat map. The heat map was plotted with the methylation profiles of 25,984 CpG markers

Table 1 Summary of significant CpG markers among the comparisons

The Manhattan plots of the three comparisons were also provided (Additional files 2, 3, and 4). Although the numbers of significant CpG markers in the three comparisons were almost equivalent (Table 1), the Manhattan plots showed that the KD3 vs. HC and KD3 vs. KD1 comparisons, both of which involved in the IVIG administration factor, had much lower p values and much more significant CpG markers. In our previous study, using M27K assays, we observed that IVIG administration had a much stronger impact on methylation variation than disease onset did [16]. Our current data also supported this finding.

Methylation variations of CpG markers within the putative promoter regions

Next, we investigated the methylation variations of CpG markers based on the distance to the transcription start sites (TSSs) of genes. Since a promoter is a rough and ambiguous region relative to the TSS of a gene, studies have defined their putative promoter regions with different distances to the TSS [21, 22]. In this study, we adopted the default parameter of Partek and defined a promoter as the region ranging from − 5000 to 3000 of a transcript’s TSS (RefSeq 41 annotation). Then, we mapped all significant CpG makers (P < 0.05) back to the promoters and marked their methylation variation ratios. According to Fig. 2, the densities of the significant CpG markers seemed to be higher within the − 1500 to 500 regions than the ones out of this region. To examine the densities of CpG markers within the promoters, we also mapped all CpG markers (both significant and non-significant) back to the promoters. As a result, we observed results similar to those shown in Fig. 2 (Additional file 5). Therefore, higher densities of CpG makers within the − 1500 to 500 regions were an intrinsic characteristic of the M450K microarray chip.

Fig. 2
figure 2

Methylation variations of significant CpG markers within the putative promoter regions. By referring to the RefSeq 41 annotation, we can determine a CpG marker’s distances to transcription start site (TSS) of a gene’s transcript. Then, we can also determine the relative locations of CpG markers within the putative promoter regions, which are the genomic regions ranging from − 5000 bp to + 3000 bp of a transcript’s TSS. a, c, e For each CpG marker, the X and Y axes denoted its distance to TSS and its methylation variation, respectively. Using the two arrows, the promoter was split into three sub-regions, the left, the core and the right sub-regions. The methylation variations (average ± S.D.) of the CpG markers located within each sub-region were labeled. The sample sizes for sub-figures a, c, e were 205,306, 393,023, and 385,840, respectively. b, d, f The box plots and t test demonstrated that the CpG markers within the core sub-region varied more than those within the other two sub-regions (P < 2.2E−16 for the six comparisons)

Figure 2a, c, e also shows that CpG markers within the − 1500 to 500 region tended to vary more than the rest CpG markers outside of this region. To investigate this issue, we divided the putative promoter region (− 5000 to 3000 bp) into three sub-regions as follows: the left (− 5000 to − 1500 bp), core (− 1500 to 500 bp), and right (500 to 3000 bp) sub-regions. As shown in Fig. 2b, d, f, consistently among the three comparisons, the CpG markers within the core regions significantly varied more than the ones within the two adjacent regions (P < 2.2E−16), implying that the CpG makers closer to the TSS of the transcript regulated gene expression more significantly.

Gene expression variations among samples

From the total HC, KD1, and KD3 RNA samples, we selected 18 HC, 18 KD1, and 18 KD3 ones to generate 3 HC, 3 KD1, and 3 KD3 evenly pooled samples. We then conducted the HTA 2.0 assays on the 9 pooled RNA samples (Additional file 1). The generated raw data was analyzed with Partek. Like DNA methylation, we also examined the overall gene expression patterns of the three sets with a PCA plot. As shown in Fig. 3a, the distinguishability of the three sets based on the gene expression data was not as good as that based on the DNA methylation data, especially for the HC and KD3 sets. Table 2 shows only 10 significant genes (P < 0.05 and expression ratio > 1.5) in the KD3 vs. HC comparison, and the union of all significant genes comprised 936 genes. Using the 936 union genes, we drew a heat map (Fig. 3b), which demonstrated that the KD3 samples were hardly distinguishable from the HC ones based on the gene expression profiles.

Fig. 3
figure 3

Gene expression profiles among the HC, KD1, and KD3 sets. We conducted gene expression microarray (HTA2.0) assays on three pooled HC, three pooled KD1, and three pooled KD3 samples. The generated raw data was analyzed with Partek to produce a a PCA plot and b a heat map. The heat map was plotted with the gene expression profiles of 936 genes

Table 2 Summary of significant genes among the comparisons

Correlation between gene expression and DNA methylation

So far, we obtained both DNA methylation and gene expression data from the HC, KD1 and KD3 samples. DNA methylation was usually thought to be negatively correlated with gene expression. The higher the CpG marker was methylated, the less abundantly the gene was expressed. However, previous studies also found positive correlations, globally or specifically [18, 19]. In addition, few studies have attempted to investigate to what extent DNA methylation on CpG marker altered gene expression. In other words, what is the global correlation pattern between DNA methylation and gene expression?

To globally and comprehensively address this question, we first constructed regulation pairs of CpG markers and genes (see the “Methods” section), followed by tabulating the variation ratios of CpG markers and genes in each comparison, e.g., KD1 vs. HC. With this approach, we could calculate the correlation coefficient between the variation ratios of gene expression and CpG marker methylation, investigating to what extent DNA methylation repressed or activated gene expression.

We first constructed random regulation pairs of CpG markers and genes by randomly assigning one CpG marker and one gene into one pair without considering whether the marker was located within the putative promoter or not. As shown in Additional file 6, the Spearman’s rank correlation coefficients of the three comparisons (random column, sub-figure a, b and c) were almost zero, reflecting pretty low correlations. Then, we considered all regulation pairs of CpG markers and genes (both significant and non-significant). We also divided the regulation pairs of CpG markers and genes into two sets, based on their genomic positions being located upstream or downstream of the TSS. As shown in Additional file 6, the upstream, downstream, and both (union of the upstream and downstream sets) columns showed that Spearman’s rho values were a little bit lower than those of the random column, reflecting slightly higher negative correlations.

Next, we considered only the significant CpG markers (P < 0.05) and the significant genes (P < 0.05). In other words, only significant CpG markers and genes were included to construct the regulation pairs of CpG markers and genes. As shown in Fig. 4, the upstream, downstream, and both columns showed much lower Spearman’s rho values (P = 0.0246, paired t test) than the values in Additional file 6, reflecting stronger negative correlations between the three comparisons when only significant CpG markers and genes were considered.

Fig. 4
figure 4

The scatter plots of gene expression variations and DNA methylation variations for CpG markers located within the putative promoters. The X axis presented the gene expression variation determined with the HTA2.0 assay. The Y axis presented the DNA methylation variation determined with the M450K assay. Each dot denoted the regulation pair of one significant gene and one significant CpG marker; only those with a p value < 0.05 were concerned significant. For each comparison in each column, the Spearman’s rank correlation coefficient (denoted as rho) was labeled. The correlation coefficient was calculated with the data from the full-length promoter (the Both column), in the − 5000 to − 1 bp region (the Upstream column) and the + 1 to + 3000 bp region (the Downstream column). The sample size for sub-figures a to i were in order: 28,776, 3903, 61,055, 18,068, 2318, 36,770, 10,698, 1575, and 24,285

Figure 2 shows that the CpG markers located within the core sub-regions of the putative promoters better regulated gene expression. So, we further performed similar analyses using only the CpG markers located within the core sub-regions (− 1500 to 500 bp). As a result, Fig. 5 shows that although not yet significant (P = 0.0586, paired t test) owing to the small sample size, 7 out of 9 comparisons (except for subfigures h and i) had stronger negative correlations than those shown in Fig. 4, which was consistent with the conclusion of Fig. 2 that the CpG makers closer to the TSSs of the transcripts better regulated gene expression.

Fig. 5
figure 5

The scatter plots of gene expression variation and DNA methylation variation for CpG markers located within the core sub-regions of the putative promoters. In this figure, only the CpG markers within the core sub-region (Fig. 3) were included in this analysis. Therefore, the data presented in this figure is a subset of the one presented in Fig. 4. The Both, Upstream, and Downstream columns individually represented the − 1500 to + 500 bp, − 1500 to − 1 bp and + 1 to + 500 bp regions. The sample sizes for sub-figures a to i were in order: 17,891, 2735, 40,106, 13,298, 1868, 27,482, 4593, 867, and 12,624

In summary, no matter the CpG marker was located upstream or downstream of the transcript’s TSS, globally speaking, DNA methylation and gene expression maintained a modestly negative correlation, at least in the KD cases in this study.

Perfect cases of negatively correlated genes and CpG markers

In this study, we collected samples from the healthy controls (HC), patients before disease treatment (KD1), and patients after disease treatment (KD3). Therefore, we were interested in the variation profiles from HC to KD1 and from KD1 to KD3. In other words, we were interested in the genes or CpG markers that were upregulated from HC to KD1 and then downregulated from KD1 to KD3 (i.e., up-then-down cases). In addition, the down-then-up cases were also our targets. Figure 6 illustrates the perfect cases of negatively correlated genes and CpG markers. These perfect cases were composed of the up-then-down genes and the down-then-up CpG markers as well as the down-then-up genes and the up-then-down CpG markers.

Fig. 6
figure 6

The concept of perfect cases of negatively correlated genes and CpG markers. Among the three sample sets, we were especially interested in the variations of gene and CpG markers from HC to KD1 and from KD1 to KD3. The mUD and gUD individually denoted the CpG markers and genes that were first upregulated from HC to KD1 and then downregulated from KD1 to KD3, indicating the up-then-down cases. mDU and gDU individually denoted the CpG markers and genes that were first downregulated from HC to KD1 and then upregulated from KD1 to KD3, forming the down-then-up cases. In this manner, we identified 83 genes and their promoter CpG markers that were the perfect cases of negatively correlated genes and CpG markers

Among the significant genes shown in Table 2, we identified 98 down-then-up and 440 up-then-down genes (Fig. 6). In addition, among the significant CpG markers in Table 1, we identified 3230 down-then-up and 818 up-then-down CpG markers, which were located at the promoters of 440 and 247 genes, respectively. Further intersection analyses generated 83 (80 + 3) perfect genes possessing negative correlation with CpG markers from HC to KD1 and from KD1 to KD3. Gene expression at the transcriptional level is regulated by many factors. These 83 genes were negatively correlated with DNA methylation on their promoter CpG markers not only in the HC to KD1 transition but also in the KD1 to KD3 transition. Therefore, they were the perfect targets for the further functional analysis.

The regulatory roles of the S100A gene family

We further conducted GO analysis on the 80 genes, and the result is shown in Additional file 7. After careful inspection, we found that four out of the 80 input genes, including S100A8, S100A9, S100A12, and FCER1G, were repetitively involved in the top five GO items in terms of p value. Therefore, we conducted qPCR assays on the four genes and succeeded in detecting the S100A gene family, namely S100A8, S100A9, and S100A12. Figure 7a illustrates five, four, and one CpG markers on the putative promoter regions of S100A8, S100A9, and S100A12, respectively. These CpG markers were all statistically significant and were all down-then-up cases. The qPCR assays also confirmed that the S100A genes were all the up-then-down cases (Fig. 7b). In summary, in the transitions from HC to KD1 and from KD1 to KD3, the CpG markers were negatively correlated with S100A gene expressions, demonstrating epigenomic regulation abilities.

Fig. 7
figure 7

The expression variations of S100A family genes and methylation variations of the S100A-related CpG markers. a There were five, four, and one significant CpG markers (FDR < 0.05 and variation ratio > 1.1) located within the promoters of S100A8, S100A9, and S100A12, respectively. The locations of the ten CpG markers were indicated by the corresponding star signs. For each CpG marker, the Y axis of the box plot is the beta value (methylation percentage) determined with the M450K assays on 12 HC, 12 KD1, and 12 KD3 samples. b We used qPCR assays to detect gene expressions of S100A family genes on 24 HC, 21 FC, 17 KD1, and 18 KD3 samples. One KD1 sample failed the qPCR assay. The Y axis denoted the 2−ΔΔCt values. The data was presented as the average ± S.D. The values of the HC set were normalized to one. * and **** denoted p values < 0.05 and < 0.0001, respectively

We have demonstrated a global modestly negative correlation between DNA methylation and gene expression (Figs. 4 and 5). Here, we were also interested in to what extent these 10 CpG markers regulated the S100A genes. Using the 2−ΔΔCt values (Fig. 7) determined with qPCR to replace the intensity values determined with HTA2.0, we conducted similar assays. We found that the rho value between S100A8 and its promoter CpG markers was − 0.4388. And, the rho values for S100A9 and A12 were − 0.3972 and − 0.4543, respectively. Therefore, the S100A genes and their promoter CpG markers were moderately negatively correlated, indicating stronger correlations than the global profiles.

S100A8 and S100A9 are inflammatory markers that are usually highly expressed in acute and chronic inflammation. They are expressed and secreted into the plasma by neutrophils and/or monocytes, performing cytokine-like functions in inflammation [23, 24]. S100A8 and S100A9 are also involved in the pathogenesis of many diseases. They were reported to predict cardiovascular events in humans [25], to promote reticulated thrombocytosis and atherogenesis in diabetes patients [26] and to trigger inflammation, apoptosis, and tissue injury in the kidney [37]. For example, the ABCC10 gene is located at chromosome 6 and has two alternative splicing isoforms, NM_001198934 and NM_033450, the TSSs of which are individually 43,395,292 and 43,399,489 bp. Owing to the varied TSSs and putative promoter regions, the CpG markers located at the upstream promoter of NM_001198934 could be located out of the promoter of NM_033450. Meanwhile, the CpG markers located at the downstream promoter of NM_001198934 could be located at the upstream promoter of NM_033450. Since we considered the differences in the upstream and downstream promoter regions, we enumerated all regulation pairs of CpG marker and mRNA. In addition, since we measured gene expression levels with a microarray and/or qPCR in this study, the term “mRNA” in the regulation pairs was replaced with the term “gene” for simplicity.

Real-time quantitative polymerase chain reaction

For the real-time PCR, 0.5 μg of total RNA was reverse transcribed into cDNA using the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, CA, USA). Next, we performed real-time quantitative PCR using the Fast SYBR® Green Master Mix system and the StepOnePlus™ System (Applied Biosystems). The sequences of the primers used were as follows:

18S: forward primer (5′-GTAACCCGTTGAACCCCATT-3′) and reverse primer (5′-CCATCCAATCGGTAGTAGCG-3′); S100A8: forward primer (5′-ACCGAGTGTCCTCAGTA-3′) and reverse primer (5′-TCTTTGTGGCTTTCTTCATGG-3′); S100A9: forward primer (5′-AACACCTTCCACCAATACT-3′) and reverse primer (5′-GCCATCAGCATGATGAACT-3′); and S100A12: forward primer (5′-CTTACAAAGGAGCTTGCAAAC-3′) and reverse primer (5′-GGTGTGGTAATGGGCAG-3′). The real-time PCR master mix was prepared as follows: 10 μl of 2X fast SYBR green master mix, 7 μl of nuclease-free water, 1 μl of cDNA, 1 μl of forward primer (10 μM), and 1 μl of reverse primer (10 μM). The default PCR thermal-cycling condition was as follows: 20 s at 95 °C and 40 cycles of 3 s at 95 °C and 30 s at 60 °C.

Cell culture and the leukocyte transendothelial migration assay

As suggested in a previous study, we used HL-60-like neutrophil cells to conduct the migration assay [38]. The HL-60 cells (BCRC No. 60027) were induced into neutrophil-like cells by culture in Iscove’s modified Dulbecco’s medium supplemented with 20% fetal bovine serum, 4 mM l-glutamine and 1.5 g/L of sodium bicarbonate at 37 °C in a humidified 95% air/5% CO2 incubator. The cells were differentiated into neutrophil-like cells with the stimulus of 1.3% DMSO (Sigma-Aldrich, MO, USA). Primary human coronary endothelial cells (HCAEC, CC-2585, Clonetics, Lonza) were cultured in EBM-2 medium (CC-3156, Clonetics, Lonza) supplemented with EGM-2 MV SingleQuots (CC-4147, Clonetics, Lonza) which contains 5% FBS.

For the transendothelial migration assay, 2 × 105 HCAECs were first seeded into gelatin-coated 24-well hanging inserts (also called the upper chamber, 3 μm, PET, Merck, NJ, USA) for 24 h. Then, the inserts were put into 24-well culture plates (also called lower chamber). Neutrophil-like cells were first starved for 4 h and then cultured in serum-free culture medium with 10 g/ml of S100A8/A9 (8226-S8-050, R&D), 8 g/ml of S100A9 (9254-S9-050, R&D), or 4 g/ml of S100A12 (1052-ER-050, R&D) recombinant proteins for 24 h.

On the day of the migration assay, the S100A-treated neutrophil cells were washed with serum-free culture medium. Then, 1 × 105 cells were placed in the inserts, which were further moved into 24-well culture plates containing 600 μl of medium with 200 nM fMLP (Sigma-Aldrich, MO, USA) as a chemo-attractant. After 2 h of migration, the neutrophil cells penetrating the endothelial layer and migrating into the lower chamber were collected. The cells were washed with PBS and stained with CD15-FITC (340,703, BD), followed by analysis with the LSRII flow cytometer (BD Biosciences).