Background

Colorectal cancer (CRC) is the most common cancer in the Polish population, and the leading cause of cancer-related morbidity and mortality [1]. Most CRCs are sporadic, and only a small proportion is associated with hereditary disorders with high penetration, such as Lynch syndrome, familial adenomatous polyposis and other polyposis syndromes mediated by rare germline mutations in DNA mismatch-repair genes and in the adenomatous polyposis coli (APC) gene [2].

Cancer is a multi-step process involving successive clonal selection events. The growth advantage of dysplastic cells over their normal neighbours leads to progressive cytological and architectural derangement, and individual cancer phenotypes are the result of cell-specific, developmental stage-specific, and metabolism-related changes in gene expression that occur selectively at specific times and are modified by epigenetic interactions [3]. Epigenetic changes such as DNA and histone modifications, chromatin remodelling and regulation by noncoding RNAs can result in massive deregulation of gene expression during the course of cancer development [4]. The global effects of altered epigenetic patterns in gene regulatory sequences have been determined by the ENCODE project [5].

Histone post-translational modifications (PTMs) include lysine acetylation, arginine and lysine methylation, phosphorylation, proline isomerization, ubiquitination (Ub), ADP ribosylation, arginine citrullination, SUMOylation, carbonylation and biotinylation [6]. The most common PTMs are acetylation and methylation [7]. Within the five main histone proteins, PTMs can occur at multiple positions, although they are most frequent at histone N-terminal tails [8]. Despite the key role of epigenetic alterations in cancer development, little is known about the patterns of histone PTM alterations in human tumours [9].

Proteomic methods are largely based on the use of mass spectrometry (MS), a highly specific, effective, and universal technique that does not require complicated multi-step sample preparation. One of the most prominent features of MS is its sensitivity, which enables the detection of attomolar sample concentrations with an error of 0.01% of the total sample mass. Proteomics analyses the composition, amounts, isoforms, and post-translational modifications of cellular proteins [10].

In the present study, we used MS-based analysis to quantify global alterations of histone PTMs in matched normal and colon cancer samples. Our results showed that histone H3 lysine 27 acetylation (H3K27Ac) is associated with colon cancer.

Results

Histones were isolated from 12 CRC tissues and corresponding normal mucosa, and equal protein amounts were separated by SDS-PAGE and subjected to qualitative LC-MS/MS and quantitative label-free LC-MS analyses (Figure 1).

Figure 1
figure 1

Diagram of workflow to determine altered histones PTMs in colon carcinoma tissues. Histones were isolated from whole tissue sections by acidic extraction using Shechter et al. protocol [36] followed by separation using SDS-PAGE and silver staining. The gel part containing histone core proteins were excised next proteins were digested with trypsin and subjected to MS analyses. MS/MS runs of pooled samples were performed to identify in deep the peptides that compose the collection of samples. The custom peptide database was further overlaid on individual 2D maps acquired in LC-MS runs. Maps were then used as the basis to quantify and point modified peptides with differential abundance between CRC and normal mucosa samples.

Qualitative histone protein analyses

For protein identification 10 pooled samples were analysed by LC-MS/MS for protein identification, resulting in the acquisition of 386120 fragmentation spectra. A search against the SwissProt database using the Mascot engine confidently identified a set of 2,647 peptides with an estimated false discovery rate of 0.01 (Additional file 1: Table S1). In total, 522 proteins were identified, of which 357 were represented by at least two peptides (Additional file 1: Table S2). Among the detected peptides, 285 originated from core histone proteins, including H4, H3.1, H3.2, H3.3 and numerous variants of H2A and H2B. However, the unambiguous identification of the members of the two latter families was difficult. High sequence homology between these two families leads to the detection of multiple shared peptides, which can be attributed to more than a single protein. As a result, it is not always possible to identify the particular proteins present in the samples. In the present study, the indistinguishable histone H2A and H2B subtypes were grouped into six and eight distinct clusters represented by variants accounting for all observed peptides. The final results of core histone protein identification are summarized in Table 1, along with the number of detected peptides and sequence coverage. A more detailed description of their peptide-protein dependencies is also available in Additional file 1: Table S3.

Table 1 LC-MS/MS histone protein identification in gel slices

The identification of 96 modified peptides allowed for the characterisation of 41 distinct post-translational modification sites, of which 7, 13, 11, and 10 were located within the sequences of H2A, H2B, H3, and H4, respectively (Additional file 1: Table S4 and Figure 2). Multiple modification variants were detected on 6 sites (14.6%), with a maximum of four different modifications per site. As shown in Figure 2, the sites were distributed in the amino-terminal tails and the globular domain forming the nucleosomal core of each of the four histones. Despite the sequence divergence, the observed modification patterns were generally preserved within the H2A, H2B and H3 families, with the exception of H2A.V and H2A.Z variants.

Figure 2
figure 2

Amino acid sequences of core histone and their forms, indicating sites of post-translational modification. The grey block marks the N-terminal histone tail. The alignment of protein sequences and visualization of the detected modification sites were performed using CLC Sequence Viewer (CLC Bio).

Acetylation, which was observed in 36 lysine residues, was the most prominent of all the studied modifications. In 31 instances, it was the only modification detected on a given site, whereas the remaining 5 sites showed alternative modification variants, including mono-, di- and tri-methylation (three, four, and four sites, respectively). Modifications on arginine residues were significantly less frequent and only a single site of methylation and deamidation was detected in the sequence of the histone H2A. Two serine and two threonine phosphorylation sites were also identified.

A crosscheck with the PhosphoSitePlus [11] and Histome [12] databases revealed that although most of the sites had been previously reported, in several cases our survey provided a more comprehensive list of their possible modification. For example, lysine K43 of histone H2B, which is a known ubiquitination site, was shown to have an acetylated variant. A detailed summary of site-modification combinations not covered by the two databases is presented in Additional file 1: Table S4.

Quantitative analysis

Peptides identified by LC-MS/MS analyses were quantified in individual samples (12 CRC tissue-healthy mucosa pairs) using a label-free approach and modification intensities were calculated for each of the detected sites (see Methods). We obtained 45 reliable quantitative estimates of modification intensities on 33 sites. Using a threshold of p ≤ 0.05, 4 sites exhibiting significant differences in modification intensity between the two sample groups were detected (Table 2), of which three were upregulated and one downregulated in cancerous samples. Among them we identified histone H3 lysine 27 acetylation (H3K27Ac) as a modification upregulated in CRC; an example of fragmentation spectrum for H3K27Ac is presented in Additional file 2: Figure S1.

Table 2 List of histone modifications showing differences in abundance ( p -value ≤ 0.05) between cancerous (CRC) and normal (NC) colonic mucosa

Availability of antibodies allowed further evaluation of the two selected PTMs of histone H3, namely H3K27Ac and K27 trimethylation (me3). Western blotting-based analysis of histones isolated from 12 CRC tissue samples and paired healthy mucosa samples, followed by densitometric analyses, confirmed increased K27 acetylation at H3 (fold change (FC) = 1.31, p- value = 0.0093) (Figure 3), whereas no significant differences were observed for the H3K27me3 marker (not shown).

Figure 3
figure 3

H3K27Ac is upregulated in colon carcinoma tissues. (A) Results of H3 and H3K27Ac immunostaining in 12 pairs of CRC tissues and normal mucosa and graphical depiction of densitometric assessment of differences (B). Histones were isolated from whole tissue sections by acidic extraction and equal amounts of protein were separated by SDS-PAGE, electrotransferred to PVDF membranes and immunostained with H3 (ab1791) or H3K27Ac (ab4729) antibody. Densitometric measurements were performed using OptiQuant image analysis software. H3K27Ac level was normalised to the signal from total H3. C, CRC- colon cancer; N, NC- normal colonic mucosa.

MS and western blot results of H3K27Ac alteration were also confirmed by the immunohistochemical staining of 10 pairs of normal and CRC formalin-fixed paraffin-embedded tissue samples; five tissue pairs were common with MS and western blot analyses (Additional file 1: Table S5). While both normal and CRC tissue revealed pronounced positive nuclear immunoreactivity for H3K27Ac (Figure 4 and Additional file 2: Figure S2), the percentage of positively stained nuclei (labelling index), calculated with the use of automated image analysis software in the representative pictures of matched normal/CRC sample sections, revealed higher ratio of immunopositive cells compared to normal counterpart (69.01% vs. 52.66% respectively, p = 0.0052) - Table 3.

Figure 4
figure 4

An example of an immunohistochemical analysis of representative matched normal colonic and CRC sections with use of antibody against H3K27Ac.

Table 3 H3K27Ac labelling index in matched CRC and corresponding normal tissue sections

To determine whether increase in H3K27Ac mark is associated with proliferating cells, we measured its abundance in four resting and dividing CRC cell lines, namely HCT-116, Colo205, HT29 and Caco2, using the western blot. While there were no differences in H3K27Ac levels between quiescent and proliferating cells, variable levels of that mark were found between the cell lines (Additional file 2: Figure S3).

Next, we wished to determine the expression of enzymes controlling H3K27Ac mark. To this end, using quantitative (q) RT-PCR, we measured the mRNA levels of enzymes controlling H3K27Ac mark abundance, namely CBP/p300 acetyltransferases [Qualitative MS data processing and database search

The acquired MS/MS raw data files were preprocessed with Mascot Distiller (version 2.2.1, Matrix Science), and the resulting peak lists were searched against the Homo sapiens entries of the SwissProt database (version 05.10.2012, 20306 sequences) using Mascot (version 2.2.03, Matrix Science). The search parameters were as follows: enzyme specificity: semitrypsin; maximum number of missed cleavages: 2; protein mass: unrestricted; parent ions mass error tolerance: 5 ppm; fragment ions mass error tolerance: 0.02 Da; fixed modifications: Carbamidomethylation (C); and variable modifications: Acetyl (K) (42.010565 Da), Methyl (K) (14.015650 Da), Dimethyl (K) (28.031300 Da), Trimethyl (K) (42.046950 Da), Methyl (R) (14.015650 Da), Dimethyl (R) (28.031300 Da), Deamidated (R) (0.984016 Da), Phospho (ST) (79.966331 Da) and Oxidation (M) (15.994915 Da).

The statistical significance of peptide identifications was determined using a target/decoy database search approach and a previously described procedure that provided q-value estimates for each peptide spectrum match (PSMs) in the data set [38, 39]. Only PSMs with q-values ≤ 0.01 were regarded as confidently identified.

Additional acceptance criteria were used for assessing confidence of modified peptides. In the first step, the exact position of the modifications in the sequence was established by an adopted version of the phosphoRS algorithm [40]. Next, the MS/MS spectra were inspected manually for accurate fragment ions assignment. Finally, selected types of sites were rejected as potential experimental artifacts. Those included: lysine methylations on the C-terminus of the sequence or detected in peptides with acidic residues (possible artifacts of methyl esterification of the carboxylic group) and peptides with deamidation on the C-terminal arginine (tryptic cleavage after a deamidated residue have been recently shown as a highly unlikely event [41]).

Proteins represented by less than two peptides, or identified by a subset of peptides from another protein, were excluded from further analysis. Proteins matching the same set of peptides were grouped together into clusters. All the steps involved in Mascot results processing were performed using MScan, a proprietary Java application available at http://proteom.ibb.waw.pl/mscan. Multiple alignment of protein sequences and visualization of the detected post-translational modification sites were performed using CLC Sequence Viewer (CLC Bio).

Quantitative MS data processing

Peptides identified in all LC-MS/MS runs were merged into a common list, which was next overlaid onto 2-D maps generated from the LC-MS profile data of individual samples. The feature extraction procedure was described in detail in a previous study [42]. Briefly, the list of identified peptides was used to tag the corresponding peptide-related ion spectra based on m/z differences, deviations from the predicted elution times, and the match between the theoretical and observed isotopic envelopes. The maximum deviation accepted in m/z and the retention time were established separately for each of the processed LC-MS spectra to account for possible variations in mass measurement accuracy and chromatographic separation between runs. First, an initial search with wide tolerance and restrictive parameters of isotopic envelope fits was performed. Next, nonlinear mass and time calibration functions were calculated using LOESS regression, and the search was repeated with narrowed tolerances and relaxed fit requirements. Finally, relative abundances of peptide ions were determined as the heights of 2-D fits to the most prominent peaks of the tagged isotopic envelopes. For normalisation purposes, the calculated abundance of each peptide was divided by the median abundance of all the peptides detected in the sample.

Given the normalised peptide abundances, quantitative values (further referred to as “modification intensities”) were calculated for distinct post-translational modification types observed on each of the previously identified sites. These values were computed using a procedure that involved rescaling of the abundances of single-modified peptides covering the site of interest to a common level, followed by computing their median value.

Statistical analysis of quantitative MS measurements

A non-parametric resampling-based test with paired t statistics was used to evaluate the differences in site-modification intensities between the two groups of samples. Modification sites with p-values ≤ 0.05 were considered as significantly changed.