Introduction

Proteomic technology allows for the global analysis of gene products in various cells, tissues and organisms. In parallel with other burgeoning technologies and coupled with the rapidly growing needs of life scientists, proteomics has become one of the most utilized approaches in all of life science research. As the Human Proteome Project is set in place1, there will be an urgent need to store vast amounts of information in proteomic databases; these databases will become a powerful tool to utilize in the research of human diseases. Heart disease is the leading cause of death in many countries including the United States, England and Canada. Proteomics approaches have clearly grown in potential to provide global and holistic analysis of changes in protein expression during heart disease. The earliest establishment of myocardial two-dimensional gel electrophoresis (2-DE) protein databases, beginning in the 1990s, focused on identifying proteins from human myocardium. In the database from the Jungblut laboratory, N-terminal sequencing, internal sequencing and amino acid analysis were used to identify twelve proteins of the human myocardial 2-DE pattern2. During the same year, a database containing approximately fifty proteins from 2-DE gels of human myocardial tissue was characterized by the Baker laboratory3. Subsequent studies have succeeded in establishing myocardial 2-DE protein databases in different species, such as rat, mouse, and dog, and thereby enriched myocardial 2-DE protein databases4, 5, 6, 7. These databases have allowed investigators to compare data and to establish reference standards.

Proteomic technology has been used extensively to study cardiovascular diseases and to identify candidate molecules for diagnosis and therapy in intact live animals. Compared with intact hearts, cell culture models would be very useful to distinguish the differences between myocytes versus fibroblasts, to eliminate hemodynamic and hormonal influences and to study cell phenotypes and signaling in more homogeneous populations in a defined environment. However, proteomic investigations of cardiomyocyte are scarce, and a 2-DE protein database for the rat cardiomyocyte has not been developed before. In this work, we have launched a proteomic study of neonatal rat cardiomyocytes and compiled a profile of proteins expressed in these cells 2-DE and matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS). Using immobilized pH gradient isoelectric focusing in a linear gradient from pH 3 to 10 in 12% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), more than 1000 proteins were separated and displayed from cultured cardiomyocytes. Among those, 148 proteins spots have so far been identified and used for the construction of an extensible markup language-based database. In addition to the cardiomyocyte proteins stored in this database, we also characterized interaction-based biological networks of proteins, which allow us to model how proteins work together to mediate biological processes. Moreover, our network analysis has also revealed function maps and pathway maps potentially involved in the neonatal rat cardiomyocyte proteins database.

In summary, with the rapid development of modern proteomics, it is essential that annotated databases are constructed to store all of the vast data generated by these techniques. More important is that these databases can be interrogated effectively both within the laboratory and by other scientists worldwide through the use of the World Wide Web. The database presented in this work will serve as an international platform for the storage, analysis, and visualization of cardiomyocyte proteomic data that can contribute to a more holistic view of heart tissue. This database may contribute to understanding key cardiovascular functions and pathways and may offer the potential for new avenues of future therapeutic intervention.

Materials and methods

Cardiomyocyte culture

Neonatal rat cardiomyocytes were isolated and cultured as previously described8. Briefly, cardiomyocytes were dissociated from ventricles of 1- to 2-d old neonatal Sprague-Dawley rats using 0.1% trypsin (Hyclone) and 80 units/mL collagenase (Worthington Biochemical Corp) in a Hank's balanced salt solution (calcium-free; Hyclone). To purify the cardiomyocytes from non-myocytes, isolated cells were pre-plated for 90 min. The enriched cardiomyocyte fractions were seeded into 150-cm culture dishes and cultured in DMEM (Sigma, 4500 mg/L D-glucose, and L-glutamine, sodium pyruvate) supplemented with 10% heat-inactivated fetal bovine serum for 24 h. The DNA synthesis inhibitor, bromodeoxyuridine (100 μmol/L), was added during the first 48 h to prevent the proliferation of non-cardiomyocytes. The experiments were approved by the Institutional Animal Care and Use Committee of Peking University Health Science Center (LA2010-035) and were adhered to the American Physiological Society's “Guiding Principles in the Care and Use of Vertebrate Animals in Research and Training.” All protocols were conducted in accordance with the Guidelines for Animal Experiments, Peking University Health Science Center.

Two-dimensional electrophoresis (2-DE)

Proteins were separated by two-dimensional electrophoresis. Samples containing 200 μg protein for analytical gels or up to 1.0 mg for micropreparative gels were diluted to 300 μL with rehydration solution (6 mol/L urea, 2 mol/L thiourea, 2% CHAPS, 65 mmol/L DTT, 0.5% v/v pH 3-10 Bio-lyte, trace bromophenol blue) and applied to IPG strips (Bio-Rad) for 12–14 h in a passive mode. The isoelectric focusing was performed at 20 oC over 24 h for a total of 70 000 V-h. After equilibration of the IPG strips, the second-dimensional SDS electrophoresis was performed on 12% gels at a current setting of 7.5 mA/gel for the initial 2 h and 15 mA/gel until the tracking dye reached the cathode.

Protein visualization and image analysis

After two-dimensional gel electrophoresis, proteins were stained with silver or with G-250 for subsequent mass spectrometry. In order to increase the sensitivity of Coomassie staining, we improved the prescription of Coomassie staining as follows: 20% methanol, 2% phosphoric acid, 10% ammonium sulfate and 0.1% Coomassie Brilliant Blue G-250. Gels were fixed in 12.5% TCA for 60 min and rinsed in Milli-Q water for 20 min. The gels were then stained in the improved Coomassie staining solution for 24 h. After the stained gels were scanned with a high-resolution scanner (Umax 1120), the gel images were analyzed using the PDQuest software (Version 7.1.1; Bio-Rad) according to the protocols provided by the manufacturer.

Identification of protein spots

Protein spots of interest visualized with colloidal Coomassie blue G-250 staining were excised and transferred to 1.5 mL siliconized Eppendorf tubes. The gel spots were then washed and destained by 50% ACN, then dried in a vacuum centrifuge. The dried gel-pieces were incubated in the digestion solution containing 50 mmol/L NH4HCO3 and 0.1 g/L TPCK-trypsin for 12 h at 37 oC. The resulting peptides were extracted three times by 50 μL aliquots of 5% trifluoroacetic acid in 60% acetonitrile. Combined extracts were concentrated in a Speed Vac to 3–5 μL.

The concentrated tryptic peptide mixture was mixed with a saturated CHCA matrix solution and vortexed gently. A volume (1 μL) of the mixture containing CHCA matrix was loaded on a 96×2 well hydrophobic plastic surface sample plate (Applied Biosystems) and air-dried. The samples were analyzed with a Voyager DE STR MALDI-TOF MS (Applied Biosystems) fitted with a 337-nm nitrogen laser. Spectra were acquired using the instrument in reflectron mode and calibrated using a standard peptide mixture. Database searching with the monoisotopic peptide masses was performed against the NCBInr database by using the peptide search engine ProFound (http://prowl.rockefeller.edu/cgi-bin/ProFound) by using the following conditions: partial methionine oxidation, complete cysteine carbamidomethylation, mass tolerance of 0.2 Da, and one missing cleavage.

Biological networks and pathway analysis

The biological network data were generated through the use of Ingenuity Pathways Analysis (IPA) software (www.ingenuity.com), a web-delivered application that evaluates biological networks. A data set of proteins from cardiomyocytes was uploaded into IPA. Each protein identifier was mapped to its corresponding protein object in the Ingenuity Pathways Knowledge Base (IPKB). These proteins were then used as the starting point for generating biological networks based upon the identities of the focus proteins and interactions with genes/proteins that were reported in the literature. IPA calculated a significance score for each network. The score was generated using a P-value calculation and was displayed as the negative log of that P-value, indicating the likelihood that the assembly of a set of focus proteins in a network could be explained by random chance alone. A score of 2 indicates that there is a 1 in 100 chance that the focus proteins are together in a network due to random chance. Therefore, networks with scores of 2 or higher have at least a 99% confidence of not being generated by random chance alone. Biological functions or canonical pathways were then calculated and assigned to each network by using findings that had been extracted from scientific literature and stored in the IPKB. The biological functions assigned to each network were ranked according to the significance of that biological function to the network.

Construction of an on-line database

One 2-DE gel of neonatal rat cardiomyocyte proteins was analyzed by PDQuest software (Version 7.1.1; Bio-Rad). The identified protein spots were stored in a flat file database that was made accessible on-line by the Make2ddb package on a web server with search functions, such as by accession number, description, or author or by clicking on a protein spot. The individual protein entries were hyperlinked to the relevant spots on an image map created from the reference gel.

Results

Data management, analysis, and presentation

A schematic representation of our proteomic informatics approach for the research of neonatal rat cardiomyocytes is illustrated in Figure 1. It mainly consists of three parts: data acquisition, data analysis, and web-accessible data presentation. The laboratory information management system (LIMS) is applied in this study for the acquisition of proteomic data. Results of gel image analysis performed by PDQuest can also be transmitted into the LIMS. A Java graphical user interface (Java-GUI) enables transfer of selected data from the LIMS into the database system consisting of 2D-PAGE, network/function analysis and protein information. The database system is interconnected with public and other knowledge databases. Furthermore, continuative explorative data analysis for significant networks and canonical pathways relevant to the dataset were calculated and analyzed using IPA.

Figure 1
figure 1

A schematic representation of our proteome informatics approach for the database research of neonatal rat cardiomyocytes.

2-DE pattern of cardiomyocytes

Proteins from cardiomyocytes were resolved by 2-DE methods. IEF was first conducted on a 17 cm, broad range pH 3–10 linear immobilized pH gradient (IPG) strip. However, the results showed that a high resolution 2-DE gel was not obtained due to an insufficient spatial resolution, difficult-to-reveal low copy number proteins in the presence of more abundant proteins and the fact that most of the proteins were distributed in a pH range from 5 to 8. Therefore, in subsequent investigations, we adopted multiple overlap** narrow ranges IPG strips of pH 3–6, pH 5–8, and pH 7–10 in the first dimension (zoom-in gel). In this way, the resolution is greatly improved, and the detection of spots with low intensity can be easily performed (Figure 2A). In addition, Coomassie blue staining was improved as described in Materials and methods. The result showed that the new method possessed the advantages of both Coomassie blue and silver stains, and has sensitivity almost equivalent to silver stain (Figure 2B).

Figure 2
figure 2

2-DE proteome map of the cardiomyocytes. (A) Shown are the gel of pI 3–10 (upper) and narrow pI range gels, 3–6, 5–8, and 7–10 (lower). Staining was by colloidal Coomassie blue G-250. (B) Representative 2-DE profiles of neonatal rat cardiomyocytes. Proteins on 2-DE gels were visualized by colloidal Coomassie blue G-250 (left) or silver staining (right).

2D-PAGE database

Proteins from cardiomyocytes were separated and stained with the above-mentioned methods. The 2-DE gel was used to create a gel image in the database mentioned above as shown in Figure 3. Protein spots were identified using the PMF method with MALDI-TOF MS. Identified proteins were summarized in Table 1. Furthermore, some selected proteins from the database were confirmed for validity in cardiomyocytes by Western blot (Figure 4). This 2-DE database is fully independent of the platform and can be grafted from one platform to another. The identified protein spots are hyperlinked to individual protein entries, and the all identification information of each protein entry can be obtained through clickable images. These hyperlinks include protein names, database accession numbers, theoretical molecular mass and pI values. Furthermore, the detailed information of certain protein spots can be obtained through several searching tools provided by the database. Besides these, detailed protocols in all the experiments were also provided to ensure the reproducibility of data in the database. Some useful links to other proteomic tools and database can be found throughout the database. For further details of the identified proteins and correspondence of identified proteins to spots in the 2-DE gel image, please refer to our database at http://2d.bjmu.edu.cn.

Figure 3
figure 3

A proteome reference map of neonatal rat cardiomyocytes. Protein spot information (annotation) on prohibitin and generated mass spectrum data.

Table 1 Identified proteins of neonatal rat cardiomyocytes.
Figure 4
figure 4

The confirmation of partial proteins from the database in cardiomyocytes by Western blot.

Data analysis and visualization

In order to futher explore how the cardiomyocyte proteins in the database were related, we used the ingenuity pathway analysis (IPA) program and the Ingenuity Pathways Knowledge Base (IPKB). The IPA program analyzes a large genomic or proteomic data set to find the most significant networks and canonical pathways relevant to the data set based on a calculated probability score. The specificity of connections for each protein was calculated, as defined by the percentage of its direct connections to other proteins showing significant changes. Just as described in Materials and methods, networks with scores of 2 or higher have at least a 99% confidence of not being generated by random chance alone. Therefore, based on the computed scores, four direct networks with scores of 24, 20, 16, and 12 were found to be significant in the database (Figure 5). In addition, high-level functions were calculated and assigned to each direct network if the significance of the association between the network and the biological function had a P<0.05 (Table 2). The functions displayed in the table represent the top 4 high-level functions from the local analysis, which provides an overview of the biological functions associated with a given network (A, network 1; B, network 2; C, network 3; and D, network 4). In order to obtain a more comprehensive regulation-relationship map of the cardiomyocyte proteins in the database, a merge network, which included above-mentioned four networks, was created by IPA (Supplementary Figure 1). Not only does the IPA software analysis find novel interconnectivity but the “biological functions and canonical pathways” feature also enables us to further understand biological functions and signaling cascades engaged by each network. The functional analysis for the merge direct network revealed a function map (Supplementary Figure 2A). The top three functions in the network were related to cellular assembly and organization, cardiovascular disease and cell death. Moreover, pathway analysis showed a canonical pathways map (Supplementary Figure 2B). The top three canonical pathways in the network were citrate cycle, glyoxylate and dicarboxylate metabolism and valine, leucine and isoleucine degradation. Beside these, four indirect networks and a merge indirect network with their functional analysis and pathway analysis were also found in the database (Supplementary Figure 3 and 4, and Supplementary Table 1). More detailed information can be found at http://2d.bjmu.edu.cn.

Figure 5
figure 5

The direct interaction functional networks map of cardiomyocyte proteins in the database. The direct interaction network scores were 24 (A, network 1), 20 (B, network 2), 16 (C, network 3), and 12 (D, network 4), accordingly. Nodes represent proteins, with their shape indicating the functional class of the protein and multiple edges indicating the biological relationships between the nodes. For the functional analyses of networks, see Table 2.

Table 2 The top functions table for direct interaction networks.

Discussion

Proteomic technology can provide new insights into cellular mechanisms involved in cardiac dysfunction9, 10, 11, 12, 13. The majority of proteomic investigations still use 2D gel electrophoresis (2-DE) with immobilized pH gradients to separate the proteins in a sample and combine this with mass spectrometry (MS) technologies to identify proteins. In spite of the development of novel gel-free technologies, 2-DE remains the only technique that can be routinely applied to parallel quantitative expression profiling of large sets of complex protein mixtures such as whole cell lysates. Current limitations to this technology include the resolution of the 2-DE and the detection of low-abundance proteins. In the present study, two approaches were explored to solve these drawbacks. The first was to apply multiple overlap** narrow range IPG strips (to increase the resolution of the 2-DE), and the second was to improve protein visualization (to achieve the highly sensitive detection of 2-DE separated proteins). With complex samples such as cardiomyocytes, 2-DE on a single wide-range pH gradient reveals only a small percentage of the whole proteome because of insufficient spatial resolution and difficulty to reveal low-copy proteins in the presence of the most abundant proteins. Our results showed that the resolution of 2-DE with multiple overlap** narrow range IPG strips (pH 3–6, pH 5–8, and pH 7–10) is about 2.5 times that of wide range (pH 3–10). Using these gel systems, more than 1000 proteins were resolved from our model. It was well-accepted that silver staining and Coomassie blue staining are currently the most popular methods of protein detection. Although silver staining, with a sensitivity approximately 10–50 times that of Coomassie blue, is the most sensitive staining method so far reported, there are a number of drawbacks and limitations associated with this method. A high background resulting from a number of variables may cause poor resolution of protein spots. Quantifying protein abundance is also difficult with silver-stained gels due to the poor linearity to protein concentration. In addition, silver stain does greatly interfere with subsequent protein identification by mass spectrometry. Unlike silver staining, Coomassie blue staining has a good compatibility with subsequent analytical techniques and yields higher dynamic range in determining the changes of protein expression levels. But the major issue of Coomassie staining is that its sensitivity is well below that of silver. In this study, we improved colloidal Coomassie brilliant blue G-250 (Coomassie blue dye) and increased its sensitivity to be almost equivalent to that of silver stain.

As mentioned earlier, the protein database will be important to serve as a basis for further studies of human disease. There are four 2-DE protein databases of cardiac proteins, established by three independent groups, which can be accessed via the World Wide Web7, 14, 15, 16. These databases facilitate proteomic research into heart diseases containing information on several hundred cardiac proteins that have been identified by protein chemical methods. In addition, 2-DE protein databases and proteomic maps for other mammals are also under construction to support work on animal models of heart disease17, 18, 19, 20.

As we know, cultured cardiac myocytes are attractive models for detailed proteomic investigations of cardiovascular diseases. Unlike the intact heart, cell culture models are very useful to distinguish effects on myocytes versus fibroblasts, to eliminate hemodynamic and hormonal influences and especially to analyze and integrate results of high throughput-omic approaches. However, there is not, so far as we know, a proteome database for cardiomyocytes. In this study, a two-dimensional electrophoresis database for rat cardiomyocyte proteins was constructed using our improved methods. Our data show that the largest class of proteins in cardiomyocytes is energy and mass metabolism proteins (20%). For instance, the glucose-regulated proteins play important roles in glucose metabolism. Pyruvate decarboxylase, one of the key enzymes in gluconeogenesis, is closely involved in the process of carbohydrate metabolism. In addition, adenylate kinase also plays a crucial role in maintaining the energy balance in cells. The other two main groups of proteins are cytoskeletal proteins and oxidative stress proteins/heat shock proteins. The former, including actin, myosin, lamin, etc, play a central role in the creation and maintenance of cell shapes in cardiomyocytes. Meanwhile, cytoskeletal proteins also play important roles in both intracellular transport (the movement of vesicles and organelles) and cellular division. In addition, well-known heat-shock proteins and oxidative-stress proteins play pivotal roles during cardiac ischemia, ischemia preconditioning and cardiac hypertrophy. The functions of all those proteins are consistent with a heart that is a myogenic muscular organ.

Based on those identified proteins, this paper describes the creation of an on-line accessible 2-DE reference database for the cardiomyocyte proteome and provides protein identification data for a restricted number of reference spots. Such a 2-DE protein database reduces the need for routine protein identification, which is often difficult to achieve from an organism of which the genome is not sequenced. This Java-based database is fully independent on the platform and can be grafted from one platform to another. The customers can view the information of any protein spots in 2D images simply by clicking them. That is, 2-DE images can be provided on a World Wide Web server and, as a response to a mouse click on any identified spot on the image, the user can obtain the database entry for the corresponding protein. This method allows a user to easily identify proteins on a 2-DE image. At the same time, the detailed information of certain protein spots can also be obtained through several searching tools provided by the database, such as searching by accession number, searching by theoretical molecular weight (kDa) and searching by theoretical pI. In addition, the detailed protocols in all the experiments were also provided to ensure the reproducibility of data in the database and enable other laboratory to repeat the results of those experiments. The database is linked to other databases through active hypertext cross-references; that is to say, the user automatically gets connected to the corresponding web site through a simple mouse click on a cross-reference.

Although it is important to identify the individual proteins expressed in cultured cardiac myocytes, there is an increasing need to move beyond this level of analysis. Cluster and principal component analyses describe overall changes in apparent protein expression but provide few insights into the biological processes and regulatory networks involving in cultured cardiac myocytes. In this study, we combine large-scale analysis of protein expression with knowledge-based functional network analyses. A complex network involved in this database was mapped, and the biological function and biologically meaningful pathways that yield novel, predictive insights were computed and analyzed by IPA. IPA is a software/database search tool for finding function and pathway for specific biological states. It is a web-delivered application that makes use of the Ingenuity Pathways Knowledge Base, the curated database consisting of millions of individually modeled relationships between proteins, genes, complexes, cells, tissues, drugs, and diseases. Sets of interaction data can be viewed as graphs or maps in which each protein is a node and each interaction is a line connecting two nodes. The importance of this view has led to use of the term “interaction map” to refer generically to interaction datasets. The map view provides not only an intuitive interface for biologists to explore the data but also a formal mathematical framework for computational biologists to explore the properties of interaction networks. These networks include both direct and indirect interactions. Direct interactions are characterized by a well-defined information flow (eg, from a transcription factor to the gene it regulates). Indirect interactions do not have an assigned direction (eg, mutual binding relationships). Furthermore, as we know, cardiovascular diseases are complicated diseases that are controlled by complex regulation network. In these regulation networks, most proteins function in collaboration with other proteins. In this study, we established, for the first time, a proteome reference map and regulation network of the neonatal rat cardiomyocyte. This is especially useful in understanding dysfunction of complicated diseases. Furthermore, integrated analysis such as that provided by IPA enables analysis of all interactions and may offer additional insights into how such an extensive map contributes to cardiovascular diseases.

In our cardiomyocyte protein database, protein regulatory networks and pathway analysis are created to show how protein–protein interaction may eventually participate in the physiological process of heart. The functional analysis and pathways analysis show that the majority of proteins from ardiomyocytes are involved in cellular assembly and organization, cardiovascular system development, cell morphology, cardiovascular disease and plenty of metabolism pathway and oxidative-stress pathway (such as citrate cycle, glyoxylate and dicarboxylate metabolism, valine and leucine degradation, fat acid metabolism, oxidative phosphorylation and endoplasmic reticulum stress pathway). These results from network analysis also are consistent with the above-mentioned protein components of cardiomyocytes. More importantly, if a user finds certain interesting proteins in this database, the detail regulatory relationship about these proteins can be further found in the protein regulatory network, which will serve as a useful predictive tool for further research.

In conclusion, we have presented an improved methodology for cultured cardiomyocyte proteomics. The on-line database with reference gel and identified proteins serves as a stable framework that facilitates the study of the heart proteome and the exchange of heart proteome data. In addition, biological regulatory networks were for the first time constructed into 2-DE protein databases. Our network analysis has also revealed function and pathway maps potentially involved in the neonatal rat cardiomyocyte protein database. Together, this allowed us to better understand the mechanism, biological processes and specific regulatory networks of the heart. Our study also provides the basis for the development of new therapeutic strategies. Finally, the database we established can serve as an international platform for storage, exchange and analysis of proteomic data. The informational management for experimental data amd the standardization, sharing and integration of this database will certainly provide a basis for further cardiovascular study.

Author contribution

Zi-jian LI designed and performed the research, analyzed the data, and wrote the manuscript; Ning LIU performed the research and analyzed the data; Qi-de HAN designed the research; and You-yi ZHANG designed the research and analyzed the data.