Background

The proteins that contain PDZ domain(s), often called PDZ proteins, play pivotal roles in dynamically organizing molecular architectures at specific intracellular regions in differentiating and differentiated cells [1, 2]. Membrane proteins such as cell adhesion molecules, receptors, and channels form functional clusters within selective subcellular regions by binding to PDZ domains [25]. Furthermore, some PDZ proteins also anchor specific cytosolic proteins such as protein kinases, cytoskeleton-regulating enzymes and second-messenger-producing enzymes [2, 6], and hence, contribute to precise signal transduction between extracellular and intracellular spaces at specific sites such as postsynaptic densities in neurons [2, 7], immunological synapses in T-lymphocytes [8, 9] and tight junctions in endothelial and epithelial cells [1, 10].

PDZ domain, an evolutionarily conserved globular structure composed of 80-90 AAs (amino acids) recognizes particular regions of their interactors [6, 11, 12]. PDZ domains primarily bind to the C-terminal ends of proteins. Interactions between PDZ domains and internal regions of their binding partners have been also reported, though they are less common [2, 11, 12]. PDZ-binding motifs (hereafter 'PB motifs') have been proposed by sequence similarity in the C-terminal ends of proteins, whose bindings to PDZ domains are mediated by their C-terminal ends. PB motifs are currently categorized into at least three major types on the basis of two AAs located at positions 0 and -2 (Figure 1, upper panel), both being essential for binding to PDZ domains [11, 1315]. The type-I PB motif has the form S/T-x-I/L/V, in which Serine (S) or Threonine (T) is positioned at -2, any AA (x) at -1, and Isoleucine (I), Leucine (L) or Valine (V) at 0. The Type-II PB motif has the form Φ-x-Φ (where Φ denotes any hydrophobic AA). The type-III PB motif has the form D/E-x-Φ [11, 12, 14, 16]. Although most of the reported PDZ-type interactions are mediated via these canonical C-terminal motifs [17], non-canonical C-terminal motifs are also reported [1856]. mInsc possess a refined type-I PB motif, x-E-S-x-V (Figure 5a), as identified in Figure 4e. However, it has not been shown whether the PB motif is functional and allows mInsc to bind to PDZ domains. This three-position specified PB motif is also observed at the C-terminal ends of voltage-dependent sodium channel Nav1.4 proteins (Figure 5a). Because Nav1.4 binds to a PDZ protein PSD-95 [52], we predicted that mInsc would also bind to PSD-95. In order to test this possibility, Flag-tagged mInsc and Myc-tagged PSD-95 were coexpressed in COS-7 cells and co-immunoprecipitation analyses were performed (Figure 5c). As expected, PSD-95 was coimmunoprecipitated by mInsc (lane 1 in Figure 5c). Furthermore, deletion of C-terminal four AAs of mInsc disruputed this binding, indicating that the binding between PSD-95 and mInsc is mediated by the PB motif of mInsc. We also tried to identify functional type-II PB motifs. As shown in Figure 5b, DTWD2, an unknown-function protein, possess three refined type-II PB motifs N-x-V-x-I, x-S-V-x-I and x-x-V-K-I, all of which are identified in Figure 4. Interestingly, two of them, x-S-V-x-I and x-x-V-K-I are also found at the C-terminal end of GluR2, a subunit of AMPA-type glutamate receptors. Considering that a PDZ protein GRIP1 binds to the type-II PB motif of the GluR2 [37, 52, 57], it is expected that DTWD2 also binds to GRIP1. As shown in Figure 5d, the interactions between DTWD2 and GRIP1 was indeed observed, in which the type-II PB motifs of DTWD2 was essential. Thus, we successfully identified functional PB motifs based on the three-position specified PB motifs identified in our study (Figure 4).

Figure 5
figure 5

Interactions between refined PB motifs and PDZ proteins. (a) Comparison of the C-terminal sequences of mInsc and voltage-dependent sodium channel Nav1.4. (b) Comparison of the C-terminal sequences of DTWD2 and GluR2. AA identified in Figure 4 at each position are indicated with diffent colors (-4: green, -3: red, -1: yellow). (c) Bindings between mInsc and PSD-95 in physiological conditions, which is dependent on the PB motif of mInsc. (d) Bindings between DTWD2 and GRIP1 mediated by PB motif of DTWD2. (e) PB motifs of NS1 proteins of several Influenza strains. C-terminal thirty AA sequences of NS1 proteins are shown. Upper strain (A/Brevig Mission/1/1918(H1N1)) caused Spanish Flu in 1918 and middle (A/Hong Kong/213/2003(H5N1)) caused outbreaks of avian influenza among humans in 2003-2004. The lower (A/New York/1/2003(H3N2)) is shown as a representative example of a strain that causes seasonal flu widely and predominantly spreads among humans. AAs identified in Figure 4 are also indicated with different colors (positions - 4, -3 and -1 are indicated in green, red and yellow, respectively).

Finally, we tested the hypothesis that the refined PB motifs correspond to evolutionary selected sequences by examining the co-evolution of virus pathogens. Several types of virus express viral proteins that possess type-I PB motifs at their C-terminal ends and bind to cellular PDZ proteins [58]. The PB motif sequences of NS1 proteins, viral proteins of influenza, vary with the isolates of influenza strains, whose pahtogenicity can correlate with the binding activity of the PB motif of NS1 with cellular PDZ proteins [5961]. These results prompted us to test whether NS1 derived from highly pathogenic strain possess the refined PB motifs shown in Figure 4. As shown in Figure 5e, the NS1 proteins of the highly pathogenic influenza viruses H1N1 that caused the "Spanish Flu" in 1918 and H5N1 that caused several outbreaks of Avian flu in Asia in 2003-2004 possess I-K-S-E-V and I-E-S-E-V motifs at their C-terminal ends, respectively [62, 63]. These correspond to some of the refined PB motifs identified here, I-x-S-x-V, x-K-S-x-V and x-x-S-E-V in Spanish flu NS1 (Figure 5e, top row) and I-x-S-x-V, x-E-S-x-V and x-x-S-E-V in Avian flu NS1 (Figure 5e, middle row). In contrast, the PB motifs of the NS1 proteins derived from low-pathogenic strains producing seasonal flu (H3N2) correspond to a non-refined A-R-S-K-V (compare to Figure 5e, bottom row). Interestingly, two of the three-positions specified PB motifs, I-x-S-x-V and x-K-S-x-V, found in highly pathogenic strain are specifically found in human (Figure 4e, column 'h'), which may suggest that these strains of highly pathogenic influenza viruses have evolved to efficiently bind to human PDZ proteins. These results suggest that the three-position specified PB motifs should be evaluated as potential indicators of viral pathogenicities.

Conclusions

We did a genome-level comprehensive study of the PB motif variants present in five phylogenetically distant species. We have shown that PB motifs are preferentially located at the C-terminal ends of proteins, in line with experimental results showing that PDZ interactions preferentially take place with C-terminal PB motifs. Our analysis identified specific AA usage bias for the -4, -3 and -1 positions surrounding the "classical" two-position-specified PB motifs, x-x-S/T-x-I/L/V and x-x-Φ-x-Φ. Ontological analysis of the proteins presenting this refined C-terminal PB motifs revealed very specific bias toward signaling and transport proteins. PDZ-type interactions are known to play key roles in these cellular processes, suggesting that the protein subset with refined PB motif are likely to be engaged in genuine PDZ-type interaction. By correlating motif position with sequence variation, the innovative analysis method presented here allows to detect fine variations in protein motifs, across variants and across species, while not requiring any training set. Being orthogonal with previously described strategies, we have shown that it provides a complementary approach to refine in silico predictions. Because these in silico analyses are applicable to any species whose protein sequences are comprehensively registered into databases, the methodology shown here has general applicability in discovering and evaluating any protein motif with an identified positional biases.

Methods

Bioinformatics

We downloaded protein sequences ('dataset_1' in Additional file 1) assigned by 'protein_coding genes' with gene ID numbers and protein ID numbers from Ensembl project http://www.ensembl.org/index.html [39] by using BioMart (Additional file 1). A version of the dataset was Release 55. Because each dataset contains the information of single species, the following procedures were separately done for the five species. After removing extraneous characters, each text line contains a single gene ID, a single protein ID and a single protein sequence. We further extracted protein sequences that contain asterisks (*) denoting stop codons and more than fifty-four AA long to perform the C0 to C50 searches ('dataset_2' in Additional file 1). Specifically for human datasets, proteins encoded in the haplotypic chromosomal regions, denoted by chromosome name HSCHR6_MHC_APD, HSCHR6_MHC_COX, HSCHR6_MHC_DBB, HSCHR6_MHC_MANN, HSCHR6_MHC_MCF, HSCHR6_MHC_QBL, HSCHR6_MHC_SSTO, HSCHR4_1 and HSCHR17_1, were removed to avoid multiple identifications of the same genes [64]. The numbers of proteins and genes in dataset_1 and _2 are shown in Additional file 2. Fifty-one data subsets ('dataset_C0' to 'dataset_C50' in Additional file 1) were generated for each species, based on the position of the motif within C0-C50, then C0-C50 searches were performed. All the Perl and UNIX scripts corresponding to these steps are available upon request to the author.

Data analysis and statistics

All the statistical tests were performed using KyPlot 5.0 software (KyenceLab Inc. Japan). Non-parametric Mann-Whitney test or Steel test was used to examine statistical difference. P-values are indicated in each figure.

Detection of over-represented GO molecular function term

The Ensembl gene IDs were converted to Entrez Gene ID using web-based tool Clone/Gene ID Converter, version 2.0 http://idconverter.bioinfo.cnio.es/ [65]. The over-represented ontological categories were identified using PIPE2 http://pipe2.systemsbiology.net/PIPE2/ [66] with the Entrez Gene IDs.

Co-immunoprecipitation assay

The cDNAs encoding full-length mInsc, DTWD2, PSD-95 and GRIP1 were amplified from mouse brain cDNA libraries by PCR and subcloned into pCMV-Tag2 or pCMV-Tag3 (Clontech) for the expression of Flag-tagged or Myc-tagged proteins, respectively. As for mInsc and DTWD2, deletion mutants lacking C-terminal four AAs were also constructed. Transfection of these plasmids into COS-7 cells (RIKEN Cell Bank) was performed using Lipofectamine 2000 (Invitrogen) according to the manufacturer's protocols. Transfected cells were lysed in Tris buffer (120 mM NaCl, 1 mM EDTA, 20 mM Tris-Cl pH 7.5, 0.5% (v/v) Triton X-100, protease inhibitors cocktail) and briefly sonicated. Lysates were centrifugated (10 min; 15,000 × g) to remove insoluble matter. Anti-Flag-M2 agarose (Sigma-Aldrich) were added to the supernatant fraction and incubated for 2 hrs at 4°C. After washing, all precipitated complex were denatured in SDS sample buffer and subjected to SDS-PAGE followed by Western blot analysis using anti-Myc antibody (Santa Cruz Biotechnology) or anti-Flag antibody (Sigma-Aldrich), and chemiluminescence-based detection system ECL plus (GE Healthcare).