Background

Gene duplication can be a major source of innovation in evolution [1], providing redundancy and additional genetic material to build upon and differentiate. In general, eukaryotic genomes contain a large fraction of gene duplicates, with paralogs stemming not only from single gene or segmental duplications, but, in the case of S. cerevisiae, also from a Whole-Genome Duplication event that occurred approximately 100 mln years ago (WGD; [2, 3]). Genomic instability and massive gene loss promptly followed WGD and purged most of the newly formed gene copies from the yeast genome, retaining approximately 10% of them [3]. Today, using multiple genomes of related fungal species with conserved synteny, we can unambiguously identify hundreds of gene pairs as WGD paralogs [4] in addition to normal small scale paralogs.

The identification of paralogs of WGD origin, in conjunction with the wealth of data on physical protein interactions and derived maps of protein complexes, puts us in an unprecedented position to test the fate of nascent duplicated genes and to potentially identify cases of duplication of whole complexes. Recently, it has been shown that, after gene duplication, protein interactions can be conserved [5, 6]. The data suggested that there exists a stepwise pathway of evolution for such functional modules [6], with duplications of homomeric interactions known to have a significant influence on the evolution of genes [5]. Moreover, it is known that gene duplicates can be found less often among the core components of protein complexes compared to sparse regions of protein interaction network [8, 9] and synthetic lethality rate [10], by displaying different phenotypic effects when deleted [11] and occurrence across functional classes (e.g., stress responsive genes, [8]). Musso and colleagues [9] show that nearly half of WGD paralogs co-cluster in the same protein complex. Amoutzias and colleagues [12] indicate that whole genome duplication did not change the dimerization specificities of interacting homologs. Here, we show a much more detailed spectrum of evolutionary and functional fates of higher order protein complex subunits. This integrated overview, enables us to quantify the fates with respect to the duplication type and address questions related to protein specialization (subfunctionalization), as well as the emergence of novel functions related to complexes (neofunctionalization).

Our hypotheses were tested on various types of manually curated data: both complexes from MIPS consortium [13], and those annotated by SGD [14]. To avoid a possible bias introduced by manual curation, we also use computationally derived maps of complexes [15, 16], reconstruction of which was possible owing to recent mass-spectrometry studies [17, 18]. Integration of these datasets allowed us to systematically study the fates of all gene duplicates which are involved in protein complexes.

Results

The fates of duplicate genes in complexes

We carried out a systematic analysis of the fate of paralogs in protein complexes. From our first observations it became clear that the cytosolic ribosomal complex dominates the whole spectrum of gene duplications. In order to prevent this single protein complex to dominate our results, we analyze it separately (see Methods). The fates of other paralogs found within complexes fall into two other categories (Figure 1 and 2). Intra-complex paralogs (I) that are formed when both resulting genes remain within the same protein complex, whereas bi-complex paralogs (II) function within two separate complexes. The third class, which we define as overhangs (III), consists of subunits of complexes with a paralog possessing no association to a known protein complex whatsoever. SSD and WGD paralogs are equally divided over intra-complex and overhang classes, but differ with respect to the bi-complex class: many more SSD paralogs are present in two complexes compared to WGD paralogs (Figure 2b). We discuss this observation below.

Figure 1
figure 1

Complex fate of paralogs. a) Gene duplication and subsequent divergence, for cytosolic ribosomal proteins (cRP) followed by homogenizing gene conversion events. b) Impact of duplicated proteins on complexes. Intra-complex duplications include dosage increase, interacting homologs and module variants. Dosage increase requires many components of the complex to duplicate simultaneously (as in the case of cRP and the whole genome duplication). For interacting homologs, the two duplicated proteins become physically subunits of the complex (e.g., homomers turning into heterodimers after the duplication). In module variants only one of the two paralogs is present in the protein complex at a given time. Bi-complex paralogs operate in different protein complexes; two possible evolutionary routes are shown. Overhangs do not aggregate with other proteins in a non-transient manner, while their paralogs do.

Figure 2
figure 2

The roles of paralogs in protein complexes. a) Shaded areas mark a complex, dashed lines connect paralogs. I) Intra-complex paralogs: when both proteins participate in the same complex; ARG transcription complex includes an intra-complex duplication of genes encoding FUN80 and ARGR1 subunits. II) Bi-complex paralogs: two proteins are involved in different protein complexes; two small complexes are shown: zeta DNA polymerase complex (left) and delta DNA polymerase complex (right). Pair REV3/CDC2 are bi-complex paralogs. III) Overhangs: only one of the paralogs constitutes a subunit of a complex, while its homolog does not aggregate with other proteins in a non-transient manner; Vps4p ATPase transport complex. Here, CHM2 protein (a paralog of DID3) represents an overhang. b) Type of duplication and their contribution to protein complexes: left, whole genome duplication (cytoplasmic ribosomal proteins excluded), and right, small scale duplications. On the pie chart, fractions of all paralog pairs are denoted. Protein complex annotations after SGD consortium.

Intra-complex paralogs: retention is an important fate of paralogs within complexes

We observe a very strong preference for both duplicated proteins to function in the same module. Compared to a null model, where proteins are stochastically reshuffled between complexes, intra-complex paralogs are ~40-fold overrepresented (SGD modules, [14]). This preference is similar, and not statistically different for both duplication types (P = 0.97, chi-square test) and holds for other module definitions, including the computationally derived protein complexes from complex co-purification experiments (see additional file 1, Table S1). Paralog retention within the module is thus an important factor in sha** the map of protein complexes.

We thus recover the previously made observation that WGD and SSD paralogs are known to act within the ancestral protein complex after the duplication [

References

  1. Ohno S: Evolution by Gene Duplication. 1970, London: Allen & Unwin

    Book  Google Scholar 

  2. Wolfe KH, Shields DC: Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997, 387: 708-13. 10.1038/42711.

    Article  CAS  PubMed  Google Scholar 

  3. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423: 241-54. 10.1038/nature01644.

    Article  CAS  PubMed  Google Scholar 

  4. Byrne KP, Wolfe KH: The Yeast Gene Order Browser: Combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 2005, 15: 1456-146110. 10.1101/gr.3672305.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA: Evolution of protein complexes by duplication of homomeric interactions. Genome Biol. 2007, 8: R51-10.1186/gb-2007-8-4-r51.

    Article  PubMed Central  PubMed  Google Scholar 

  6. Pereira-Leal JB, Teichmann SA: Novel specificities emerge by stepwise duplication of functional modules. Genome Res. 2005, 15: 552-9. 10.1101/gr.3102105.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Li L, Huang Y, **a X, Sun Z: Preferential duplication in the sparse part of yeast protein interaction network. Mol Biol Evol. 2006, 23: 2467-73msl121. 10.1093/molbev/msl121.

    Article  CAS  PubMed  Google Scholar 

  8. Wapinski I, Pfeffer A, Friedman N, Regev A: Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007, 449: 54-61. 10.1038/nature06107.

    Article  CAS  PubMed  Google Scholar 

  9. Musso G, Zhang Z, Emili A: Retention of protein complex membership by ancient duplicated gene products in budding yeast. Trends Genet. 2007, 23:

    Google Scholar 

  10. Guan Y, Dunham MJ, Troyanskaya OG: Functional Analysis of Gene Duplications in Saccharomyces cerevisiae. Genetics. 2007, 175: 933-94310. 10.1534/genetics.106.064329.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Hakes L, Pinney J, Lovell S, Oliver S, Robertson D: All duplicates are not equal: the difference between small-scale and genome duplication. Genome Biology. 2007, 8: R20910-10.1186/gb-2007-8-10-r209.

    Article  Google Scholar 

  12. Amoutzias GD, Veron AS, Weiner J, Robinson-Rechavi M, Bornberg-Bauer E, Oliver SG, Robertson DL: One billion years of bZIP transcription factor evolution: conservation and change in dimerization and DNA-binding site specificity. Mol Biol Evol. 2007, 24: 827-35msl211. 10.1093/molbev/msl211.

    Article  CAS  PubMed  Google Scholar 

  13. Mewes HW, Amid C, Arnold R, Frishman D, Güldener U, Mannhaupt G, Münsterkötter M, Pagel P, Strack N, Stümpflen V, Warfsmann J, Ruepp A: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 2004, 32: D41-4. 10.1093/nar/gkh092.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Krieger CJ, Livstone MS, Miyasato SR, Nash RS, Oughtred R, Skrzypek MS, Weng S, Wong ED, Zhu KK, Dolinski K, Botstein D, Cherry JM: Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res. 2008, 36: D577-81gkm909. 10.1093/nar/gkm909.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  15. Hart GT, Lee I, Marcotte E: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics. 2007, 8: 236-10.1186/1471-2105-8-236.

    Article  PubMed Central  PubMed  Google Scholar 

  16. Collins SR, Kemmeren P, Zhao X, Greenblatt JF, Spencer F, Holstege FCP, Weissman JS, Krogan NJ: Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics. 2007, 6: 439-50.

    Article  CAS  PubMed  Google Scholar 

  17. Gavin A, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B, Edelmann A, Heurtier M, Hoffman V, Hoefert C, Klein K, Hudak M, Michon A, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, et al: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440: 631-6. 10.1038/nature04532.

    Article  CAS  PubMed  Google Scholar 

  18. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-64310. 10.1038/nature04670.

    Article  CAS  PubMed  Google Scholar 

  19. Planta RJ, Mager WH: The list of cytoplasmic ribosomal proteins of Saccharomyces cerevisiae. Yeast. 1998, 14: 471-7. 10.1002/(SICI)1097-0061(19980330)14:5<471::AID-YEA241>3.0.CO;2-U.

    Article  CAS  PubMed  Google Scholar 

  20. Kellis M, Birren BW, Lander ES: Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004, 428: 617-24. 10.1038/nature02424.

    Article  CAS  PubMed  Google Scholar 

  21. Angus-Hill ML, Schlichter A, Roberts D, Erdjument-Bromage H, Tempst P, Cairns BR: A Rsc3/Rsc30 Zinc Cluster Dimer Reveals Novel Roles for the Chromatin Remodeler RSC in Gene Expression and Cell Cycle Control. Molecular Cell. 2001, 7: 741-751. 10.1016/S1097-2765(01)00219-2.

    Article  CAS  PubMed  Google Scholar 

  22. Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T: Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci USA. 1989, 86: 9355-9. 10.1073/pnas.86.23.9355.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  23. de Lichtenberg U, Jensen LJ, Brunak S, Bork P: Dynamic Complex Formation During the Yeast Cell Cycle. Science. 2005, 307: 724-72710. 10.1126/science.1105103.

    Article  CAS  PubMed  Google Scholar 

  24. Mbonyi K, van Aelst L, Arguelles JC, Jans AW, Thevelein JM: Glucose-induced hyperaccumulation of cyclic AMP and defective glucose repression in yeast strains with reduced activity of cyclic AMP-dependent protein kinase. Mol Cell Biol. 1990, 10: 4518-4523.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  25. Hodge MR, Kim G, Singh K, Cumsky MG: Inverse regulation of the yeast COX5 genes by oxygen and heme. Mol Cell Biol. 1989, 9: 1958-1964.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Burke PV, Raitt DC, Allen LA, Kellogg EA, Poyton RO: Effects of Oxygen Concentration on the Expression of Cytochrome c and Cytochrome c Oxidase Genes in Yeast. J Biol Chem. 1997, 272: 14705-1471210. 10.1074/jbc.272.23.14705.

    Article  CAS  PubMed  Google Scholar 

  27. Steinmetz LM, Scharfe C, Deutschbauer AM, Mokranjac D, Herman ZS, Jones T, Chu AM, Giaever G, Prokisch H, Oefner PJ, Davis RW: Systematic screen for human disease genes in yeast. Nat Genet. 2002, 31: 400-4.

    CAS  PubMed  Google Scholar 

  28. Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, Proctor M, St Onge RP, Tyers M, Koller D, Altman RB, Davis RW, Nislow C, Giaever G: The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science. 2008, 320: 362-5. 10.1126/science.1150021.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  29. Romisch K: Surfing the Sec61 channel: bidirectional protein translocation across the ER membrane. J Cell Sci. 1999, 112: 4185-4191.

    CAS  PubMed  Google Scholar 

  30. Sommer T, Wolf DH: Endoplasmic reticulum degradation: reverse protein flow of no return. FASEB J. 1997, 11: 1227-33.

    CAS  PubMed  Google Scholar 

  31. Robb A, Brown JD: Protein transport: two translocons are better than one. Mol Cell. 2001, 8: 484-6. 10.1016/S1097-2765(01)00339-2.

    Article  CAS  PubMed  Google Scholar 

  32. Kaeberlein M, Guarente L: Saccharomyces cerevisiae MPT5 and SSD1 function in parallel pathways to promote cell wall integrity. Genetics. 2002, 160: 83-95.

    PubMed Central  CAS  PubMed  Google Scholar 

  33. Mitchell P, Petfalski E, Shevchenko A, Mann M, Tollervey D: The exosome: a conserved eukaryotic RNA processing complex containing multiple 3'-->5' exoribonucleases. Cell. 1997, 91: 457-66. 10.1016/S0092-8674(00)80432-8.

    Article  CAS  PubMed  Google Scholar 

  34. Noguchi E, Hayashi N, Azuma Y, Seki T, Nakamura M, Nakashima N, Yanagida M, He X, Mueller U, Sazer S, Nishimoto T: Dis3, implicated in mitotic control, binds directly to Ran and enhances the GEF activity of RCC1. EMBO J. 1996, 15: 5595-605.

    PubMed Central  CAS  PubMed  Google Scholar 

  35. van Noort V, Snel B, Huynen MA: Predicting gene function by conserved co-expression. Trends in Genetics. 2003, 19: 238-242. 10.1016/S0168-9525(03)00056-8.

    Article  CAS  PubMed  Google Scholar 

  36. Conant GC, Wolfe KH: Functional partitioning of yeast co-expression networks after genome duplication. PLoS Biol. 2006, 4: e109-10.1371/journal.pbio.0040109.

    Article  PubMed Central  PubMed  Google Scholar 

  37. Warner JR: The economics of ribosome biosynthesis in yeast. Trends Biochem Sci. 1999, 24: 437-40. 10.1016/S0968-0004(99)01460-7.

    Article  CAS  PubMed  Google Scholar 

  38. Planta RJ: Regulation of ribosome synthesis in yeast. Yeast. 1997, 13: 1505-18. 10.1002/(SICI)1097-0061(199712)13:16<1505::AID-YEA229>3.0.CO;2-I.

    Article  CAS  PubMed  Google Scholar 

  39. Li H, Pellegrini M, Eisenberg D: Detection of parallel functional modules by comparative analysis of genome sequences. Nat Biotechnol. 2005, 23: 253-60nbt1065. 10.1038/nbt1065.

    Article  CAS  PubMed  Google Scholar 

  40. Weiss H, Friedrich T, Hofhaus G, Preis D: The respiratory-chain NADH dehydrogenase (complex I) of mitochondria. Eur J Biochem. 1991, 197: 563-76. 10.1111/j.1432-1033.1991.tb15945.x.

    Article  CAS  PubMed  Google Scholar 

  41. Finel M: Organization and evolution of structural elements within complex I. Biochim Biophys Acta. 1998, 1364: 112-219593850. 10.1016/S0005-2728(98)00022-X.

    Article  CAS  PubMed  Google Scholar 

  42. Huynen MA, Gabaldón T, Snel B: Variation and evolution of biomolecular systems: searching for functional relevance. FEBS Lett. 2005, 579 (8): 1839-1845. 10.1016/j.febslet.2005.02.004.

    Article  CAS  PubMed  Google Scholar 

  43. Cairns BR, Lorch Y, Li Y, Zhang M, Lacomis L, Erdjument-Bromage H, Tempst P, Du J, Laurent B, Kornberg RD: RSC, an essential, abundant chromatin-remodeling complex. Cell. 1996, 87: 1249-60. 10.1016/S0092-8674(00)81820-6.

    Article  CAS  PubMed  Google Scholar 

  44. Scannell DR, Byrne KP, Gordon JL, Wong S, Wolfe KH: Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature. 2006, 440: 341-5. 10.1038/nature04562.

    Article  CAS  PubMed  Google Scholar 

  45. Komili S, Farny NG, Roth FP, Silver PA: Functional Specificity among Ribosomal Proteins Regulates Gene Expression. Cell. 2007, 131: 557-571. 10.1016/j.cell.2007.08.037.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  46. Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30: 207-10. 10.1093/nar/30.1.207.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  47. Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Schuldiner M, Gebbia M, Recht J, Shales M, Ding H, Xu H, Han J, Ingvarsdottir K, Cheng B, Andrews B, Boone C, Berger SL, Hieter P, Zhang Z, Brown GW, Ingles CJ, Emili A, Allis CD, Toczyski DP, Weissman JS, Greenblatt JF, Krogan NJ: Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007, 446: 806-10. 10.1038/nature05649.

    Article  CAS  PubMed  Google Scholar 

  48. He B, Chen P, Chen SY, Vancura KL, Michaelis S, Powers S: RAM2, an essential gene of yeast, and RAM1 encode the two polypeptide components of the farnesyltransferase that prenylates a-factor and Ras proteins. Proc Natl Acad Sci USA. 1991, 88: 11373-7. 10.1073/pnas.88.24.11373.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  49. Witter DJ, Poulter CD: Yeast geranylgeranyltransferase type-II: steady state kinetic studies of the recombinant enzyme. Biochemistry. 1996, 35: 10454-63. 10.1021/bi960500y.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We would like to thank Ken Wolfe and Gavin Conant for insightful comments. Authors are grateful to Like Fokkens and Jos Boekhorst for discussions, Martin Oti for co-expression dataset, Joanna Parmley for carefully reading the manuscript and Patrick Kemmeren for sharing protein complex data. We would also like to thank anonymous reviewers for their valuable comments. This work was supported by the Netherlands Genomics Initiative (Horizon programme).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Berend Snel.

Additional information

Authors' contributions

RS and BS designed the study. RS performed the analysis. MH contributed analytical methods. RS and BS wrote the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Szklarczyk, R., Huynen, M.A. & Snel, B. Complex fate of paralogs. BMC Evol Biol 8, 337 (2008). https://doi.org/10.1186/1471-2148-8-337

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2148-8-337

Keywords