Background

The so-called WD-repeat (WDR) proteins comprise an astonishingly diverse superfamily of regulatory proteins, representing the breadth of biochemical mechanisms and cellular processes. These proteins have been found to play key roles in such disparate mechanisms as signal transduction, cytoskeletal dynamics, protein trafficking, nuclear export, and RNA processing, and are especially prevalent in chromatin modification and transcriptional mechanisms. WDR proteins are intimately involved in a variety of cellular and organismal processes, including cell division and cytokinesis, apoptosis, light signaling and vision, cell motility, flowering, floral development, and meristem organization, to name a few. Within the cell, WDR proteins have been found to be components of the cytoplasm or nucleoplasm, linked to the cytoskeleton, or associated with membranes through binding to membrane proteins or through membrane-interacting, ancillary domains. Known WDR proteins range in size from small proteins such as the pleiotropic plant developmental regulator VIP3, to massive (>400-kDa) proteins such as the mammalian protein trafficking factor Lyst.

The common and defining feature of these proteins is the WD (also called Trp-Asp or WD-40) motif, a ~40-amino acid stretch typically ending in Trp-Asp, but exhibiting only limited amino acid sequence conservation [1]. When present in a protein, the WD motif is typically found as several (4–10) tandemly repeated units. In the WDR proteins for which structure has been determined, including a mammalian Gβ subunit of heterotrimeric GTPases, repeated WD units form a series of four-stranded, antiparallel beta sheets [[2]; D.K. Wilson, pers. commun.], which fold into a higher-order structure termed a β-propeller. This structure can be visualized as a short, open cylinder where the strands form the walls [2]. At least four repeats are believed to be required to form a β-propeller [3]. In Gβ, which contains seven WDRs, the first and last (i.e., amino- and carboxyl-terminal) WDRs participate in the same propeller blade, potentially reinforcing the structure (for an extensive discussion of WD motifs and WDR structure, the reader is referred to Smith et al., 1999 [2]).

It is now accepted that WDR domains within proteins act as sites for interaction with other proteins. This characteristic of WDRs allows for three general functional roles. First, WDRs within one protein can provide binding sites for two or more other proteins and foster transient interactions among these other proteins. This type of role is best illustrated by Gβ, which has been the most extensively studied of the WDR proteins. The heterotrimeric GTPases in which Gβ s participate functionally associate with a variety of heptahelical membrane receptors (G protein-coupled receptors or GPCRs) to propagate cellular response to a multitude of extracellular signals. Upon receptor activation by an extracellular ligand, Gβ, along with the tightly bound Gγ peptide, dissociates from the Gα subunit, and both Gβγ and Gα then can interact with a variety of effectors. Gβ associates reversibly with at least 14 other proteins, including phospholipases, adenylate cyclases, and ion channels [4]. Another example of this type of role is the yeast histone acetylase subunit Hat2, which is required for efficient interaction of the catalytic subunit Hat1 with the target histone [5]. In both Gβ and Hat2 (and in many other WDR proteins) nearly all of the protein is composed of WDRs.

A second potential role of WDR proteins is as an integral component of protein complexes. This functional mode is probably best illustrated by the snoRNP U3 particle, involved in splicing of the small subunit ribosomal RNAs. Of the 28 characterized subunits of U3, no less than 7 are WDR proteins [6]. Another example is yeast Pfs2, a protein that is found associated with the poly(A) polymerase Pap1 and several multisubunit factors in a large protein complex required for pre-mRNA 3'-end processing and polyadenylation [7]. Within this large complex, Pfs2 interacts directly with specific subunits of two of the processing factors, suggesting that Pfs2 is important for integrity of the larger complex [8]. Many other WDR proteins have been found in relatively stable complexes, including the nuclear pore complex [9], the general transcription factor TFIID [10, 11], and the yeast SET1 histone methyltransferase complex [12].

A third recognized role of the WDR is to act as a modular interaction domain of larger proteins. The presumed role of the WDR in these cases is to bring the protein and associated ancillary domain(s) into proximity of its target(s). Two examples in plants are the light signaling proteins COP1 and SPA1, which juxtapose carboxyl-terminal WDRs with an amino-terminal ring-finger or kinase-like domain, respectively (below). Other common examples of ancillary domains seen in WDR proteins from yeast, animals or plants include the F-box, SET domain, and bromodomain (not shown).

Many WDR-containing proteins of unknown function have been designated as 'Gβ-like', even in the absence of any sequence-based or functional relationship with Gβ. These misleading annotations suggest that a phylogenetic analysis of this superfamily is needed. Here, we evaluated the extent of the predicted WDR protein superfamily in Arabidopsis, and the sequence-based and functional relationships between these proteins and known or hypothetical proteins from budding yeast (Saccharomyces cerevisiae), fruit fly (Drosophila melanogaster) and humans (Homo sapiens). Our results suggest that most Arabidopsis WDR proteins are strongly conserved across eukaryotes, including those that have been found to play key roles in plant-specific processes.

Results and Discussion

The Arabidopsis WDR protein family

This analysis identified 269 Arabidopsis proteins containing at least one copy of the WD motif. The vast majority of these (237) contained four or more recognizable copies of the motif. We classified these 237 proteins into 143 distinct families, 49 of which contained more than one Arabidopsis member. Approximately 113 of these families or individual proteins showed clear homology with WDR proteins from yeast, fly, and/or human (Table 1 [see Additional file 1]). Where conservation was found, it often extended across all of these organisms, suggesting that many of these proteins are components of basic cellular mechanisms.

The Arabidopsis proteome is apparently lacking counterparts of several WDR proteins that have been extensively studied in other eukaryotes and might have been expected to be conserved. For example, we found no protein related to the cell death initiator Dark (fly)/Apaf-1 (human). This protein is the central scaffolding of the apoptosome, a protein complex that activates specific cellular proteases in response to death signals [13]. Many parallels exist between animal and plant apoptotic pathways, and many other components of the animal pathways have been strongly conserved in plants [14]. Arabidopsis also appears to lack a protein closely related to the intermediate chain of the microtubule motor protein dynein, involved in transporting cellular cargo along microtubules. In mammalian cytoplasmic dynein, the intermediate chain plays a crucial scaffolding role, mediating interactions among the heavy chain and other dynein subunits [15]. Arabidopsis was previously hypothesized to lack the dynein heavy chain, based on nearly-completed genomic sequence [16]. It was suggested that if Arabidopsis did lack functional dynein, this could be compensated for by the relative variety of carboxyl-terminal motor domain kinesins in this species [17].

In contrast to these apparently lacking proteins, we found several proteins that were not expected. One example is a protein very closely related to Notchless (Nle), a fly protein that binds to the intracellular domain of the developmental signal receptor Notch and modulates its activity [18] (Fig. 1). Arabidopsis lacks a recognizable Notch, and other components of the associated signaling pathways appear to be absent [19]. In addition, we found two proteins strongly related to the transcription-coupled, DNA-repair (TCR) proteins Rad28 (yeast) and Csa (human), even though plants are not known to undergo TCR (Fig. 1).

Figure 1
figure 1

Domain structure of selected Arabidopsis WDR proteins, and selected homologous proteins from yeast, fly and human. Domains were identified as described in Methods. Regions of homology among homologous proteins are indicated with a grey background.

Also notable were several cases where Arabidopsis apparently did not participate in the expansion of gene families seen in the other eukaryotes. One example is a conserved component of the transcription factor TFIID, represented by the human TAFII-100 protein. This protein interacts directly with at least three other components of TFIID, and thus probably serves as a scaffolding for construction of the complex [10, 11]. Multiple paralogs of this protein exist in fly, worm, and human (Table 1 [see Additional file 1] and not shown); in flies a form designated Cannonball (Can) appears to operate outside of basal transcription as a key regulator of spermatogenesis [20]. One possibility is that the paralogs within each species act as a interchangeable components of the general transcription machinery to mediate expression of developmentally regulated target genes [21]. The presence of only a single form of this protein in the Arabidopsis proteome suggests that this potential means of expanding the transcriptional repertoire has not evolved in plants. Another example of an evolutionarily stagnant family is Gβ, which exists as only a single form in Arabidopsis (Table 1 [see Additional file 1]; below). In mammals, each of the heterotrimeric G protein subunits, as well as the GPCRs, are encoded by multigene families, and combinatorial interaction among the proteins are believed to modulate much of the diversity of response to extracellular signals. The restriction of this gene family in Arabidopsis would suggest that, if a 'typical' heterotrimeric G protein does exist, it would likely lack the functional complexity seen in mammals. This scenario would be similar to that in yeast, where only a single heterotrimeric G protein, incorporating the Gβ protein Ste4, has a specialized role in transducing mating type signals from heptahelical mating-factor receptors [22].

In contrast, some other Arabidopsis WDR proteins show relatively expanded gene families compared with the other eukaryotes studied. One of the largest Arabidopsis WDR families, consisting of nine members, is orthologous to the conserved Cdc20/Fizzy class of cell cycle regulators including yeast Cdc20 and Cdh1 (Table 1 [see Additional file 1]). These proteins activate the anaphase promoting complex (APC) ubiquitin ligase, which targets downstream cell cycle regulators for proteolysis [23], potentially by mediating interaction of the APC complex with target proteins. Mutation in Cdc20 or Cdh1 affect distinct aspects of the cell cycle, and Cdc20 and Cdh1 coimmunoprecipitate with distinct APC target proteins [24, 25], indicating that these proteins have non-overlap** functions. One explanation for the expansion of this family in Arabidopsis is that the several distinct proteins each specify distinct targets for the APC. Another example of an expanded gene family is the MSI1/RbAp48 group of chromatin-related proteins, which includes five members in Arabidopsis (below), but is represented by only a single form in flies (Table 1 [see Additional file 1]).

Several examples were seen where Arabidopsis WDR proteins have used elements from the inceptive 'molecular toolbox' in original ways. One example is the pleiotropic developmental regulator LEUNIG (LUG) [26]. LUG contains seven, carboxyl-terminal WD motif repeats, internal polyglutamine tracts, and an extended motif termed the single-stranded DNA-binding-protein (SSDP) motif ([27], Fig. 1 and not shown). The SSDP motif was defined in a small family of animal proteins including chicken SSDP, which binds to a single-stranded, polypyrimidine region of the α2(I) collagen promoter [28]. SSDP-like proteins function in transcriptional complexes with LIM homeodomain proteins and LIM-domain-binding proteins (Ldbs) to regulate specific embryonic developmental processes [29]. The arrangement of the SSDP motif with carboxyl-terminal WD repeats appears to be unique to LUG and its orthologs from other plants (not shown). However, the juxtapositioning of polyglutamine tracts with carboxyl-terminal WD repeats, while diverging from the domain structure seen in the SSDPs, resembles that seen in the yeast transcriptional corepressor Tup1 and a related corepressor from fly, Groucho [27]. This, in conjunction with the observation that loss of LUG activity leads to ectopic expression of a floral regulatory gene [30] has led to the speculation that LUG acts as a Tup1/Groucho-like transcriptional corepressor [27]. Intriguingly, it was recently shown that LUG functions in floral development together with SEUSS (SEU), a protein related to the mammalian LIM-domain-binding protein, Ldb1 [26], suggesting the existence of a LUG-SEU transcriptional complex analogous to that involving LIM proteins. Collectively, this information suggests that LUG participates in an evolutionarily distinct mechanism of gene regulation incorporating elements of both Tup1/Groucho and Ldbs.

Linking conserved mechanisms with plant-specific processes: Functional specificity through divergence in regulatory targets

With the exception of LUG, the Arabidopsis WDR proteins that have been functionally characterized are strongly conserved within the WDR regions among yeast, fly and/or human (Table 1 [see Additional file 1]). Most of these proteins have been identified as components of basic cellular machinery in these other eukaryotes, yet have been found to regulate plant-specific processes (Table 2). An interesting question for further consideration is how these proteins have become adapted to their plant roles.

Table 2 Conserved Arabidopsis WD repeat proteins of known function. Indicated are the plant-specific process(es) in which these proteins participate, and linkage to basic cellular mechanisms through homologous proteins from other eukaryotes (references within text).

In several cases, the homologous WDR proteins are highly conserved throughout the length of the proteins, and appear to operate in highly analogous mechanisms, with specificity in function conferred by changes in upstream signaling pathways and/or downstream effectors. One case is AGB1, the only clear Arabidopsis ortholog of Gβ [31] (Fig. 1). Loss of AGB1 function leads to developmental pleiotropy including shortened fruits [32] and changes in patterns of cell division in the hypocotyl and root [33]. These phenotypes are associated with the derepression of genes that are normally turned on by auxin, suggesting a role for AGB1 as a negative regulator of auxin signaling [33]. There appears to be one Gα-like protein (GPA1) and two Gγ-like proteins (AGG1 and AGG2) in the Arabidopsis proteome [31], and molecular modeling and yeast two-hybrid studies of potential interactions among AGB1, GPA1 and AGG1 are not inconsistent with the possibility that these could form a heterotrimeric protein [33, 34]. In addition, both AGG1 and AGG2 contain domains expected to recruit AGB1 to membranes [34], and GPA1 has been demonstrated to bind GTP(γ)S [35]. These findings lead to the prediction that AGB1 participates in a prototypical heterotrimeric G protein. However, the Arabidopsis proteome does not contain obvious heptahelical receptors with which a heterotrimeric G protein might interact [31]. One possibility is that the AGB1-containing G protein might be unlinked from a receptor. In animals, several receptor-independent activators of heterotrimeric G proteins are known, including the Ras-related protein Ags1 [3). The only of these to be studied to date, the peroxisomal import receptor-associated protein PEX7, appears to be very closely related in function to its yeast and human counterpart [73] (Fig. 1), and this is a preliminary indication that such plants studies will be highly relevant.

Table 3 WDR genes associated with human disease and their Arabidopsis homologs.

Methods

Predicted Arabidopsis proteins containing at least one WD motif were identified using motif-search software maintained by The Arabidopsis Information Resource [74] and current InterPro signatures (Prosite PS50294, PS00678, or PS50082; Pfam PF00400, PRINTS PR00320, or SMART SM0320 [75]). The database used for this analysis, ATH1. pep, was provided by The Institute for Genomic Research (TIGR) and was released Apr 17, 2003. Proteins containing at least four WD motifs were assigned into families using Blastclust (unpublished, available from the National Center for Biotechnology Information [76]), a single linkage clustering tool that uses the BLAST algorithm to determine distance. Blastclust uses these default values for the BLAST: matrix BLOSUM62, gap opening cost 11, gap extension cost 1, no low-complexity filtering, and an Expectation (E)-value cutoff of 1E-6. It is configurable, and accepts several different parameters which can be set to alter the distance calculations and the clustering threshold. Because there was no a priori evidence as to which parameters would yield biologically relevant clusters, we ran the Blastclust software over several iterations, varying two parameters. The L parameter (range: 0.3–0.8) represents the amount of overlap coverage between query and subject, expressed as a ratio. The S parameter (range: 0.7–1.5) is a measure of the information content density of the alignment. As L and S increase, so does the stringency of the match. The analysis presented here used L = 0.3 and S = 0.7. Other protein motifs in WDR-containing proteins were identified using the InterProScan. pl program (Release 3.1) [77] and the Interpro 5.3 database as maintained by the European Bioinformatics Institute, in combination with Pfam Release 7.8 [78] and the PRODOM database (2002.1).

To identify WD motif-containing proteins in S. cerevisiae, D. melanogaster, and H. sapiens, we analyzed previously compiled proteome datasets available from the Saccharomyces Genome Database [79], FlyBase [80], and Ensembl (v. 13.31.1, released Mar 31, 2003)[81] as described above. The sequences utilized can be obtained through HTML links in Table 1 (see Additional file 1]. These sequences were used to query the ATH1. pep dataset using Washington University BLAST (WUBLAST) version 2.0 as maintained by TAIR. An Arabidopsis protein or paralogous group was designated as orthologous if it met the following three criteria: 1) it was the most closely related protein(s) 2) The E value for the match was less than 10E-11, and 3) the protein or all members of the paralogous group were more closely related than the next most significant match by a factor equal to or greater than 10E15.