Gene organization and evolutionary history

There is little known at present about the organization of the genes encoding F-box proteins, with most studies focusing on the protein products. This article provides an overview of those studies, summarizing the current knowledge about the structure, function and regulation of the F-box proteins.

Classification

The F-box was initially observed as a region of homology among the proteins Cdc4, ß-TrCP, Met30, Scon2, and MD6, all of which contain WD (Trp-Asp) repeats, by Kumar and Paietta in 1995 [1]. The implications of the homology were not appreciated until Bai et al. [2] recognized in 1996 that the F-box was a widespread motif that was required for protein-protein interaction. The name F-box was given by Bai et al. on the basis of the presence of the motif in cyclin F.

The F-box motif itself is generally found in the amino-terminal half of proteins and is often coupled with other motifs in the carboxy-terminal part of the protein, the two most common of which in humans are leucine-rich repeats (LRRs) and WD repeats. The nomenclature for human F-box proteins proposed by the Human Genome Organization follows the pattern proposed by Cenciarelli et al. [3] and Winston et al. [4]: FBXL denotes a protein containing an F-box and LRRs; FBXW denotes a protein with an F-box and WD repeats; and FBXO denotes a protein with an F-box and either another or no other motif. A similar nomenclature is followed in mice, but in other organisms, proteins are not at present named according to the presence of an F-box.

Evolutionary history

There are 11 F-box proteins in the completed Saccharomyces cerevisiae genome, 326 predicted in Caenorhabditis elegans, 22 in Drosophila, and at least 38 in humans (see Table 1 and Additional data file 1), but no known examples of F-box proteins in prokaryotes. F-box proteins contain a wide range of secondary motifs including zinc fingers, cyclin domains, leucine zippers, ring fingers, tetratricopeptide (TPR) repeats, and proline-rich regions. The diversity of associated protein domains suggests that F-box motifs have been transferred into existing proteins multiple times during eukaryotic evolution. Evolutionary constraints are higher for certain classes of F-box proteins: all of the human FBXW or FBXL proteins have counterparts in C. elegans with most also conserved in yeast, but only about half of the human FBXO class of proteins is conserved in nematodes or yeast.

Table 1 F-box proteins in the yeast, nematode, and human genomes

An interesting observation is the huge number of F-box proteins in C. elegans. The F-box motif is the fourth most common protein domain in C. elegans, with their number dwarfing the F-boxes found in other species by a factor often. Over half of the predicted C. elegans F-box proteins (135) are found with another motif known as DUF38 (domain of unknown function 38) or FTH (FOG-2 homology) [5]. The FTH/DUF38 domain is found mostly in nematodes, with none in humans or yeast. A second domain, PfamB-45, is found in another 56 C. elegans F-box proteins. Both of these cases suggest the expansion of single progenitor genes within nematodes.

Characteristic structural features

The F-box motif has approximately 50 residues. As can be seen from the consensus sequence (Figure 1), there are very few invariant positions; the least variable are positions 8 (92% of the 234 F-box proteins used for the consensus have leucine or methionine), 9 (92% proline), 16 (86% isoleucine or valine), 20 (81% leucine or methionine), and 32 (92% serine or cysteine). This lack of a strict consensus makes identification by eye difficult; it is therefore prudent to use search algorithms to detect F-boxes. Currently, the two best search algorithms are found in the Prosite and Pfam databases [6]. Occasionally, one database will give a significant score to an F-box in a given protein when the other does not detect it, so both databases should be searched.

Figure 1
figure 1

The F-box consensus sequence. The consensus was derived from the alignment of 234 sequences used to create the Pfam F-box profile [30]; the single-letter amino-acid code is used. Bold and underlined capital letters signify residues found in over 40% of the F-box sequences; bold, non-underlined, capital letters signify residues found in 20-40% of the F-boxes; bold lower case letters indicate residues found in 15-19% of the F-boxes; and non-bold lower case letters indicate residues found in 10-14% of the F-boxes. A minority of F-boxes contain small insertions in the alignment after positions 11 or 24, or small (1-3 residue gaps) at various locations.

Localization and function

Localization

There have been a limited number of studies analyzing the subcellular localization of F-box proteins, and in all but a couple of cases this analysis was performed with overexpressed tagged proteins (see for example the supplementary material in [3,4]). Some F-box proteins were found to be distributed both in the cytoplasm and in the nucleus. The identical localization of wild-type and mutant F-box proteins demonstrates that the presence of the F-box and the F-box-dependent binding to Skp1 does not determine the subcellular localization of these proteins. While the expression of mRNAs encoding some F-box proteins have been found in all tissues tested, others are clearly tissue-specific. Because of the large number of F-box proteins, this information is too complex to be summarized here.

Function

The F-box motif functions to mediate protein-protein interaction. F-box proteins were first described as components of SCF ubiqutin-ligase (E3) complexes [7,8]. SCF complexes contain four components: Skp1, a cullin, Rbx1/Roc1/Hrt1, and an F-box protein (Figure 2a) [9,10,11]. SCF complexes facilitate interaction between substrates and ubiquitin-conjugating enzymes, which then covalently transfer ubiquitin onto substrates. Poly-ubiquitinated substrates are subsequently degraded by the 26S proteasome [12]. The F-box protein is the subunit of the SCF complex that binds specific substrates, and it links to the complex by binding Skp1 through the F-box itself.

Figure 2
figure 2

F-box protein functions. (a) The SCF complex. The F-box protein is linked to the SCF complex via interaction between the F-box and Skp1. A ubiquitin-conjugating enzyme (Ubc) binds to the SCF complex and transfers ubiquitin (Ub) onto substrates bound by the F-box protein. When the substrate becomes poly-ubiquitinated, it is degraded by the 26S proteasome. (b) Skp1 binds to the F-box of Ctf13, facilitating Ctf13 phosphorylation, which allows Ctf13 to form the structural core of the CBF3 centromere-binding complex. (c) The F-box of Elongin A binds Elongin C (El C). The association of Elongins B and C with A increases Elongin A transcriptional activity. (d) The FOG-2/GLD-1 complex binds the 3' UTR of tra-2 mRNA to translationally repress it. The function of the F-box of FOG-2 is currently unknown. (e) Cyclin F binds to cyclin B1-cdc2 through a direct association of the cyclin F 'cyclin box' with the CRS domain of cyclin B1, and may be required for cyclin B1 nuclear localization. The function of the F-box of cyclin F is unknown. NLS, nuclear localization signal.

In both yeast and human cells, multiple SCF complexes are present that differ only in the F-box protein component. In yeast, there are three characterized SCF complexes: SCFCdc4, SCFMet30, and SCFGrr1, designated according to their F-box-containing component. The ability of the SCF backbone to bind multiple F-box proteins, each with specific substrate specificity, substantially increases the substrate repertoire. The F-box proteins found to function in SCF complexes have so far been those that have WD repeats or LRRs in their carboxyl termini, with substrate binding occurring via those motifs. Interestingly, human FBX04 and FBXO7 have been found to co-immunoprecipitate both with the cullin Cull and with Skp1, and the immunoprecipitates are associated with ubiquitin-ligase activity, suggesting that classes of F-box proteins other than the FBXW and FBXL classes can function in SCF complexes [3].

SCF complexes generally recognize substrates after they are phosphorylated on specific epitopes [10]. Phosphorylation is one of the major mechanisms used by cells to rapidly transduce signals. SCF complexes are therefore ideal for dynamic processes that require an abrupt change to be made irreversible (at least in the short term) via the degradation of key proteins. Examples of such processes are cell-cycle phase transitions - during which the cell-cycle regulators that were required for the previous phase are degraded as the cell enters the new phase - and shifts in transcription that last for a longer time period than otherwise because a transcriptional inhibitor is degraded. There is a wide variety of SCF targets that include cell-cycle regulators, for example, G1-phase cyclins, cyclin-dependent kinase inhibitors, DNA replication factors, and transcription factors that promote cell-cycle progression, as well as non-cell-cycle functions, such as a cytoskeletal regulator, cell-surface receptors, transcription-factor inhibitors, and non-cell-cycle transcription factors (Table 2).

Table 2 F-box proteins that function in SCF complexes

F-box proteins have also been found to function in four other biochemical contexts. First, in yeast, the Ctf13 protein contains a diverged F-box motif that is not picked up by Prosite or Pfam search algorithms, but which has been demonstrated to be required for binding to Skp1 [13]. Ctf13 is an integral component of the CBF3 kinetochore complex, which binds microtubules to the condensed mitotic chromosomes (Figure 2b). Binding of Skp1 facilitates Ctf13 phosphorylation by an unknown kinase, which allows Ctf13 to assemble into the CBF3 complex [13,14]. Complete loss of CTF13 is lethal, while at permissive temperatures, Ctf13 temperature-sensitive mutants missegregate chromosomes [15].

Second, Elongin A, the transcriptionally active subunit of the Elongin (SIII) complex - which facilitates transcription elongation by RNA polymerase II [16] - is an F-box protein (Figure 2c). Elongin A was isolated by virtue of its ability to increase the catalytic rate of transcript elongation by RNA polymerase II in vitro [16]. Binding of the other components of the complex, Elongin B and C, increases the specific activity of Elongin A. The F-box motif of Elongin A is in the smallest region shown to be sufficient for Elongin A to bind Elongin C in both yeast and humans [17,18]. Elongin C has homology to Skp1; the F-box-Elongin C interaction may therefore be evolutionarily conserved.

The third additional biochemical context in which F-boxes are implicated is in C. elegans, where the F-box protein FOG-2, which also contains an FTH/DUF38 motif, forms a complex with the RNA-binding protein GLD-1 through an interaction with the FTH/DUF38 domain and/or sequences carboxy-terminal to it (Figure 2d) [5]. FOG-2 is required for spermatogenesis in C. elegans hermaphrodites. The complex binds the 3' untranslated region of tra-2 mRNA in the germline and inhibits its translation, thereby allowing spermatogenesis to occur [5,19]; the function of the F-box motif in FOG-2 has not been determined.

Finally, in both Xenopus and human, cyclin F has been found to bind cyclin B1 through a direct protein interaction between the cyclin box of cyclin F and the cytoplasmic retention signal (CRS) domain of cyclin B (Figure 2e) [20]. Subcellular mislocalization of cyclin F or cyclin B causes a co-mislocalization of the other cyclin, indicating a strong interaction. The distribution of cyclin B1 changes from cytoplasmic to nuclear during the transition from G2 to M phase, yet neither cyclin B1 nor the associated cdc2 kinase has a nuclear localization signal (NLS). The interaction of cyclin B1 with cyclin F, which has two NLSs, may be important in mediating its nuclear entry. The function of the F-box motif of cyclin F is currently unknown.

Regulation

F-box proteins have been observed to be regulated by several mechanisms and at different levels: for example, synthesis, degradation, and association with SCF components. The three yeast F-box proteins Cdc4, Grr1, and Met30 are intrinsically unstable proteins whose levels do not oscillate during the cell cycle. It appears that they are subjected to ubiquitin-proteasome mediated degradation by an autocatalytic mechanism. Whereas the degradation of Cdc4 and Grr1 is dependent on their abilities to bind Skp1 through their F-boxes [21,22], Met30 seems to be ubiquitinated in a cullin-dependent manner but in an F-box-independent manner [23].

Mammalian Skp2 is degraded by the ubiquitin-proteasome pathway but its expression is mostly regulated at a transcriptional level (A.C. Carrano and M.P., unpublished observations; [24]). The expression of both Skp2 mRNA [24] and Skp2 protein [25] are cell-cycle-regulated, peaking in S phase and declining as cells progress through M phase. In contrast, the expression of the other subunits of the SCFSkp2 ligase complex (Cull, Skp1, and Roc1), as well as its ubiquitin-conjugating enzyme (Ubc3), do not fluctuate through the cell cycle. Thus, although the ubiquitination of Skp2 substrates is regulated by their own phosphorylation, which allows their recognition by Skp2, a second level of control is ensured by the cell-cycle oscillations in Skp2 levels. The only characterized post-translational modification of an F-box protein is phosphorylation of Skp2 on Ser76 by the cyclin A-cdk2 complex [26], but the significance of this modification is currently unknown.

Enforced expression of ß-catenin induces the expression of the F-box protein ß-TrCP [27]. Although ß-catenin can act as a transcriptional regulator, induction of ß-TrCP by ß-catenin is due to a stabilization of ß-TrCP mRNA. As ß-catenin is an SCFß-Trcp substrate, stimulation of ß-TrCP expression by ß-catenin results in an accelerated degradation of ß-catenin itself, suggesting that a negative feedback loop may control the ß-catenin pathway. Finally, the association of Grr1 with Skp1 is regulated by glucose levels [28]. Grr1 is required to transduce the glucose signal to transcriptional regulatory proteins. When glucose levels are high, the post-translational association of Grr1 with Skp1 is markedly increased, and this effect is dependent on the carboxy-terminal region of Grr1.

Frontiers

Currently, the dominant paradigm for F-box proteins is the SCF complex, in which the F-box motif is required to tether the substrate-binding protein to the complex. Much current research is focused on identifying the F-box proteins that function in SCF complexes and the substrates that are bound by each F-box protein. The functions of the majority of F-box proteins are still unknown. Given the structural diversity of the family, it is likely that they will be involved in diverse cellular activities. Determining the enzymatic functions of these uncharacterized proteins will prove to be an important area of future research.

An open question is whether the F-box motif is specific for binding to Skp1 or Skp1-like proteins (for example, Elongin C). There are currently no examples of F-boxes binding other types of proteins. Interestingly, in C. elegans, where there is such a large number of F-box proteins, the ancestral Skp1 gene has also undergone amplification to produce 17 paralogs [29], potentially increasing the number of F-box-binding proteins.

In the four years since the discovery of the F-box, intensive research has illuminated the function of F-box proteins in several cellular settings. They are the critical determinant for controlling SCF substrate selection and are positioned as key regulators in many pathways of cell signaling, transcription, and the cell cycle. It is likely that the currently discovered functions are just the tip of the iceberg and that the range of F-box-dependent process will continue to expand.

Additional data

The following additional data are included with the online version of this article: a text file of the C. elegans F-box proteins with FTH/DUF38, or other motifs.