In April 1996, the completely annotated genome sequence of the yeast Saccharomyces cerevisiae was made publicly available [1, 2], the first eukaryotic genome sequence to be completed. Eight years later, thanks to the united efforts of the large yeast research community and to the unique genetic and physiological properties of yeast, this humble servant of mankind provides by far the best annotated eukaryotic genome [3]. The completeness of the yeast genome sequence has allowed the development of many novel tools for analyzing all molecular components of the cell and their interactions. These tools include three high-throughput collections of mutants that were first produced in 1999 and that have been analyzed in the five years since then. Here, we review the uses of these collections and their contribution to the identification of the components of basic physiological and developmental pathways of S. cerevisiae.

The yeast deletion mutant collection

A set of over 20,000 knockout strains was created by a consortium of European and North American laboratories [4, 5]. The collection currently contains homozygous and heterozygous diploid strains corresponding to deletions of each of 5,916 genes (including 1,159 essential genes) and one haploid strain of each mating type for every non-essential gene (4,757 genes). Each knockout strain is marked by two unique 20-nucleotide 'bar codes', allowing quantitative and qualitative identification by DNA microarray hybridization of each strain in the pools used to assess the strains under different growth conditions (see Figure 1). The original article [4] describing this collection has been cited more then 560 times in the five years since its publication, according to the ISI Web of Science [6]. The complete collection of strains can be obtained at low cost from Euroscarf [7], ATCC [8] and Invitrogen [9].

Figure 1
figure 1

Construction and screening of the yeast deletion strain collection. (a) The cassette used consists of a kanamycin-resistance gene (KanMX4) flanked by two tags (also called barcodes), the UPTAG and the DOWNTAG, which are unique to each gene. The yeast DNA 5' and 3' to the barcodes is homologous to yeast DNA flanking the gene to be deleted. After homologous recombination, the gene is replaced by the cassette sequences, including the barcodes. (b) Screening the deletion strains for differences in fitness under selective conditions. Selection leads to an increase in the proportion of some strains in the culture and a decrease of others; these changes can be detected by probing a microarray containing the sequences complementary to the barcodes. A stronger signal, indicating a higher level of a barcode in the RNA extracted from the culture, shows strains that have increased in frequency after selection. Adapted with permission from [71].

The deletion collection has been used in dozens of novel exhaustive screens for phenotypes that occur under a variety of physiological conditions; these include growth in minimal medium, in high salt and low salt, in galactose or sorbitol, at pH 8, after heat or cold shock, under stress by hydrogen peroxide (all in [10]); growth on non-fermentable carbon substrates [11], in saline conditions [12] or after treatment by ionizing radiation or DNA-damaging agents [1317]; and the collection has also been screened for defects in meiosis, sporulation and germination [18, 19]. This approach has uncovered numerous new putative components of well-known pathways; for instance, the number of genes known to have sporulation or germination phenotypes when deleted has been doubled by these analyses [18].

More sophisticated screens, for example for suppressors of the accumulation of mutations [20], have been developed more recently, as well as screens involving transformation of the deletion strains in order to identify genes needed for non-homologous DNA end-joining [21]. Novel protocols requiring individual transformations of each mutant have allowed the identification of host factors that influence the fate of the Ty family of long-terminal-repeat retrotransposable elements [22] and of genes involved in the unfolded protein response induced by heterologous introduction of mutant human Huntingtin protein or fragments of α-synuclein, both of which form disease-associated aggregates [23]. Similarly, proteins that interfere with the assembly of endoplasmic-reticulum structures termed karmellae, which are induced by elevated levels of HMG-CoA reductase under specific growth and genetic conditions, have been identified using the collection [24]. Several morphological screens have been developed, for example for defects in the selection of the bipolar bud site [25], in cell-size distribution [26, 27], in cell morphology [28] and in meiotic chromosomal segregation [29]. Another approach is the screening of individual colonies. For instance, mis-sorting and secretion of vacuolar carboxypeptidase Y were detected by colony immunoblotting [30]. A second example of colony screening is the transformation of each single-deletion strain to express viral replicase proteins and an RNA replication template in which the capsid gene was replaced by a luciferase reporter gene, which was used to monitor viral expression in yeast colonies [31]. Finally, to identify the genes affecting glycogen storage, the deletion mutant colonies were blotted and stained by iodine vapor; the intensity of coloration allowed assessment of glycogen accumulation [32].

Use of the yeast deletion collection in screens for synthetic lethal mutants and to study drug targets

Synthetic lethality is the phenomenon that occurs when two mutations that are each viable are combined and the double mutant is lethal. A method has been developed for the systematic construction of double mutants, which is called synthetic genetic array analysis (SGA) [33, 34]. Haploid strains with mutations in non-essential genes were crossed to an array of the whole haploid deletion collection; the resulting diploid cells were made to sporulate and the lethal combinations identified, indicating the existence of essential interactions between gene products. In a first SGA screen using eight query genes, 291 interactions among 204 genes were identified [33]. Three years later [34], the search was expanded to 132 query genes, and 4,000 interactions were identified among 1,000 genes with roles in cytoskeletal organization, cell-wall biosynthesis, microtubule-based chromosome segregation and DNA metabolism.

A more recent development of this approach is the use of DNA-DNA hybridization protocols to assess lethality, termed 'synthetic lethality analysis by microarray' (SLAM). In this method, a pool of haploid deletion strains is transformed with a cassette that replaces the gene of interest with either a deletion construct or the wild-type form. Transformants are pooled and genomic DNA is isolated; the barcodes are amplified by PCR and labeled with either Cy3 (green) or Cy5 (red) fluorescent dyes; and hybridization to an array containing all the deletion tags allows identification of the synthetic-lethal combinations (which are missing). The SLAM method has been validated by identifying members of the DNA helicase interaction network [35]. Synthetic genetic arrays have also been used for high-resolution genetic map** of suppressor mutations [36]; this method, termed SGA map** (SGAM), is in principle also applicable to the analysis of multigenic traits.

The yeast deletion collection has also been used to identify members of the pathways modified by more than 25 different chemical ligands. This approach has identified the L-carnitine transporter Agp2p as a novel transporter of bleomycin in yeast, implicating membrane transport as a key determinant of resistance to this widely used anticancer agent [37]. In another study, the transcription factor Rpn4p was shown to compensate for proteasome inhibition by PS-341, a drug that is being studied as a treatment for cancer [38]. Other ligands tested include the phosphatidyl kinase inhibitor wortmannin [62]. Thirdly, Blanc and Adams [63] used mutations resulting from insertion of the Ty1 transposon to identify yeast mutations that generate evolutionarily significant phenotypes by causing small but positive increments of fitness. Finally, an application of transposon mutagenesis called direct allele replacement technology (DART) allows rapid transfer of any insertion allele into any strain [64]. A transposon library consisting of a collection of plasmids containing yeast genomic DNA with transposon insertions is sequenced to identify the exact insertion point in the yeast genomic DNA [64]. After excision from the plasmid, the yeast genomic DNA containing the transposon is used to transform a yeast strain of choice by homologous recombination. The procedure was validated by identification of 29 insertions into 17 genes involved in apical growth [64].

Insertional mutagenesis using transposons has several potential advantages over targeted deletion: for instance, insertion occurs in the non-coding as well as in the coding segments, so regulatory regions and other non-genie regions can be disrupted. Moreover, depending on the site of insertion, transposon mutagenesis may lead to partial loss of function or gain of function and hence to the identification of novel functions that would not be found from studies of complete knockouts of genes [52]. Conditional alleles may be generated, as well as mutants in promoter or terminal regions. Also, apart from phenotype analysis of the mutation, the level of expression of the targeted gene can be measured in vivo and the subcellular localization of its product can be determined. A disadvantage, however, is that transposon insertions are not random, and this method may therefore never cover all the genes in the genome. Also, several of the problems that were mentioned above for the analysis of the yeast deletion collection apply equally to the transposon mutant collections, including the observation that phenotypes are often very strongly background-dependent.

Other genome-wide mutant collections and their uses

Several databases have been developed to catalog the subcellular localizations of yeast proteins as identified by fluorescence microscopy. The yeast protein localization database [65, 66] describes the results obtained using a library of yeast genes fused to a green fluorescent protein (GFP) reporter. The TRIPLES database [55, 56] includes the use of the transposon-insertion libraries to determine protein localization for 5,504 insertions. The yeast GFP fusion localization database [67, 68] presents the localization of 4,156 proteins into 22 distinct subcellular locations, as determined using a library of GFP-tagged proteins compared with reference strains expressing proteins of known localization tagged with red fluorescent protein (the strains used are available from Invitrogen [9]).

In a recent study [69], an exhaustive global analysis of protein expression in yeast was reported. Each open reading frame was marked by an insertion cassette consisting of a modified version of the tandem affinity purification (TAP) tag, a yeast selectable marker to drive homologous recombination, and regions homologous to yeast genes. The level of expression of each protein was determined by very sensitive western-blotting analysis that could detect less than 50 protein molecules per cell. The level of expression of 4,251 gene products was identified in exponential growth conditions [69].

Like the sequencing of the yeast genome, the construction of the single and double deletion libraries and that of the major transposon insertion library has required a considerable investment by the international yeast research community. The three original papers [4, 33, 52] presenting the libraries have a total of 83 authors, funded by public agencies from the USA, from the European Union and from Canada. Was this effort worthwhile?

Over the last five years, the three seminal papers have been cited more than 950 times [6]. We have found over 50 experimental papers reporting on different phenotypic screens carried out with these genome-wide libraries. More than 100 experimental conditions have been tested, representing near a million individual mutant screens. We estimate that more than 5,000 novel phenotypic traits have been assigned to yeast genes of known or unknown molecular functions. This undoubtedly represents a considerable amount of progress towards the ultimate goal of a full description of the functions and interactions of all the molecular components of a basic eukaryotic cell. Several recent improvements of the use of the original genome-wide mutant libraries have been reported, such as the DART [64] and SLAM [35] approaches described above. New genome-wide libraries using more sensitive protein tagging are being developed [69] for global measurements of protein levels or subcellular expression. In all this work, yeast continues to be a very convenient test-bed for the development of novel high-throughput tools that rely on the availability of the complete genome sequence.

But a word of caution is appropriate. As was the case for other high-throughput tools that were also pioneered using yeast, such as two-hybrid protein-interaction screening, DNA-hybridization microarrays and proteomic analyses, the information provided by large screens of genome-wide libraries contains an appreciable number of false-positive and false-negative data points. It is therefore essential to confirm each result by several independent approaches. Ultimately, we will have to return to 'reductionist' biochemical approaches to demonstrate fully the molecular function suggested by a large primary screen.

Finally, there is an urgent need to build databases to collect and organize all the data obtained from all the large screening approaches. The SGD [49] and the Munich information center for protein sequences (MIPS) database [70] are making good progress in this respect. The authors of these databases should, however, be encouraged to further develop procedures that take into account the difficult assessment of the uncertainties associated with much of the data. Here again, yeast is expected to become a pioneer.