Keywords

11.1 Introduction

Specific interactions between proteins form the basis of most biological processes, thus the knowledge of an organism’s protein interaction network provides insights into the function(s) of individual proteins, the structure of functional complexes, and eventually, the organization of the entire cell. The Protein-protein interactions (PPIs) can be identified by a multitude of experimental methods. However, a vast majority of the PPIs available today are generated by yeast two-hybrid (Y2H) method and affinity purification or co-fractionation coupled to mass spectrometry (AP-MS) (Kerrien et al. 2012). It is important to note that these methods yield different types of information; Y2H analyses reveal binary interactions, including transient interactions, whereas the AP-MS approaches report multiple interactions connecting all of the proteins in fairly stable complexes. Protein interactome analysis on a genome scale was first achieved by using yeast two-hybrid (Y2H) screens (Ito et al. 2001; Uetz et al. 2000) and next by large-scale mass spectrometric analysis of affinity-purified protein complexes (Gavin et al. 2002; Ho et al. 2002). Here we describe the high-throughput Y2H screening strategies, applied to map high-quality proteome-scale interactome networks of model organisms and pathogenic infectious agents.

11.2 Yeast Two-Hybrid System

The Y2H system is a genetic screening extensively used to identify binary protein–protein interactions in vivo (in yeast cells). The system was originally developed by Stanley Fields in 1989 (Fields and Song 1989). The principle of the assay relied on major discoveries on transcription initiation (Brent and Ptashne 1985). In general the protein domains can be separated and recombined and can retain their properties. In particular, transcription factors can frequently be split into the DNA-binding domain (DBD) and activation domains (ADs). In the two-hybrid system, a DNA-binding domain (in this case, from the yeast Gal4 protein) is fused to a protein generally called bait (“B”) for which one wants to find interacting partners. A transcriptional activation domain (from the yeast Gal4 protein) is then fused to one or more ORFs (preys) (Fig. 11.1). The bait and prey fusion proteins are then co-expressed in the same yeast cell. If, the two proteins bait and prey interact, a transcription factor is reconstituted which in turn activates the reporter gene(s) (Fig. 11.1). The expression of the reporter gene allows the yeast cell to grow under certain conditions. For example, the HIS3 reporter encodes imidazoleglycerolphosphate (IGP) dehydratase, a critical enzyme in histidine biosynthesis. In a screening yeast strain lacking an endogenous copy of HIS3, expression of a HIS3 reporter gene is driven by a promoter that contains a Gal4p-binding site, so the bait protein fusion can bind to it. However, the bait fusion does not contain a transcriptional activation domain it remains inactive. If, a prey protein with an attached activation domain binds to the bait protein, this activation domain can recruit the basal transcription machinery, and expression of the reporter gene ensues. Thus, these cells can now grow in the absence of histidine in the media because they can synthesize their own.

Fig. 11.1
figure 1

Yeast two-hybrid principle: A protein of interest ‘B’ is expressed in yeast as a fusion to a Gal4p DNA-binding domain (DBD, “bait”; circles denote expression plasmids). Another protein or library of proteins of interest ‘ORF’ is fused to Gal4p transcriptional activation domain (AD, “prey”). The two yeast strains are mated to combine the two plasmids expressing bait and prey fusion proteins in the same cell (diploid). If, proteins ‘B’ and ‘ORF’ interact in the resulting diploids cells, they reconstitute a transcription factor which activates a reporter gene (HIS3) and therefore allows the cell to grow on selective synthetic media (media lacking histidine)

11.3 Y2H Applications

Initially, the two-hybrid system was invented to demonstrate the association of two proteins (Fields and Song 1989). Later, it was demonstrated that completely new protein interactions can be identified with this system. Over time, it has become clear that the ability to perform unbiased large-scale library screens is the most powerful application of the system. In recent years, Y2H method has been extensively applied to map high-quality proteome-scale binary interactome networks of server model organism, including human proteome and may pathogenic infectious agents. In a recent study, Rolland et al., published a human interactome map, which is based on a systematic Y2H screening of 13,000 human proteins that uncovered 14,000 PPIs (Rolland et al. 2014), Saccharomyces cerevisiae (Uetz et al. 2000; Yu et al. 2013). These studies have the potential to both fundamentally change our understanding of how pathogens (virus/bacteria) modulate the host proteome and aid the development of countermeasures to control infections/diseases. Likewise, the two-hybrid screens, can also be adapted to a variety of related questions, such as the identification of mutants that avert or advance interactions (Schwartz et al. 1998; Wang et al. 2012), the screening for drugs that affect protein interactions (Vidal and Endoh 1999; Vidal and Legrain 1999), the identification of RNA-binding proteins (SenGupta et al. 1996), or the semiquantitative determination of binding affinities (Estojak et al. 1995). The system can also be exploited to map binding domains (Rain et al. 2001; Ester and Uetz 2008), to study protein folding (Raquet et al. 2001), or to map interactions within a protein complex, for example, spliceosome (Hegele et al. 2012), proteasome (Cagney et al. 2001), flagellum (Rajagopala et al. 2007).

11.4 High-Throughput Yeast Two-Hybrid Screens

11.4.1 Array-Based Screening

In an array screening, a number of pre-defined prey proteins are tested for interactions with a bait protein. Typically the bait protein is expressed in one haploid yeast strain and the prey is expressed in another haploid yeast strain of different mating type (Fig. 11.1). The two strains are then mated so that the two proteins are expressed in the resulting diploid cell (Fig. 11.2). The screenings are done side-by-side under identical conditions with several prey proteins, and negative controls, so they can be well controlled, i.e., compared with control assays. In an array, usually each element (prey) is sequence validated and therefore it is immediately clear which two proteins are interacting when positives are selected. Most importantly, all the assays are done in an ordered array, so that background signals can be easily distinguished from true signals (Fig. 11.2, step 3). However, to perform the array screens the prey library need to be made upfront. This can be done for a subset of genes or for a whole genome (i.e., all ORFs of a genome). The array-based Y2H screenings are ideal for small genomes, for example map** the interactions of phages (Rajagopala et al. 2011), virus (Uetz et al. 2006; Shapira et al. 2009; Khadka et al. 2011) and map** the interactions within a protein complex (Rajagopala et al. 2007, 2012). Hands-on time and the amount of used resources grow exponentially with the number of tested proteins; this is a disadvantage for large genome sizes. However, cloned ORFeome collections of whole genomes become increasingly available for several organisms and modern cloning systems also allow direct transfer of entry clones into many specialized vectors (Walhout et al. 2000). One of the first applications of such clone collections is often a high-throughput protein interaction screening.

Fig. 11.2
figure 2

Array-based yeast two-hybrid screens. Step 1: Yeast mating combines the bait and prey plasmids. The bait (DNA-binding domain (DBD) fusion) liquid culture is pinned onto YEPDA agar plates using a 384-pin pinning tool, and then the prey array (activation-domain (AD) fusion) is pinned on top of the baits using the sterile pinning tool. The mating plates are incubated at 30 °C for 16 h. Step 2: The cells from the yeast mating plates are transferred onto –Leu –Trp medium plates using a sterile 384-pin pinning tool. Only diploid cells will grow on the media lacking leucine and tryptophan agar plates and ensures that both the prey and bait plasmids are combined in the diploid yeast cells. Step 3: The diploid cells are transferred onto –Leu –Trp –His medium plates for protein interaction detection. If the bait and prey proteins interact, and an active transcription factor is reconstituted and transcription of a reporter gene is activated. Thus, the cells can grow on selective media plates

11.4.2 Pooled Array Screening

The pooling strategy has the potential to accelerate screening but require sequencing capacity and/or extensive pairwise Y2H screening. In the pooled array screening, preys of known identity (systematically cloned or sequenced cDNA library clones) are combined and tested as pools against bait strains. The identification of the interacting protein pair commonly requires either sequencing preys in the positive yeast colonies or retesting of all members of the respective pool clones. Prof. Vidal lab at the Dana-Farber Cancer Institute, Boston MA, employed a pooling strategy for several large-scale interactome map** projects (Rolland et al. 2007), and thus usually does not depend on sequencing or a 2nd pairwise retest procedure. Preferably, the preys are pooled rather than baits, because the former do not generally result in self-activation of transcription.

11.4.3 Pooled Library Screening

The pooled library screening strategy significantly accelerates screening but might also have the disadvantage of increasing the number of false negatives and multiple sampling is essential to achieve a reasonable sampling sensitivity rate (Rajagopala et al. 2014; Yu et al. Full size image

11.4.4 Random Library Screening

Random library screens do not require systematic cloning of all prey constructs, however, the prey library must be created. Therefore, the complete DNA sequence of the genome of interest is no prerequisite. Random prey libraries can be made using genomic DNA or cDNA based libraries. For genomic libraries, the genomic DNA of interest is randomly cut, size-selected, and the resulting fragments ligated into one or more two-hybrid prey vector(s). Previous yeast two-hybrid and bacterial two-hybrid screening projects used random genomic DNA libraries (Rain et al. 2001; Joung et al. 2000). A cDNA library is made through reverse transcription of mRNA collected from specific cell types or whole organisms. To simplify the task even more, many cDNA libraries are commercially available. For example, Clontech has a collection of human and tissue-specific cDNA libraries. However the bait clones that need to be screened with a random library need to be made independently.

Similar to pooled library screens, in a random library screen a library of prey proteins is tested for interactions with a bait protein. Similar to pooled library screens the bait protein is expressed in one yeast strain and the prey is expressed in another yeast strain of different mating type. The two strains are then mated so that the two proteins are expressed in the resulting diploid cell. The diploids are plated on interaction selective medium where only yeast cells having bait and its interacting prey will grow. The prey is identified by isolating the prey plasmids, PCR amplification of the insert, and sequencing (Sect. 11.6.11). The major limitation of the random library screening is unavailability of the indusial prey clones to perform pairwise Y2H retest or other validation assays, for example, validate a subset of interactions using orthogonal assays. Thus, evaluating the quality of PPIs relies on computation methods.

11.4.5 Adapting Next-generation Sequencing

The major shortfall of the high-throughput protein-protein interactome datasets is low coverage (Rajagopala et al. 2014; Yu et al. 2005; Margulies et al. 2005) as opposed to Sanger technology, would substantially increase throughput and decrease cost. However, next-generation DNA sequencing technologies are not readily applicable for identification of interacting pairs. Yu et al. describe a massively parallel interactome-map** strategy that incorporates next-generation DNA sequencing and test the strategy in a high-throughput Y2H system (Yu et al. 2011). The methodology, termed Stitch-seq, which used PCR approach to amplify and stitch the bait and prey ORF or cDNA inserts in to a single amplican. Then the PCR products are pooled and sequenced by next-generation DNA sequencing to produce stitched interacting sequence tags. The sequencing produced an average read length of 207 bases, which are 125 bases longer than the 82-bp linker sequence between bait and preys. To identify the ORFs encoding pairs of interacting proteins, they selected reads that contained the linker sequence and also covered at least 15 bases of ORF-specific sequences on both ends of the linker. These reads could unambiguously identify pairs of unique bait and prey ORFs, after matching these sequences to human ORFeome collection used for the study. This general scheme can be readily extended to increase throughput and decrease cost for other interactome-map** methods, particularly for binary protein-protein interaction assays

11.4.6 Analysis of Y2H Data

Analysis of raw results significantly improves the data quality of the protein interaction set. It is important to consider at least the following three parameters to obtain a high-quality Y2H data. Auto-activation: Is the background self-activation strength of the tested bait. The protein interaction strength of interacting pairs must be significantly higher than with all other (background) pairs. Ideally, no activation (i.e. no colony growth) should be observed in non-interacting pairs or vector control. Reproducibility: The protein interactions that are not reproduced in a pairwise retest experiment should be discarded. Sticky preys: For each prey the number of different interacting baits (prey count) is counted; preys interacting with a large number of baits are non-specific (“sticky” preys) and thus may have no biological relevance. The cut-off number depends also on the nature of baits and the number of baits screened: if a large family of related proteins are screened it is expected that many of them find the same prey. As a general guideline the number of baits interacting with a certain prey should not be larger than 5–10 % of the bait number, in an unbiased set of baits or genome-wide screenings. Furthermore, more sophisticated statistical evaluations of the raw can be adapted, i.e., using logistic regression approach that uses statistical and topological descriptors to predict the biological relevance of protein-protein interactions obtained from high-throughput screens as well as integrating known and predicted interactions from a variety of sources (Bader et al. 2004; von Mering et al. 2007).

11.4.7 False Negatives and Multiple-Variants of Y2H System

Although Y2H screens have been among the most powerful methods to detect binary protein-protein interactions, a limitation of the technology is the high incidence of false negative interactions (true interactions that are not detected) which is on the order of 70–90 % (Rajagopala et al. 2014; Yu et al. 2014). Similarly, when subsets of the large-scale human Y2H interactomes were evaluated about 65 % of them could be verified by independent orthogonal methods (Rual et al. 2005; Stelzl et al. 2005).

11.4.9 Integration of AP-MS and Y2H Data

It is important to note that affinity-purification followed by mass spectrometry (AP-MS) derived information about protein complexes does not provide information about the internal topology of multiprotein assemblies. Protein complexes are often interpreted as if the proteins that co-purify are interacting in a particular manner, consistent with either a spoke or matrix model (Goll and Uetz 2006). The yeast two-hybrid and other orthogonal assays detect direct binary interactions. Combination of both methods will give a better picture of protein complex topology and an experimentally derived confidence score for each interaction. In a recent study Rajagopala et al. compiled a list of 227 E. coli protein complexes that have three or more components as identified by AP-MS studies (Rajagopala et al. 2014). They identified the binary interactions between subunits of these complexes using proteome-scale Y2H data set and literature-curated binary interactions. Integrating these two data sets were able to map 745 binary interactions in 203 complexes, which deduce a putative complete internal topology for 15 multiprotein complexes. For another 45 complexes they determined the putative internal topology of a sub-complex with at least three subunits. However, even the combination of both methods is usually not sufficient to establish accurate topology as some interactions may be too weak to be detected individually.

11.4.10 Proteome-Scale Y2H Screening

Making an entire proteome library of an organism that can be screened in vivo under uniform conditions is a challenge. When proteins are screened on a genome scale, automated robotic procedures are necessary. The Y2H screening protocols described here have been extensively tested with human, yeast, bacterial, and viral proteins, but they can be applied to any other genome. Different high-throughput methods used to generate Y2H clones, i.e., proteins with AD fusions (preys) and the DBD fusions (baits), proteome-scale Y2H screening are included below. Usually, the processes starts with construction of the prey and bait libraries (Protocol 11.6.111.6.6); bait auto-activation tests (Protocol 11.6.7) followed by high-throughput array-based Y2H screening (Protocol 11.6.8) or pooled library screening (Protocol 11.6.9) including the selection of positives and identifying the interaction proteins by sequence (Protocol 11.6.11). Finlay, conducting the pairwise Y2H retests (Protocol 11.6.10) to make sure that the interactions are reproducible.