The genetic architecture of protein interaction affinity and specificity | Nature Communications

News

HomeHome / News / The genetic architecture of protein interaction affinity and specificity | Nature Communications

Oct 15, 2024

The genetic architecture of protein interaction affinity and specificity | Nature Communications

Nature Communications volume 15, Article number: 8868 (2024) Cite this article Metrics details The encoding and evolution of specificity and affinity in protein-protein interactions is poorly

Nature Communications volume 15, Article number: 8868 (2024) Cite this article

Metrics details

The encoding and evolution of specificity and affinity in protein-protein interactions is poorly understood. Here, we address this question by quantifying how all mutations in one protein, JUN, alter binding to all other members of a protein family, the 54 human basic leucine zipper transcription factors. We fit a global thermodynamic model to the data to reveal that most affinity changing mutations equally affect JUN’s affinity to all its interaction partners. Mutations that alter binding specificity are relatively rare but distributed throughout the interaction interface. Specificity is determined both by features that promote on-target interactions and by those that prevent off-target interactions. Approximately half of the specificity-defining residues in JUN contribute both to promoting on-target binding and preventing off-target binding. Nearly all specificity-altering mutations in the interaction interface are pleiotropic, also altering affinity to all partners. In contrast, mutations outside the interface can tune global affinity without affecting specificity. Our results reveal the distributed encoding of specificity and affinity in an interaction interface and how coiled-coils provide an elegant solution to the challenge of optimizing both specificity and affinity in a large protein family.

Protein-protein interaction (PPI) networks form the backbone of a cell’s functional organization. Proteins function in crowded cellular environments in which they must bind to specific target proteins but also avoid binding to many other off-target proteins. In large protein families this task is particularly challenging because many off-target proteins have very similar structures. The specificity of PPIs determines the structure of an interaction network and the way it coordinates cellular processes and thus plays an important role in establishing phenotypes. Specificity is determined by protein sequence and mutations that modify specificity rewire cellular networks, perturb the coordination of biological processes and modify organismal phenotypes, which can lead to disease, functional innovation and be subjected to natural selection. Understanding the physicochemical principles that govern how the sequence of a protein determines the specificity of its interactions is thus central to the question of how genotype determines phenotype.

Conserved families of protein interaction domains serve as models to study PPI specificity. One such family is the basic leucine zipper domains (bZIPs). The human genome encodes 54 bZIPs that form homo- and heterodimers that act as important transcription factors, including AP-1 (FOS/JUN), C/EBPs and MAFs. They are amongst the smallest known PPI domains and consist of an N-terminal basic DNA-binding domain of 27 amino acids and a C-terminal zipper domain of 35 amino acids that mediates homo- or heterodimerization via formation of coiled-coils1,2,3 (Fig. 1a). bZIP helices display a periodic pattern of seven amino acids per two turns. The zipper domains are therefore subdivided into five heptads within which the positions are labeled from a to g. Per heptad, side chains adopt the same position relative to the opposite bZIP in the dimer4. Amino acids at positions a and d are typically hydrophobic and form the hydrophobic core of the coiled-coil interface by interacting with the facing amino acid at the same position in the partner, following a characteristic knobs-into-holes pattern that is essential for dimer stability and forms a zipper-like appearance from which its name is derived1,5. Positions e and g are often charged and can establish stabilizing salt bridges between the g residue of one helix and the subsequent e residue of the opposite helix6. Amino acids b, c, and f are solvent exposed in the coiled-coil and therefore often polar4.

a JUN-FOS dimer and heptad structure (pdb: 1fos). b Cloning strategy and BindingPCA. Two intermediate libraries were constructed. 640 variants of JUN were generated using overlap extension PCR and barcoded using random primer overhangs, 54 wild-type (wt) bZIPs were cloned and barcoded using random primer overhangs. The two intermediate libraries were cloned together to juxtapose the JUN variant barcodes with the wt bZIP barcodes. The full libraries were transformed into yeast cells, which were grown in selective medium containing Methotrexate (MTX). Enrichment of interacting pairs was quantified by sequencing barcode counts in INPUT and OUTPUT cultures by Next Generation Sequencing. Binding scores were calculated using DiMSum36 to assess the strength of interaction. c Screening strategy. To identify context-dependent effects of mutations on the interaction between JUN and all 54 wt bZIPs, the effect of every possible single amino acid mutation in JUN was measured. This allowed the identification of determinants of specificity for individual interactions. d Correlation between replicate 1 and 2 binding scores. e Binding scores relative to wt JUN, measured by BindingPCA in yeast (x-axis) vs. measured by splitFAST in human HCT116 cells (y-axis) for 13 variant – partner pairs. Data are presented as means (splitFAST, n = 3 biological replicates) or error-weighted averages (deepPCA, n = 6 biological replicates) with error bars representing the 95% confidence interval. f Binding score distribution. g Binding scores between wild-type JUN and all 52 wild-type partners. Data are presented as error-weighted means (n = 6 biological replicates) with error bars representing the 95% confidence interval.

Mapping bZIP interaction networks has revealed a sparse, modular network, with subfamilies of bZIPs displaying high specificities for one another and low levels of cross-talk7. However, this high level of specificity is achieved despite the simplicity of the coiled-coil interaction interface and the relative sequence similarity across bZIPs. For this reason, studies on bZIP specificity have been conducted since the early 1990s. Initial studies which laid the basis for our current understanding of bZIP specificity focused on exemplary pairs of leucine zippers that served as models. Amongst them the AP-1 dimeric transcription factor and the yeast Gcn4 homodimer6,8,9,10. O’Shea et al. were the first to attempt to pin down effects on specificity on individual positions within FOS and JUN by subdividing them into inside (interface) and outside (solvent exposed) residues and stated that inside residues mainly contribute to specificity. Importantly, electrostatic effects were reported as more important for specificity than hydrophobic ones. These in turn had a stronger influence on stability of the coiled-coil10,11. When comparing the importance of d and a residues, it was shown that while the d position is indeed crucial for dimer stability12,13, the a residue shows more variability and thus contributes to specificity. One important role of the a positions is the prevention of formation of antiparallel coiled-coils14, as well as the mediation of preferential homodimerization13,15. Vinson et al. took a more detailed look at the e and g positions and tried to develop a classification system for preferential dimerization based on complementarity of charges of these positions in two pairs of bZIPs6. However, trying to apply these rules to the design of synthetic bZIPs only yielded limited success and it was shown that there is no general correlation between the electrostatic charge pattern found in a bZIP’s e and g positions and its dimerization preferences. Energetic couplings between interacting pairs16,17, Van-der-Waals interactions and hydrophobic interactions between a and d positions also play important roles13. Arndt et al. used a library-on-library screening approach to derive synthetic heterodimers and also observed that the strongest heterodimeric pairs not only exhibited complementary charges but also that the effect of charges also depends on sequence context18,19.

More recently, studies have also focused on the contributions of solvent-exposed positions to binding in bZIP dimers. The observation that the b, c and f positions can contribute to helical propensity of the dimers led to the hypothesis that it could be possible to independently modulate dimer stability and specificity20,21,22.

A lot of work has also been put into designing synthetic bZIPs displaying unique interaction patterns. This was first done by screening libraries in functional assays that select for specificity and also stability18,19,23,24. Computational models to predict interaction specificities7,25,26,27 and design synthetic bZIPs26,28 were also established. While these prediction models showed good performance when benchmarked against the available in vitro data7,29, applying them to design synthetic bZIPs resulted in substantial cross-talk with other bZIP family members when tested against human bZIPs, especially within the same subfamily. This indicates that even though the basic rules of specificity are well established, there is a deeper, sequence context-dependent encoding of specificity that previous experimental strategies could not capture.

Determining general, family-wide rules requires broader approaches where the effects of sequence variation are quantified not only on binding to one or a few known interaction partners but to all family members. We previously developed protein fragment complementation (PCA)-based deep mutational scanning assays to quantify the effects of thousands of variants of two interacting proteins (Fig. 1b). These assays, which we collectively term deepPCA, are based on a split DHFR system whereby the proteins of interest are fused to complementary fragments of a murine DHFR variant (respectively called DH and FR) that confers resistance to methotrexate30. When the two proteins interact, the two fragments complement each other and the reconstituted DHFR activity sustains yeast growth in the presence of methotrexate, an inhibitor of the endogenous yeast DHFR. These assays are highly quantitative and have a large dynamic range31,32,33,34. In one deepPCA implementation, BindingPCA (bPCA), two proteins are expressed as DH and FR fusions from the same plasmid, enabling library-on-library screening in a single pool format (Fig. 1b, left). Yeast cells expressing pairs of proteins (wild-type or mutant) that form stronger interactions are enriched in the population while those forming weaker interactions are depleted. Deep sequencing then allows variants to be identified and the frequency of each variant or variant pair before and after growth in methotrexate to be quantified. The resulting enrichment scores are proportional to dimer concentration within the dynamic range of the assay and can be used to infer the underlying causal changes in binding free energy by fitting thermodynamic models to this data. bPCA thus provides a very high-throughput method to perform biophysical measurements33,34,35.

We previously employed bPCA to measure the effects of single and double mutants on the formation of the dimer between JUN and FOS bZIPs33. We showed that fitting a two-state thermodynamic model to the data improved our ability to predict how pairs of mutations combine and provided mechanistic insight into how mutations alter binding affinity. Our data showed that most combinations of genetic variants have additive effects on the free energy of binding that result in non-additive changes in dimer concentration because of the sigmoidal relationship between binding free energy and complex concentration. Double mutants not well predicted by the model identify energetically-coupled mutations, which we found to be enriched in directly physically contacting residues.

Here, we use bPCA to comprehensively quantify how single amino acid substitutions in JUN affect its binding to all 54 human bZIPs (Fig. 1c). The resulting data provide a complete view of how changes in the sequence of a protein alter the specificity of its binding across an entire protein family.

To quantify the impact of substitutions on the binding specificity of JUN, we mutated each of 32 positions in the bZIP domain to every possible amino acid and quantified binding to all 54 human bZIPs and 19 negative controls (protein domains not expected to interact with JUN, such as BRCA1 or HRAS; Supplementary Data 1) using bPCA.

We improved bPCA by enabling the use of random DNA barcodes (Fig. 1b, left), which can be sequenced with shorter read lengths that are robust to sequencing errors, including those induced by template-switching (see methods). We constructed an intermediate library of JUN variants by overlap-extension PCR with NNS primers for each of the residues to be mutated and cloned them as FR fusions. Random barcodes were then added to each library, and the barcodes associated to each variant determined by deep sequencing (Supplementary Data 2). We successfully obtained 616 out of the 641 total possible amino acid variants (96%; 32 positions x 20 substitutions, including stops, plus the wild-type variant), with a median of 21 different barcodes per substitution. The 54 human bZIPs were cloned as DH fusions, barcoded and verified by sequencing (Supplementary Data 1). The final library was then assembled in a single pot reaction by combining the DH and FR intermediate libraries, juxtaposing the barcodes associated with the DH and FR fusions for efficient variant-bZIP identification by deep sequencing (Fig. 1b).

Binding of the JUN variants to all human bZIPs was assayed in six replicates and binding fitness scores and errors for each variant–partner pair calculated using DiMSum36 after collapsing read counts across barcodes associated to the same pair. Interaction scores were normalized to the wild-type binding of JUN (Supplementary Data 3). After quality control, we were able to quantify the interaction of 26,648 out of the 34,614 expected pairs (Supplementary Fig. 1), including 580 unique variants and 52 out of the 54 wild-type bZIPs (excluding CEBPG and CREB1), with an average coverage of 443 reads per pair across input replicates. Binding scores are highly reproducible, with an average Pearson correlation between replicates of 0.93 (Fig. 1d, Supplementary Fig. 2). The use of barcodes as a sequencing read-out does not impact quantification as shown by the good correlation between 562 binding scores with FOS measured here and those measured in a previous deepPCA study not based on barcodes sequencing (Pearson’s R = 0.93, p < 2.2e−16, Supplementary Fig. 2). Using the 19 negative controls as true negatives and the bZIP partners reported in Biogrid37 as true positives, we computed an area under the receiver operating characteristic curve of 0.89, demonstrating the sensitivity and specificity of our binding measurements (Supplementary Fig. 4). We also confirmed 21 measurements by an orthogonal split fluorescent protein assay, splitFAST38 performed in human colorectal cancer cells, HCT116 (Pearson’s R = 0.79, p = 1.72e−5 for absolute binding scores and R = 0.89, p = 5.3e−5 and n = 13 for binding scores relative to wild-type, Fig. 1e, Supplementary Fig. 5 and Supplementary Data 4 and 5).

The distribution of binding scores is bimodal (Fig. 1f), with a smaller number of variant-partner pairs in the high binding score mode, reflecting the relatively small fraction of high affinity interaction partners that wild-type JUN has7,29 (Fig. 1g). As previously reported, wild-type JUN forms strong interactions with many members of the AP-1 family including BATF1-37, the FOS subfamily8, the JUN-subfamily8, JDP239, ATF38, ATF48 and ATF77. We also detected previously reported interactions with CREB5 and DDIT37,29 as well as an interaction with XBP1, a regulator of the unfolded protein response40 (Supplementary Data 3 and 4).

In order to compare the effect of the same mutation with different partners, we normalized each measurement by the interaction between the corresponding human bZIP and the JUN wild-type allele. Figure 2a shows the relative effects of all JUN mutants on the interaction with all bZIPs. The partners are ordered according to their binding score with wild-type JUN, revealing a general pattern of mutational effects that is highly similar across all JUN interaction partners. This pattern identifies positions that are highly sensitive to mutation, such as the a and d positions of all heptads, consistent with them forming the hydrophobic core of the interaction interface3. These mutations affect binding to all of JUN’s partners in a similar way, i.e., they affect the intrinsic capacity of JUN to dimerize with other bZIPs. For example, replacing one of the core leucines that define bZIPs by a charged amino acid disrupts JUN’s ability to form bZIP dimers regardless of the partner (Fig. 2a, Supplementary Fig. 6).

a Heatmap of binding scores relative to the corresponding interaction between the wild-type (wt) partner and wt JUN. Rows are ordered by decreasing strength of the wt interaction. Substituent amino acids at each position are ordered alphabetically. b Binding score profile of four selected partners, showing a non-linear trend between mutations that correspond to general effects of mutation. All correlations are shown in Supplementary Fig. 6. c The non-linearity trend between mutational profiles is easily explained by the non-linearity between ΔG and binding scores if mutations have a general effect at the energetic level that is independent of the partner. The dark dot represents wt JUN, and dots of other colors represent different mutants. For the interaction with different partners, these mutants have the same ΔΔG relative to wt JUN, but different effects on binding. This is due to the non-linearity of the thermodynamic function and the starting position of the wt interaction with different partners on this function. When plotted against each other, the binding effects between different partners thus result in the non-linear trend observed in (b).

Although most of the mutational effects appear qualitatively as having a general effect on JUN’s ability to dimerize, some mutations appear to more strongly affect the interaction with some partners. These represent cases of specific effects. For instance, several substitutions at position 3a (position a in heptad 3) increase the interaction of JUN with members of the MAF subfamily (Fig. 2a, Supplementary Fig. 6).

Plotting the mutational effects profiles across partners also reveals a strong trend affecting all partners, showing that most mutations have general effects on JUN’s ability to form dimers with all its partners (Fig. 2b, Supplementary Fig. 7). The non-linearity of these relationships is reminiscent of global effects due to the thermodynamics of binding to partners with different wild-type affinities (Fig. 2c). Some mutations, however, change binding to particular interaction partners such that their binding score is located away from the general trend. These residuals represent mutational effects that are specific to these partners. To formally quantify these changes in binding specificity we therefore used modeling to decompose mutational effects into their general and specific components while accounting for the non-linearity imposed by the thermodynamics of binding.

The thermodynamics of leucine zippers binding involves triggering motifs that nucleate helix formation in monomers, which then completely fold upon binding41,42. However, it has been shown to be well approximated by a simple two-state model between soluble unfolded and unbound monomers folded and bound dimers5. Using MoCHI34,35, we therefore fitted a two-state thermodynamic model that infers a single additive binding Gibbs free energy change (ΔΔG) for each of the 590 available JUN mutations and 51 additive binding free energies for each partner bZIP (relative to FOS as an arbitrary reference, see methods, Fig. 3a). A single ΔΔG, which we refer to as the change in global binding energy, is fitted for each variant or partner across the entire dataset. MoCHI assumes additivity for all changes in free energy (ΔΔG) and a linear relationship between the fraction of bound molecules and the binding scores quantified by bPCA32,33,34. Therefore, the fitted variant ΔΔGs capture the general effects of mutations across all partners. The residuals between the experimentally measured effects of mutations and the model-predicted values then quantify the specific effects of mutations with each binding partner.

a MoCHI34,35 fits a simple two-state thermodynamic model between unbound, unfolded and bound, folded states. The total ΔG corresponds to the sum of the ΔG of the reference protein interaction, arbitrarily chosen as wild-type (wt) JUNxFOS, the ΔΔG of the mutation relative to wt JUN, and the ΔΔG of swapping FOS for another bZIP. MoCHI uses a one-hot encoded sequence representing which substitution at which position in JUN is present and a dummy sequence representing which bZIP partner is present. It then fits the reference ΔG, the mutant ΔΔG and the partner ΔΔG and sum them up, and in parallel a multiplicative parameter for each partner that corrects a small bias in the distribution of residuals due to different curvature of the sigmoid for different partners and that relates to a Hill coefficient (see methods). b Plot of observed binding scores versus the ones predicted by the thermodynamic model. c ΔΔGs inferred by MoCHI correlate well with those determined by in vitro FRET29 whether JUN was used as FRET donor or acceptor (n = 34).

The additive MoCHI energy model predicts binding scores with high accuracy (R2 = 0.94, Fig. 3b, Supplementary Data 3), and the partner ΔΔGs correlate well with independently-measured in vitro binding energies29 (Pearson’s R = 0.78 and 0.85 when JUN is labeled either as donor or acceptor in fluorescence resonance energy transfer (FRET) experiments, respectively, n = 34; Fig. 3c), confirming the validity of our approach and the use of a two-state model. This high correlation shows that our assay covers a similar dynamic range as the in vitro FRET assay, approximately from 1 nM to 5000 nM. However, some well-known JUN binding partners were not detected by the in vitro FRET assay, for instance JUNB43 or the JUN homodimer44, suggesting these correlations may underestimate the agreement of MoCHI’s ΔΔGs with the ground truth. The high correlation between the measured binding scores and those predicted using an additive energy model in which the effects of mutations are the same for all interaction partners shows that the vast majority of measured changes in binding are independent of the interaction partner’s identity and that context-dependent, specific effects are rare.

Global changes in binding could in part be caused by changes in the total concentration of JUN. We therefore employed a highly validated assay, AbundancePCA, which leverages the latent affinity between an over-expressed free DHFR fragment for the complementary fragment fused to JUN variants to read-out their abundance34. We measured the abundance of all JUN variants and found that changes in binding and global binding energy cannot be explained by changes in JUN concentration (Supplementary Fig. 8).

To better understand the underlying mechanistic causes of changes in global binding energy, we used linear regression to predict the fitted ΔΔGs from a compilation of ~500 physicochemical properties of the 20 natural amino acids45, which heptad is mutated and the position within the heptad. The five top features together explain 68% of the variance in non-specific effects (Fig. 4a, b, Supplementary Table 1), with the top feature related to α-helical stability46, the two features encoding heptad number and position within the heptad, and the last two features related to side chain hydrophobicity47 and standard entropy48 (related to number of possible conformations), respectively. Thus, mutations decreasing the stability of JUN’s helix and/or its hydrophobicity tend to reduce its intrinsic ability to interact with other bZIPs, regardless of the partner2,10,49.

a Cumulative proportion of variance in mutant ΔΔG explained by the five best features, including three physicochemical properties of amino acids related to alpha-helical stability, hydrophobicity and conformation. b Mutant ΔΔG predicted by amino acid features. X-axis corresponds to the ΔΔG predicted by the five top features represented in (a) and y-axis corresponds to the ΔΔG fitted by MoCHI. Mutant names are colored by their heptad position (a, black; b, red; c, green; d, blue; e, cyan; f, pink; g, yellow). c Proportion of variance in mutant ΔΔG explained by the top feature at each of the seven heptad positions. Proportion of variance in binding scores explained by global binding energy at each of the seven heptad positions (d) and each of the 32 mutated positions (e).

Because of the importance of position in the model, we next looked at the most predictive feature at each of the seven heptad positions independently (Fig. 4c, Supplementary Table 2). Far-side positions b, c and f were best predicted (best feature explaining 71, 66 and 72% of the variance, respectively)50,51, followed by salt-bridge positions e and g (52 and 45% of the variance, respectively)52,53, and hydrophobic core positions last (39 and 36% of the variance explained for a and d, respectively)46,54. The top features at all positions relate to the equilibrium between a disordered peptide and a stable α-helix in the context of leucine zippers.

A substantial part of the changes in global binding energy inferred by the model therefore reflects the general ‘zipper propensity’ of a sequence i.e., the baseline ability to form a coiled-coil and dimerize with other bZIPs. This goes along with previous knowledge stating that higher helical propensity results in higher stability of coiled-coils21.

We compared the variance in binding explained at each of the seven heptad positions by changes in the global binding energy. Positions on the far side of the helix (b, c and f) have the highest proportion of variance explained by global binding energy while hydrophobic core positions (a and d) have the lowest (Fig. 4d, e). This is in agreement with core positions making direct contacts with binding partners3.

Plotting the binding scores against the global binding energies for each bZIP partner (Fig. 5a, Supplementary Fig. 9) reveals that global binding energies predict the binding fitness estimates very well for all bZIPs except ATF4 (which is removed from subsequent analyses, see methods). However, for each bZIP, some changes in binding are not well predicted by change in global binding energy, with large residuals from the predicted binding. These residuals quantify changes in binding specificity, i.e., mutational effects that are specific to one or a subset of interaction partners. Inspection of these plots suggests these specificity-altering mutations are enriched in specific positions and for particular types of substitutions for each interaction partner (Fig. 5a).

a, b Specifcity plots representing the binding scores as a function of the global binding energy (ΔΔG) inferred by MoCHI. The sigmoid curve represents binding scores predicted by MoCHI, and the vertical residual from the curve thus represent the specific effect of this mutation on the interaction with this partner. The three characters in the mutant name correspond to heptad, position within the heptad and amino acid substitution, respectively. Mutant names are colored according to their heptad position, which further highlights that several mutations at the same few positions have specific effects (a, black; b, red; c, green; d, blue; e, cyan; f, pink; g, yellow). Examples with the four partners presented in Fig. 2B (a) and general principle of the decomposition of specific and global effects (i.e., effects due to change in global energy) (b) (WT = wilde-type). c distribution of mutational effects due to changes in global binding energy (top) and specificity (bottom). Plot of mutational effects as a function of their specificity (d) and global binding energy (e) components. Each point represents a different variant-partner pair.

The energy model therefore allows us to quantitatively decompose the measured change in binding caused by each mutation for each interaction partner into global and specific components. We define the global component of mutational effects as the difference between the mutant binding score predicted by the thermodynamic model and the wild-type binding score, and the specific component as the difference between the measured mutant binding score and the predicted one (Fig. 5b). The distributions of these effects are rather different, with global effects having a long negative tail and specificity changes having a more pronounced positive tail (Fig. 5c). Out of all variant-partner pairs tested, 13,331 interactions are altered by a change in global binding energy (global effect ≠ 0; FDR < 5%), while only 207 are altered by a change in specificity (specific effect ≠ 0; FDR < 5%). In summary, mutations are much more likely to affect JUN’s affinity globally than specifically.

Plotting changes in binding as a combination of changes in specificity and global energy changes reveals that decreased binding is mostly caused by changes in global energy whereas increased binding can be achieved by changes in both specificity and global energy (Fig. 5d, e).

Clustering the specificity profile of each bZIP reveals that members of the same subfamily tend to cluster together, especially for those that interact strongly with JUN, and identifies the wild-type positions important to establish specific interactions with a given subfamily (Fig. 6a, Supplementary Figs. 10 and 11). We next quantitatively characterize these determinants of specificity and how they are distributed throughout the bZIP interface.

a Heatmap of specific mutational effects revealing clusters of mutations at a few positions that affect some partners specifically. Columns are clustered according to each partners specificity profile. Substituent amino acids at each position are ordered alphabetically. b Summary of positions involved in determining JUN’s specificity. Positions were labeled as a positive or negative determinant of specificity if it had at least one significant specific mutational effect with one partner (FDR < 5%, specific effect < −0.5 or > 0.5, respectively). c Relationship between the specific and global binding energy components of mutational effects. Each point represents a different variant-partner pair and is colored according to its mutational effect relative to the binding score of the corresponding wild-type interaction (FDR < 5%). d Changes in global binding energy for different bins of specific effects, showing the pleiotropic nature of specificity-changing mutations (FDR < 5%).

Binding specificity can potentially be determined via three different strategies. Wild-type residues can be positive determinants of specificity that increase affinity to one or a subset of bZIPs, negative determinants that decrease affinity to one or a subset of bZIPs, or dual determinants that both promote on-target binding and prevent specific off-target binding55,56,57. Our comprehensive mutagenesis and mutational effect decomposition allows us to quantify the relative prevalence of each strategy in a large protein family.

When mutated, positive determinants of specificity result in reduced binding to only some interaction partners. Conversely, mutation of negative determinants leads to increases in binding to only some interaction partners. Mutating dual determinants of specificity leads to positive effects with some partners and negative effects with others.

Out of the 97 variant-bZIP combinations with strong changes in specificity (absolute specific effect > 0.5, FDR < 5%), 80 increase binding and 17 decrease binding (82.5% and 17.5%, respectively; Supplementary Figs. 12a and 13a). These represent a total of 56 unique mutations out of the 579 tested (9.7%), with 41 only increasing binding, 11 only decreasing binding and four both increasing and decreasing binding depending on the partner (Supplementary Figs. 12b and 13a). 16 partners show increased binding, with an average of 4.6 mutations per partner increasing binding. Only five partners show decreased binding, with four of them affected by a single mutation and eight mutations affecting JUN binding with itself. Dual mutations affect nine partners, with an average of 1.33 dual mutations per partner (Supplementary Figs. 12c and 13c).

However, some mutations strictly increasing or decreasing binding are found at the same position, such that the wild-type residue can be considered a dual determinant of specificity, with different mutations having different directions of effect. Negative determinants of specificity thus correspond to positions where mutations only increase binding specifically and positive determinants of specificity correspond to positions where mutations only decrease binding specifically. Out of the 32 tested positions in the zipper domain, seven are dual determinants of specificity, five are negative determinants and two are positive determinants (Fig. 6b).

The interaction specificity of JUN is thus defined by a mixture of negative and positive specificity determinants, with half of the specificity determining positions acting both negatively and positively to encode specificity.

We next examined where in the JUN sequence the determinants of specificity are located. Across the five heptads, residues in five out of seven positions (a, d, c, e, g) contribute to specificity. Position a in all five heptads is a dual determinant of specificity, as is position 1e and 4d (Fig. 6b). Negative determinants of specificity are found at position 1g, 2d, 4e, 4g and 5d, while positive determinants are found at position 1c and 1d. The strictly positive determinants of specificity are thus more biased towards the N-terminal region of the leucine zipper while the strictly negative determinants are more biased towards the C-terminal region.

Interestingly, three out of the four salt-bridge positions that contribute to specificity act negatively, with the other being a dual determinant. JUN therefore employs electrostatic interactions mostly in a repulsive rather than attractive manner to determine specificity. A role for electrostatic repulsion in preventing homodimerization has already been described for FOS10.

We next examined whether changes in specificity also affect the global binding energy. Mutations that alter specificity might have no effect on the global binding energy (if specificity and global binding energy are orthogonally encoded) or mutations causing specificity changes may have pleiotropic effects and synergistically or antagonistically alter affinity via changes in both specificity and global binding energy.

We find that in JUN 50 out of 56 (89.3%) unique mutations that change specificity to at least one partner also change the global binding energy (Fig. 6c, d). In nine cases, changes in specificity and global binding energy have synergistic effects on affinity (all negative) and in 37 cases changes in specificity and global binding energy have antagonistic effects on affinity. Out of these, a single mutation decreases binding specifically but increases global affinity, while 36 increase binding specifically but decrease global affinity. The last four mutations can both increase or decrease binding specifically, depending on the interaction partner and in all cases decrease gobal affinity (Fig. 6c, d).

We conclude that nearly all mutations affecting specificity are pleiotropic and also alter binding affinity to all other interaction partners. However, the reverse is not true and the vast majority of mutations that alter the global binding energy do so without changing the binding specificity.

Core positions a and d are important for establishing specificity, with strong specific effects for all a positions across the five heptads and for d positions in the extremity heptads 1 and 5 (Fig. 6a). This somewhat contradicts previous statements that d positions are more relevant for coiled-coil stability while a positions are the main determinants of specificity12,13,15. Mutations at position 1d have a relatively small absolute effect on interaction with wild-type partners compared to other d positions (Fig. 2a). For most of JUN’s wild-type partners, two of the surrounding positions in the hydrophobic core, 1a and 2a, are bulky or hydrophilic (Fig. 7a, left). The hydrophobic core in heptad 1 is thus already disrupted so that substituting the 1d leucine cannot strongly disrupt it further. However, JUN subfamily members have hydrophobic residues at 1a and 2a (Fig. 7a, right), making heptad 1 important for binding with these partners. Mutations in JUN positions 1a and 1d therefore disrupt the hydrophobic core and have specific effects on the interactions with the JUN subfamily.

a–c Structural insights into a few examples of determinants of specificity. All structures were predicted using AlphaFold264 since crystal structures are not available for many of the dimers. d Distribution of affinity-only and pleiotropic positions within the bZIP interface. Crystal structure of the JUN-FOS dimer68. e zig-zag model of bZIP evolution. Because of the pleiotropic antagonistic nature of nearly all mutations changing specificity, evolution of specificity requires compensatory mutations at the far side increasing affinity globally by changing global binding energy only.

Mutations at positions 5d generally weaken the interaction with all partners, as expected for a core leucine15, but they do so less than expected for the BATF subfamily as well as for DDIT3 and XBP1, even in some cases making the interaction stronger than with wild-type JUN (Fig. 6a). BATF and BATF2 have bulky aromatic residues at the facing 5d position (Fig. 7b, left), which could explain why mutating JUN towards alanine, valine and isoleucine, all less bulky than leucine (isoleucine has the same number of carbon atoms, but one is closer to the backbone than in leucine), are amongst the most positive residuals for these partners. For interaction with DDIT3 and XBP1, nearly all 5d substitutions have positive specific effects, with a large fraction increasing the interaction with XBP1 compared to wild-type JUN (Fig. 6a). The presence of a threonine at position 5d in DDIT3 (Fig. 7b, right) and at position 5a in XBP1 suggests a weakened hydrophobic core in this last heptad, such that mutating the leucine has a reduced impact on the contribution of this heptad's hydrophobic core to binding compared to other partners that have a 5d leucine.

Another interesting position is 3a. The wild-type asparagine at position 3a is involved in the specificity of the interaction with the FOS-subfamily since some mutations decrease binding specifically. However, asparagine is not the optimal residue for FOS subfamily binding with mutations to valine and cysteine increasing binding. However, although valine is the optimal amino acid at this site for FOS subfamily-binding it also increases binding with the MAF subfamily (Fig. 6a). MAF bZIPs have a complementary valine at the facing 3a position. Together with its function in preventing anti-parallel coiled-coils14, asparagine at position 3a may thus reflect an evolutionary trade-off between promoting the interaction with the FOS subfamily and avoiding strong interactions with the MAF subfamily.

We have presented here a comprehensive quantification of the effects of mutations in a protein on its binding to possible interaction partners from a protein family. Our data quantifies the binding of JUN mutants to 52 of the 54 members of the human bZIP family of proteins: a total of 26,648 measurements. Fitting a global thermodynamic model to this data allowed us to deconvolve the effects of mutations on binding specificity from changes in the propensity of JUN to bind to all its interaction partners (Figs. 2a, b and 5a, b).

Our data show that mutations overwhelmingly affect JUN’s binding to interaction partners non-specifically (Fig. 5c, e). Mutations in all 32 tested positions can alter this global binding energy, which, at least in part, relates to both the helical propensity of the JUN sequence and the hydrophobicity of the interface (Fig. 4).

Mutations that alter the specificity of binding are rarer (Fig. 5c), but they are also distributed throughout the protein interface (Fig. 6a, b and 7e). Indeed, mutations in nearly half of amino acid positions alter JUN’s binding specificity. The core interface residues a and d are particularly important determinants of specificity. The binding specificity of JUN is defined by a mixture of both positive determinants that increase the affinity for target proteins and negative determinants that reduce binding to off-target proteins. Whilst the positive determinants are enriched in the C-terminus, the negative determinants are more abundant in the N-terminus of the zipper. The negative specificity determinants are both larger in number and individually of larger effect (Fig. 5c, Supplementary Figs. 12 and 13). Preventing binding, for instance through repulsive electrostatic interactions, is therefore a more important strategy for defining specificity than promoting binding, which has also been suggested as a mechanism of preferential dimerization for FOS10. However, half of the specificity-defining residues serve as both positive and negative determinants of JUN’s interaction profile. Specificity is thus encoded in a delocalized manner and by a mixture of positive and negative interactions, with many residues acting as dual specificity determinants.

Specificity and affinity in JUN are, however, not encoded by orthogonal genetic variables. Our data show that it is difficult to change the specificity of JUN without also altering its affinity for all interaction partners (Fig. 6c, d). Put another way, specificity-altering mutations are nearly always pleiotropic and also change the affinity of the protein for all of its interaction partners. During evolution, therefore, changes in specificity will rarely occur without changes in affinity to all interaction partners. Specificity changes will therefore normally have to be accompanied by compensatory mutations that counteract the change in affinity without altering specificity.

However, although changes in specificity nearly always also involve changes in affinity to all binding partners, the reverse is not true and there are many mutations in many positions that can alter the affinity of JUN for all interaction partners without altering its specificity. The modular design of coiled-coils thus provides an elegant solution to the problem of independently tuning specificity and affinity in a large protein family, allowing both properties to be optimized. Whereas mutations in the interface affect both specificity and global affinity, mutations outside the interface alter the general zipper propensity and so affinity without affecting specificity (Fig. 7d). This is most likely thanks to the helical structure of the bZIPs in which surface residues can impact protein stability by increasing helix propensity21. This feature that has previously been used to optimize stability of synthetic coiled-coils20,22 but its role in determining specificity within the human bZIP interaction network has not been addressed systematically. We propose that specificity can be optimized during evolution using combinations of pleiotropic specificity and affinity altering mutations in the interface coupled to compensatory affinity tuning mutations elsewhere in the zipper (Fig. 7e). For instance, a mutation that increases binding specifically to one partner but decreases binding globally could be compensated for by a mutation that restores global binding to the previous level, with a net effect of only increasing binding to the specific partner. We refer to this model as the 'zig-zag model' of specificity and affinity optimization. The zig-zag model could be further tested by using phylogenetic methods to reconstruct ancestral leucine zipper sequences and experimentally quantifying the changes in binding caused by consecutive substitutions.

Our results concern a particular protein – JUN – from a particular family of proteins – basic leucine zippers – and a particular class of protein interactions – coiled-coil interactions. In future work it will be important to use similar experimental designs to better understand the encoding of affinity and specificity in other protein families. Pioneering studies have already begun to address these questions for other structural classes of protein interactions58,59,60 and we believe that a concerted effort to mutagenize the diversity of domain-peptide and domain-domain interactions that constitute the interactomes of every species will allow us to better understand, predict and engineer affinity and specificity from sequence.

All plasmids and strains are available upon request. All oligonucleotides, commercially available reagents and kit used in this study are listed in Supplementary Data 6.

All experiments were performed in BY4742 (MATα his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0).

LB: 25 g/L Luria-Broth-Base (Invitrogen™). Autoclaved 20 min at 121 °C.

LB-agar with 2X Ampicillin (100 μg/mL): 25 g/L Luria-Broth-Base (Invitrogen™), 7.5 g/L Agar, 1.2 g/L MgSO4 · H2O. Autoclaved 20 min at 121 °C. Cool-down to 45 °C. Addition of 100 mg/L Ampicillin.

YPAD: 20 g/L Bacto-Peptone, 20 g/L Dextrose, 10 g/L Yeast extract, 25 mg/L Adenine. Filter-sterilized (Millipore Express ®PLUS 0.22 μm PES, Merck, Darmstadt, Germany).

SC-ura: 6.7 g/L Yeast nitrogen base without amino acids, 20 g/L glucose, 0.77 g/L complete supplement mixture drop-out without uracil. Filter-sterilized (Millipore Express ®PLUS 0.22 μm PES, Merck, Darmstadt, Germany).

SC-ura/ade/met: 6.7 g/L Yeast nitrogen base without amino acids and folic acid, 20 g/L glucose, 0.74 g/L complete supplement mixture drop-out without uracil, adenine and methionine. Filter-sterilized (Millipore Express ®PLUS 0.22 μm PES, Merck, Darmstadt, Germany).

SORB: 1 M sorbitol, 100 mM LiOAc, 10 mM Tris-HCl pH 8.0, 1 mM EDTA pH 8.0. Filter-sterilized (Millipore Express ®PLUS 0.22 μm PES, Merck, Darmstadt, Germany).

Plate mixture: 40 % PEG3350, 100 mM LiOAc, 10 mM Tris-HCL pH 8.0, 1 mM EDTA pH 8.0. Filter-sterilized (Millipore Express ®PLUS 0.22 μm PES, Merck, Darmstadt, Germany).

Recovery medium: YPAD + 0.5 M sorbitol. Filter-sterilized (Millipore Express ®PLUS 0.22 μm PES, Merck, Darmstadt, Germany).

Competition medium: SC-ura/ade/met + 200 μg/mL methotrexate (BioShop Canada Inc., Canada), 2 % DMSO.

DTT buffer: 0.1 M EDTA-KOH pH7.5, 10 mM DTT.

Zymolyase buffer: 20 mM K-phoshpate pH 7.2, 1.2 M sorbitol, 0.4 mg/mL Zymolyase 20 T (amsbio, USbiological), 100 μg/mL RNAse A.

Two intermediate plasmids were constructed, one carrying the DHFR-bait cassette, one carrying the DHFR-prey cassette. For the bait intermediate plasmid, a gene fragment carrying the CYC promoter, a 3 x Myc-tag fused to a DHFR N-terminal half (from here on referred to as DH-tag), a Linker sequence (Linker L3, GSAGASAGGSGSAGSASGGS), a 34 bp random spacer sequence, a 20 bp placeholder barcode sequence (GGCACTGTAGTCGATAGCCT; bait barcode) and an SP1 Illumina primer binding site (Illumina, San Diego, CA) was cloned into pRS41661 digested with KpnI-HF (New England Biolabs, Ipswitch, MA, USA) and SacI-HF (New England Biolabs, Ipswitch, MA) using Gibson assembly. The Gibson assembly was performed using 34 fmol of backbone and 68 fmol of gene fragment in 20 μl reactions (New England Biolabs, Ipswitch, MA). The reactions were incubated for 15 min at 50 °C. 3 μl of reaction mix were transformed into Mix & Go! Competent Cells - DH5 Alpha (Zymo Research Corporation, Irvine, CA). Correct assembly was checked by colony PCR followed by Sanger sequencing of the gene fragment region and the Gibson junctions. The sequence-confirmed plasmid was named pDL00211. Afterwards, the plasmid was digested with HindIII-HF (New England Biolabs, Ipswitch, MA) and SacI-HF (New England Biolabs, Ipswitch, MA) and treated with Quick CIP alkaline phosphatase (New England Biolabs, Ipswitch, MA). A fragment containing the CYC terminator, originating from plasmid pGD00933 was ligated into the backbone using NEB T4 Ligase (New England Biolabs, Ipswitch, MA) according to the manufacturer's protocol using 29 fmol of backbone and 87 fmol of insert. Correct assembly was checked by colony PCR followed by Sanger sequencing of the CYC terminator and the backbone regions up- and downstream of the restriction sites. The sequence-confirmed plasmid was named pDL00212.

For the prey intermediate plasmid, a gene fragment was ordered from Twist carrying an SP1 Illumina primer binding site, a 24 bp placeholder barcode sequence (AAGTTCGTTGCATCACCTAGCCAA; prey barcode), a 188 bp random spacer sequence, an SP2 Illumina primer binding site, the CYC promoter, a 3 x Flag-tag fused to a DHFR C-terminal half (from here on referred to as FR-tag), a Linker sequence (Linker L4, GASGSAAGGSGSAGSGASAS), the JUN bZIP wt sequence (consisting of the wt DNA-binding domain (DBD) and the five heptads of the wt zipper domain), and Linker L3 was cloned into the pUC19 backbone which had been digested with HindIII-HF (New England Biolabs, Ipswitch, MA) and EcoRI (New England Biolabs, Ipswitch, MA) via Gibson assembly using 62.2 fmol of backbone and 121.4 fmol of gene fragment in 20 μl reactions. The reactions were incubated for 15 min at 50 °C and 3 μl were transformed into Mix & Go! Competent Cells - DH5 Alpha (Zymo Research Corporation, Irvine, CA). Correct assembly was checked by colony PCR followed by Sanger sequencing of the gene fragment region and the Gibson junctions. The sequence-confirmed plasmid was named pDL00210.

The JUN variant library was constructed by overlap-extension PCR as described in ref. 33. In contrast to the cited library, the JUN sequences were truncated by three amino acids located C-terminally of the fifth heptad. These amino acids are not part of the heptad repeats and were thus not mutagenized in the original design. Here, they were completely removed from the construct. The resulting leucine zipper domain is thus 62 amino acids long, with 27 amino acids for the DNA binding domain and 35 amino acids for the zipper domain, of which only the first 32 where mutated for technical reasons. In the final library, JUN variants were cloned into the prey site of the bindingPCA plasmid (fused to the FR-tag) and the 54 wt bZIP sequences were cloned into the bait site (fused to the DH-tag).

Every bait-prey pair was identifiable by a double barcode where the bait barcode had a length of 20 bp and the prey barcode had a length of 24 bp. The two barcodes were separated by a 6 bp restriction site so that the full double barcode could be sequenced using the Illumina HiSeq single-end 50 bp sequencing run type. The barcodes were randomly generated using primers with N20- and N24-overhangs, respectively. Different strategies were used during cloning of the bait and prey intermediate libraries which will be described in detail below.

The intermediate plasmids for the 54 baits, the 18 negative controls and the overexpressed DH fragment for abundancePCA were first cloned individually in pGD00933 from human cDNA or ORFeome clones and verified by Sanger sequencing. Restriction sites in the CREM and DDIT3 sequences were removed by targeted mutagenesis to a synonymous codon.

To clone them into the bait intermediate plasmid pDL00212 and barcode them, four building blocks needed to be prepared: (1) the backbone by digesting pDL00212 with BstEII-HF (New England Biolabs, Ipswitch, MA) and BamHI-HF (New England Biolabs, Ipswitch, MA) and de-phosphorylating it with Quick CIP alkaline phosphatase (New England Biolabs, Ipswitch, MA); (2) the CYC terminator by digesting pDL00212 with HindIII-HF (New England Biolabs, Ipswitch, MA) and SpeI-HF (New England Biolabs, Ipswitch, MA); (3) the wt bZIP sequences by digesting them out of them original plasmids with BamHI-HF (New England Biolabs, Ipswitch, MA) and SpeI-HF (New England Biolabs, Ipswitch, MA;, and (4) the barcode fragment digested with BstEII-HF (New England Biolabs, Ipswitch, MA) and HindIII-HF (New England Biolabs, Ipswitch, MA). The barcode fragment was created by PCR. The oligonucleotide oDL00549 containing a 20 bp stretch of randomly generated nucleotides (N20) was made double stranded using oligonucleotides oDL00122 and oDL00551. A 100 μl setup using 40 pmol of oDL00549 and 200 pmol each of oDL00122 and oDL00551 using the Q5 Master Mix (New England Biolabs, Ipswitch, MA) was run with an initial denaturation at 98 °C for 30 s, 4 cycles of 10 s denaturation at 98 °C, 30 s annealing at 55 °C, and 15 sec elongation at 72 °C, followed by a final extension for 1.5 min at 72 °C. The product was treated with ExoSAP-IT (Thermo Fisher Scientific, Waltham, MA) to remove residual single-stranded oligonucleotides and afterwards purified with the QIAGEN Mini Elute Reaction Cleanup Kit (QIAGEN, Hilden, Germany). The purified product was then digested with BstEII-HF (New England Biolabs, Ipswitch, MA) and HindIII-HF (New England Biolabs, Ipswitch, MA) and afterwards gel-purified using the QIAquick Gel Extraction Kit (QIAGEN, Hilden, Germany) to be ready for cloning. The final plasmids were assembled using NEB T4 Ligase (New England Biolabs, Ipswitch, MA) according to the manufacturer's protocol with ca. 14.5 fmol of backbone and 29 fmol of each of the other ingredients. The resulting clones were confirmed via Sanger-sequencing. For each ligation, one clone with a distinctive barcode was chosen (Supplementary Data 1)

The JUN intermediate library was constructed in two steps. The variant library was first cloned into pDL00212, and barcodes were then inserted. To create the variant library, mutagenic overlap-extension PCR was performed as previously described33. In short, mutagenic NNS primers were designed for each codon and used to amplify a 5′and a 3′amplicon with 30 nt overlaps. The two overlapping PCR products for each codon were then pooled and amplified with universal primers binding at opposite ends of the annealed amplicons (oligonucleotides are listed in Supplementary Data 6). Instead of cloning the library into the backbone via Gibson assembly as previously described, the fragments as well as pDL00212 were digested with NheI-HF (New England Biolabs, Ipswitch, MA) and HindIII-HF (New England Biolabs, Ipswitch, MA). The backbone was de-phosphorylated with Quick CIP alkaline phosphatase (New England Biolabs, Ipswitch, MA). All components were ligated together using NEB T4 Ligase (New England Biolabs, Ipswitch, MA) according to the manufacturer's protocol. The reactions were performed in a 20 μl reaction using 21.9 fmol of backbone and 43.6 fmol of insert. The reaction was incubated overnight in a PCR cycler using the temperature-cycle ligation (TCL) protocol. This protocol consists of 4 cycles of 10 min at 22 °C and 10 min at 16 °C, followed by 49 cycles of 30 sec each at 4 °C, 27 °C, and 13 °C, then 10 cycles of 30 sec at 4 °C, 1 h at 16 °C, 10 min at 22 °C, and 10 min at 16 °C, another 49 cycles of 30 sec each at 4 °C, 27 °C, and 13 °C, followed finally by 30 sec at 4 °C, 1 hour at 16 °C, 1 h at 22 °C and 3 h at 18 °C. Afterwards, the reaction was dialyzed against MilliQ H2O (Merck Millipore, Darmstadt, Germany) to reduce the salt concentration using nitrocellulose membranes with 0.025 μm pores (Merck Millipore, Darmstadt, Germany) and then concentrated down to 5–10 μl in a speedvac. The concentrated ligated products were then transformed into NEB 10-beta Electrocompetent E.coli (New England Biolabs, Ipswitch, MA). Three transformations were performed to increase the yield. For each transformation, 1.5 μl of concentrated ligation reaction was mixed with 25 μl of competent cells and pipetted into a pre-chilled 0.1 cm Gene Pulser Cuvette (Bio-Rad, Hercules, California). Each mixture was electroporated using a Gene Pulser Xcell (BioRad, Hercules, California) using the exponential protocol with 2.0 kV, 200 Ω and 25 μF. After electroporation, the cells were immediately re-suspended in 488 μl of SOC medium that had been pre-warmed to 37 °C. The cuvette was rinsed with another 488 μl of that same medium. All three transformations were pooled and incubated for 30 min at 37 °C. Afterwards, 1 μl of recovered cells was diluted 1/1000, and 10 μl and 100 μl of this dilution were plated on LB + 2X Amp petri dishes and incubated for 16 h at 37 °C. These plates were used to quantify the number of transformants per transformation. The remaining cells were poured in 150 mL of LB + 4X Amp and incubated at 37 °C and 180 rpm shaking for 16 h. In the morning, the colonies were counted and a total of 23 million transformants were obtained. 700 ng of the plasmid library was isolated from the liquid culture using the NucleoBond® PC 500 kit (Macherey-Nagel, Düren, Germany).

To barcode the library, a barcode fragment was generated by PCR-amplifying a 326 bp amplicon from the pDL00210 template and using the Q5 High-Fidelity 2X Master Mix (New England Biolabs, Ipswitch, MA) and primers oDL00747 and oDL00748. Primer oDL00747 anneals just downstream of the barcode placeholder sequence and contains a 24 bp overhang of random bases (N24) in place of the barcode sequence. The PCR was run for 30 cycles using an annealing temperature of 52 °C for 30 s and an extension temperature of 72 °C for 1 min. The resulting amplicon was cut from a 1% agarose gel and purified using the QIAGEN PCR purification kit (QIAGEN, Hilden, Germany). Afterwards, the purified amplicon was digested with AvrII (New England Biolabs, Ipswitch, MA) and XhoI (New England Biolabs, Ipswitch, MA) overnight with an excess of restriction enzymes and gel-purified once more. The resulting fragment was then cloned into the library that had been digested with AvrII (New England Biolabs, Ipswitch, MA) and XhoI (New England Biolabs, Ipswitch, MA) and de-phosphorylated with Quick CIP alkaline phosphatase (New England Biolabs, Ipswitch, MA). The ligation was performed using NEB T4 ligase (New England Biolabs, Ipswitch, MA) according to the manufacturer's protocol with 23 fmol of backbone and 46 fmol of barcode insert. The reaction mix was incubated overnight using the TCL protocol as described above, followed by dialysis and concentration also as described above. The transformation into NEB 10-beta electrocompetent E.coli (New England Biolabs, Ipswitch, MA) was performed slightly differently. Since the number of barcodes per variant was aimed to be restricted to an average of 30, only 30,000 transformants were supposed to be harvested. A single transformation was performed and then plated for colony counting as described above. The remaining cells were used to inoculate 12 times 10 mL of LB + 4X Amp using 2 × 5 μl, 2 × 10 μl, 2 × 15 μl, 2 × 20 μl, 2 × 50 μl, and 2 × 100 μl of recovery culture. In the morning, the amount of transformants was quantified by counting colonies on the plates and the number of liquid cultures containing a total of ca. 30,000 transformants was pooled and purified by splitting the entire pellet over 10 Miniprep columns (QIAGEN, Hilden, Germany) and eluting each in 30 μl of EB Buffer. This resulted in 145 μg of plasmid library. Successful barcoding was assessed via Sanger sequencing of eight clones. All but one clone showed a correct barcode insertion.

To put the variant and barcode in close proximity and ensure a sequencing amplicon size suitable for Illumina sequencing, the intermediate library was digested with I-SceI (New England Biolabs, Ipswitch, MA) and NheI-HF (New England Biolabs, Ipswitch, MA) to remove parts of the spacer sequence, the SP2 Illumina primer binding site, the CYC promoter, and the 3x Flag tag – FR fragment – L4 fusion. Afterwards, the ends were blunted with NEB DNA Polymerase I, Large (Klenow) Fragment (New England Biolabs, Ipswitch, MA) according to the manufacturer's protocol and then purified using the QIAGEN Mini Elute Reaction Cleanup Kit (QIAGEN, Hilden, Germany) and eluted with 16 μl of buffer EB. The blunt-ended fragments were then re-circularized using NEB T4 ligase (New England Biolabs, Ipswitch, MA). Five reactions of 50 μl with 19 fmol of fragment each were set up and incubated overnight at 15 °C. In the morning, 30 μl per reaction were dialyzed as described above. Afterwards, the five reactions were pooled and concentrated on the speedvac to ca. 20 μl. A single transformation was performed into NEB electrocompetent bacteria (New England Biolabs, Ipswitch, MA) as described above. The non-plated cells were poured in 50 mL of LB + 4x Amp and incubated at 37 °C for 16 h. In the morning, a total of ca. 70 million transformants were counted. The cells were harvested, and 230 μg of DNA was extracted using the NucleoBond® PC 500 kit (Macherey-Nagel, Düren, Germany). Efficiency of the re-circularization was checked by Sanger sequencing of eight clones. All showed the expected results, with the barcode and the JUN variant sequence separated by 52 bp.

To prepare the library for sequencing, a two-step PCR was performed to add the Illumina adapter sequences to the amplicons. Since only the SP1 primer binding site was still present in the plasmid, the SP2 primer binding site had to be added via the first round of PCR. For this, 2 ng of the re-cloned library were used as template and a 367 bp barcode-variant fragment was amplified using the KAPA HiFi HotStart DNA Polymerase Kit (Roche Sequencing Solutions Inc, Pleasanton, CA) and primers oDL00345 and oDL00518. The PCR was run for 10 cycles, the annealing was performed at 64 °C for 15 s and the extension at 72 °C for 15 s. Afterwards, the reaction was incubated with 20 μl of ExoSAP-IT (Thermo Fisher Scientific, Waltham, MA) at 37 °C for 15 min to remove residual oligonucleotides, followed by heat inactivation at 80 °C for 15 min. The reaction was then purified using the QIAGEN Mini Elute Reaction Cleanup Kit (QIAGEN, Hilden, Germany) and eluted with 12 μl of buffer EB. For the second round of PCR, the entire eluate was used as input for a 300 μl PCR reaction mastermix with the KAPA HiFi HotStart DNA Polymerase kit (Roche Sequencing Solutions Inc, Pleasanton, CA) using primers oDL00345 and oDL00346. The mastermix was split in 6 × 50 μl and the PCR was run for another 10 cycles with an annealing temperature of 72 °C. After PCR, the six reactions were pooled. After verifying the specificity of the amplification on gel, the PCR was gel-purified with the QIAquick Gel Extraction Kit (QIAGEN, Hilden, Germany). The eluate was once more quality-checked with a Bioanalyzer (Agilent Technologies, Santa Clara, CA) which showed chromatograms of single peaks at the correct size. For further quality control, a qPCR was performed using the KAPA Library Quantification Standards with Primer (Roche Sequencing Solutions Inc, Pleasanton, CA). The library was then sequenced on an Illumina NextSeq 500 Sequencing System (Illumina, San Diego, CA) with the Mid-output 300 cycle paired-end sequencing run type. To deal with the low complexity of the library, 30% of PhiX phage library were spiked in.

Adapters were first removed using cutadapt 1.1862. The 5' adapter was removed from the forward read upstream and including the AvrII site, while the 5' adapter was removed from the reverse read downstream and including the HindIII and BamHI sites present in tandem. This was done using the default error rate of 0.1. The last base, which seems to be systematically added by the sequencer, was removed from both forward and reverse read. The minimum overlap was set to 6 and max number of N per read set to 0. Forward and reverse reads were then merged using PEAR 0.9.1163 with a minimum overlap size of 12, min and max read length of 249 and 259, respecitvely (expected length is 254) and a p-value of 0.05. Merged reads where then pre-processed in R using the shortread package. First, the read was split between barcode and variant sequences. For the barcode, the first 24 bp of the read were used since the sequence upstream of the expected barcode positoon was discarded by cutadapt. Even if there is an indel, these are the 24 bp that will correspond to the barcode during the deepPCA. For the variant, a perfect and unique match of the 10 bp directly upstream of the first JUN codon was required and the sequence downstream is used as the JUN variant sequence since all sequences downstream of the stop codon were discarded by cutadapt.

Read counts were then aggregated to unique barcode-variant sequence pairs. To identify true unique barcode-variant pairs present in the sample from sequencing errors or PCR template switching, which should be at lower frequency, we also counted the total counts for individual barcode and variant sequences. Each unique barcode-variant pair was then also expressed as the percentage of the corresponding total barcode and variant counts. Barcode-variant pairs with less than 10 reads were filtered out since they are far from the 1000 read average expected for the 30,000 transformant obtained and thus likely to be sequencing errors. This reduced the number of unique barcode sequences from ~500k to 30,416, indeed close from our expectation. Next, read counts that likely correspond to sequencing errors of the same barcode were collapsed. Thanks to the barcode length, it is highly unlikely the barcodes close in sequence were cloned to the same variant. For each unique variant, read counts from all barcodes with a Hamming distance of 4 bp or less were collapsed to the barcode with the highest frequency. This was then repeated to the second highest frequency barcode, and iteratively until all remaining barcode sequences were processed and all true barcodes associated to this variant were identified. This step further reduced the number of unique barcode sequences to 28,969. Plotting read counts of barcode-variant pairs as a function of both total barcode and variant counts reveals a cluster of pairs where both the barcode and the variant have a high total read count but the pair has a low read count. These are likely chimeras that result from template switching during PCR, and were removed by filtering out pairs with less than 100 counts for which the total barcode and variant counts were higher than 300 and 3000, respectively. Finally, for each unique barcode, variant read counts that correspond to sequencing errors were collapsed to the highest frequency variant. A lower frequency variant was considered a sequencing error if it had a Hamming distance of not more than 1 compared to the highest frequency variant, if this mutation was in a different codon and if its total count is 100x lower than the total count of the highest frequency one. If the highest frequency variant was the wild-type JUN sequence, then the variants from which to collapse the count also could not be a variant expected from the mutagenesis design. After this process, unique barcodes for which all associated variants could not be collapsed to the highest frequency one were flagged as promiscuous. 167 barcodes were also flagged as having a Hamming distance <= 4 to at least one other barcode and for which sequencing error during deepPCA could falsely attribute the read between two closely related barcodes. Ultimately, we obtained a total of 28,336 high confidence unique barcodes out of which 23,385 were associated to expected JUN variant sequences for an average of 38 barcodes per amino acid variant.

To prepare the backbone, all intermediate plasmids (54 wild-type bZIPs, 18 negative controls and the DH overexpression for abundancePCA) were digested with AvrII (New England Biolabs, Ipswitch, MA) and HindIII-HF (New England Biolabs, Ipswitch, MA), dephosphorylated with Quick CIP alkaline phosphatase (New England Biolabs, Ipswitch, MA), gel-purified with the QIAquick Gel Extraction Kit (QIAGEN, Hilden, Germany) and pooled at equimolar ratios. To prepare the inserts consisting of the Jun variants fused to the FR fragment, the upstream CYC promoter and the variant barcode, the prey intermediate library was digested with AvrII (New England Biolabs, Ipswitch, MA) and HindIII-HF (New England Biolabs, Ipswitch, MA) and gel-purified with the QIAquick Gel Extraction Kit (QIAGEN, Hilden, Germany).

A ligation reaction containing 200 fmol of backbone pool, 600 fmol of insert pool, 10 μl of NEB T4 ligation buffer (New England Biolabs, Ipswitch, MA), 5 μl of NEB T4 ligase (New England Biolabs, Ipswitch, MA) filled up to a volume of 100 μl with MilliQ H2O (Merck Millipore, Darmstadt, Germany) was set up. The reaction was split in 2×50 μl and incubated in a PCR cycler with the TCL program as described above. Afterwards, the ligase was heat-inactivated at 65 °C for 15 min, the 2 × 50 μl were pooled and then split in 3 × 33 μl which were each dialyzed against 50 mL of MilliQ H2O (Merck Millipore, Darmstadt, Germany) as described above. The dialyzed samples were pooled, which resulted in a total volume of ca. 80 μl. This was concentrated to 25 μl with a speedvac. Two transformations into NEB 10 beta electrocompetent cells (New England Biolabs, Ipswitch, MA) were performed as described above. After recovery, the culture was spread on 35 individual 15 cm diameter LB + 2x Amp plates. The cells were grown to a lawn at 37 °C for approximately 16 h and then scraped off the plates with sterile H2O and a plastic spatula. The pellet was washed once with sterile H2O and then each library was purified over two columns of the NucleoBond® PC 500 kit (Macherey-Nagel, Düren, Germany). A total of 17 million transformants were harvested. The plasmid extractions yielded ca. 1245 μg of DNA. The quality of the libraries was assessed via test digests, which showed the expected band patterns.

The transformation was performed in six replicates following a slightly modified protocol from the one described in ref. 33. Yeast cells of the BY4742 strain were streaked from a glycerol stock on YPAD plates and incubated at 30 °C for two days prior to the experiment. For each replicate, a single colony was used to inoculate 15 mL of liquid YPAD and grown to saturation overnight at 30 °C, 200 rpm shaking. In the morning, OD600 was measured and for each replicate, a culture of 350 mL pre-warmed YPAD was inoculated at an OD600 of 0.3 and grown at 30 °C and 200 rpm shaking for about 4.5 h until an OD600 of 1.2 to 1.6 was reached. The cells were harvested for 10 min at 3000 g. Afterwards, the cells were washed with 50 mL H2O and centrifuged for 5 min at 3000 g, the supernatant was discarded, and a second washing step with 50 mL SORB was performed in the same way. Finally, the pellets were re-suspended in 14 mL SORB and incubated for 30 min at RT on a wheel. Next, 350 μl of 10 mg/mL pre-boiled salmon sperm DNA (Agilent Technologies, Santa Clara, CA) were added and mixed thoroughly with the cells before adding 7 μg of the final library to each replicate and mixing by inversion. Each replicate was then split in two times 7 ml and distributed over two 50 mL Falcon tubes. Afterwards, 35 mL of Plate Mixture were added to each tube and the tubes were then incubated for another 30 min at RT on a wheel. Then, 3.5 mL of DMSO (AppliChem, Darmstadt, Germany) were added to each tube, they were mixed well by inversion, and the ells were heat-shocked for 20 min in a 42 °C water bath. To homogenize the heat within the tubes, they were inverted 6 times after 1 min, 2 min 30 s, 5 min, 7 min, 10 min, and 15 min. Afterwards, the tubes were centrifuged for 5 min at 1771 g. The supernatant was removed by pouring first, followed by quick spin and aspiration of the leftover supernatant with a vacuum pump to get rid of any remnants. The pellets were then each re-suspended in 50 mL of pre-warmed recovery medium and incubated at 30 °C for 1 h without agitation. They were harvested by centrifuging for 5 min at 3000 g. The two pellets per sample were then pooled again by resuspending in 50 mL SC-ura and harvested once more for 5 min at 3000 g. After this, the supernatant was discarded, and each pellet was resuspended in 700 mL of SC-ura in 5 L Erlenmeyer flasks. From this culture, 10 μl and 50 μl of each sample were plated on SC-ura plates to assess the transformation efficiency. The plates were incubated at 30 °C for 48 h. The cultures were incubated at 30 °C and 200 rpm shaking for 48 h until they reached saturation. The numbers of transformants varied between 7 and 18 million, ensuring at least 10-fold library coverage in each replicate.

After the transformant selection culture had reached saturation after about 48 h of growth, a second selection cycle was performed by inoculating a culture of 1 L of SC-ura/ade/met at and OD600 of 0.1 and letting it grow until an OD600 of 1.2–1.4. From this culture, a competition culture of 1 L competition medium (SC-ura/ade/met + 200 μg/mL MTX in 2% DMSO) was inoculated at an OD600 of 0.05 and grown for 5 generations until an OD600 of 1.6. The INPUT samples were harvested from the remaining cells of the second selection culture by centrifugation for 5 min at 3000 g and two washes with 50 mL H2O and then frozen at −20 °C. The OUTPUT samples were harvested from the competition cultures in the same way.

DNA was extracted from the yeast pellets using a yeast Midiprep protocol to enrich plasmid over genomic DNA. The cells were first spheroblasted by thawing at RT, re-suspending in 20 mL DTT buffer, and incubating at 30 °C and 200 rpm shaking for 15 min. The cells were then harvested at 2500 g for 5 min, re-suspended in 20 mL Zymolyase buffer and incubated at 30 °C and 200 rpm shaking for 1.5 h. The spheroblasted cells were collected in 50 mL Falcon tubes and spun for 5 min at 2500 g and re-suspended in 7.5 mL of home-made buffer P1 (according to the manufacturers recipe, QIAGEN, Düren, Germany). 7.5 mL of home-made buffer P2 (according to the manufacturers recipe, QIAGEN, Düren, Germany) were added, the samples were mixed by inverting the tubes several times and incubated at RT for 10 min. 7.5 mL of pre-cooled buffer P3 (QIAGEN, Düren, Germany) were added and the samples were again homogenized by inverting the tubes several times. The samples were centrifuged for 15 min at 3000 g and 4 °C and the supernatant was then recovered in 50 mL centrifugation tubes (Nalgene, Rochester, NY, USA) and re-centrifuged for 15 min at 15,000 g and 4 °C. The supernatant was then filtered to remove remaining cell debris. The cleared supernatant was applied to columns provided in the Plasmid Midi Kit (QIAGEN, Düren, Germany) that had been equilibrated with 10 mL buffer QBT. Once all of the supernatant had been applied, the columns were washed twice with 10 mL buffer QC and finally, the DNA was eluted with 5 mL buffer QF and collected in 15 mL Falcon tubes. 3.5 mL of Isopropanol added to each sample and mixed. The samples were then distributed evenly over 2 × 5 mL reaction tubes (Eppendorf, Hamburg, Germany) and spun at 14,200 g for 15 min. The supernatant was removed and the two pellets per sample were pooled by re-suspending each in 500 mL freshly prepared 70% ethanol and pipetting them together in a normal 1.5 mL reaction tube. The pellets were spun down again at 15,870 g for 1 min and then left to dry at RT. Finally, the pellets were re-suspended in 100 μl of buffer EB.

To determine the molar concentration of the plasmid in each sample, as well as the enrichment over genomic DNA, a qPCR was performed using primers OGD241 and OGD242 that bind in the plasmid backbone. For this, the extracts were diluted 1/200 and a standard curve was made using clean plasmid originating from a Miniprep that had been made of one clone of the library and of which the concentration had been determined using the Qubit system (Invitrogen, Waltham, MA, USA). The concentrations used for the standard curve were 0.4 ng/μl, 0.08 ng/μl, 0.016 ng/μl, 0.0032 ng/μl, 0.00064 ng/μl, 0.000128 ng/μl, and 0.0000256 ng/μl. The qPCR was performed using the 2X SsoAdvanced Univeral SYBR Green Supermix (Bio-Rad Laboratories, Hercules, CA, USA) according to the manufacturer's protocol.

Both Illumina primer binding sites (SP1 and SP2) are already present in the plasmid, so only a single round of PCR had to be performed to prepare the final sequencing libraries. 400 million molecules of plasmid for each sample were used as template to ensure at least 10-fold coverage of input molecules over expected read counts. To enable multiplexed sequencing, primers of the NEBNext dual primer set 1 (New England Biolabs, Ipswitch, MA) were used. The INPUT samples used the NEBNext i501 primer as forward primer and the NEBNext i701 to i706 primers as reverse primers. The OUTPUT samples used the NEBNext i502 primer as forward primer and also the NEBNext i701 to i706 primers as reverse primer. The libraries were amplified with Q5 polymerase (New England Biolabs, Ipswitch, MA) in a 50 μl reaction using 400 million template molecules, 0.25 μM of each primer, 200 μM dNTPs (New England Biolabs, Ipswitch, MA), 1X concentrated Q5 reaction buffer (New England Biolabs, Ipswitch, MA) and 0.5 μl polymerase. The PCR was run for 18 cycles with 63 °C annealing for 30 s and 72 °C extension for 30 s. Successful PCR was confirmed by checking 5 μl of each sample on a 1% Agarose gel. The rest of each reaction was then loaded on a separate 1% Agarose gel, and the 375 bp amplicons were cut out and purified using the MinElute Gel Extraction kit (QIAGEN, Düren, Germany) and eluted in 20 μl buffer EB. The concentrations were measured using the Qubit system (Invitrogen, Waltham, MA, USA). The concentration of individual samples was confirmed using the KAPA Library Quantification Standards with Primer (Roche Sequencing Solutions Inc, Pleasanton, CA). All 6 INPUT and 6 OUTPUT samples per screen were pooled at equimolar ratios. The concentration of the final pool was confirmed once more first with Qubit, then with qPCR. A final quality control was performed by loading the library on a Bioanalyzer (Agilent Technologies, Santa Clara, CA). Finally, the library was submitted for sequencing on a single lane of the Illumina HiSeq 2500 System (Illumina, San Diego, CA) with the 50 bp single-end sequencing run type. To increase sequencing quality and clustering efficiency, 25 % of PhiX phage library were spiked into the library.

Sequencing data for binding to the 54 wild-type bZIP partners and the 18 negative controls and abundance measurements were processed and computed together in the exact same way.

Raw sequencing reads were filtered for an average Q score >= 20, split into the wild-type barcode (first 20 bp) and the variant barcode (last 24 bp), and collapsed to obtain read counts for each barcode pair. Wildtype barcode reads with a Hamming distance <= 2 to the expected true barcode sequence were kept. Jun variant barcode reads with a Hamming distance <2 to the expected true barcodes and a difference in Hamming > 2 between the two best matches were kept. Read counts of barcode pairs matching the same combination of JUN amino acid substitution and wild-type partner were collapsed. Output samples were sequenced twice to increase coverage, and the read counts for each sample were added. Read count tables were then processed with DiMSum v1.336 (https://github.com/lehner-lab/DiMSum) using STEAM stages 4–5 (“countPath” option) and setting the JUN wild-type x FOS pair as an arbitrary reference to obtain binding scores and associated errors.

For DMS experiments in general, the enrichment score (ES) for each variant i in replicate r is typically defined as the ratio between its frequency before and after selection:

Where Ninput and Noutput are the corresponding numbers of sequenced reads in the Input and Output sample respectively. DiMSum calculates fitness scores for each variant i in replicate r as the natural logarithm of the enrichment score normalized to the wild-type or reference variant (wt):

Measurement errors of fitness scores are estimated by fitting a modified Poissonian (count-based) model to the data—with replicate-specific multiplicative and additive terms that are common to all variants—that has been shown to outperform previous methods in benchmarking analyses36. Final DiMSum fitness and error estimates for all variants are obtained by merging between replicates using weighted averaging. These fitness scores correspond to binding and abundance scores reported in Supplementary Data 3.

The true positive set was determined by taking all bZIP partners for JUN reported in Biogrid 4.4.23437. The true negative set consisted in the 18 negative controls and the overexpressed free DH fragment used for abundancePCA that were present in the library. At different binding score thresholds, the true positive rate was calculated as the fraction of true positive partners with a binding score above the threshold. The true negative rate was calculated as the fraction of partners from the true negative set above the binding score threshold.

Variant - partner pairs tested with splitFAST38 were expressed from separate plasmids. One plasmid expresses from a CMV promoter a fusion consisting in BFP followed by P2A, the NFAST fragment, a 20 amino acid linker and the JUN variant. The other plasmid expresses from a CMV promoter a fusion consisting in RFP followed by P2A, the CFAST fragment, an 11 amino acid linker and the wild-type partner. The two plasmid backbones were purchased from Addgene, plasmid #130812 (NFAST) and #13814 (CFAST) (Addgene, Watertown, Massachusetts), and JUN variants and wild-type partners were cloned by restriction digestion and ligation following standard protocols.

Human colon cancer cells HCT-116 (ATCC ccl-247) were cultured in Dulbecco’s Modified Eagle’s medium (DMEM + GlutaMAX, gibco, Thermo Fisher Scientific, Waltham, MA) supplemented with phenol red, penicillin (0.04 mg/mL), streptomycin (0.06 mg/mL), and 10 % (vol/vol) fetal calf serum (FBS) at 37 °C at 5% CO2. Cells were transiently transfected in triplicate with pairs of NFAST and CFAST plasmids using the Lipofectamine 3000 Transfection Kit (Invitrogen, Waltham, MA, USA) according to the manufacturer’s protocol and 48 h prior to cytometry in 10 cm dishes.

Prior to cytometry, cells were washed with phosphate-buffered saline (PBS), trypsinized (0.05% Trypsin-EDTA, gibco, Thermo Fisher Scientific, Waltham, MA) and resuspended in DMEM without phenol red, supplemented with 5 μM hydroxybenzylidene rhodamine (HBR) analog (TFLime, The Twinkle Factory, Paris, France) to induce fluorescence when in complex with the reconstituted FAST protein in case of protein interaction between the two proteins of interest. Cytometry data was acquired on a BD LSRII SORP Analyser (Becton Dickinson, Franklin Lakes, NJ, USA) cytometer. Green fluorescence (HBR) was excited at 488 nm wavelentgh, BFP at 405 nm, and RFP at 561 nm. Gates were set as FSC.A > 42,000 and <260,000, SSC.A > 25,000 and <25,0000, SSC.H > 50,000 and <200,000, SSC.W > 20,000 and <85,000. Cells with log10(GFP) > log10(BFP)*1.6-1.4 and BFP < 600 represent non-transfected cells and were also filtered out. Gating plots for all samples are provided in Supplementary Data 5. RFP was highly correlated to BFP but more noisy, so all analyses were based on the BFP signal as a measure of protein abundance. The GFP/BFP ratio was averaged over all cells for each replicate, and the mean and standard error was calculated across the three replicates (Supplementary Data 4). Relative binding scores were then calculated by subtracting the corresponding JUN wild-type – partner binding score from the JUN variant - partner binding score.

We used MoCHI (https://github.com/lehner-lab/MoCHI)34,35 to fit a global mechanistic model to the bPCA data where JUN variants binding to all partner bZIPs are modeled as an equilibrium between two states: unfolded and unbound or folded and bound.

We assume that free energy changes of binding are additive and shared between all partner bZIPs, i.e., the total binding free energy changes of an arbitrary variant to an arbitrary bZIP relative to the wild-type JUN sequence is simply the sum over residue-specific energies corresponding to all constituent single amino acid substitutions.

In order to model bZIP-specific modifications in binding affinity we dummy encoded the presence/absence of each partner bZIP in a 51-mer amino acid sequence appended to the JUN variant amino acid sequence where a A > C substitution denotes the presence of the corresponding partner bZIP. The associated inferred free energy changes for these dummy substitutions correspond to additive bZIP-specific binding affinity effects.

However, for some partners this thermodynamic model did not provide a good fit to the bPCA data and suggested a partner-dependent non-linear trend corresponding to a multiplicative energy term (effectively controlling the steepness of the sigmoid gradient). We therefore modified the model to infer one such term for each bZIP partner in addition to the additive effects described above. This modification indeed reduced the bias observed in the residuals, although without significantly improving the overall goodness of the fit (R2 = 0.91 in both cases). Interestingly, these multiplicative terms are equivalent to Hill’s coefficient, which is determined by the cooperativity of the interaction between multiple subunits, although the underlying mechanisms in the context of this assay remains speculative pending further investigation.

We configured MoCHI parameters to specify a neural network architecture consisting of two additive trait (free energy) layers i.e., one corresponding to the additive JUN and dummy-encoded bZIP partner binding free energy changes (“Binding”) and another for the bZIP-specific multiplicative effects (“BindingMod”). A single linear transformation layer outputs the predicted bPCA binding score. The specified non-linear transformation “TwoStateFractionFoldedMod” derived from the Boltzmann distribution function relates binding energies to proportions of folded and bound molecules.

A random 30% of variant-partner combinations was held out during model training, with 20% representing the validation data and 10% representing the test data. Validation data was used to evaluate training progress and optimize hyperparameters (batch size). Optimal hyperparameters were defined as those resulting in the smallest validation loss after 100 training epochs. Test data was used to assess final model performance.

Models were trained with default settings, i.e., for a maximum of 1000 epochs using the Adam optimization algorithm with an initial learning rate of 0.05. MoCHI reduces the learning rate exponentially (γ = 0.98) if the validation loss has not improved in the most recent ten epochs compared to the preceding ten epochs. In addition, MoCHI stops model training early if the wild-type free energy terms over the most recent ten epochs have stabilized (standard deviation 10−3).

In vitro FRET measurements were obtained from Reinke et al.29. KD measured at 37 °C were converted to ΔG, normalized to the JUN-FOS ΔG to obtain ΔΔG and the Pearson correlation with the partner’s ΔΔG fitted by MoCHI was calculated.

To determine which amino acid feature relates to global energy, we fitted a linear model between a database of >500 amino acid features45 and the ΔΔG fitted by MoCHI. We retained only the 536 features for which values for all 20 amino acids were available. We performed a forward feature search by an iterative process where each feature was used one-by-one together with the heptad number and the position within the heptad of each mutant. The best predictive feature was retained, and the search was repeated including the previously identified predictive feature. We next determined the best feature at each heptad position. The data was split by heptad position, and at each position, each feature was tested alone and the best one was retained.

Despite the multiplicative term, the fit for ATF4 was biased and the fitted sigmoid does not fit the apparent trend. To not bias the analysis of specific effects, all measurements of binding to ATF4 were therefore filtered out. Additionally, all measurements with 10 or less counts in any input replicate or 0 count in any output replicate were filtered out because of the high error inherent to low read counts. Mutational effect was then calculated by subtracting the binding score of interaction between wild-type JUN and each corresponding partner. The measurement error of the mutational effect was calculated by propagating the error of the mutant and the corresponding wild-type pair, and a p-value was calculated by performing a one-sample t-test. P-values were adjusted by the FDR method.

Mutational effects due to global energy effects were calculated by subtracting the binding score predicted by MoCHI from the binding score of the corresponding wild-type JUN – partner combination. Corresponding errors were calculated by propagating the measurement and prediction errors, p-values were calculated by a one-sample t-test and corrected by the FDR method. Specific effects were calculated by subtracting the binding score predicted by MoCHI from the one measured, and error, p-values and FDR calculated as above.

Predictions of protein complexes between JUN and a few partners presented in Fig. 7 were generated using Alphafold2-multimer (v2.3.2) using the sequences in Supplementary Data 1, with multimer models v3 and subsequent model relaxation64,65. The pipeline was run through GUIFold66 (v0.4). For the generation of multiple sequence alignments, the “full_dbs” protocol was employed. In case of each prediction target, the best model was selected by the smallest inter-subunit average predicted aligned error (PAE). The range of inter-subunit average PAE for all predictions was between 5.5 and 10.0 Å.

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

The deep sequencing data generated in this study has been deposited in the GEO database under the accession number GSE245326. The FACS data generated in this study are deposited on Zenodo at https://doi.org/10.5281/zenodo.13285315. The AlphaFold models generated in this study have been deposited at the model archive (modelarchive.org) under accession numbers ma-umsl9 (JUN/FOS), ma-pvzu3 (JUN/JUN), ma-dv38s (JUN/BATF2), ma-stsaw (JUN/MAF) and ma-fruyw (JUN/DDIT3).

All code used to analyze the data generated in this study is available on GitHub (https://github.com/gdiss/JUN_specificity)67.

Crick, F. H. C. The packing of α-helices: simple coiled-coils. Acta Crystallogr. 6, 689–697 (1953).

Article CAS Google Scholar

O’Neil, K. T. & DeGrado, W. F. A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids. Science 250, 646–651 (1990).

Article ADS PubMed Google Scholar

O’Shea, E. K., Rutkowski, R. & Kim, P. S. Evidence that the leucine zipper is a coiled coil. Science 243, 538–542 (1989).

Article ADS PubMed Google Scholar

Landschulz, W. H., Johnson, P. F. & McKnight, S. L. The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 240, 1759–1764 (1988).

Article ADS CAS PubMed Google Scholar

Thompson, K. S., Freire, E. & Vinson, C. R. Thermodynamic characterization of the structural stability of the coiled-coil region of the bZIP transcription factor GCN4. Biochemistry 32, 5491–5496 (1993).

Article CAS PubMed Google Scholar

Vinson, C. R., Hai, T. & Boyd, S. M. Dimerization specificity of the leucine zipper-containing bZIP motif on DNA binding: prediction and rational design. Genes Dev. 7, 1047–1058 (1993).

Article CAS PubMed Google Scholar

Newman, J. R. S. & Keating, A. E. Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science 300, 2097–2101 (2003).

Article ADS CAS PubMed Google Scholar

Hai, T. & Curran, T. Cross-family dimerization of transcription factors Fos/Jun and ATF/CREB alters DNA binding specificity. PNAS 88, 3720–3724 (1991).

Article ADS CAS PubMed PubMed Central Google Scholar

O’Shea, E. K., Klemm, J. D., Kim, P. S. & Alber, T. X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science 254, 539–544 (1991).

Article ADS PubMed Google Scholar

O’Shea, E. K., Rutkowski, R. & Kim, P. S. Mechanism of specificity in the Fos-Jun oncoprotein heterodimer. Cell 68, 699–708 (1992).

Article PubMed Google Scholar

O’Shea, E. K., Lumb, K. J. & Kim, P. S. Peptide ‘Velcro’: design of a heterodimeric coiled coil. Curr. Biol. 3, 658–667 (1993).

Article PubMed Google Scholar

Moitra, J., Szilák, L., Krylov, D. & Vinson, C. Leucine is the most stabilizing aliphatic amino acid in the d position of a dimeric leucine zipper coiled coil. Biochemistry 36, 12567–12573 (1997).

Article CAS PubMed Google Scholar

Vinson, C. et al. Classification of human B-ZIP proteins based on dimerization properties. Mol. Cell. Biol. 22, 6321–6335 (2002).

Article CAS PubMed PubMed Central Google Scholar

Oakley, M. G. & Kim, P. S. A buried polar interaction can direct the relative orientation of helices in a coiled coil. Biochemistry 37, 12603–12610 (1998).

Article CAS PubMed Google Scholar

Acharya, A., Ruvinov, S. B., Gal, J., Moll, J. R. & Vinson, C. A heterodimerizing leucine zipper coiled coil system for examining the specificity of a position interactions: amino acids I, V, L, N, A, and K. Biochemistry 41, 14122–14131 (2002).

Article CAS PubMed Google Scholar

Horovitz, A., Serrano, L., Avron, B., Bycroft, M. & Fersht, A. R. Strength and co-operativity of contributions of surface salt bridges to protein stability. J. Mol. Biol. 216, 1031–1044 (1990).

Article CAS PubMed Google Scholar

Serrano, L., Horovitz, A., Avron, B., Bycroft, M. & Fersht, A. R. Estimating the contribution of engineered surface electrostatic interactions to protein stability by using double-mutant cycles. Biochemistry 29, 9343–9352 (1990).

Article CAS PubMed Google Scholar

Arndt, K. M. et al. A heterodimeric coiled-coil peptide pair selected in vivo from a designed library-versus-library ensemble. J. Mol. Biol. 295, 627–639 (2000).

Article CAS PubMed Google Scholar

Arndt, K. M., Pelletier, J. N., Müller, K. M., Plückthun, A. & Alber, T. Comparison of in vivo selection and rational design of heterodimeric coiled coils. Structure 10, 1235–1248 (2002).

Article CAS PubMed Google Scholar

Kaplan, J. B., Reinke, A. W. & Keating, A. E. Increasing the affinity of selective bZIPbinding peptides through surface residue redesign. Protein Sci. 23, 940–953 (2014).

Article CAS PubMed PubMed Central Google Scholar

Dahiyat, B. I., Gordon, D. B. & Mayo, S. L. Automated design of the surface positions of protein helices. Protein Sci. 6, 1333–1337 (1997).

Article CAS PubMed PubMed Central Google Scholar

Drobnak, I., Gradišar, H., Ljubetič, A., Merljak, E. & Jerala, R. Modulation of coiled-coil dimer stability through surface residues while preserving pairing specificity. J. Am. Chem. Soc. 139, 8229–8236 (2017).

Article CAS PubMed Google Scholar

Mason, J. M., Schmitz, M. A., Müller, K. M. & Arndt, K. M. Semirational design of Jun-Fos coiled coils with increased affinity: Universal implications for leucine zipper prediction and design. PNAS 103, 8989–8994 (2006).

Article ADS CAS PubMed PubMed Central Google Scholar

Mason, J. M., Müller, K. M. & Arndt, K. M. Positive aspects of negative design: simultaneous selection of specificity and interaction stability †. Biochemistry 46, 4804–4814 (2007).

Article CAS PubMed Google Scholar

Grigoryan, G. & Keating, A. E. Structure-based prediction of bZIP partnering specificity. J. Mol. Biol. 355, 1125–1142 (2006).

Article CAS PubMed Google Scholar

Potapov, V., Kaplan, J. B. & Keating, A. E. Data-driven prediction and design of bZIP coiled-coil interactions. PLoS Comput. Biol. 11, 1–28 (2015).

Article Google Scholar

Fong, J. H., Keating, A. E. & Singh, M. Predicting specificity in bZIP coiled-coil protein interactions. Genome Biol. 5, R11 (2004).

Article PubMed PubMed Central Google Scholar

Grigoryan, G., Reinke, A. W. & Keating, A. E. Design of protein-interaction specificity affords selective bZIP- binding peptides. Nature 458, 859–864 (2009).

Article ADS CAS PubMed PubMed Central Google Scholar

Reinke, A. W., Baek, J., Ashenberg, O. & Keating Networks of bZIP protein-protein interactions diversified over a billion years of evolution. Science 340, 730–735 (2013).

Article ADS CAS PubMed PubMed Central Google Scholar

Pelletier, J. N., Campbell-Valois, F.-X. & Michnick, S. W. Oligomerization domain-directed reassembly of active dihydrofolate reductase from rationally designed fragments. PNAS 95, 12141–12146 (1998).

Article ADS CAS PubMed PubMed Central Google Scholar

Freschi, L., Torres-Quiroz, F., Dubé, A. K. & Landry, C. R. qPCA: a scalable assay to measure the perturbation of protein-protein interactions in living cells. Mol. Biosyst. 9, 36–43 (2013).

Article CAS PubMed Google Scholar

Levy, E. D., Kowarzyk, J. & Michnick, S. W. High-resolution mapping of protein concentration reveals principles of proteome architecture and adaptation. Cell Rep. 7, 1333–1340 (2014).

Article CAS PubMed Google Scholar

Diss, G. & Lehner, B. The genetic landscape of a physical interaction. eLife 7, 1–31 (2018).

Article Google Scholar

Faure, A. J. et al. Mapping the energetic and allosteric landscapes of protein binding domains. Nature 604, 175–183 (2022).

Article ADS CAS PubMed Google Scholar

Weng, C., Faure, A. J. & Lehner, B. The energetic and allosteric landscape for KRAS inhibition. bioRxiv 2022.12.06.519122 https://doi.org/10.1101/2022.12.06.519122 (2022).

Faure, A. J., Schmiedel, J. M., Baeza-Centurion, P. & Lehner, B. DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol. 21, 1–23 (2020).

Article Google Scholar

Oughtred, R. et al. The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 30, 187–200 (2021).

Article CAS PubMed Google Scholar

Tebo, A. G. & Gautier, A. A split fluorescent reporter with rapid and reversible complementation. Nat. Commun. 10, 2822 (2019).

Article ADS PubMed PubMed Central Google Scholar

Aronheim, A., Zandi, E., Hennemann, H., Elledge, S. J. & Karin, M. Isolation of an AP-1 repressor by a novel method for detecting protein-protein interactions. Mol. Cell. Biol. 17, 3094–3102 (1997).

Article CAS PubMed PubMed Central Google Scholar

Chen, S. et al. The emerging role of XBP1 in cancer. Biomed. Pharmacother. Biomed. Pharmacother. 127, 110069 (2020).

Article CAS PubMed Google Scholar

Patel, L. R., Curran, T. & Kerppola, T. K. Energy transfer analysis of Fos-Jun dimerization and DNA binding. Proc. Natl. Acad. Sci. USA 91, 7360–7364 (1994).

Article ADS CAS PubMed PubMed Central Google Scholar

Bentley, E. P., Scholl, D., Wright, P. E. & Deniz, A. A. Coupling of binding and differential subdomain folding of the intrinsically disordered transcription factor CREB. FEBS Lett. 597, 917–932 (2023).

Article CAS PubMed Google Scholar

Deng, T. & Karin, M. JunB differs from c-Jun in its DNA-binding and dimerization domains, and represses c-Jun by formation of inactive heterodimers. Genes Dev. 7, 479–490 (1993).

Article CAS PubMed Google Scholar

Halazonetis, T. D., Georgopoulos, K., Greenberg, M. E. & Leder, P. c-Jun dimerizes with itself and with c-Fos, forming complexes of different DNA binding affinities. Cell 55, 917–924 (1988).

Article CAS PubMed Google Scholar

Kawashima, S. et al. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36, D202–D205 (2008).

Article CAS PubMed Google Scholar

Qian, N. & Sejnowski, T. J. Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202, 865–884 (1988).

Article CAS PubMed Google Scholar

Roseman, M. A. Hydrophobicity of the peptide C=O…H-N hydrogen-bonded group. J. Mol. Biol. 201, 621–623 (1988).

Article CAS PubMed Google Scholar

Hutchens, J. O. in Handbook of Biochemistry B60–B61 (Chemical Rubber Co., 1970).

DeGrado, W. F. & Lear, J. D. Induction of peptide conformation at apolar water interfaces. 1. A study with model peptides of defined hydrophobic periodicity. J. Am. Chem. Soc. 107, 7684–7689 (1985).

Article CAS Google Scholar

Bastolla, U., Porto, M., Roman, H. E. & Vendruscolo, M. Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins 58, 22–30 (2005).

Article CAS PubMed Google Scholar

Bigelow, C. C. On the average hydrophobicity of proteins and the relation between it and protein structure. J. Theor. Biol. 16, 187–211 (1967).

Article ADS CAS PubMed Google Scholar

Nakashima, H. & Nishikawa, K. The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins. FEBS Lett. 303, 141–146 (1992).

Article CAS PubMed Google Scholar

Meirovitch, H., Rackovsky, S. & Scheraga, H. A. Empirical studies of hydrophobicity. 1. effect of protein size on the hydrophobic behavior of amino acids. Macromolecules 13, 1398–1405 (1980).

Article ADS CAS Google Scholar

Ponnuswamy, P. K., Prabhakaran, M. & Manavalan, P. Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochim. Biophys. Acta 623, 301–316 (1980).

Article CAS PubMed Google Scholar

Podgornaia, A. I. & Laub, M. T. Determinants of specificity in two-component signal transduction. Curr. Opin. Microbiol. 16, 156–162 (2013).

Article CAS PubMed Google Scholar

Schreiber, G. & Keating, A. E. Protein binding specificity versus promiscuity. Curr. Opin. Struct. Biol. 21, 50–61 (2011).

Article CAS PubMed Google Scholar

Lite, T. L. V. et al. Uncovering the basis of protein-protein interaction specificity with a combinatorially complete library. eLife 9, 1–57 (2020).

Article Google Scholar

McClune, C. J., Alvarez-Buylla, A., Voigt, C. A. & Laub, M. T. Engineering orthogonal signalling pathways reveals the sparse occupancy of sequence space. Nature 574, 702–706 (2019).

Article ADS CAS PubMed PubMed Central Google Scholar

Aakre, C. D. et al. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163, 594–606 (2015).

Article CAS PubMed PubMed Central Google Scholar

Ghosea, D. A., Przydziala, K. E., Mahoneya, E. M., Keating, A. E. & Laub, M. T. Marginal specificity in protein interactions constrains evolution of a paralogous family. PNAS 120, 1–10 (2023).

Google Scholar

Sikorski, R. S. & Hieter, P. A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19–27 (1989).

Article CAS PubMed PubMed Central Google Scholar

Martin, M. CUTADAPT removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

Article Google Scholar

Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014).

Article CAS PubMed Google Scholar

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

Article ADS CAS PubMed PubMed Central Google Scholar

Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv, (2022).

Kempf, G. & Cavadini, S. GUIFold - a graphical user interface for local AlphaFold2. bioRxiv https://doi.org/10.1101/2023.01.19.521406 (2023).

Diss, G. Code used for the analysis of data presented in this manuscript. https://doi.org/10.5281/zenodo.13324992 (2024).

Glover, J. N. & Harrison, S. C. Crystal structure of the heterodimeric bZIP transcription factor c-Fos-c-Jun bound to DNA. Nature 373, 257–261 (1995).

Article ADS CAS PubMed Google Scholar

Download references

This work was supported by the Novartis Research Foundation (all authors except A.J.F. and B.L.) and SNF Project grant 197593 (G.D., A.M.B.). Work in the lab of B.L. is funded by the European Research Council (ERC) Advanced grant (883742), the Spanish Ministry of Science and Innovation (LCF/PR/HR21/52410004, EMBL Partnership, Severo Ochoa Center of Excellence), the Bettencourt Schueller Foundation, the AXA Research Fund, Agencia de Gestio d’Ajuts Universitaris i de Recerca (AGAUR, 2017 SGR 1322), and the CERCA Program/Generalitat de Catalunya. A.J.F. was funded by a Ramón y Cajal fellowship (RYC2021-033375-I) financed by the Spanish Ministry of Science and Innovation (MCIN/AEI/10.13039/501100011033) and the European Union (NextGenerationEU/PRTR).

Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland

Alexandra M. Bendel, Dominique Klein, Kenji Shimada, Romane Lyautey, Nicole Schiffelholz, Georg Kempf, Simone Cavadini & Guillaume Diss

University of Basel, Basel, Switzerland

Alexandra M. Bendel & Romane Lyautey

Swiss Institute for Experimental Cancer Research, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Alexandra M. Bendel

Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain

Andre J. Faure & Ben Lehner

ALLOX, C/ Dr. Aiguader, 88, PRBB Building, Barcelona, Spain

Andre J. Faure

Universitat Pompeu Fabra (UPF), Barcelona, Spain

Ben Lehner

Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

Ben Lehner

Wellcome Sanger Institute, Hinxton, UK

Ben Lehner

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

G.D. and B.L. conceptualized the project. G.D. designed the experiments. R.L. and N.S. performed the splitFAST experiments. A.M.B. performed all other experimental work with help from D.K. and K.S.; G.D. analysed the data. A.J.F. fitted the thermodynamic model. G.K. and S.C. performed the AlphaFold2 structure predictions. G.D., B.L., and A.M.B. interpreted the results and wrote the manuscript.

Correspondence to Ben Lehner or Guillaume Diss.

A.J.F. and B.L. are founders and shareholders of ALLOX. The remaining authors declare no competing interests.

Nature Communications thanks Ulrich Stelzl, and the other, anonymous, reviewers for their contribution to the peer review of this work. A peer review file is available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

Bendel, A.M., Faure, A.J., Klein, D. et al. The genetic architecture of protein interaction affinity and specificity. Nat Commun 15, 8868 (2024). https://doi.org/10.1038/s41467-024-53195-4

Download citation

Received: 17 October 2023

Accepted: 04 October 2024

Published: 14 October 2024

DOI: https://doi.org/10.1038/s41467-024-53195-4

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative