Structure-Guided Deimmunization CMPS 3210

Size: px
Start display at page:

Download "Structure-Guided Deimmunization CMPS 3210"

Transcription

1 Structure-Guided Deimmunization CMPS 3210

2 Why Deimmunization? Protein, or biologic therapies are proving to be useful, but can be much more immunogenic than small molecules. Like a drug compound, a biologic therapy is a designed molecule. The therapy might be excellent in vivo, but in vitro the immune response to a protein therapy can be significant (e.g., anaphylactic shock). How can we design a protein to evade the immune response of the host?

3 CD4+ T-Cell Response

4 MHC Binding Grooves

5 Position-Specific Scoring Matrices NetMHC-II utilizes position-specific scoring matrices computed from the training data for each allele, and scores peptides against these. Kullback-Leibler Figure 2 logo visualizations of peptide binding motifs Kullback-Leibler logo visualizations of peptide binding motifs. The upper panel depicts the motif for the DRB1*0101 allele, and the lower panel the motif for the DRB1*1302 alleles. From left the different columns show the motif estimated by the SMM (NetMHCII), Gibbs sampler, and TEPITOPE methods, respectively. The height of a column in the logo is proportional to the relative information content in the sequence motif, and the letter height is proportional to the amino acid frequency [23] [Nielsen Et al., 2007] The desired motif can be calculated by doing an MSA on all training data, but this is typically intractable.! PSSM weights are estimated using Gibbs sampling, in which we sample MSA space and iteratively improve alignments according to an information gain criterion.

6 Hidden Markov Models PSSMs require the computation (or estimation) of a multiple-sequence alignment over the entire training set.! While PSSMs are the method of choice, local sequence relationships via hidden Markov models have also been used for MHC class II predictions. v x f 1 f 2 f 3 f 4 y Peptide sequence data AC-R HMMs model amino acid preferences for MHC binding by local preferences of amino acid type.! Once an HMM is trained, the maximum-likelihood sequence is the analog of a PSSM motif.! The binding affinity is proportional to the likelihood of any particular sequence w.r.t. the HMM. A C R K, _..._._ j --+ Transition probability I ( Symbol generation probability :...._...._._._.._..._. j FIG. 1. An example of an HMM model trained using five peptide sequences. Symbols: normal arrows, transition probability; open arrows, symbol generation probability. [Noguchi Et al., 2002]

7 Predicting Immunogenicity Immunodominance is generally equated with MHC class II binding affinity. A variety of supervised learning methods are used, and it has been shown that they can reasonably match experimental binding affinity assays. Side note: Strong MHC binding is necessary but not sufficient for a peptide to be immunodominant - can we reduce the false positive rate associated with MHC binding-based prediction?

8 Problem 1 (Structurally Guided Deimmunization) We are given a protein sequence A of n amino acids, along with a 3D structure that includes a backbone as well as a rotamer sequence R paralleling A. We are also given a set M of mutable positions (amino acid types allowed to vary), and a set F of flexible positions (side-chains allowed to vary) with M F; by default M = F = f1... ng. Finally, we are given a mutational load m. Let a design be an m-mutation sequence A 0, with mutations only at positions in M, along with a set R 0 of selected rotamers, differing from R only at positions in F. Our goal is to determine all Pareto optimal designs, minimizing the two objectives f (A 0 ) = Xn - 8 i = 1 f / (R 0 ) = X i2f based on the following contributions: (A 0 [i...i + 8]) (1) / i (R 0 [i]) + X i2f X j2f j6¼i / i j (R 0 [i] R 0 [j]) (2) : A 9! N gives the epitope score for a peptide (we assume a 9-mer; see below). / i : R!R gives the singleton energy capturing the internal energy of a rotamer at position i plus the energy between the rotamer and the backbone structure and side-chains at nonflexible positions. / i j : R R!R gives the pairwise energy between a pair of rotamers at a pair of positions i, j. A = f g R What is Pareto optimality?

9 Pareto Optimality The Pareto frontier describes solution choices which can t be changed without worsening one of the optimization criteria.! We want a Pareto-optimal redesign that it not antigenic. [Wikipedia]

10 F = X i r s i r / i (r) + X i j r t p i j r t / i j (r t) (3) We then constrain the epitope score according to the current sweep value: X w i X f (X)pE - 1 (4) i X To guarantee that the variable assignments yield a valid set of rotamers, we impose the following constraints: 8i : X r s i r = 1 (5) 8i r j > i : X t p i j r t = s i r (6) 8j t i < j : X r p i j r t = s j t (7) 8i r 8h 2 1::9 : X X:X[h] = a(r) w i X = s i + h - 1 r (8) Equation 5 ensures that only one rotamer is assigned to a given position. Equations 6 and 7 maintain consistency between singleton and pairwise variables, while Equation 8 maintains consistency between singleton and window variables. Finally, we enforce the desired mutational load: X i r:a[i]6¼a(r) s i r = m (9)

11 EpiSweep FIG. 1. EpiSweep. (A) An immune response to a therapeutic protein is initiated by MHC II (red/white) recognition of an immunogenic peptide epitope (blue) digested from the protein. Our goal is to mutate the protein so that no such recognition will occur. (B) MHC/T-cell epitopes are pervasive: exposed on the surface, buried in the core, and covering active sites. Shown is staphylokinase (PDB ID 2sak), with black backbone for no epitopes starting at the residue and highlighted sausage for the predicted binding by eight common MHC II alleles, ranging from thin pale yellow (bind just one) to thick bright red (bind all eight). Putative active sites are denoted with green arrows. (C) EpiSweep explores the Pareto frontier simultaneously optimizing structure (molecular mechanics energy function, y-axis) and immunogenicity (epitope score, x-axis). The left circled plan has fewer predicted epitopes than the right one, but the right one has better predicted energy. MHC II, type II major histocompatibility complex; PDB, Protein Data Bank.

12 FIG. 2. Pareto-optimal plans in the energy epitope landscapes for two SakSTAR peptides. In addition to the wildtype (magenta star for Beta, not shown for C3), the plots show three different mutational loads: 1, solid blue and diamond; 2, dash red and circle; 3, dot black and square. The tables detail the mutations from the wild-type. The numbers above the wild-type sequence are the epitope scores for 9-mers starting at those positions.

13 FIG. 3. Pareto-optimal plans in the energy epitope landscapes for two Epo peptides. See Figure 2 for description.