Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor

Size: px

Start display at page:

Download "Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor"

Phillip O’Connor’
5 years ago
Views:

1 Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor Here we provide an example worked in detail from antibody sequence and unbound antigen structure to a docked model of the antibody antigen interaction. We use the antibody antigen complex from PDB ID 1JPS as an example. The structures of this antibody and antigen have been solved in the unbound state (1JPT and 1TFH, respectively); we use 1TFH as the starting antigen structure for docking to the homology modeled antibody. Technical Specifications The example below was run on a machine with the following specifications: Operating System: OS X El Capitan (version ) Processor: 2 x 2.4 GHz Quad-Core Intel Xeon Memory: 20 GB 1066 MHz DDR3 ECC Hard Drive: 1 TB Solid State SATA Drive Graphics: ATI Radeon HD MB Antibody Homology Modeling First, we export environment variables as necessary (in the example, the latest version of Rosetta has been downloaded to my $HOME, but in general Rosetta could be located anywhere in the file system): $ export ROSETTA=~/Rosetta $ export ROSETTA3_DB=$ROSETTA/main/database $ export ROSETTA_BIN=$ROSETTA/main/source/bin $ export PATH=$PATH:$ROSETTA_BIN We begin with an antibody sequence, which will be extracted from the PDB file (in this case: and modified to include heavy and light in the respective FASTA lines (bold indicates the alterations). N.B. internal newline/carriage return characters have been removed from the sequence. >heavy EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAE DTAVYYCARDTAAYFDYWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSG LYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHT >light DIQMTQSPSSLSASVGDRVTITCRASRDIKSYLNWYQQKPGKAPKVLIYYATSLAEGVPSRFSGSGSGTDYTLTISSLQPEDFATYYCL QHGESPWTFGQGTKVEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSST LTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC In this example, we save the file as antibody.fasta in our immediate working directory (for example, I set up a directory called antibody_example in my $HOME: mkdir ~/antibody_example; cd ~/antibody_example. Now, we can run the grafting protocol. Since 1JPT is included in our database, to avoid cheating in this testcase example, we add an extra homology exclusion option. For actual applications, do not exclude any homologs so that the protocol has the advantage of all known information on related sequences. $ antibody.macosclangrelease fasta antibody.fasta \ -exclude_homologs \ -exclude_homologs_cdr_cutoff 95 \ -exclude_homologs_fr_cutoff 90 tee grafting.log

By default, this process will produce 10 relaxed grafted models (relaxing minimizes the energy and alleviates clashes that were introduced during grafting), each with a different V L V H orientation,

On the left, the models are aligned to the heavy chain (blue) of the first model. On the right, the models are aligned to the light chain (red) of the first model.

$We can classify our model clusters by running: $ identify_cdr_clusters.macosclangrelease -s grafting/model-*.relaxed.pdb \ -out:file:score_only north_clusters.log The output is captured in Figure S2.$

2 By default, this process will produce 10 relaxed grafted models (relaxing minimizes the energy and alleviates clashes that were introduced during grafting), each with a different V L V H orientation, but the same CDRs and frameworks. Figure S1 shows all the relaxed models, aligned to chain H of model-0. Fig. S1: The ten grafted, relaxed antibody models with several V L V H orientations. On the left, the models are aligned to the heavy chain (blue) of the first model. On the right, the models are aligned to the light chain (red) of the first model. There are slight variations in the individual framework regions and CDRs due to the relax protocol. Next, we inspect the clusters of our CDR loops. We can classify our model clusters by running: $ identify_cdr_clusters.macosclangrelease -s grafting/model-*.relaxed.pdb \ -out:file:score_only north_clusters.log The output is captured in Figure S2. Fig. S2: Example output of the identify_cdr_clusters application. For each CDR, there are two columns: one that indicates the Dunbrack-North cluster and another that indicates the dihedral distance to the cluster median. We now must confirm that there are no obvious outliers or incorrectly chosen CDRs. Our CDR sequences are reported in grafting/report.json (under the Chothia numbering scheme) and reproduced below in Table S1 (the Dunbrack-North number scheme differs slightly, see Box X in the main text). CDR (length) Sequence Notes H1 (13) AASGFNIKEYYMH ~87% of sequences of this length are in cluster H H2 (10) LIDPEQGNTI L1 (11) RASRDIKSYLN The identity of residue 71 on the light chain can be used to specify the cluster: we have a tyrosine, which is indicative of cluster L * L2 (8) YYATSLAE ~95% of sequences of this length are in cluster L *

3 L3 (9) LQHGESPWT Proline in position 7 indicates L3-9-cis7-1 is most likely, as 93% of sequences with a proline in position 7 belong to this cluster. It appears that we can improve on our initial models by re-grafting two CDRs. BLAST does not correctly pick out the L1 cluster as it does not know (Rosetta only aligns individual segments) about the identity of residue 71. Also, the template CDR for L2 occupies an extremely unlikely cluster. So, we instead manually specify the L1 and L2 templates to be the median PDBs of clusters L and L2-8-1 (this will overwrite our previous grafting results): $ antibody.macosclangrelease fasta antibody.fasta \ -exclude_homologs \ -exclude_homologs_cdr_cutoff 95 \ -exclude_homologs_fr_cutoff 90 \ -antibody:l1_template 1zan \ -antibody:l2_template 1cr9 tee grafting.log Finally, we proceed to H3 modeling. This can be an extremely time consuming process, taking ~20 minutes to produce a single H3 model. In a full run, we would generate 2,800 models, taking about 933 hours. Utilizing 16 parallel processes on the machine describe here, this would take ~2.5 days. For the sake of this example, we only generate 10 models for the most likely orientation and 2 models for each subsequent orientation (the first command is run thrice, to spawn three processes): $ -s grafting/model-0.relaxed.pdb -nstruct 10 \ -antibody:auto_generate_kink_constraint \ -antibody:all_atom_mode_kink_constraint \ -out:file:scorefile H3_modeling_scores.fasc \ -multiple_processes_writing_to_one_directory \ -out:path:pdb H3_modeling > h3_modeling-0.log 2>&1 & $ for i in ; do \ -s grafting/model-$i.relaxed.pdb \ -nstruct 2 \ -antibody:auto_generate_kink_constraint \ -antibody:all_atom_mode_kink_constraint \ -out:file:scorefile H3_modeling_scores.fasc \ -multiple_processes_writing_to_one_directory \ -out:path:pdb H3_modeling > h3_modeling-$i.log 2>&1 &; done At this point, in a full run, we would check the V L V H orientations using the plot_lhoc.py script, but as our number of models is artificially low, this is unnecessary. Instead, for the ensemble of models for docking, we take the top ten lowest energy models. In a full run, the models can be selected more purposefully (See [Supplement] for details). Antibody Antigen Docking Once our models are selected, we compile a list: $ sort -nk2 H3_modeling_scores.fasc.bak head -n10 awk '{print $NF}' > antibody_ensemble.list Note that we need to specify the directory and the extension, so we manually edit antibody_ensemble.list in Vim using visual block (or in any preferred text editor) to look as in the example below. H3_modeling/model-9.relaxed_0002.pdb H3_modeling/model-0.relaxed_0003.pdb H3_modeling/model-6.relaxed_0002.pdb H3_modeling/model-0.relaxed_0006.pdb

4 H3_modeling/model-0.relaxed_0009.pdb H3_modeling/model-9.relaxed_0001.pdb H3_modeling/model-0.relaxed_0007.pdb H3_modeling/model-5.relaxed_0002.pdb H3_modeling/model-0.relaxed_0008.pdb H3_modeling/model-8.relaxed_0001.pdb Now that we have the antibody ensemble enumerated, we must prepare the antigen ensemble. In this example, we will use chain A of 1TFH: $ wget $ gunzip 1TFH.pdb.gz $ $ROSETTA/tools/protein_tools/scripts/clean_pdb.py 1TFH.pdb A. This set of commands will produce 1TFH_A.pdb and 1TFH_A.fasta as output. Before proceeding, we will need to relax the antigen crystal structure to generate an ensemble for docking. In this example, we will relax without constraints for speed, but typically crystal structures are relaxed with constraints. $ mkdir relax_antigen $ relax.macosclangrelease -s 1TFH_A.pdb \ -ex1 -ex2 -use_input_sc -flip_hnq \ -no_opth false -nstruct 10 \ -out:path:pdb relax_antigen \ -multiple_processes_writing_to_one_directory > relax.log 2>&1 & The second process can be launched several times to speed up the protocol. Once relax has finished running, we generate a list of antigen structures: $ ls relax_antigen/*.pdb > antigen_ensemble.list Now we must assemble a putative antibody antigen complex. In this case, we have a homologous complex (1JPS). We load one of our antibody models, one of the relaxed antigen structures, and the homologous complex in a single PyMOL session. Then, in PyMOL, we align the antibody and antigen to the known positions and save the resultant PDB: PyMOL>align model-0.relaxed, 1JPS PyMOL>align 1TFH_A_0001, 1JPS PyMOL>save ~/antibody_example/antibody_antigen_start.pdb, not 1JPS Once the putative complex is assembled, we must ensure the chain order in the PDB is light, heavy, then antigen. In this example, the heavy chain comes before the light chain, so we edit the file in Vim ($ vim antibody_antigen_start.pdb). Using visual mode, we select the heavy chain, cut, and paste it after the light chain but before the antigen (v 1754j 27l d 1668j o Esc p; space indicates separate commands). This process will have to be repeated for every member of the ensemble, such that chain order is matching. Now, we prepack the complex and ensemble structures. Prepacking rearranges the side chains of the docking partners in their monomeric states. This ensures fair sampling, preventing the initial complex orientation from biasing the side chain positions. docking_prepack_protocol.macosclangrelease -s antibody_antigen_start.pdb -ex1 -ex2 -partners LH_A -ensemble1 antibody_ensemble.list -ensemble2 antigen_ensemble.list -docking:dock_rtmin Finally, we can dock via a shell script (save the below text as run_dock.sh and then do $ bash run_dock.sh ):

5 for i in {1..10}; do snugdock.macosclangrelease -s antibody_antigen_start.prepack.pdb \ -ensemble1 antibody_ensemble.list \ -ensemble2 antigen_ensemble.list \ -detect_disulf false \ -antibody:auto_generate_kink_constraint \ -antibody:all_atom_mode_kink_constraint \ -nstruct 100 \ -multiple_processes_writing_to_one_directory > snugdock.log 2>&1 &; done Note: Errors at this point are most likely due to an incorrectly prepared ensemble. One common issue arises when some structures in an ensemble contain a disulfide bond, but others do not; this gives rise to an error when swapping the two members during ensemble docking. A simple fix, which will result in slightly higher scores, is to use the -detect_disulf false flag. This flag treats all cysteines as non-bonded. Now, we should have 100 models (1000 in a full run) of the antibody antigen complex in our working directory. From these, we seek to choose a representative model. In this artificial case, since we know the native complex structure, we could compare all models to 1JPS and select the most similar. However, we also need a method that works in blind cases. To compare to 1JPS, we can hijack the standard docking application s RMSD and energy calculations. One caveat is that all structures we compare must have the same sequence, and the same chains in the same order, so we modify 1JPS in PyMOL (using the remove atoms action to delete the extra light and heavy chain domains). Furthermore, we have to modify our models since our antigen template, 1TFH, contains two extra residues on a loop region (A80 and G81) than 1JPS. This modification can be done using a for loop and the grep utility. We saved this script as remove_extra_atoms.sh and ran it as bash remove_extra_atoms.sh : for pdbf in antibody_antigen_start.prepack_*.pdb; do grep -v "^ATOM.* ALA A 80.*\ ^ATOM.* GLY A 81.*" $pdbf > antibody_antigen_renum_${pdbf:31:4}.pdb done We can alternatively compare all decoys to the lowest energy model (for docking, we use interface energy rather than total energy). We can identify this model by using the sort command ($ sort -nk5 scoresnugdock.sf head -n1 awk '{print $NF}'). This manner of comparison is common when we do not know the complex structure. The command line is: docking_protocol.macosclangrelease -s antibody_antigen_renum_0* \ -native antibody_antigen_renum_0040.pdb \ -docking_local_refine \ -dock_min \ -out:file:score_only recalc_decoy_rmsd.sc \ -nstruct 1 The output is a score file (tab delimited) with alpha-carbon RMSDs calculated to the first model. We can plot the interface RMSD (column Irms in the score file) versus the binding energy (column I_sc ) using a plotting software of choice (Fig. S3). We then analyze the plot for funnels or clusters of structurally similar models (according to RMS) that populate local or global energy minima, assuming that low energies are indicative of native vs. non-native interactions.

6 Fig. S3: Docking funnel plots of our SnugDock simulation. From left to right, the first plot is generated by calculating distances for all models to an arbitrary model, the second plot is generated by calculating distances for all models to the lowest-interface-energy model, and the third compares distances for all models to the native structure. We can see two slight docking funnels in all three plots. One funnel contains native-like conformations whereas the other does not. It is not always evident how to select a model most representative of the native complex (though the easiest approach is to select the lowest energy model, which would be successful in this case). Also, depending on the downstream application, it may be incorrect to select a single model. Thus several factors should be questions considered when selecting a model or models: (1) Is the model physical valid? (Do not select models that are incongruent with biophysical reality) (2) Does the model agree with other results? (E.g. residues found to be important by mutagenesis are found at the interface or the binding mode is similar to homologous complexes) (3) What will the model be used for? (If we seek one representative structure we will choose a lowscoring structure that is most representative of the set of lowest-scoring structures, but if we seek many representative structures, then will chose a set of structurally distinct models to cover a larger portion of conformational space and hedge our bets ) Other analysis, such as structural clustering, which can be done with the clustering application ( may also be helpful in choosing models.