Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR

Size: px
Start display at page:

Download "Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR"

Transcription

1 Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR 1

2 The problem We wish to clone a yet unknown gene from a known gene family. This could be done by constructing a DNA library of the target genome and a low-stringency hybridization of the library with a labeled probe made from a known member. This is laborious, time consuming and will not work for genes with low sequence similarity to the probe. PCR is a much faster method. However, we need to know the sequence of two nucleotides regions, length at least 20 bases, flanking the amplified regions. How can we know the sequence of regions in a yet unknown gene? 2

3 Degenerate PCR primers The flanking regions that are used in PCR amplifications are called primers. Primers can be designed from conserved sequence regions (motifs) of the gene family of the gene we wish to clone. Back-translating protein motifs will give us the DNA sequences that can code for them and could be used as primers. However, due to the degenerate nature of the genetic code and the typical partial conservation of most motifs positions, the resulting primers will also be degenerate. 3

4 Degenerate PCR primers Amino F Acids W G M S Codons TTT GGT ATG TCT TTC GGC TCC TGG GGA TCA GGG TCG AGT AGC Degenerate TTT GGT ATG TCT Primer GG G AGG C C C A A Degeneracy = 2*3*4*2*2*4 = 384 TTT F TCT S TAT Y TGT C TTC F TCC S TAC Y TGC C TTA L TCA S TAA * TGA * TTG L TCG S TAG * TGG W CTT L CCT P CAT H CGT R CTC L CCC P CAC H CGC R CTA L CCA P CAA Q CGA R CTG L CCG P CAG Q CGG R ATT I ACT T AAT N AGT S ATC I ACC T AAC N AGC S ATA I ACA T AAA K AGA R ATG M ACG T AAG K AGG R GTT V GCT A GAT D GGT G GTC V GCC A GAC D GGC G GTA V GCA A GAA E GGA G GTG V GCG A GAG E GGG G Universal genetic code

5 Amino Acids Codons Consensus Primer Consensus PCR primers F W G M S TTT GGT ATG TCT TTC GGC TCC TGG GGA TCA GGG TCG AGT AGC TTC GGA ATG AGT TTT F TCT S TAT Y TGT C TTC F TCC S TAC Y TGC C TTA L TCA S TAA * TGA * TTG L TCG S TAG * TGG W CTT L CCT P CAT H CGT R CTC L CCC P CAC H CGC R CTA L CCA P CAA Q CGA R CTG L CCG P CAG Q CGG R ATT I ACT T AAT N AGT S ATC I ACC T AAC N AGC S ATA I ACA T AAA K AGA R ATG M ACG T AAG K AGG R GTT V GCT A GAT D GGT G GTC V GCC A GAC D GGC G GTA V GCA A GAA E GGA G GTG V GCG A GAG E GGG G Universal genetic code 5

6 CODEHOP PCR primers A very successful strategy for PCR amplification of unknown genes with low sequence similarity to known sequences uses hybrid primers. These consist of a relatively short 3 degenerate core and a 5' nondegenerate consensus clamp. Reducing the length of the 3' core to a minimum decreases the total number of individual primers in the degenerate primer pool. Hybridization of the 3' degenerate core with the target template is stabilized by the 5' non-degenerate consensus clamp, which allows higher annealing temperatures without increasing the degeneracy of the pool. These are termed COnsensus-DEgenerate Hybrid Oligonucleotide Primers or CODEHOPs. CODEHOP Primer Consensus clamp 5 3 Degenerate core 6

7 CODEHOP PCR primers Some of the primers (1/degeneracy) will fully, or almost fully, match the template in their 3 degenerate core. The 5 consensus clamp will stabilize the annealing by a partial match. CODEHOP Primer Consensus clamp 5 3 Degenerate core 3 Template 5 During later rounds of the PCR reaction, when the primers amplify the products from previous rounds, all the primers will fully match the products in their 5 consensus clamp and will have some mismatches in their 3 degenerate core Product 7

8 The task Find the CODEHOP primers that can be designed from an input of protein multiple-sequence alignment, codon-usage table, maximal primer degeneracy, primer melting point temperature (Tm). 8

9 A set of blocks is given as input. A weight is provided for each sequence segment,which can be increased to favor the contribution of selected sequences in designing the primer. A codon usage table is chosen for the target genome. An amino acid position-specific scoring matrix (PSSM) is computed for each block. A consensus amino acid residue is selected for each position of the block as the highest scoring amino acid in the matrix. The aa consensus residues are backtranslated into the most common codons utilizing the userselected codon usage table. The resulting DNA sequence is used to make the 5' consensus clamp. A DNA PSSM is calculated from the amino acid matrix. The score for each amino acid is divided among its codons in proportion to their relative weights from the codon usage table, and the scores for each of the four different nucleotides are combined in each DNA matrix position. Nucleotide positions are treated independently when the scores are combined. The degeneracy is determined at each position of the DNA matrix based on the number of bases found there, optionally using a threshold. Possible degenerate core regions are identified by scanning the DNA matrix in the 3' to 5' direction. A core region must start on an invariant 3' nucleotide position, have length of 11 or 12 nucleotides ending on a codon boundary, and be no more than the required degeneracy. Candidate degenerate core regions are extended by addition of a 5' consensus clamp. The length of the clamp is controlled by the requested 9 Tm.

10 CODEHOP program scheme Protein sequence block Input seq seq seq seq seq etc. Transformation to aa PSSM aa PSSM Ala Cys Asp Glu Phe etc. Back-translation to DNA PSSM DNA PSSM A C G T Calculation of degeneracies Position degeneracy values ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Identify degenerate regions ("===") and add consensus regions for degenerate regions ("---") CODEHOP primers 5' ==== 3' Output 3' ==== ' Calculation of aa consensus sequence aa consensus sequence Transformation to DNA consensus sequence DNA consensus sequence

11 CODEHOP program example Block MTDM_CHICK E M L C G G P P C Q G MTDM_HUMAN E M L C G G P P C Q G MTDM_MOUSE E M L C G G P P C Q G MTDM_XENLA E M L C G G P P C Q G MTDM_PARLI E L L C G G P P C Q G MTDM_ARATH D F I N G G P P C Q G MTCH_CARAR Y S V C G G P P C Q G MTCH_ARATH Y T V C G G P P C Q G Consensus aa: Y M V C G G P P C Q G codon:t A C A T G G T T T G T G G A G G A C C T C C T T G T C A A G G A DNA A PSSM C G T A A A A A A A A A Degene- C C C C C C C C C C rate G G A G G G G A A C G G G G C A G primer T A T T T T T T T T G T G G T G G T C C T C C T T G T C A G G G T Degene racy 5 C A T G G T T T G T G G A G G A C C T 5 consensus clamp C C N T G Y C A R G G 3 3 Degenerate core - 11 degeneracy of 16-4*2*2

12 CODEHOP program degeneracy strictness Degeneracy strictness specifies how to count nucleotide(s) with low ccurrences. A nucleotide will be counted if the ratio of its frequency value over the highest (maximal) value in that position is more or equal to the strictness. Strictness can have values between 0 and 1. Strictness of 0 will cause all the nucleotides that actually appear in the position to be counted. Strictness of 1 means that only the nucleotides with the highest value in a position will be counted. Intermediate strictness values give behavior in between. Examples: base value ratio value ratio value ratio value ratio value ratio A C G T Strictness Degeneracy

13 Rose et al Nucleic Acids Research, 26:

14 More details, sources and things to do for next class Sources: Rose TM, Schultz ER, Henikoff JG, Pietrokovski S, McCallum CM & Henikoff S "Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly-related sequences" Nucleic Acids Research, 26: (1998). 14