Learning Objectives : Understand the basic differences between genomic and cdna libraries Understand how genomic libraries are constructed Understand the purpose for having overlapping DNA fragments in genomic libraries and how they are generated Understand how cdna libraries are constructed and the use of reverse transcriptase for their construction Understand the rationale for library screening Understand the method of plaque hybridization Understand the four methods for library screening and when they are put into use
Molecular cloning in bacterial cells. This strategy can be applied to genomic DNA as well as cdna
Library construction two types of libraries a genomic library contains fragments of genomic DNA (genes) a cdna library contains DNA copies of cellular mrnas both types are usually cloned in bacteriophage vectors Construction of a genomic library Bam HI sites vector DNA (bacteriophage lambda) lambda has a linear doublestranded DNA genome the left and right arms are essential for the phage replication cycle the internal fragment is dispensable left arm right arm internal fragment (dispensable for phage growth)
Bam HI sites: NNG NNCCTAG GATCCNN GNN human genomic DNA (isolated from many cells) cut with Bam HI (6-base cutter) cut with Sau 3A (4-base cutter) which has ends compatible with Bam HI: left arm internal fragment remove internal fragment right arm NNN GATCNNN NNNCTAG isolate ~20 kb fragments NNN
left arm right arm combine and treat with DNA ligase left arm right arm 2 3 5 6 package into bacteriophage and infect E. coli 1 7 4 genomic library of human DNA fragments in which each phage contains a different human DNA sequence
Partial restriction enzyme digestion allows cloning of overlapping fragments isolation of ~20 kb fragments provides optimally sized DNAs for cloning in bacteriophage partial digestion with a frequent-cutter (4-base cutter) allows production of overlapping fragments, since not every site is cut overlapping fragments insures that all sequences in the genome are cloned overlapping fragments allows larger physical maps to be constructed as contiguous chromosomal regions (contigs) are put together from the sequence data number of clones needed to fully represent the human genome (3 X 10 9 bp) assuming ~20 kb fragments theoretical minimum = ~150,000 99% probability that every sequence is represented = ~800,000 a contig
All possible sites: Results of a partial digestion: = uncut = cut
Genomic Library making The partial digest is one of the most important steps. Why??? Due to the production of overlapping DNA fragments
The production of a cdna library
Construction of a cdna library reverse transcriptase makes a DNA copy of an RNA The life cycle of a retrovirus depends on reverse transcriptase retrovirus 2. the capsid is uncoated, releasing genomic RNA and reverse transcriptase 1. virus enters cell and looses envelope 3. reverse transcriptase makes a DNA copy new viruses 6. it is translated into viral proteins, and assembled into new virus particles 4. then copies the DNA strand to make it double-stranded DNA, removing the RNA with RNase H 5. the DNA is then integrated into the host cell genome where it is transcribed by host RNA polymerase II
cdna library construction AAAAA 5 3 mrna (all mrnas in cell) anneal oligo(dt) primers of 12-18 bases in length 5 AAAAA 3 3 TTTTT 5 add reverse transcriptase and dntps 5 3 5 3 AAAAA TTTTT 3 5 cdna add RNaseH (specific for the RNA strand of an RNA-DNA hybrid) and carry out a partial digestion AA TTTTT short RNA fragments serve as primers for second strand synthesis using DNA polymerase I
5 3 5 3 5 3 AAAAA TTTTT short RNA fragments serve as primers for second strand synthesis using DNA polymerase I AAA TTTTT DNA polymerase I removes the remaining RNA with its 5 to 3 exonuclease activity and continues synthesis DNA ligase seals the gaps AAA TTTTT 5 3 AAAAA TTTTT double-stranded cdna
5 3 5 3 AATTCNNNNNNNN GNNNNNNNN AAAAA TTTTT NNNNNNNNG EcoRI linkers are ligated to both ends NNNNNNNNCTTAA using DNA ligase AAAAANNNNNNNNG TTTTTNNNNNNNNCTTAA double-stranded cdna copies of mrna with EcoRI cohesive ends are now ready to ligate into a bacteriophage lambda vector cut with EcoRI
left arm EcoRI sites right arm cdnas combine cdnas with lambda arms and treat with DNA ligase left arm right arm 2 3 5 6 package into bacteriophage and infect E. coli 1 7 4 cdna library in which each phage contains a different human cdna
DNA libraries Genomic DNA libraries contains both introns and exons and promoters etc Usually made with 4 base cutters that cut frequently ( every 275 bases or so). The production of overlapping sequences is due to partial digestion. Libra sqry complexity is important to make sure that the sequence you are looking for is found in the DNA that has been sampled. N = ln (1-P) / ln (1-f) where N = number of clones, P = probability that the DNA fragment is found in your library and f = the frequency of the DNA in your library.
Genomic DNA complexity To screen for a clone in a library usually want a 99% probability that your clone is found there. Frequency is the size of the DNA fragment in the library/the size of the haploid genome. For a lambda library 17 kb (1.7 x 10 4 ) is the average size of library. The size of the genome is 3 x 10 9 bp F = 1.7 x 10 4 / 3 x 10 9 bp N = ln (1-.99) / ln (1- [1.7 x 10 4 / 3 x 10 9 ]) N = ln.01 / ln (1-0.56 x 10-5 ) N = -4.6061702 / -0.0000056 N = 822,351 clones
Genome equivalents How many genome equivalents are there in this library? How do you calculate this? 822,351 x 1.7 x 10 4 bps = 1.40 x 10 10 bps Divide by the genome size 3.0 x 10 9 bps = 4.67 times the genome equivalent How many positives will you get if you screen for a single copy gene?
Insertional mutagensis In all of the vectors that are currently used to date there is a system that can either identify or select for vectors containing clones. This is the backbone of recombinant DNA technology. Initial vectors involved the cloning into a antibiotic resistance gene making a bacteria containing a vector with a DNA fragment sensitive to the antibiotic. This is not the best situation, Why?
Insertional mutagenesis II The use of the beta-galactosidase gene for an insertional mutagenesis target allowed the screening of all clones for those that contained inserts by a simple blue white color assay. This gene cleaves X gal (chromagen) to give rise to a blue dye that colors the bacteria or phage plaque. This allows the screening those plasmids or phage particles that contain DNA disrupting the target gene.
Insertional mutagenesis III In addition suppressor trna genes can be used to identify YAC that contain an insert. The suppressor trna can suppress the effects of a Ade2 ochre mutation. This gives a white yeast colony. When the trna gene is disrupted the colonies are pink due to the accumulation of a precursor of Adenine. Pink colonies are what is desired. See Figure 4.16
Clones are usually characterized first by restriction digestion. This DNA fragment was digest with various enzymes giving rise to specific sizes. These can be used to generate a restriction map
Vectors for library construction Plasmid vectors Small circles of DNA that contain a selection marker like antibiotic resistance. Insertional mutagenesis target with a multi cloning site. A variety DNA replicons. Bacterial, Yeast. Maximum size of insert is about 10 kb.
Lambda and Cosmid vectors Bacteriophage lambda can be used as a cloning vector. It has a genome of about 50 kb of linear DNA. Its life cycle is condusive to the use as a cloning vector The lytic cycle can be supported by only a portion of the genes found in the lambda genome.
Lambda life cycle. The lytic life cycle produces phage particles immediately The lysogenic life cycle requires genes in the middle of the genome, which can be replaced
Lambda insertion and replacement vectors Only 37 to 52 kb DNA fragments can be packaged into the lambda head. This can be done in vitro.because the middle portion of the lambda genome can be replace if the lytic life cycle is used up to 23 kb DNA can be inserted in lambda genome. These are used for genomic DNA libraries. Insertion vectors can hold up to 7 kb of cdna.
Lambda genome
In vitro Packaging of ligated lambda DNA.
Cosmid vectors A cosmid is a hybrid between a lambda vector and a plasmid. The COS sites are the only thing that is necessary for lambda DNA packaging. Therefore if one can ligate COS sites about 50 kb apart then the ligation products can be in vitro packaged. Therefore cosmid vectors can contain 33 to 45 kb.
Cosmid vector ligation
Making a Genomic Library with Cosmids Partial digest Tet R 21.5 kb cos Ligation into site EcoRI EcoRI
Final Steps of a Genomic Library Package into heads and plug with tails Transduce E. coli receptor cell Select white colonies with tet R Check for plasmid Screen in your mutant for phenotype restoration
Things You Should Remember Some plasmids are used as vectors or cloning vehicles but they are limited to the amount of DNA that can be cloned. A cosmid is a plasmid that has at least one COS (cohesive end site). COS comes from a bacteriophage. A genomic library contains at least one copy of every gene in an organism.
Cloning large DNA fragments Due to the large size of the human genome and the fact that many genes are very large and some DNA fragments cannot be replicated in lambda other vector systems needed to be developed. Bacterial Artificial chromosomes (BAC) vectors These vectors are based on the E. coli F factor. These vectors are maintained at 1-2 copies per cell and can hold > 300 kb of insert DNA. Problems are low DNA yield from host cells. (due to low copy number when compared to 300 copies per cell with a plasmid vector like puc19.
Cloning large DNA fragments II Bacteriophage P1 These vectors are like lambda and can hold up to 110 to 115 kb of DNA. This DNA can then be packaged by the P1 phage protein coat. The use of T4 in vitro packaging systems can enable the recovery of 122 kb inserts. See Figure 4.15
Bacteriophage P1 vector system.
Cloning large DNA fragments III Yeast Artificial Chromosomes Many DNA fragments cannot be propagated in bacterial cells. Therefore yeast artificial chromosomes can be built with a few specific components. 1. Centromere 2. Telomere 3. Autonomously replicating sequence (ARS) Genomic DNA is ligated between two telomeres and the ligation products are transformed into yeast cells using the spheroplast method.
YAC cloning system
Cloning systems Vector systems that can be used to clone DNA
Plaque hybridization This is a general technique required for a number of specific approaches for isolating cdna or genomic clones Generally, one starts by 1). Isolating a cdna sequence from a cdna library, then 2). The gene from a genomic library using the cdna as a probe Information gained from cdna and genomic clones 1). cdna clones provide the amino acid sequence of the full-length protein, unencumbered by intron sequences 2). Genomic clones provide the control regions and are required for searching for mutations Library screening: four experimental approaches Starting with a protein 1). Synthetic oligonucleotide - plaque hybridization 2). Antibody - variation of plaque hybridization Starting with mrna 1). Differential cdna library screening - yet another variation 2). Expression screening - does not utilize plaque hybridization
Library screening plaque hybridization plate phage library on lawn of E. coli (bacteria >>> phage) plaques form as a consequence of a spreading lytic infection starting with a single phage-infected bacterial cell each phage plaque is a clone of identical recombinant phage prepare replica of phage plaques and hybridize DNA with probe E. coli lawn is grown on agar plate and then overlayered with the recombinant phage library. Wherever a single bacteriophage particle infects a bacteria cell, a plaque will form. This is a clear area caused by the lysis of bacteria on the lawn of E. coli. A replica of the agar plate is made on a nitrocellulose sheet - the DNA is denatured and adheres to the nitrocellulose. X-ray film The nitrocellulose is hybridized with a labeled DNA probe (such as an oligonucleotide) and the nitrocellulose is exposed to X-ray film. spot on film indicates a plaque containing DNA of interest
32 P label Labeled probe in solution e.g. an oligo probe Hybridization of probe to immobilized DNA Probe hybridized to immobilized DNA forming double-stranded region
1 X 2 X 2 X 2 X 3 X 2 X 2 X 2 = 192-fold degenerate How does one isolate a gene for an inherited disorder? Start with a candidate protein DNA protein If a protein candidate has been identified for a genetic disease it can be used to make a probe to screen for the gene 1. oligonucleotide probe purify the protein of interest partially sequence the protein find a region having amino acids with the fewest possible codons predict a DNA sequence that could represent a gene region encoding a portion of the protein synthesize a set of degenerate oligonucleotides for that region hybridize the labeled oligonucleotide to the phage library MET.GLU.PHE.TYR.ILE.CYS.GLN.LYS amino acids } AUG.GAA.UUU.UAU.AUU.UGU.CAA.AAA G C C C C G G all possible A oligonucleotides
2. antibody probe purify the protein of interest make an antibody to that protein construct cdna library to express recombinant proteins in E. coli use the antibody to detect the protein being made from the cloned cdna encoded by the recombinant phage in E. coli using plaque hybridization method modified for antibody probes bacterial promoter and Shine-Dalgarno sequence left arm right arm human cdna insert
How does one isolate a gene for an inherited disorder? Start with a candidate mrna DNA mrna mrna candidates can be identified by comparing mrna populations between normal and abnormal tissues, or by looking for a specific function encoded by the mrna 1. differential cdna library screening prepare duplicate plaque replica plates hybridize one with a labeled cdna probe made to all the mrnas in the normal cell and hybridize the other (duplicate) with the corresponding probe to the abnormal cell differences in cell function should be reflected by differences in the mrna populations any plaques showing differential hybridization are candidates hybridization with cdna probe differentially expressed clone no hybridization to this plaque hybridization with cdna probe from abnormal cells
2. expression screening develop a cell-based functional assay for the abnormality (e.g., a transport assay) construct cdna library in a way that will allow expression of protein in mammalian cells inject groups of cdna clones into cells and assay function narrow down cdna clones using smaller groups of clones until the function is observed with a single cdna species for example, inject clones and test cells for transport activity left arm inject groups of cdna clones if the function being assayed is observed, divide the group of clones into smaller groups and retest continue process of testing smaller groups until the function being assayed is obtained with one clone right arm mammalian promoter human cdna insert
- + - - - - - 10 4 + - - - - - - - + 10 2
Expression Cloning Certain vector systems can be used to produce specific products. The type of expression product RNA Riboprobes Protein product The type of environment In vitro cell free In vivo mammalian or prokaryotic cells Purpose of the expression system To produce large quantities of proteins for protein studies or antibody production.
cdna expression libraries The gene for a specific protein can be cloned from an expression cdna library if an antibody to the protein is available. A variety vectors can be used to produce fusion proteins which can be detected with Ab in question. See Figure 4.18
Expression for Ab detection
Expression in Eukaryotic cells Many proteins need specific modifications to work properly expression in bacterial cells is not sufficient Plasmid based Eukaryotic expression systems which work after transient transfection into mammalian cell lines have been produced. Viral based system are also popular.