Chapter 4. The Genomic Biologist s Toolkit

Chapter 4. The Genomic Biologist s Toolkit Contents 4. Genomic Biologists tool kit 4.1. Restriction Endonucleases making sticky ends 4.2. Cloning Vectors 4.2.1. Simple Cloning Vectors 4.2.2. Expression Vectors 4.2.3. Shuttle Vectors 4.2.4. Phage Vectors 4.2.5. Artificial Chromosome Vectors 4.3. Methods for Sequence Amplification 4.3.1. Polymerase Chain Reaction 4.3.2. Cloning Recombinant DNA 4.3.3. Cloning DNA in Expression Vectors 4.3.4. Making Complementary DNA (cdna) 4.3.5. Cloning a cdna Library 4.4. Genomic Libraries 4.4.1. Cloning in YAC Vectors 4.4.2. Cloning in BAC Vectors

CONCEPTS OF GENOMIC BIOLOGY Page 4-1 CHAPTER 4. THE GENOMIC BIOLOGIST S TOOLKIT (RETURN) Genomic Biology has 3 important branches, i.e. Structural Genomics, Comparative genomics, and Functional genomics. The ultimate goal of these branches is, respectively; the sequencing of genes and genomes; the comparison of these sequenced genes and genomes, and an understanding of how genes and genomes work to produce the complex phenotypes of all organisms. A set of molecular genetic technologies was/is critical to our ability to pursue the goals described above. The Genomic Biologists Tool Kit is provides a brief understanding of these critical tools, and how they are used in the investigation of genomes. While the techniques are intrinsically laboratory tools, the nature of what they can do and how they work can be readily studied using bioinformatic resources. 4.1. RESTRICTION ENDONUCLEASES (RETURN) Restriction endonucleases (restriction enzymes) each recognize a specific DNA sequence (restriction site), and break a phosphodiester linkage between a 3 carbon and phosphate within that sequence. Restriction enzymes are used to create DNA fragments for cloning and to analyze positions of restriction sites in cloned or genomic DNA. A specific restriction enzyme digests cut DNA at the same sites in every molecule if allowed to cut to completion. Thus, this is a method whereby all copies of genomes or any other longer sequence can be reproducibly cut into identical fragments. The first three letters of the name of a restriction enzyme are derived from the genus and species of the organism from which it was isolated. Additional letters often denote the bacterial strain from which the restriction enzyme was isolated, and if multiple enzymes are isolated from the same strain, they are given Roman numerals. For example, the restriction enzyme EcoRI, is the first enzyme isolated from the RY13-strain of Escherichia coli. Bacteria produce restriction endonucleases to defend against bacteriophages (viruses), and each restriction

CONCEPTS OF GENOMIC BIOLOGY Page 4-2 Table 4.1. Characteristics of Some Restriction Enzymes

CONCEPTS OF GENOMIC BIOLOGY Page 4-3 enzyme recognizes a completely unique DNA sequence where it cuts the DNA strands (see Table 4.1 & Figure 4.1). The specific restriction enzyme recognition sites in the bacterial DNA are often limited in the genome of the organism from which it comes, but they are abundant in the genome of the bacteriophage. Also the DNA of the host cell can be modified by methylation, which prevents the restriction enzymes of the host cell Figure 4.1. Restriction site sequences and cut locations of: a) SmaI; b) BamHI, and c) PstI. from degrading host cell DNA, while invading bacteriophage DNA is unmethylated and readily degraded. Many restriction sites are sequences of 4, 6, or 8 base pairs in length and have identical sequences from 5 to 3 on each strand. These sequences are referred to as palindromic DNA sequences. Other restriction sites are not completely symmetrical and/or differ in length from 4, 6, or 8 nucleotide pairs (Table 4.1 & Figure 4.1). As shown in the figure on the left, the nature of the fragment ends produced when a restriction enzyme produces DNA fragments can vary. Some enzymes produce fragments where the two strands are equal in length. This is referred to as blunt ends. Other enzymes produce fragments where the two strands are unequal in length. These are referred to as either 5 sticky ends, or 3 sticky ends. Overhanging sticky ends provide a basis for combining DNA fragments produced by the same restriction enzyme from different DNA sources. This process was the original method used to produce recombinant DNA molecules. The application of restriction endonucleases to the cloning of DNA is further discussed in DNA Cloning video that can be viewed by clicking on the link. Note that part of this video will be discussed in detail in the next section of the Genomic Biologist s Toolkit, but the first part of the video is a good demonstration of how

CONCEPTS OF GENOMIC BIOLOGY Page 4-4 restriction enzymes work and how they can be used to create recombinant DNA molecules for cloning DNA. Note that we have previously discussed SNPs as a type of Sequence Tagged Site (STS). As single nucleotide changes in the genome sequence, consider the effect of an SNP that happens to occur in a restriction endonuclease recognition site. The result would be the loss of a restriction site at that SNP. This site would no longer be cut by the enzyme, and thus new fragments having different sizes would be produced. This is called Restriction Fragment Length Polymorphism (RFLP). Thus, and RFLP is an SNP that happens to occur in a restriction site in the DNA. A famous RFLP is associated with Sickle Cell Disease, and is further described in the accompanying video. Figure 4.2. Using restriction enzyme, EcoRI to make recombin-ant DNA. The procedure relies on the 3 -overhanging sticky ends. An additional application of restriction enzymes involves the production of a res-triction map. A restriction map is shows the relative position of restriction sites for multiple restriction enzymes in a piece of linear or circular DNA. Prior to the availability of genomic sequences, restriction mapping was an important tool used to characterize cloned DNA fragments. The production of a restriction map for a circular DNA is shown in the Restriction Mapping video. 4.2. CLONING VECTORS (RETURN) The process of DNA cloning involves a set of experimental methods in molecular biology that are used to assemble recombinant DNA molecules and to direct their replication within host organisms. The use of the word cloning refers to the fact that the method involves the replication of one molecule to produce a population of cells with identical DNA molecules. Molecular cloning generally uses DNA sequences from two different organisms: 1) the organism that is the source of the DNA to be cloned, and 2) the organism that will serve as the living host for replication of the

CONCEPTS OF GENOMIC BIOLOGY Page 4-5 recombinant DNA. Molecular cloning methods are central to many areas of biology, biotechnology, and medicine, including DNA sequencing. The DNA from host organism in a cloning experiment, often called a vector, typically has 3 things: 1) Sequences necessary to produce recombinant DNA and facilitate entry into the host organism. Typically, this can be one or more unique restriction sites. Unique in this context means that these are restriction sites will permit cutting the vector at only one location. Most vectors contain unique restriction sites for a number of different restriction enzymes. This is called a polylinker or multiple cloning site, and can make the use of the vector much easier. 2) An origin of replication for the host organism to facilitate replication of the recombinant DNA in the host cell. Typically this sequence controls the number of copies of the vector that can be made in one cell. 3) In order to facilitate identification of cells that contain the vector containing recombinant DNA, a gene that can be expressed in the host and that provides a selectable marker for the presence of recombinant DNA is provided. Often the selectable marker gene will be a gene that makes cells resistant to a specific antibiotic or that permits cells to make an amino acid required for growth. These are the basic requirements that all modern cloning vectors contain, but beyond these basic requirements, there can be a number of additional features that make specific vectors useful for various purposes. Thus, several types of cloning vectors have been constructed, each with different molecular properties and cloning capacities. 4.2.1. Simple Cloning Vectors (RETURN) The most common vectors are used to clone recombinant DNA in bacterial cells, typically E. coli. Simple cloning vectors are constructed from plasmids common in many bacterial cells. In fact plasmids are circles of dsdna (double stranded) much smaller than the bacterial chromosome that include replication origins (ori sequence) needed for replication in bacterial cells that naturally carry DNA between different bacteria. An example of a typical E. coli cloning vector is puc19 (2,686bp). The more modern version of puc19 is pbluescript II. The features of this plasmid are shown in Figrue 4.2. More information about cloning DNA in plasmid vectors can be found in Molecular Cell Biology, 4 th edition, Section 7.1. This can be downloaded from NCBI by clicking on the link. The use of simple cloning vectors

CONCEPTS OF GENOMIC BIOLOGY Page 4-6 Figure 4.3. The features of puc19 and pbluescrip II include: 1) High copy number in E. coli, with nearly 100 copies per cell, provides a good yield of cloned DNA. 2) Its selectable marker is ampr. 3) It has a cluster of unique restriction sites, called the polylinker (multiple cloning site). 4) The polylinker is part of the lacz (b-galacto-sidase) gene. The plasmid will complement a lacz - mutation, allowing it to become lacz +. When DNA is cloned into the polylinker, lacz is disrupted, preventing complementation of the lacz - from occurring. 5) X-gal, a chromogenic analog of lactose that turns blue when -galactosidase is present, and remains white in the absence of -galactosidase, so bluewhite screening can indicate which colonies contain recombinant plasmids. to clone recombinant DNA made via the use of DNA restriction and overhanging sticky ends can be seen in the attached Steps in DNA Cloning video. The use of simple cloning to obtain a collection of clones representing all sequences that can be cut from a longer piece of DNA is called creating a clone library (see video) of sequences. Libraries can be useful in several ways. One of these might be to create a expression library that makes specific proteins from each clone. This requires an expression vector. 4.2.2. Expression Vectors (RETURN) Expression vectors contain all of the same elements that simple cloning vectors contain, i.e. an ori, a selectable marker, and a multiple cloning site; but the

CONCEPTS OF GENOMIC BIOLOGY Page 4-7 MCS is flanked by a promoter sequence, and a terminator sequence that works in the host organism. This permits the cloned sequence to be transcribed, and if the vector contains a Shine-Delgarno sequence (not shown in Figure 4.4.), to be translated into a protein if there is an start and stop code word in the sequence. Note that Figure 4.4. illustrates how the cloned sequence can insert randomly in two orientations. However, only one of the orientations will produce a translatable mrna. The other orientation will produce an apparent RNA that will be the complementary strand of the mrna (called an antisense RNA). In section 4.5. dealing with this issue will be considered. t 4.2.3. Shuttle Vectors (RETURN) A cloning vector capable of replicating in two or more types of organism (e.g., E. coli and yeast) is called Figure 4.4. An example of a simple expression vector. Figure 4.5. Shuttle vectors like prs426 can be used to move cloned DNA into 2 different organisms. In this case, the plasmid moves into E. coli and Yeast. Note that the vector contains an origin of replication for yeast (yeast 2 u ARS) and E. coli (ori), a selectable marker gene for E. coli (amp r ) and yeast (Ura3, does not require Uracil for growth as does the yeast strain used), and a multiple cloning site with a yeast promoter and terminator on either side. Thus, this shuttle vector can work in both E. coli and yeast.

CONCEPTS OF GENOMIC BIOLOGY Page 4-8 a shuttle vector. Shuttle vectors may replicate autonomously in both hosts, or integrate into the host genome. 4.2.4. Phage Vectors (RETURN) Beside plasmid-based simple cloning vectors, there are a number of other vectors that are not based on Figure 4.6. Phage λ Vector. plasmids. These often have specific uses that take advantage of their unique properties. Among the types of non-plasmid vectors, bacteriophage λ vectors (shown in Figure 4.6) are among the most frequently used. Phage λ vectors can be used to make expression libraries and to convenient for selection of clones as the bacteriophage lyses cells releasing the contents to the cell to the medium. Thus RNAs and proteins derived from the inserted fragment can be investigated using these vectors. 4.2.5. Artificial Chromosome Vectors (RETURN) The typical simple cloning vector will accommodate DNA fragments up to about 3,000 bp in length. However, there are needs to clone significantly longer fragments of DNA for study. Typically DNA genomic sequencing is easiest with the longest fragments possible. Two vector systems, i.e. BAC vectors (bacterial artificial chromosome) and YAC vectors (yeast artificial chromosome), are useful choices for cloning DNA fragments. In BACs fragments up about 350 kbp (350,000 bp) can be cloned while in YACs fragments up 1,000,000 bp have been reported. Both of these methods have been used in the original human genome sequencing project. However, it was found that YACs are relatively unstable, meaning that they frequently self-modified loosing DNA in the process, and thus, they

CONCEPTS OF GENOMIC BIOLOGY Page 4-9 did not have the stability shown by BACs. Consequently, BACs have emerged as the large cloning vector of choice. 4.3. METHODS OF SEQUENCE AMPLIFICAION (RETURN) With our discussion of restriction endonucleases and cloning vectors completed. We are now ready to put these concepts together and show how specific DNA sequences can be amplified to provide specific DNA sequences for genetic and genomic studies. Figure 4.7. Artificial Chromosome vectors. a) Shows a bacterial artificial chromosome (BAC) that has a selectable marker (chloramphenicol resistance), and a MCS. However, the ori sequence is replaced by a single copy F factor origin of replication. b) Shows a yeast artificial chromosome, including selectable markers (TRP1 and URA3), a yeast origin of replication (ARS), and centromere and telomere chromosome parts. This vector will replicate in yeast cells. 4.3.1. Polymerase Chain Reaction (PCR) (RETURN) Polymerase Chain Reaction or PCR is a method by which DNA polymerase can be used to make many copies of a DNA sequence in a test tube. The technique is a valuable supplement to DNA cloning to generate specific DNA sequences for use as reagents. A description of the PCR process is given in the Polymerase Chain Reaction video. Click the link to view this video. Some additional things to note are that the reaction temperature is changed using a device called a thermal cycler that can rapidly change temperatures during each cycle. The reaction mixture must have all necessary components for a PCR reaction including a thermostable DNA polymerase like the TAQ DNA polymerase mentioned in the video. Such DNA

CONCEPTS OF GENOMIC BIOLOGY Page 4-10 polymerases are obtained from organisms called extremophiles that grow in very hot water like that found in geysers (e.g. Old Faithful in Yellowstone National park) or thermal vents on the floor of the ocean. The reaction also contains the deoxyntp (deoxy nucleotide triphosphates, e.g. datp, dgtp, dctp, & dttp), and the primers which define each end of the sequence to be amplified. DNA sequences amplified via PCR typically contain an extra A on the 3 -end the molecule, i.e. a single overhanging 3 -A that makes ligation of the PCR amplified fragment into a PCR cloning vector much easier (see Figure 4.6). pgem-t Easy PCR Vector (3015 bp) pgem-t Easy PCR Vector DNA Ligase (3015 bp) + PCR Amplified DNA (1191 bp) pgem-teasy+ PCR Amplified DNA (4206 bp) Figure 4.8. PCR Cloning vectors. Note that the vector comes linearized with overhanging 3 -T s. PCR products typically have single overhanging A s at their 3 -ends. This provides a convenient way of making a circular plasmid with the inserted PCR product. 4.3.2. Cloning in a Simple Cloning Vector (RETURN) DNA cloning is the for a number of genomic biology experiments. Large amounts of DNA are needed for analysis, sequencing, and numerous experimental approaches. As we saw above multiple copies of a known DNA sequence can be made and cloned using PCR and a PCR vector. However, an alternative is necessary when the sequence to be cloned is unknown (i.e. PCR primers cannot be determined). To introduce this principle we will outline the steps to clone a DNA fragment of unknown sequence in a simple cloning vector. To get multiple copies of a gene or other piece of DNA you must isolate, or cut, the DNA from its source using restriction enzymes, and then paste it into a simple cloning vector that can be amplified in a host cell, typically E. coli. The four main steps in PCR DNA cloning are: Step 1. DNA is purified from the donor cells using a standard DNA purification technique. Step 2. A chosen fragment of DNA is cut from the purified genomic DNA of the source organism using a restriction enzyme. Recont

CONCEPTS OF GENOMIC BIOLOGY Page 4-11 Step 3. The piece of DNA is pasted into a vector and the ends of the DNA are joined with the vector DNA by DNA ligase (joins Okazaki fragments) in the DNA Figure 4.9. Insertion of restricted DNA into a simple cloning vector. replication section. Step 4. The vector is introduced into a host cell, often a bacterium, by a process called bacterial transformation. The transformed host cells copy the vector DNA + recombinant DNA along with their own DNA, creating multiple copies of the inserted DNA. DNA that has been cut and pasted from an organism into a vector is called recombinant DNA. Because of this, DNA cloning is also called recombinant DNA technology. Step 5. The vector DNA is isolated (or separated) from the host cells DNA and purified. 4.3.3. Cloning DNA in Expression Vectors (RETURN) In section 4.2., we discussed expression vectors, and showed that when a restricted DNA sequence is cloned Figure 4.10. Using PCR to obtain only the forward orientation of a sequence in an expression vector. Primers are designed with a restriction site added such that they anneal at each end of the fragment of interest. Following PCR an amplified fragment will be produced with a KpnI site at the 5 end of the intended coding sequence and a SalI site at the 3 end. The expression vector is then opened by cutting with both KpnI and SalI. Since the KpnI site is closer to the promoter in the expression vector s MCS, while the SalI site is closer to the terminator. This construct will go into the vector in the sense orientation so that a message is produced that makes the protein of interest rather than its antisense equivalent.

CONCEPTS OF GENOMIC BIOLOGY Page 4-12 in an expression vector, it can be ligated into the vector in both a forward or a reverse configuration (Figure 4.4). In the forward configuration the fragment is positioned so that it makes an mrna that codes for a protein, while in the reverse configuration, the DNA fragment does not make an mrna, but makes an RNA from the opposite strand called an antisense RNA. It is possible using a PCR strategy to insert a DNA fragment into an expression vector such that it can only insert in the forward orientation. This strategy is shown in Figure 4.10. 4.3.4 Making complementary DNA (cdna) (RETURN) A double stranded DNA copy of an mrna is called a cdna. Making cdna is a way to convert a relatively labile single-stranded RNA into a relatively stable double-stranded DNA. It is possible to make a DNA copy of an RNA by employing an enzyme involved in replication of certain viruses called reverse transcriptase. The other aspect of Eukaryotic mrnas that makes producing cdnas relatively facile is the polya tail as we will see below. cdnas can be made in several ways, but the method described here is a traditional method. Step 1. Total RNA is extracted from cells using a standard technique for the organism in question. Step 2. An oligo-dt primer is hybridized with the polya tail of a Eukaryotic mrna. Then an enzyme called reverse transcriptase (makes a DNA strand from an RNA strand) is used to make a first-strand DNA copy of the mrna strand. Figure 4.11. The process for making cdna in a simple cloning vector.

CONCEPTS OF GENOMIC BIOLOGY Page 4-13 Step 3. The RNA is then partially degraded with RNase H, and RNA fragments are randomly annealed to the newly made DNA strand. These RNA fragments act is primers for DNA polymerase I. Step 4. DNA polymerase I is then used to make a complementary DNA strand, and replace the RNA primers with DNA nucletoides. Step 5. All pieces are then ligated together using DNA ligase. Completing the synthesis of a double stranded DNA copy of the mrna. At completion of the procedure above you will have prepared a cdna copy of each mrna that was present in the cells from which you extracted the RNA. If there were 10,000,000 polya tails on 10,000,000 mrnas you should make 10,000,000 cdnas. In other words if there were 10,000 mrnas in the preparation that coded for a given protein like myosin, but only 500 mrnas coding for hexokinase and 10 mrnas for tyrosyl-trna synthetase, you might expect that your cdna library of sequences obtained from the cells you used would have 10,000, 500, and 10 cdnas for the 3 proteins respectively. The frequency of occurrence of each mrna is represented by the frequency of cdnas in the cdna library obtained from a given set of cells. Thus, information about the frequency of occurrence of mrnas in cells can be obtained from analysis of such a cdna library. A similar cdna library from different cells (e.g. different tissues, or cells treated with a drug, or grown in a different environment, etc.) will show different levels of each cdna present based on the mrnas found in a tissue. The frequency of mrnas found in a tissue is considered information about the expression of a gene. Gene expression information relates directly to the function of transcription machinery in cells, and is critical functional genomic information, as we will see in a subsequent section of the book. In order to store and subsequently utilize a cdna library it is useful to produce a clone of each sequence in the library. Typically this involved putting the cdnas into vectors, and putting the vectors into host cells, typically E. coli such that each cell gets a single cdna which is amplified in that cell and all it s clones. 4.3.5. Cloning a cdna Library (RETURN) A cdna clone library is a useful tool to identify specific mrnas found in a tissue and to obtain the sequences of identified genes. To do this a cdna clone library (i.e. to clone all cdnas into a vector, and put one vector containing an individual cdna in each cell) can be created. These cells can be screened to determine which clones express genes of interest.

CONCEPTS OF GENOMIC BIOLOGY Page 4-14 Various types of vectors can be used to create a cdna clone library. These include phage expression vectors, plasmid expression vectors, or shuttle vectors depending on the intended use of the clone library. We will look at a protocol for incorporation of cdna into a plasmid expression vector, using a simple strategy. Note that kits are now available that provide everything you require and outline specific strategies for most types of vectors should you ever need to accomplish this task. Step 1. Prepare a cdna library as outlined in section 4.3.4. Step 2. Manipulating the cdnas so that each one has a unique (not contained in any cdna) restriction site at both ends. To do this, the cdnas are frequently methylated with a specific methyl transferase that incorporates a methyl group into particular restriction sites to protect them from the restriction enzyme that will be used later. Step 3. A synthetic double stranded oligonucleotide linker is then ligated to the ends of this cdna. The linker should correspond to a restriction site in the MCS of the vector to be used. Blunt end ligation is generally a low efficiency process; but, by using a high concentration of these synthetic oligonucleotides, it is possible to drive the reaction to near completion. Step 4. Digest the cdnas with internal sites protected and linkers attached with the restriction enzyme to generate the appropriate overhanging sticky ends). Step 3 Step 4 Step 5 Figure 4.12. Procedure of inserting a cdna into a cloning vector involving ligation of linkers on the ends of the cdna.

CONCEPTS OF GENOMIC BIOLOGY Page 4-15 Step 5. Mix the digested cdnas with the predigested vector, and add DNA ligase to ligate to make cdna recombinant vectors Step 6. Transform the recombinant vectors into host cells, and grow up clones. Once the cdna clone library has been constructed, a number of strategies can be used to select a specific clone that contains a gene of interest. Figure 4.11 demonstrates how this could be done if antibodies against the protein of interest are available. Figure 4.12. shos a strategy for identifying a specific clone by complementation of a yeast mutant. Note that for this technique the cdna library was constructed in a yeast shuttle vector. Because cdnas are the exons of the gene (parts that code for proteins) a cdna clone library can be expressed in either Prokaryotic or Eukaryotic cells. However, there are sometimes (but relatively infrequently) complex issues that keep Eukaryotic cdnas from expressing functional proteins in Prokaryotic cells. When this occurs the shuttle vector approach is necessary to get a functional protein produced in the library. cdna libraries have many uses, but comparisons of cdna sequences with sequences of corresponding genes is one way of demonstrating the positions of introns and exons in the genomic sequence (see Figure 4.15. By Figure 4.13. Finding a specific cdna clone using an expression library. Following transformation of cells with the cdna expression library, transformants with inserts (white colonies) are selected, replated, and screened with antibodies against the protein of interest. Colonies producing antigenic proteins are then tested for the presence of the protein of interest and the cdna insert in that clone is characterized. sequencing clones from a cdna library, so called expressed sequence tags (ESTs) are determined. The sequences of ESTs were critical to understanding

CONCEPTS OF GENOMIC BIOLOGY Page 4-16 DNA (Gene) Figure 4.14. Strategy for identifying cdna clones for a gene of interest (ARG1) using cdnas (high MW DNA from (ARG1)yeast strain. Note the cdnas need to be inserted into a yeast shuttle vector such that the ARG1 gene will be propperly expressed and complement the arg1 mutant in the yeast strain used. functional components of genomes as they were being sequenced. Primary RNA Transcript mrna (cdna) Figure 4.15. Primary RNA Transcript 4.4. GENOMIC LIBRARIES (RETURN) A genomic clone library or Genomic Library is a set of cloned sequences made by cloning the entire genome of an organism or organelle. One of several ways this can be done by cutting the genomic DNA with one or more restriction enzymes, and ligating the pieces into a simple cloning vector as shown in Figure 4.9. A limitation of simple cloning vectors is the size of DNA that can be introduced into the cell by transformation. This presents problems when you are trying to create a Genomic Library of a large genome such as that of most Eukaryotes. Remember that a genomic library contains all of the DNA found in the cells of the organism. If you digest

CONCEPTS OF GENOMIC BIOLOGY Page 4-17 organismal DNA to completion with a restriction enzyme, ligate those fragments into a plasmid vector and transform bacterial cells, only a portion of those fragments will be represented in the final transformation products. If a gene of interest is larger that the clonalbe fragment length, then you will not be able to isolate that gene in tact from a plasmid library. But what can be done to increase the probability of obtaining a clone that contains the entire gene. First you need to use a vector that can accept large fragments of DNA. Examples of these are bacteriophage and cosmid vectors, and the relatively popular yeast artificial chromosome (YAC) vectors (see Figure 4.7b) and the bacterial artificial chromosome (BAC) vetors (see Figure 4.7a). While longer fragments of genomic DNA can be cloned in YAC vectors, these are less stable than the BAC vectors, making BACs the vectors most frequently used for genomic cloning. 4.4.1. Cloning in YAC Vectors (RETURN) A goal of genomic sequencing is to obtain physical data about the genomic organization of DNA in a genome. Traditionally, this data has been obtained by a technique called chromosome walking. Walking can performed by subcloning the ends of DNA inserted in a phage λ vector or cosmid vector and screening a library for new clones that contain the end-sequences previously obtained. If this new clone overlaps a portion of the original clone, then the length of the DNA of interest is extended by the length of DNA in the second clone that is not found in the original clone. By performing these steps successive times, a long distance map can be obtained. To claify this concept, please view the Chromosome Walking short video. This technique though has difficulties. First, each step is technically slow. Second, if you use phage λ or cosmid clones, you might only extend the region of interest by 5-10 kb in each step of the walk. Finally, if any of the clones that are obtained contain repeated sequences, the subclone could lead you to another region of the genome that is not contiguous with the region of interest. This is because Eukaryotic genomes have so called repeated sequence DNA interspersed throughout their genomes. Yeast artificial chromosomes can alleviate some of these problems because of the large (100-1000kb) amount of DNA that can be cloned. Howver, YACs cannot speed up each step of the walk because the subcloning and screening steps cannot be accellerated. But YACs can easily extend the region of interest by 50-100 kb and up to as much as 500 kb per walking cycle. Thus a long distance map of the region can be obtained in several steps. Secondly, although repetitive regions

CONCEPTS OF GENOMIC BIOLOGY Page 4-18 may be 10-20 kb in length they are rarely, longer than 50 kb. Thus a YAC with 100kb will contain some region that is single copy which can be used for further steps in the walk. While YACs allow the cloning of the largest fragments possible, their relative stability has allowed the more stable BACs, which bear shorter recombinant fragments, to become the vector of choice for chromosome walking and subsequent sequencing. 4.4.2. Cloning in BAC Vectors (RETURN) During the Human Genome Project, researchers had to find a way to reduce the entire human genome into chunks, as it was too large to be sequenced in one go. To do this they created a store of DNA fragments called a BAC library, specifically a human genome BAC library. BAC stands for Bacterial Artificial Chromosome. These are small pieces of bacterial DNA that can be identified and copied within a bacterial cell and act as a vector, to artificially carry recombinant DNA into the cell of a bacterium, such as Escherichia coli. In general BAC clones carry inserts of DNA up to 300,000 bp in length. The bacteria are then grown to produce colonies that contain the same fragment of DNA in each cell of the colony. This is a BAC clone library. Individual BAC clone colonies can be stored until needed. Making a BAC library To make a genomic Bacterial Artificial Chromosome (BAC) library: Step 1. Isolate the cells containing the DNA you want to store. For animals BAC libraries come from white blood cells. Step 2. These isolated cells are then mixed with warm agarose, a jelly-like substance. The whole mixture is then poured into a mold and allowed to cool to produce a set of small blocks, each containing thousands of the isolated cells. Step 3. The cells are then treated with enzymes to dissolve their cell membranes and release the DNA into the agarose gel. A restriction endonuclease is used to chop the DNA into pieces around 200,000 base pairs in length (partial digestion versus complete digestion producing smaller fragments). Step 4. These blocks of gel containing chopped up DNA are then inserted into holes in a slab of agarose gel. The DNA fragments are then separated according to size by electrophoresis.

CONCEPTS OF GENOMIC BIOLOGY Page 4-19 and inserted into a BAC vector using DNA ligase to join the two bits of DNA together. This produces a set of BAC clones. Step 6. The BAC clones are added to bacterial cells, usually E. coli, and the bacteria are then spread on nutrient rich plates that allow only the bacteria that carry BAC clones to grow. The bacteria grow rapidly, resulting in lots of bacterial cells, each containing a copy of a separate BAC clone. Step 7. After they have grown, the bacteria are then picked into plates of 96 or 384 so that each tube contains a single BAC clone. The bacteria can also be copied or frozen and kept until researchers are ready to use the DNA for sequencing. A BAC library has been created. 4.5. DNA SEQUENCING (RETURN) Figure 4.16. BAC Vector. Contains blue/white screening capability. Genomic DNA fragments up to 300,000 bp can be ligated into the MCS of the vector which also contains a selectable marker and an F single copy origin of replication. Step 5. Fragments of a particular size class (200,000 to 300,000 bp) selected, removed from the agarose gel