Using Phylogenetic Trees for Disease Diagnosis. submitted in partial fulfillment of the requirements for the degree of

Size: px
Start display at page:

Download "Using Phylogenetic Trees for Disease Diagnosis. submitted in partial fulfillment of the requirements for the degree of"

Transcription

1 Using Phylogenetic Trees for Disease Diagnosis Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Shamsudduha Tabish M Sabir Danish Roll No: under the guidance of Mr. Satish S Kumbhar College of Engineering, Pune DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATION TECHNOLOGY, COLLEGE OF ENGINEERING, PUNE June, 2013

2 DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATION TECHNOLOGY, COLLEGE OF ENGINEERING, PUNE CERTIFICATE This is to certify that the dissertation titled Using Phylogenetic Trees for Disease Diagnosis has been successfully completed By Shamsudduha Tabish M Sabir Danish ( ) and is approved for the degree of Master of Technology in Computer Engineering. Date: June Place:Pune Prof. Satish S. Kumbhar Department of Computer Engg. and Information Technology, College of Engineering Pune, Shivajinagar, Pune

3 Dedicated to my Mother Smt.Mudassir Danish and my father Shri. M. Sabir Danish

4 Abstract The Phylogenetic Tree is a tool for tracking the evolution process by looking into the changes in the genome sequences under study. This tree is a graphical representation of the evolutionary relationships among multiple genes or organisms. In this work we apply the this principle of phylogeny to diagnose what disease an individual is suffering from. In our method the multiple sequence alignment is applied to a set of omic (Genomic or Proteomic) sequences of the patient, a few family members of the patient and the diseased sequences or reference sequences. Once we get the result of Multiple Sequence Alignment, the similarity in the omic sequences of patients family members is found along with the loci of each common nucleotide/amino acid, and the dissimilar nucleotides or amino acid at respective loci are discarded also from the patients and diseased sequences. Finally we create a phylogenetic tree from these sequences which can now be used to visualize the distance among the patients genome sequence and the diseased genome sequences. After applying this algorithm on the data available at the 1000 genome project and dbsnp we got the expected ressults and hence the algorithms is proved for the accuracy. Keywords: Disease diagnosis, evolution, medical diagnosis, Phylograms, cladograms, Phylogenetic trees, Multiple Sequence Alignment.

5

6 Acknowledgments I would like to take this opportunity to express my gratitude towards my guide Prof. Satish S Kumbhar for his constant help and suppoert, encouragement and inspiration for the project work. Without his invaluable guidance, this work would never have been a reached to this level. I would also like to thank all the faculty members and staff of Computer and IT department for providing us ample facility and flexibility and for making my journey of post-graduation successful. Last, but not the least, I would like to thank my classmates for their valuable suggestions and helpful discussions. I am thankful to them for their unconditional support and help throughout the year. Shamsudduha Tabish College of Engineering, Pune. ii

7 Contents Abstract Acknowledgements List of Figures ii i v 1 Introduction DNA ( Deoxyribo Nucleic Acid ) SNP (Single Nucleotide Polymorphism) Mutation Mutagens Chemical Mutagens Radiation Sunlight Spontaneous mutations Literature Survey Problem statement Multiple Sequence Alignment (MSA) Phylogenetic Trees Constructing Phylogenetic Trees Distance Methods Character Based Methods Maximum Likelihood Data Sets The HapMap Project dbsnp The 1000 Genomes Project Technologies Tomcat Server Web Services iii

8 4.3 JSP (Java Server Pages) HTML Java Script Eclipse DiagnosTree -The Tool The Algorithm Required Inputs Example System Architecture 27 7 Results 30 8 Conclusion 31 9 Future Work 32

9 List of Figures 1.1 The Eukaryotic Cell Structure The DNA Composition and Structure The Chemical Structures of Cytosine, Thymine, Adenine and Guanine A Phylogeny of Six Species Rooted and Unrooted Trees Example: A distance Matrix M Unrooted tree from the given matrix of M nodes Comparison of two sequences with their ancestor shows several types of substitutions Set of Input sequences for Maximumparsimony Algorithm Trees for first two sites of sequences A through E Pictorial Example Employing Fitch s Algorithm for given site Choosing the right algorithm that suits your needs Set of Input Sequences Aligned Sequences (Output of MSA) Uncommon Nucleotieds to be omitted out of the sequences Set of Family Members Sequences to be removed from The Sequences Final set of Sequences to be used for creating The Phylogenetic Tree The resultant Tree depicting relationship among the patients gene sequence and different diseased sequences Layered System Architecture Component Based System Architecture Flowchart for the Algorithm v

10 Chapter 1 Introduction Our work is completely based on the DNA/RNA/Protein found in the cell of almost all the living organisms. To understand these elements lets get into the cell and find out where they are created and what role do they play. The basic b building block of every living being on this planet is biologic al cell. The Cell is composed of Nucleus, Mitochondria, cytoplasm, etc. There are two types of cells, prokaryotic and eukaryotic cells. Most of single cellular organisms are made up of prokaryotic cells (eg. Bacteria), where as the all the multi-cellular organisms are made up of eukaryotic cells. In this work we focus on eukaryotic cellular organisms. Figure 1.1: The Eukaryotic Cell Structure The above figure 1.1 shows the structure of a cell in eukaryotic organism. The DNA is found in 1

11 almost every living organism. The chromosomes are composed of DNA and are found in the cell. The Nucleus in the above figure 1.1 is the main part of the cell containing large amount of DNA, only a small portion of the DNA is found in the Mitochondrion as shown in the figure. This DNA is called as mtdna or Mitochondrion DNA. The DNA is the code which encodes everything about the organism including the behavior, appearance, diseases, resistance to diseases and every character an organism posses. 1.1 DNA ( Deoxyribo Nucleic Acid ) DeoxyriboNucleic Acid (DNA) is the hereditary material found in almost all living organisms. Nearly every cell in the human body has exactly the same replica of DNA. Most of the DNA is located in the nucleus of the call (called nuclear DNA), but a small amount of DNA is also be found in mitochondria (mitochondrial DNA or mtdna). The DNA is composed of two strands having backbone madeup of phosphorous group and pentose sugar. These strands are connected to each other by adenine (A), guanine (G), cytosine (C), and thymine (T) as shown in the figure. The Human DNA has about 3 billion base pairs, and more than 99% of those bases are the same in all human beings. The sequence of these bases determine the information for building and maintaining an organism, in a similar way in which letters of the alphabet are arranged in a certain order to form words and sentences. The DNA bases, pair up with each other, A pairs with T and C pairs with G, to form units which are called base pairs. Each base is also attached to a sugar molecule and a phosphate molecule which are together the backbone for DNA. A base, sugar, and phosphate together are called a nucleotide. These nucleotides are arranged in two long sequences called strands that together form a spiral called a double helix. The structure of the double helix looks like a ladder, where the base pairs form the ladders rungs and the sugar and phosphate molecules form the vertical sidepieces of the ladder but in a spiral form. The figure 1.2 shows the chemical structure of DNA as explaind in the forth coming description and figure 1.3 shows the chemical structure of the different nucleotides playing a vital role in the structure and composition of DNA. Only because of these chemical compounds the DNA has the two strands connected and a spiral shape. 1.2 SNP (Single Nucleotide Polymorphism) Single Nucleotide Polymorphism also known as SNP (Snip) is a change of single nucleotide in the genome a particular locus. If such a variation at a single locus is found common in more than 1% of the population, only then it is considered as SNP. Around 90% of the variation in the genome is because of SNPs. SNPs are scattered across the human genome by an approximate average of one SNP per thousand base pairs, these SNPs directly affect the gene product that is the protein. Sequence variations in the genomes exist at defined positions and are responsible for phenotypic characteristics, including a person s tendency towards complex diseases like heart disease and cancer. Single nucleotide polymorphisms, frequently called SNPs (pronounced snips), are the most common 2

12 Figure 1.2: The DNA Composition and Structure type of genetic variation among people. Each SNP represents a difference in a single DNA building block, called a nucleotide. For example, a SNP may replace the nucleotide cytosine (C) with the nucleotide thymine (T) in a certain stretch of DNA. SNPs occur normally throughout a persons DNA. More precisely, they occur once in every 300 nucleotides on average, which means there are roughly 10 million SNPs in the human genome. Most commonly, these variations are found in the DNA between genes. They can act as biological markers, helping scientists locate genes that are associated with disease. When SNPs occur within a gene or in a regulatory region near a gene, they may play a more direct role in disease by affecting the genes function. Most SNPs have no effect on health or development. Some of these genetic differences, however, have proven to be very important in the study of human health. Researchers have found SNPs that may help predict an individuals response to certain drugs, susceptibility to environmental factors such as toxins, and risk of developing particular diseases. SNPs can also be used to track the inheritance of disease genes within families. There is a scope for future studies for identifying SNPs associated with complex diseases such as heart disease, diabetes, and cancer. At present there are a number of SNP analysis techniques available, some of these methods are inefficient and others require manual intervention. Using a 5 nuclease assay chemistry protocol is a fast and simple way to get data results. The experiment protocol involves combining purified genomic DNA, 3

13 Figure 1.3: The Chemical Structures of Cytosine, Thymine, Adenine and Guanine master mix, and a 5 nuclease assay, then thermal cycling, reading, and analyzing the results. For example a SNP might change the DNA sequence AAGGCTAA to ATGGCTAA. For a variation to be considered a SNP, it must occur in at least 1% of the population. SNPs, which make up about 90% of all human genetic variation, occur every 100 to 300 bases along the 3-billion-base human genome. Two of every three SNPs involve the replacement of cytosine (C) with thymine (T). SNPs can occur in coding (gene) and non-coding regions of the genome. Many SNPs have no effect on cell function, but scientists believe others could predispose people to disease or influence their response to a drug. Although more than 99% of human DNA sequences are the same, variations in DNA sequence can have a major impact on how humans respond to disease, environmental factors such as bacteria, viruses, toxins, and chemicals and drugs and other therapies. This makes SNPs valuable for biomedical research and for developing pharmaceutical products or medical diagnostics. SNPs are also evolutionarily stable that is not changing much from generation to generation which make them easier to follow in population studies. Scientists believe SNP maps will help them identify the multiple genes associated with complex ailments such as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to establish with conventional gene-hunting methods because a single altered gene may make only a small contribution to the disease. Several previous contributions to find SNPs and ultimately create SNP maps of the human genome. Among these were the U.S. Human Genome Project (HGP) and a large group of pharmaceutical companies called the SNP Consortium or TSC project. The likelihood of duplication among the groups is small because of the estimated 3 million SNPs, and the potential payoff of a SNP map was high. In addition to pharmacogenomic, diagnostic and biomedical research implications, SNP maps are being utilized to identify thousands of additional markers in the genome, thus simplifying navigation of the much larger genome map generated by HGP researchers. SNPs as risk factors in disease development SNPs do not cause disease, but they can help determine the likelihood that someone will develop a particular illness. One of the genes associated with Alzheimer s disease, apolipoprotein E or ApoE, is a good example of how SNPs affect disease development. ApoE contains two SNPs that result in three possible alleles for this gene: E2, E3, and E4. Each allele differs by one DNA base, and the protein product of each gene differs by one amino acid. Each individual inherits one maternal copy of ApoE and one paternal copy of ApoE. Research has 4

14 shown that a person who inherits at least one E4 allele will have a greater chance of developing Alzheimer s disease. Apparently, the change of one amino acid in the E4 protein alters its structure and function enough to make disease development more likely. Inheriting the E2 allele, on the other hand, seems to indicate that a person is less likely to develop Alzheimer s. Of course, SNPs are not absolute indicators of disease development. Someone who has inherited two E4 alleles may never develop Alzheimer s disease, while another who has inherited two E2 alleles may. ApoE is just one gene that has been linked to Alzheimer s. Like most common chronic disorders such as heart disease, diabetes, or cancer, Alzheimer s is a disease that can be caused by variations in several genes. The polygenic nature of these disorders is what makes genetic testing for them so complicated. 1.3 Mutation A Mutation occurs when a DNA gene is damaged or changed in such a way as to alter the genetic message carried by that gene. A Mutagen is an agent of substance that can bring about a permanent alteration to the physical composition of a DNA gene such that the genetic message is changed. Once the gene has been damaged or changed the mrna transcribed from that gene will now carry an altered message. The polypeptide made by translating the altered mrna will now contain a different sequence of amino acids. The function of the protein made by folding this polypeptide will probably be changed or lost. In this example, the enzyme that is catalyzing the production of flower color pigment has been altered in such a way it no longer catalyzes the production of the red pigment. No product (red pigment) is produced by the altered protein. In subtle or very obvious ways, the phenotype of the organism carrying the mutation will be changed. In this case the flower, without the pigment is no longer red Mutagens A Mutagen is an agent of substance that is responsible for permanent alteration to the physical composition of a DNA such that the genetic message is changed. Such a change may impact the organism on its physical appearance or in the other way which may not be directy visible Chemical Mutagens change the sequence of bases in a DNA gene in a number of ways; It mimics the correct nucleotide bases in a DNA molecule, but fail to base pair correctly during DNA replication. Remove parts of the nucleotide (such as the amino group on adenine), again causing improper base pairing during DNA replication. Add hydrocarbon groups to various nucleotides, also causing incorrect base pairing during DNA replication. 5

15 1.3.3 Radiation High energy radiation from a radioactive material or from X-rays is absorbed by the atoms in water molecules surrounding the DNA. This energy is transferred to the electrons which then fly away from the atom. Left behind is a free radical, which is a highly dangerous and highly reactive molecule that attacks the DNA molecule and alters it in many ways. Radiation can also cause double strand breaks in the DNA molecule, which the cell s repair mechanisms cannot put right Sunlight contains ultraviolet radiation (the component that causes a suntan) which, when absorbed by the DNA causes a cross link to form between certain adjacent bases. In most normal cases the cells can repair this damage, but unrepaired dimmers of this sort cause the replicating system to skip over the mistake leaving a gap, which is supposed to be filled in later. Unprotected exposure to UV radiation by the human skin can cause serious damage and may lead to skin cancer and extensive skin tumors Spontaneous mutations occur without exposure to any obvious mutagenic agent. Sometimes DNA nucleotides shift without warning to a different chemical form (know as an isomer) which in turn will form a different series of hydrogen bonds with it s partner. This leads to mistakes at the time of DNA replication. 6

16 Chapter 2 Literature Survey The current diagnosis methods are mostly based on the non genetic tests, which involve blood test, urine test, thyroid test, stool test, saliva test etc, all of these look into the chemicals and microbes found in their respective inputs. And X-Ray, MRI, CT scan, ultra sound etc, look for the physical appearance and functioning of the organs. Whereas Electroencephalography (EEG), Electrocardiogram (ECG) also known as Electrocardiography (EKG), Electromyography (EMG) etc, look into the accuracy of functioning of the organs. So these tests may or may not be successful in diagnosis of disease also a combination of such tests is required to reach the actual cause of the disease. Another new method that is on its way is through the analysis of human genome. For this method the patients genome needs to be sequenced. Then it is compared using Multiple Sequence Analysis (MSA) with the other reference genome of diseased people known to be suffering from a particular disease, if the similarity is found then patient is diagnosed to be suffering from the disease of most similar sequence in the set of input, but this requires a long time, in order to cut short this time we propose our method to be used for the diagnosis. 2.1 Problem statement Many a times doctors come across a situation where the diagnosis of a disease (a patient is suffering from) become quite difficult and this diagnosis process may take months of time, and during this time the patient is given treatment based on assumptions, if the assumptions go wrong then the patient has to take drugs targeted for the disease he/she is not suffering from. Such drugs may leave heavy side effects. Hence its the requirement of the medical system to speed up the diagnosis process and increase its accuracy. To this end, a modern technique which employ genome sequencing has been discovered lately for efficient diagnosis of diseases. In this method the patients genome is sequenced first and is then compared with the reference sequences. Although existing methods offer good accuracy but are a bit slow. This motivates a need for a faster yet accurate method to diagnose the diseases. 7

17 2.2 Multiple Sequence Alignment (MSA) Multiple Sequence Alignment (MSA) is the alignment of multiple biological sequences (of protein or nucleic acid) of equal length. From the output of the multiple sequence alignment homology is inferred and the evolutionary relationships between the sequences can be studied by creating Phylogenetic Trees. Multiple Sequence Alignment (MSA) is usually the alignment of three or more nitrogen base sequences or Nucleic acid sequences of similar length. Homology can be inferred from the output and the evolutionary relationships between the sequences studied. Usually protein sequences are aligned using multiple sequence alignment to find out the relationship among them. The multiple sequence alignment tools compare these sequences and try to correlate each other by introducing gaps in the sequences in order to match these sequences. A multiple sequence alignment arranges protein or nucleotide sequences into a rectangular array with the goal that residues in a given column are homologous (that is they are derived from a single ancestral sequence), and in a rigid local structural alignment or play a common functional role. Although these criteria are essentially equivalent for closely related proteins (most similar sequences of amino acids), structure and function diverge over evolutionary time sequences, and different criteria may result in different alignments of these sequences. Most of the existing tools do not meet the efficiency / precision expectations because the length of these sequences is very high, and a complex algorithm is required to accurately align these sequences and hence continuous efforts are being put in to improve the method. Such an algorithms require a huge amount of RAM and processing power because of the nature of the input and complex algorithms involved for getting a solution. Homology is the similarity that is the result of inheritance from a common ancestor, and identification and analysis of homologies is central to phylogenetic systematics. An Alignment is an hypothesis of positional homology between bases/amino Acids. Many tools exist for finding the MSA of given set of omic sequences, namely: Clustalw2 Clustal Omega from EBI UK, T-COFFEE from Lausanne Switzerland, VRIJE universitys PARALINE, bioinformatics.orgs STRAP, MAFFT from Tokyo, Japa, MUSCLE from EBI UK, and many more. We have chosen the popular EMBL EBIs Clustal Omega for multiple sequence alignment in our work. Almost all these tools are based on dynamic programming. 2.3 Phylogenetic Trees A phylogenetic tree is described as, a branching diagram that shows, for each species, with which other species it shares its most recent common ancestor. The evolutionary tree or cladograms were traditionally used to draw evolutionary relationship among the organism; a more modern version of the same is phylogenetic tree which uses gene / protein sequences to draw the evolutionary relationship. These trees dictate the relationship among the organisms based on the similarity and dissimilarity among the nucleotide or nucleic acid sequences. The tree construction can be done through variety of tree-building methods which include methods 8

18 based on distances, likelihood and characters. After a phylogenetic tree is constructed, it is important to test its accuracy which refers to the degree to which a tree is close to the true tree. Phylogenetics is the study of evolutionary relationships among organisms or genes. Below, we will refer to the objects whose phylogeny we are studying as organisms or species, but the discussion of methods is valid for the phylogeny of genes as well. We construct phylogenetic trees to illustrate the evolutionary relationships among a group of organisms. The purpose of phylogenetic studies are (1) to reconstruct evolutionary ties between organisms and (2) to estimate the time of divergence between organisms since they last shared a common ancestor. There are several types of data that can be used to build phylogenetic trees: Traditionally, phylogenetic trees were built from morphological features (e.g., beak shapes, presence of feathers, number of legs, etc). Today, we use mostly molecular data like DNA sequences and protein sequences. A phylogeny example showing the evolutionary history of six species: Fish, Deer, Cow, Human, Monkey and Chimpanzee is shown in Figure 2.1. Figure 2.1: A Phylogeny of Six Species Each of the organism has discrete characters each character has a finite number of states. For example, discrete characters include the number of legs of an organism, or a column in an alignment of DNA sequences. In the latter case, the number of states for the column character is 4 (A, C, T, G). Comparative Numerical Data These data encode the distances between objects and are usually derived from sequence data. For example, we could hypothetically say distance (man, mouse) = 500 and distance (man, chimp) = 100. External nodes are things under comparison, also called operational taxonomic units (OTUs). Internal nodes are hypothetical ancestral units. They are used to group current-day units. In rooted trees, the root is the common ancestor of all OTUs under study. The path from root to a node defines an evolutionary path. An unrooted tree specifies relationships among OTUs but does not specify evolutionary paths 9

19 Figure 2.2: Rooted and Unrooted Trees (Figure 2.2). We can root an unrooted tree by finding an outgroup (i.e., if we have some external reason indicating that a certain OTU branched off first). For example, in Figure 2.2, the unrooted tree can be transformed to the rooted tree by making E the outgroup. The topology of a tree is the branching pattern of a tree. All internal nodes of a bifurcating tree have 2 descendants if it is rooted or 3 neighbors if it is unrooted. It is sometimes useful to allow more than 2 descendants (or more than 3 neighbors in the unrooted case), but we will focus on bifurcating trees. The branch length can represent the number of changes that have occurred in that branch, or can indicate the genetic distance between nodes connected by that branch, or can indicate the amount of evolutionary time passed along the branch. In every phylogenetic tree, a time axis is implicit. In our example, the time at C is more recent than the time at B which is in turn more recent than that at A. In this phylogeny, it shows that monkey and chimpanzee had the most recent common ancestor at the time C. Then, some time before this, at time B, the most recent common ancestor of human, monkey and chimpanzee were found. Finally, the most recent common ancestor of all six species was found at time A. Phylogeny inference can be used for analysis of sequences of proteins and DNA. The concept of phylogeny is extended to haplotype sequences. The sequences of the individuals replace the species in the phylogenetic tree. In this case, the phylogeny shows the evolutionary history of the individuals. This concept also makes sense for sequences coming from the same individual, as in our case of using phylogeny for reconstructing the haplotype sequences from genotypes. This is because the two sequences of the individual actually come from his/her father and mother. The phylogeny shows the common ancestor of both father and mother of the individuals. In our algorithm, we further extend the concept of phylogeny and use it to represent only a column of the set of haplotype sequences. In every phylogenetic tree, a time axis is implicit. In our example, the time at C is more recent than the time at B which is in turn more recent than that at A. In this phylogeny, it shows that monkey and chimpanzee had the most recent common ancestor at the time C. Then, some time before this, at time B, the most recent common 10

20 ancestor of human, monkey and chimpanzee were found. Finally, the most recent common ancestor of all six species was found at time A. 2.4 Constructing Phylogenetic Trees The three major methods for constructing phylogenetic trees are: Distance methods: Evolutionary distances are computed for all OTUs and these are used to construct trees. Maximum Parsimony: The tree is chosen to minimize the number of changes required to explain the data. Maximum Likelihood: Under a model of sequence evolution, the tree which gives the highest likelihood of the given data is found Distance Methods The problem can be described as follows: Input: Given an n X n matrix M where Mij 0 and Mij is the distance between objects i and j. Goal: Build an edge-weighted tree where each leaf corresponds to one object of M, and such that the distances measured on the tree between leaves i and j correspond exactly to the value of Mij. When such a tree can be constructed, we say the distances in M are additive. Example: Suppose we are given the distances as in Table below. Figure 2.3: Example: A distance Matrix M Distance methods do not use the actual molecular sequence alignment during the tree inference but calculate a symmetric n X n matrix from the input alignment in the beginning. The entries of this matrix are the pair-wise-distances of the n sequences. The actual tree inference is then performed solely on the basis of this matrix. n provides a measure for the genetic distance of each pair of the n sequences in the input alignment. In the simplest case this function would only count the number of differing characters of the two sequences. More elaborate functions, however, utilize a sophisticated model of molecular 11

21 Figure 2.4: Unrooted tree from the given matrix of M nodes evolution. The most frequently used distance-based approaches are probably the LS (Least-Squares) method and the UPGMA (Un-weighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor- Joining) heuristics. Least-Squares The Least-Squares method estimates the branch lengths of a tree topology by matching the distances described by them as closely as possible to the values of the pair-wise distances matrix. This is achieved by minimizing the sum of squared differences between the given (by the distances matrix) and the predicted distances. The predicted distance between two sequences is calculated as the sum of the branch lengths along the path connecting both of them. The sum of all squared differences represents a measure for the fit of the tree to the given sequence data: the tree with the minimal sum is the optimal tree. The complexity of LS is O(n 3 ). UPGMA UPGMA is a clustering algorithm that builds a rooted tree topology by stepwise addition. A molecular clock is assumed for the evolutionary process, which means that all species contained in the phylogenetic tree are supposed to evolve at the same rate. This assumption leads to the fact that trees obtained by UPGMA are ultra metric trees, that is, all end nodes (representing the species of interest) are equidistant from the root. The algorithm works as follows: In the beginning, each node represents a cluster. At each step, the two clusters whose associated sequences have minimal distance according to the distance matrix are joined. Their entries are removed from the matrix and an entry for the new cluster is added. The distance of the new cluster to other clusters is computed as the mean distance of the sequences contained in each cluster. The algorithm terminates when all clusters have been joined into a single cluster. The complexity of UPGMA is O(n 2 ). Neighbor-Joining Neighbor-Joining is also a clustering algorithm and is based on the minimum-evolution criterion. The tree that explains the sequence data with the minimal amount of change, i.e., the tree which minimizes the sum of all branch lengths (the total tree length), is the optimal tree. The algorithm starts with a 12

22 star-tree. At each step, two nodes are removed from the tree and reconnected via a common newly added internal node. The distance of both nodes to any other node of the tree (i.e., the sum of the branch lengths on the path connecting the nodes) stays constant. Yet, the total tree length is reduced as two rather long branches are replaced by three shorter branches. The nodes to be reorganized are selected such that the greatest reduction of the tree length is achieved. This procedure is repeated until the tree is fully resolved. The complexity of the original NJ implementation is O(n 3 ) which can be reduced to O(n 2 ) by using a more sophisticated algorithm for selecting the nodes to be joined. Computing Distances We have looked at a couple of distance method heuristics for reconstructing trees, given distance data. One question we could ask at this point is: how do we obtain the distance data? One answer is that distance data can be obtained from sequence data. Let us compare the following two sequences: Figure 2.5: Comparison of two sequences with their ancestor shows several types of substitutions There are only 3 observed difference between the 2 sequences; however, considering the ancestral sequence, we see that are actually 12 total substitutions. Thus, if multiple substitutions have occurred at any site (e:g:, the convergent substitution at site 11), then the naive way of computing distance is an underestimate. How can we correct for multiple substitutions? For DNA sequences, we can use models for nucleotide substitution. For protein sequences, we have already talked about models for amino acid substitution in our discussion of PAM matrices. (We will also use these models when we talk about maximum likelihood methods for phylogenetic reconstruction.) 13

23 2.4.2 Character Based Methods Discrete characters include morphological data (such as the absence or presence of feathers), protein data (20 possible amino acids), and DNA data (four possible nucleotides). All character based methods assume that different characters are independent of each other. Given character data, how does one find a tree out of the given data? What criteria are used to pick the best tree? Maximum Parsimony One method is to use maximum parsimony. In this instance, we want to find the tree that minimizes the number of changes needed to explain the data. For example, given the following DNA data, which tree is most parsimonious? Figure 2.6: Set of Input sequences for Maximumparsimony Algorithm Sites 1 and 2 each require one change for the given tree. It turns out that the entire data can be explained with a minimum of 9 changes using the tree in Figure below. However, changing the tree will alter the minimum number of changes required. This example leads us to ask two important questions relating to parsimony: Given a particular tree, how do you find the minimum number of changes needed to explain the data? (Easy) How do you find the most parsimonious tree? (NP-hard) To answer the easy first question, we use Fitch s Algorithm. The idea is to construct a set of possible states (eg: nucleotides) for internal nodes based on the states of the children. For each site, each leaf is labeled by a singleton set containing, for example, the nucleotide at that position. For each internal node i, with children j and k (labels Sj and Sk): Si = SjUnionSk, ifsjintersectionsk = φ Si = SjIntersectionSkotherwise The total number of changes equals the total number of union operations. This is illustrated by the Figure 2.7. We can see from Figure 2.7 that there are three unions in the tree; this implies that this site requires three changes. It is easy to implement this algorithm by post-order traversal of the tree. In 14

24 Figure 2.7: Trees for first two sites of sequences A through E contrast, the answer to the second question, finding the most parsimonious tree, is not easy. There are many heuristics for doing this. We will quickly talk about two techniques: 1) the branch-and-bound method (prunes search space, and find the most parsimonious tree) and 2) the nearest-neighbor interchange method (fast heuristic, which may not find most parsimonious tree). Maximum Parsimony favors the tree topology which explains the given data (the multiple sequences alignment) with the least amount of change, i.e., the lowest number of nucleotide or amino acid substitutions. In this sense, it is similar to the minimum-evolution criterion of NJ. However, MP computes the distance between two sequences on a per-column (per-site) basis and considers only so-called informative sites. Those are the columns of the sequence alignment that contain at least two different kinds of characters, each of which is represented in at least two of the sequences. The distance between two sequences is the number of differing characters at informative sites and is attributed as weight to the branch connecting the two sequences. For the inner nodes of the tree hypothetical sequences are calculated such that the distances between an inner node and its adjacent nodes are minimal. The Maximum Parsimony score of a tree can be calculated by summing up the weights of all branches. The tree with minimal score is the most parsimonious tree and thus the optimal tree under the Maximum Parsimony optimality criterion. Since the Maximum Parsimony criterion is very similar to the minimum-evolution criterion, it also suffers from identical shortcomings. Additionally, the phenomenon of so-called long branch attraction can be observed on MP-inferred phylogenies: sequences which are connected to the tree by very long branches, might be grouped together though they developed from very different lineages. Long branches indicate a high rate of change, i.e., the sequence at the terminal node of the branch differs from the hypothetical sequence at the internal node in many sites. Maximum Parsimony only accounts for the fact that some substitution took place at a specific site and not which substitution. Thus, it groups the two nodes with the long branches together solely because both highly differ from the other sequences. The fact that both of them also are highly different to each other is neglected. Nevertheless, Maximum Parsimony is still frequently used for phylogenetic inference for several reasons. Firstly, it is a character-based method and 15

25 as such considered to be superior to distance methods at it uses all information that is contained in the input alignment for the tree reconstruction. Secondly, it is fast and therefore an alternative to Maximum Likelihood for large-scale datasets if computational resources are restricted. Thirdly, the phenomenon of long-branch attraction is only an issue for small datasets. Fourthly, many biologists appreciate the fact that MP only makes few assumptions about the evolutionary process besides evolutionary change being rare. Branch and bound The branch-and-bound method (as applied here) counts the number of changes for an initial tree (e.g., an initial tree may be obtained using the neighbor-joining method). Then, starting from scratch, we will search our space by building partial trees (i:e:, one branch is added at a time). That is, in the kth level of the search, we will have nodes representing all possible phylogenetic trees with k leaves for the first k species (the order is fixed beforehand arbitrarily). If the cost of any partial tree we are building is greater than that of the initial tree, then search along this line is abandoned. We can improve our search (potentially getting rid of more things) by computing an estimate of the minimum number of changes required to add the additional species. There is no guarantee with branch and bound on how much of the search space is eliminated. Figure 2.8: Pictorial Example Employing Fitch s Algorithm for given site Nearest-neighbor interchange The nearest-neighbor interchange method involves rearranging trees at the neighbor level and choosing the neighbor tree with the best score (ie. the least number of changes). There are many possibilities for how you can define neighbors. Neighbors in this heuristic procedure are defined as follows. Considering any internal edge, we break up our tree into 4 sub-trees. For example, in the tree in Figure 4, the subtrees would consist of the leaves A, B, C and D, although in general these subtrees consist of more than 1 leaf. This original tree (which has A and B branching separately from C and D) has two neighbors : one with the roles of B and D switched (i.e., with A and D branching separately from B and C) and one with the roles of B and C switched (i.e., with A and C branching separately from B and D). Starting with one tree, we repeatedly choose the neighboring tree with the best score, until there are no neighboring trees with better scores. This is a hill-climbing method, and there is no guarantee that we will find the most parsimonious tree. 16

26 While the parsimony method makes very few assumptions, it ignores branch lengths in building trees. If there are branches that diverge much more rapidly than others, it is easy to convince yourself that the parsimony method can lead to incorrect topologies Maximum Likelihood Maximum Likelihood is a method for the inference of phylogeny. It evaluates a hypothesis about evolutionary history in terms of the probability that the proposed model and the hypothesized history would give rise to the observed data set. The supposition is that a history with a higher probability of reaching the observed state is preferred to a history with a lower probability. The method searches for the tree with the highest probability or likelihood. In general, Maximum Likelihood is a parametric statistical method for fitting a mathematical model to some data. The principle of likelihood suggests that the explanation that makes the observed outcome the most likely occurrence is the one to be preferred. Formally, given some data D and a hypothesis O, the likelihood of that data is given by which the probability of obtaining D given v. L(Dj O) = f(dj O) Though both terms are colloquially used synonymously, it is important to distinguish between probability and likelihood here. Informally, probability allows one to predict unknown outcome based on known parameters, whereas likelihood allows one to predict unknown parameters based on known outcome. Figure 2.9: Choosing the right algorithm that suits your needs 17

27 Chapter 3 Data Sets There are numerous open-source bioinformatics databanks available on internet. Every country is in a race to develop a rich bioinformatics databank. In this work we select SCBIs DBSNP, EMBL EBIs 1000 genome as a data source from 3.1 The HapMap Project We have identified one of the sources of data for inferring phylogenetic trees and analyzing them as the international HapMap project. The International HapMap Project is an effort by multiple countries to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. The Project is collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States. All of the information generated by the Project is publically available. The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs. In the initial phase of the Project, genetic data are being gathered from four populations with African, Asian, and European ancestry. Ongoing interactions with members of these populations are addressing potential ethical issues and providing valuable experience in conducting research with identified populations. Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints. This project is supposed to use the data available at the International Haplotype Map (HapMap Phase II) for the purpose of conducting a fine-scale genome-wide scan of human genetic variations.computationally phased HapMap data is used for this analysis. Although what algorithms we have developed infers maximum parsimony phylogenies directly from un-phased data, these algorithms are not efficient enough for use on a whole-genome scale. We restrict this project to the HapMap population of single subcontinent because these subpopulations were genotyped for parent-child trios and can thus be expected to have 18

28 minimal phasing error. The other two HapMap data sets (Han Chinese in Beijing, China and Japanese in Tokyo, Japan) were genotyped only for unrelated individuals and were omitted here due to the higher likelihood of phasing errors. All HapMap data sets were downloaded in phased form from the HapMap web site, where the PHASE program had been used to identify most likely phases from the trio data. This HapMap build was based on the NCBI human genome assembly build 35. SNP location assignments and genomic coordinates are therefore based on NCBI build 35. The resulting data contained 120 haplotypes from 60 unrelated individuals for each of the two populations typed at approximately 3.7 million SNPs. Phylogeny inferences are proposed to run for window sizes of five, six, seven, eight, and nine consecutive SNPs at each overlapping window of the given size across the 22 autosomal human chromosomes in each of the HapMap subcontinental populations. 3.2 dbsnp The Single Nucleotide Polymorphism database (dbsnp) is a database which maintains the variation (occurring in more than 1 dbsnp is a database that contains entries submitted by public laboratories and private organizations for a large number of organisms across the globe. Each of these submissions include information about the actual nucleotide variation and the 5 and 3 flanking sequences. 3.3 The 1000 Genomes Project The 1000 Genomes Project is the first ever project to sequence the genomes of a large number of people, to provide a comprehensive data set resource on human genetic variation. The goal of the 1000 Genomes Project is to locate most genetic variants that have frequencies of at least 1% in the populations under study. This goal is being attained by sequencing many individuals lightly. To sequence a person s genome, many copies of the persons DNA are broken into short pieces and each piece is sequenced individually. The many copies of DNA indicate that the DNA pieces are more-or-less randomly distributed across the genome. The pieces are then aligned with the reference sequence and merged together. To accurately sequence the complete genomic sequence of one person with the existing sequencing platforms, it requires sequencing that person s DNA the equivalent of about 28 times. If the amount of sequence done is only an average of once across the genome, then much of the sequence would be missed, since some genomic locations will be covered by several pieces while others will have nothing. Deeper the sequencing coverage, more of the genome will be covered at least once. Also, people are diploid; the deeper the sequencing coverage, the more likely that both chromosomes at a loci will be included. In addition, deeper coverage is mainly useful for diagnosing structural variants, and it corrects the sequencing errors. The 1000 Genome Project offers genome sequences from various families across the geographic locations. It also maintains the relationship information about the individuals. 19

29 Chapter 4 Technologies Following are the technologies we have used in our research. 4.1 Tomcat Server We use Tomcat Server to provide web based access to our system. Also the comcat server is used to deploy the webservice clients for the Multiple Sequence alignment through Clustal Omega and Phylogenetic Trees through Clustal Phylogeny from EMBL EBI. 4.2 Web Services Web services are application components providing access to certain methods and objects through internet. Web services communicate using open protocols like tcp/ip and http and make it easy to access the components across the platforms. Web services are self contained and self describing services. All this description is offered through an XML file with extension as wsdl (stands for web service description language). Web services are discovered using UDDI (Universal Description Discovery and Integration) which allows the client to connect with a specific web service running on that server. Web services can also be used by other applications existing within the local area network of the server or through internet. XML is the base for Web services as it offers interoperability across the platforms and simplifies the communication through basic protocols. The basic Web services platform is XML and HTTP protocol combination. XML offers a language which can be used across different platforms and programming languages and still deliver complex messages and functions. The HTTP protocol is the core and most used Internet protocol. Web services platform elements include: SOAP - (Simple Object Access Protocol) UDDI - (Universal Description, Discovery and Integration) WSDL - (Web Services Description Language) 20

30 Various web service are offered by the global bioinformatics community. And we have used a couple of them offered by EMBL EBI. The web services that we have used are ClustalOmega for Multiple Sequence Alignment And ClustalW2 Phylogeny for retrieving phylogenetic tree related data. 4.3 JSP (Java Server Pages) Java Server Pages (JSP) is a technology for developing dynamic web pages that is to provide support dynamic content. It helps developers insert java code in HTML pages by making use of special JSP tags. A JSP component is a type of Java servlets that is designed to interact with the client offering realtime contents using a Java web application. The JSP files are written as text files that combine HTML or XHTML code, XML elements, and embedded JSP actions and commands in order to offer dynamic contents. The User interface for JSP is offered through web browsers as JSP happens to be a web application development language. JavaServer Pages often offers the same applications as offered by Common Gateway Interface (CGI) language but on the top of it has tons of benefits both functional and non functional. Performance is significantly improved because JSP allows embedding Dynamic Elements in HTML Pages itself instead of having a separate CGI files. JSP files are always compiled before it s processed by the server as opposed to CGI/Perl which requires the server to load an interpreter and the target script each time the page is requested. JavaServer Pages are built using the base as the Java Servlets API, so like Servlets, JSP also has access to all the powerful Enterprise Java APIs, including JDBC, EJB, JNDI, JAXP etc. JSP pages can also be used in combination with servlets that are used to handle the business logic, the model that is supported by Java servlet template engines. JSP is an integral part of J2EE, a complete platform for enterprise standard applications. This implies that JSP can be used to develop simplest applications to the most complex and demanding applications. 4.4 HTML 5 HTML5 is a co-operation between the (W3C) World Wide Web Consortium and the (WHATWG) Web Hypertext Application Technology Working Group. HTML5 is the new standard for HTML. For HTML5 still a lot of work is in progress. However, Many browsers have incorporated support for HTML 5. It heavily uses java script and CSS. By use of these technologies it reduces the use of external plugins like flash, reduces use of scripting by incorporating new tags, and has improved on error handling. Also HTML5 targets to be compatible with every device. In our research we have used the canvas tag in combination with the java scrip language for rendering the results in the form of phylogenetic trees. 4.5 Java Script A scripting language is a lightweight programming language used with the web applications. This is client side scripting language mainly used for data validation, animations, and small calculations at the 21

31 client end. It is programming code that can be inserted into HTML pages. JavaScript when inserted into HTML pages, is supported by all modern web browsers and hence can be executed with ease. It can detect the browser the client is using so that respective code can be executed. The java script is an interpreted language that is you do not need to compile it before execution, its directly interpreted by the web browser. 4.6 Eclipse Eclipse is an opensource IDE Integerated Development Environment. It is created by Open Source Community and is used in several different areas, e.g. as a development environment for Java or Android applications, python, c, c++ pearl etc. The Eclipse projects are governed by the Eclipse Foundation. The Eclipse Foundation is a member supported, non-profit corporation that hosts the Eclipse Open Source projects. Also helps to cultivate both an Open Source community and an Ecosystem of complementary products and services. The Eclipse IDE can be easily extended with additional software components or plugins. Several Open Source projects and companies have extended the Eclipse IDE and customized according to their requirements in their working environment. Eclipse is also used as a base for creating general purpose applications. These applications are known as Eclipse Rich Client Platform applications (Eclipse RCP). The Eclipse Foundation uses Eclipse Public License (EPL) and is an Open Source software license for its software. The EPL is specially designed to be business-friendly. EPL Licence states that the EPL licensed programs can be used, modified, copied and distributed free of cost. The consumer of EPL licensed software can go for using this software in closed source programs. Any modifications in the original EPL code must also be released as EPL code as stated by EPL. We have extensively used Eclipse IDE for implementing our algorithm by implementing the web service clients and our intermediate code and the HTML 5 with Java Script code. 22

32 Chapter 5 DiagnosTree -The Tool We name our tool as DiagnosTree since it facilitates diagnosis of diseases through the use of phylogenetic trees. Although the diagnosis is possible with gene sequences, protein sequences, and the RNA sequences, but for this paper we will stick to gene sequences. Our method is based on the similarity that the human beings are having in their gene sequences and the assumption that any change in the gene sequence at the loci where the nitrogen bases are usually common in all the human beings is responsible for the abnormality an individual is having. 5.1 The Algorithm Required Inputs Patients gene sequence A few of Patients family members gene sequences Diseased gene sequences (which will be downloaded from the Bioinformatics databases). We consider patients family members sequences for analysis since their gene sequences are most close to the patients gene sequence, with the help of these sequences we try to find out which mutation in the sequence of the patient is responsible for the disorder. To diagnose the disease we need to compare the sequence of the patient with the gene sequences of diseased genomes. To reduce the time required for diagnosis (through computer processing) we suggest to find out the probable diseases the patient might be suffering from based on the symptoms. In our method we find out the common nucleotides among the patient and the family members gene sequences and discard the dissimilar nucleotides to retain the common nucleotides with respect to their loci. Following are the steps that we suggest to diagnose the disease. Step 1: Align the gene sequences of The patient 23

33 The family members of the patient And the diseased sequences. Step 2: Find out the common nucleotides among the family members of the patient, and discard the dissimilar nucleotides from all the sequences (of the patient, patients family members, and the diseased sequences) from the respective loci after alignment. Step 3: Now Discard the Patients family members gene sequences. Step 4: Create a phlyogenetic tree (we prefer maximum parsimony based phylogenetic tree) based on the sequences we got in the previous step (Modified gene sequences of the patient and the diseased sequences). Step 5: From this tree we can say that the patient is suffering from a disease which is having least distance from the patients gene sequence Example Lets consider the following hypothetical sequences. Figure 5.1: Set of Input Sequences Where P is the Patient, F1, F2, F3 and F4 are close relatives of the patient and D1, D2, D3 and D4 are People suffering from different diseases (Reference sequences). Now we do apply multiple sequence alignment on these sequences and get the following output. From the above result we discard the dissimilar nucleotides/characters from the family members sequences and discard the nucleotides/characters at respective loci from the other sequences as follows. And we get Further we ignore the close relatives sequences and construct a phylogenetic tree based on the rest of the sequences. The tree shown in the above figure depicts that the patient is suffering from the disease D2. 24

34 Figure 5.2: Aligned Sequences (Output of MSA) Figure 5.3: Uncommon Nucleotieds to be omitted out of the sequences 25

35 Figure 5.4: Set of Family Members Sequences to be removed from The Sequences Figure 5.5: Final set of Sequences to be used for creating The Phylogenetic Tree Figure 5.6: diseased sequences The resultant Tree depicting relationship among the patients gene sequence and different 26

36 Chapter 6 System Architecture Figure 6.1: Layered System Architecture 27

37 Figure 6.2: Component Based System Architecture 28

Genetics 101. Prepared by: James J. Messina, Ph.D., CCMHC, NCC, DCMHS Assistant Professor, Troy University, Tampa Bay Site

Genetics 101. Prepared by: James J. Messina, Ph.D., CCMHC, NCC, DCMHS Assistant Professor, Troy University, Tampa Bay Site Genetics 101 Prepared by: James J. Messina, Ph.D., CCMHC, NCC, DCMHS Assistant Professor, Troy University, Tampa Bay Site Before we get started! Genetics 101 Additional Resources http://www.genetichealth.com/

More information

Mutation. ! Mutation occurs when a DNA gene is damaged or changed in such a way as to alter the genetic message carried by that gene

Mutation. ! Mutation occurs when a DNA gene is damaged or changed in such a way as to alter the genetic message carried by that gene Mutations Mutation The term mutation is derived from Latin word meaning to change.! Mutation occurs when a DNA gene is damaged or changed in such a way as to alter the genetic message carried by that gene!

More information

Exome Sequencing Exome sequencing is a technique that is used to examine all of the protein-coding regions of the genome.

Exome Sequencing Exome sequencing is a technique that is used to examine all of the protein-coding regions of the genome. Glossary of Terms Genetics is a term that refers to the study of genes and their role in inheritance the way certain traits are passed down from one generation to another. Genomics is the study of all

More information

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual

More information

Goal 3. Friday, May 10, 13

Goal 3. Friday, May 10, 13 Goal 3 Bio.3.1 Explain how traits are determined by the structure and function of DNA. Bio.3.2 Understand how the environment, and/or the interaction of alleles, influences the expression of genetic traits.

More information

12/8/09 Comp 590/Comp Fall

12/8/09 Comp 590/Comp Fall 12/8/09 Comp 590/Comp 790-90 Fall 2009 1 One of the first, and simplest models of population genealogies was introduced by Wright (1931) and Fisher (1930). Model emphasizes transmission of genes from one

More information

Resources. How to Use This Presentation. Chapter 10. Objectives. Table of Contents. Griffith s Discovery of Transformation. Griffith s Experiments

Resources. How to Use This Presentation. Chapter 10. Objectives. Table of Contents. Griffith s Discovery of Transformation. Griffith s Experiments How to Use This Presentation To View the presentation as a slideshow with effects select View on the menu bar and click on Slide Show. To advance through the presentation, click the right-arrow key or

More information

Protein Synthesis

Protein Synthesis HEBISD Student Expectations: Identify that RNA Is a nucleic acid with a single strand of nucleotides Contains the 5-carbon sugar ribose Contains the nitrogen bases A, G, C and U instead of T. The U is

More information

4.1. Genetics as a Tool in Anthropology

4.1. Genetics as a Tool in Anthropology 4.1. Genetics as a Tool in Anthropology Each biological system and every human being is defined by its genetic material. The genetic material is stored in the cells of the body, mainly in the nucleus of

More information

DNA DNA Profiling 18. Discuss the stages involved in DNA profiling 19. Define the process of DNA profiling 20. Give two uses of DNA profiling

DNA DNA Profiling 18. Discuss the stages involved in DNA profiling 19. Define the process of DNA profiling 20. Give two uses of DNA profiling Name: 2.5 Genetics Objectives At the end of this sub section students should be able to: 2.5.1 Heredity and Variation 1. Discuss the diversity of organisms 2. Define the term species 3. Distinguish between

More information

DNA. Grade Level: 5-6

DNA. Grade Level: 5-6 DNA Grade Level: 5-6 Teacher Guidelines pages 1 2 Instructional Pages pages 3 5 Activity Page pages 6-7 Practice Page page 8 Homework Page page 9 Answer Key page 10-13 Classroom Procedure: Approximate

More information

Physical Anthropology 1 Milner-Rose

Physical Anthropology 1 Milner-Rose Physical Anthropology 1 Milner-Rose Chapter 3 Genetics: Reproducing Life and Producing Variation Our Origins By Clark Spencer Larsen Natural Selection operates on the levels of the 1. living, behaving

More information

GENETICS 1 Classification, Heredity, DNA & RNA. Classification, Objectives At the end of this sub section you should be able to: Heredity, DNA and RNA

GENETICS 1 Classification, Heredity, DNA & RNA. Classification, Objectives At the end of this sub section you should be able to: Heredity, DNA and RNA Classification, Heredity, DNA and Objectives At the end of this sub section you should be able to: RNA Heredity and Variation Gene Expression DNA structure DNA Profiling Protein Synthesis 1. Discuss the

More information

Up Close and Personal

Up Close and Personal Special Report Up Close and With Personal advances in genetics, a new type of medicine has emerged: one that s tailored just for you and your DNA. Could it transform the paradigm of how we treat and prevent

More information

BIOLOGY 111. CHAPTER 6: DNA: The Molecule of Life

BIOLOGY 111. CHAPTER 6: DNA: The Molecule of Life BIOLOGY 111 CHAPTER 6: DNA: The Molecule of Life Chromosomes and Inheritance Learning Outcomes 6.1 Describe the structure of the DNA molecule and how this structure allows for the storage of information,

More information

CELL BIOLOGY: DNA. Generalized nucleotide structure: NUCLEOTIDES: Each nucleotide monomer is made up of three linked molecules:

CELL BIOLOGY: DNA. Generalized nucleotide structure: NUCLEOTIDES: Each nucleotide monomer is made up of three linked molecules: BIOLOGY 12 CELL BIOLOGY: DNA NAME: IMPORTANT FACTS: Nucleic acids are organic compounds found in all living cells and viruses. Two classes of nucleic acids: 1. DNA = ; found in the nucleus only. 2. RNA

More information

UNIT MOLECULAR GENETICS AND BIOTECHNOLOGY

UNIT MOLECULAR GENETICS AND BIOTECHNOLOGY UNIT MOLECULAR GENETICS AND BIOTECHNOLOGY Standard B-4: The student will demonstrate an understanding of the molecular basis of heredity. B-4.1-4,8,9 Effective June 2008 All Indicators in Standard B-4

More information

DNA and Biotechnology Form of DNA Form of DNA Form of DNA Form of DNA Replication of DNA Replication of DNA

DNA and Biotechnology Form of DNA Form of DNA Form of DNA Form of DNA Replication of DNA Replication of DNA 21 DNA and Biotechnology DNA and Biotechnology OUTLINE: Replication of DNA Gene Expression Mutations Regulating Gene Activity Genetic Engineering Genomics DNA (deoxyribonucleic acid) Double-stranded molecule

More information

Quiz 1. Bloe8 Chapter question online student quizzes

Quiz 1. Bloe8 Chapter question online student quizzes Bloe8 Chapter 9 2 15-question online student quizzes Questions are organized by section number and have an (F), (C), or (A) at the beginning to designate the modified Bloom categories used in the test

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map

More information

Unit 3: DNA and Genetics Module 6: Molecular Basis of Heredity

Unit 3: DNA and Genetics Module 6: Molecular Basis of Heredity Unit 3: DNA and Genetics Module 6: Molecular Basis of Heredity NC Essential Standard 3.1 Explain how traits are determined by the structure and function of DNA How much DNA is in my body? DNA is found

More information

Outline. Structure of DNA DNA Functions Transcription Translation Mutation Cytogenetics Mendelian Genetics Quantitative Traits Linkage

Outline. Structure of DNA DNA Functions Transcription Translation Mutation Cytogenetics Mendelian Genetics Quantitative Traits Linkage Genetics Outline Structure of DNA DNA Functions Transcription Translation Mutation Cytogenetics Mendelian Genetics Quantitative Traits Linkage Chromosomes are composed of chromatin, which is DNA and associated

More information

Chapter 4 DNA Structure & Gene Expression

Chapter 4 DNA Structure & Gene Expression Biology 12 Name: Cell Biology Per: Date: Chapter 4 DNA Structure & Gene Expression Complete using BC Biology 12, pages 108-153 4.1 DNA Structure pages 112-114 1. DNA stands for and is the genetic material

More information

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations An Analytical Upper Bound on the Minimum Number of Recombinations in the History of SNP Sequences in Populations Yufeng Wu Department of Computer Science and Engineering University of Connecticut Storrs,

More information

RNA ID missing Word ID missing Word DNA ID missing Word

RNA ID missing Word ID missing Word DNA ID missing Word Table #1 Vocab Term RNA ID missing Word ID missing Word DNA ID missing Word Definition Define Base pairing rules of A=T and C=G are used for this process DNA duplicates, or makes a copy of, itself. Synthesis

More information

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome Lesson Overview 14.3 Studying the Human Genome THINK ABOUT IT Just a few decades ago, computers were gigantic machines found only in laboratories and universities. Today, many of us carry small, powerful

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Outline Central Dogma of Molecular

More information

THE COMPONENTS & STRUCTURE OF DNA

THE COMPONENTS & STRUCTURE OF DNA THE COMPONENTS & STRUCTURE OF DNA - How do genes work? - What are they made of, and how do they determine the characteristics of organisms? - Are genes single molecules, or are they longer structures made

More information

Making sense of DNA For the genealogist

Making sense of DNA For the genealogist Making sense of DNA For the genealogist Barry Sieger November 7, 2017 Jewish Genealogy Society of Greater Orlando OUTLINE Basic DNA concepts Testing What do the tests tell us? Newer techniques NGS Presentation

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome Lesson Overview 14.3 Studying the Human Genome THINK ABOUT IT Just a few decades ago, computers were gigantic machines found only in laboratories and universities. Today, many of us carry small, powerful

More information

DNA: The Molecule of Heredity

DNA: The Molecule of Heredity DNA: The Molecule of Heredity STRUCTURE AND FUNCTION - a nucleic acid o C, H, O, N, P o Made of nucleotides = smaller subunits o Components of nucleotides: Deoxyribose (simple sugar) Phosphate group Nitrogen

More information

Lesson 8. DNA: The Molecule of Heredity. Gene Expression and Regulation. Introduction to Life Processes - SCI 102 1

Lesson 8. DNA: The Molecule of Heredity. Gene Expression and Regulation. Introduction to Life Processes - SCI 102 1 Lesson 8 DNA: The Molecule of Heredity Gene Expression and Regulation Introduction to Life Processes - SCI 102 1 Genes and DNA Hereditary information is found in discrete units called genes Genes are segments

More information

DNA is the genetic material found in cells Stands for: Deoxyribonucleic Acid Is made up of repeating nucleic acids It s the Unit of Heredity

DNA is the genetic material found in cells Stands for: Deoxyribonucleic Acid Is made up of repeating nucleic acids It s the Unit of Heredity What is DNA? DNA is the genetic material found in cells Stands for: Deoxyribonucleic Acid Is made up of repeating nucleic acids It s the Unit of Heredity DNA is found in the cytoplasm of prokaryotes and

More information

To truly understand genetics, biologists first had to discover the chemical nature of genes

To truly understand genetics, biologists first had to discover the chemical nature of genes To truly understand genetics, biologists first had to discover the chemical nature of genes Identifying the structure that carries genetic information makes it possible to understand how genes control

More information

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test DNA is the genetic material Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test Dr. Amy Rogers Bio 139 General Microbiology Hereditary information is carried by DNA Griffith/Avery

More information

Name 10 Molecular Biology of the Gene Test Date Study Guide You must know: The structure of DNA. The major steps to replication.

Name 10 Molecular Biology of the Gene Test Date Study Guide You must know: The structure of DNA. The major steps to replication. Name 10 Molecular Biology of the Gene Test Date Study Guide You must know: The structure of DNA. The major steps to replication. The difference between replication, transcription, and translation. How

More information

1. What is DNA? 2. List anything you know about DNA. (from readings, class, TV?)

1. What is DNA? 2. List anything you know about DNA. (from readings, class, TV?) 1. What is DNA? 2. List anything you know about DNA (from readings, class, TV?) Before we begin, let s investigate the way DNA molecules are set up! http://learn.genetics.utah.edu/content/molec ules/builddna/

More information

THE STUDY OF GENETICS is extremely

THE STUDY OF GENETICS is extremely Exploring Animal Genetics and Probability THE STUDY OF GENETICS is extremely valuable to several areas of science. From medical to agricultural applications, the development of new techniques in studying

More information

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses Course Information Introduction to Algorithms in Computational Biology Lecture 1 Meetings: Lecture, by Dan Geiger: Mondays 16:30 18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 11:30, Taub 2. Grade:

More information

DNA: The Hereditary Molecule

DNA: The Hereditary Molecule 1 CHAPTER DNA: The Hereditary Molecule Chapter 1 Modern Genetics for All Students S 1 CHAPTER 1 DNA: The Hereditary Molecule SECTION A What is DNA?..............................................S5 1. An

More information

I. To understand Genetics - A. Chemical nature of genes had to be discovered B. Allow us to understand how genes control inherited characteristics

I. To understand Genetics - A. Chemical nature of genes had to be discovered B. Allow us to understand how genes control inherited characteristics Ch 12 Lecture Notes - DNA I. To understand Genetics - A. Chemical nature of genes had to be discovered B. Allow us to understand how genes control inherited characteristics 1 II. Griffith and Transformation

More information

How can something so small cause problems so large?

How can something so small cause problems so large? How can something so small cause problems so large? Objectives Identify the structural components of DNA and relate to its function Create and ask questions about a model of DNA DNA is made of genes. Gene

More information

DNA: Structure and Function

DNA: Structure and Function DNA: Structure and Function Biology's biggest moment in the 20th century, as heralded in six paragraphs in The New York Times, May 16, 1953. 2 Research of DNA Structure Chargaff s Rule of Ratios Amount

More information

Unit 5 DNA, RNA, and Protein Synthesis

Unit 5 DNA, RNA, and Protein Synthesis 1 Biology Unit 5 DNA, RNA, and Protein Synthesis 5:1 History of DNA Discovery Fredrick Griffith-conducted one of the first experiment s in 1928 to suggest that bacteria are capable of transferring genetic

More information

DNA Structure, Function and Replication Teacher Notes 1

DNA Structure, Function and Replication Teacher Notes 1 DNA Structure, Function and Replication Teacher Notes 1 This analysis and discussion activity can be used to introduce your students to key concepts about the structure, function and replication of DNA

More information

1. What is DNA? 2. List anything you know about DNA. (from readings, class, TV?)

1. What is DNA? 2. List anything you know about DNA. (from readings, class, TV?) 1. What is DNA? 2. List anything you know about DNA (from readings, class, TV?) Before we begin, let s investigate the way DNA molecules are set up! http://learn.genetics.utah.edu/content/molec ules/builddna/

More information

8.1. KEY CONCEPT DNA was identified as the genetic material through a series of experiments. 64 Reinforcement Unit 3 Resource Book

8.1. KEY CONCEPT DNA was identified as the genetic material through a series of experiments. 64 Reinforcement Unit 3 Resource Book 8.1 IDENTIFYING DNA AS THE GENETIC MATERIAL KEY CONCEPT DNA was identified as the genetic material through a series of experiments. A series of experiments helped scientists recognize that DNA is the genetic

More information

1. What is DNA? 2. List anything you know about DNA. (from readings, class, TV?)

1. What is DNA? 2. List anything you know about DNA. (from readings, class, TV?) 1. What is DNA? 2. List anything you know about DNA (from readings, class, TV?) Before we begin, let s investigate the way DNA molecules are set up! http://learn.genetics.utah.edu/content/molec ules/builddna/

More information

DNA Structure and Replication, and Virus Structure and Replication Test Review

DNA Structure and Replication, and Virus Structure and Replication Test Review DNA Structure and Replication, and Virus Structure and Replication Test Review What does DNA stand for? Deoxyribonucleic Acid DNA is what type of macromolecule? DNA is a nucleic acid The building blocks

More information

Chapter 8 DNA STRUCTURE AND CHROMOSOMAL ORGANIZATION

Chapter 8 DNA STRUCTURE AND CHROMOSOMAL ORGANIZATION Chapter 8 DNA STRUCTURE AND CHROMOSOMAL ORGANIZATION Chapter Summary Even though DNA has been known as a biochemical compound for over 100 years, it was not implicated as the carrier of hereditary information

More information

Introduction to Algorithms in Computational Biology Lecture 1

Introduction to Algorithms in Computational Biology Lecture 1 Introduction to Algorithms in Computational Biology Lecture 1 Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001. This class has been edited from

More information

What Are the Chemical Structures and Functions of Nucleic Acids?

What Are the Chemical Structures and Functions of Nucleic Acids? THE NUCLEIC ACIDS What Are the Chemical Structures and Functions of Nucleic Acids? Nucleic acids are polymers specialized for the storage, transmission, and use of genetic information. DNA = deoxyribonucleic

More information

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University Bioinformatics: Sequence Analysis COMP 571 Luay Nakhleh, Rice University Course Information Instructor: Luay Nakhleh (nakhleh@rice.edu); office hours by appointment (office: DH 3119) TA: Leo Elworth (DH

More information

Chapter 1 The Genetics Revolution MULTIPLE-CHOICE QUESTIONS. Section 1.1 (The birth of genetics)

Chapter 1 The Genetics Revolution MULTIPLE-CHOICE QUESTIONS. Section 1.1 (The birth of genetics) Chapter 1 The Genetics Revolution MULTIPLE-CHOICE QUESTIONS Section 1.1 (The birth of genetics) 1. The early 1900s was an important period for genetics due to which of the following major events? A) the

More information

From AP investigative Laboratory Manual 1

From AP investigative Laboratory Manual 1 Comparing DNA Sequences to Understand Evolutionary Relationships. How can bioinformatics be used as a tool to determine evolutionary relationships and to better understand genetic diseases? BACKGROUND

More information

PROTEIN SYNTHESIS. Or how our bodies make proteins!

PROTEIN SYNTHESIS. Or how our bodies make proteins! PROTEIN SYNTHESIS Or how our bodies make proteins! What is the function of DNA The DNA molecule contains all your hereditary information in the form of genes A gene is a coded section of DNA; it tells

More information

GENETICS. Chapter 1: Cell cycle. Thème 1 : La Terre dans l Univers A. Expression, stabilité et variation du patrimoine génétique.

GENETICS. Chapter 1: Cell cycle. Thème 1 : La Terre dans l Univers A. Expression, stabilité et variation du patrimoine génétique. Introduction: GENETICS 3M = first look at genetics (study of inheritance, discovery of chromosomes, genes, dominant and recessive alleles and the DNA molecule within chromosomes) 2D = not much in fact,

More information

* Hereditary: these molecules are passed from generation to generation.

* Hereditary: these molecules are passed from generation to generation. D) Nucleic Acids - Genetic Material: Carry the info to make proteins act as a blueprint. * Hereditary: these molecules are passed from generation to generation. - Largest molecules in body. - All have

More information

8/21/2014. From Gene to Protein

8/21/2014. From Gene to Protein From Gene to Protein Chapter 17 Objectives Describe the contributions made by Garrod, Beadle, and Tatum to our understanding of the relationship between genes and enzymes Briefly explain how information

More information

DNA, Genes and Chromosomes. Vocabulary

DNA, Genes and Chromosomes. Vocabulary Vocabulary Big Ideas Heredity and Reproduction Understand and explain that every organism requires a set of instructions that specifies its traits, that this hereditary information (DNA) contains genes

More information

DNA & THE GENETIC CODE DON T PANIC! THIS SECTION OF SLIDES IS AVAILABLE AT CLASS WEBSITE

DNA & THE GENETIC CODE DON T PANIC! THIS SECTION OF SLIDES IS AVAILABLE AT CLASS WEBSITE DNA & THE GENETIC CODE DON T PANIC! THIS SECTION OF SLIDES IS AVAILABLE AT CLASS WEBSITE Recommended reading: The Double Helix: A Personal Account of the Discovery of the Structure of DNA, by James D.

More information

Honors Biology Reading Guide Chapter 10 v Fredrick Griffith Ø When he killed bacteria and then mixed the bacteria remains with living harmless

Honors Biology Reading Guide Chapter 10 v Fredrick Griffith Ø When he killed bacteria and then mixed the bacteria remains with living harmless Honors Biology Reading Guide Chapter 10 v Fredrick Griffith Ø When he killed bacteria and then mixed the bacteria remains with living harmless bacteria some living bacteria cells converted to disease causing

More information

An introduction to genetics and molecular biology

An introduction to genetics and molecular biology An introduction to genetics and molecular biology Cavan Reilly September 5, 2017 Table of contents Introduction to biology Some molecular biology Gene expression Mendelian genetics Some more molecular

More information

Part I: Extract DNA. Procedure:

Part I: Extract DNA. Procedure: Part I: Extract DNA Deoxyribose nucleic acid (DNA) is found in all organisms. Strawberries are an excellent source for extracting DNA. They are soft and can be pulverized easily. In addition, strawberries

More information

BIO 2 GO! NUCLEIC ACIDS

BIO 2 GO! NUCLEIC ACIDS BIO 2 GO! NUCLEIC ACIDS 3115 Nucleic Acids are organic molecules that carry the genetic information for every living organism. All living things contain nucleic acids. The DNA and RNA are responsible for

More information

Trait: a characteristic that can vary in size or form from individual to individual within a species; can be passed on from generation to generation

Trait: a characteristic that can vary in size or form from individual to individual within a species; can be passed on from generation to generation The Function of the Nucleus within the Cell (pp. 112-121) Trait: a characteristic that can vary in size or form from individual to individual within a species; can be passed on from generation to generation

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com

More information

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are? 2 strands, has the 5-carbon sugar deoxyribose, and has the nitrogen base Thymine. The actual process of assembling the proteins on the ribosome is called? DNA translation Adenine pairs with Thymine, Thymine

More information

Nucleic Acids: DNA and RNA

Nucleic Acids: DNA and RNA Nucleic Acids: DNA and RNA Living organisms are complex systems. Hundreds of thousands of proteins exist inside each one of us to help carry out our daily functions. These proteins are produced locally,

More information

DNA and RNA. Chapter 12

DNA and RNA. Chapter 12 DNA and RNA Chapter 12 History of DNA Late 1800 s scientists discovered that DNA is in the nucleus of the cell 1902 Walter Sutton proposed that hereditary material resided in the chromosomes in the nucleus

More information

Inheritance of Traits

Inheritance of Traits Cookbooks describe the ingredients and steps needed to make many kinds of dishes. Some cookbooks contain hundreds of recipes. However, someone needs to use the cookbook in order to create the dishes. Without

More information

Two Mark question and Answers

Two Mark question and Answers 1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three

More information

BIOTECH 101 UNDERSTANDING THE BASICS

BIOTECH 101 UNDERSTANDING THE BASICS BIOTECH 101 UNDERSTANDING THE BASICS Genetics is at the forefront of investigations into human variation, disease and biotechnology. Newspapers, TV, magazines, radio and the internet have made genetics

More information

Chapter 11 Quiz #8: February 13 th You will distinguish between the famous scientists and their contributions towards DNA You will demonstrate replication, transcription, and translation from a sample

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Genes and Gene Technology

Genes and Gene Technology CHAPTER 7 DIRECTED READING WORKSHEET Genes and Gene Technology As you read Chapter 7, which begins on page 150 of your textbook, answer the following questions. What If...? (p. 150) 1. How could DNA be

More information

Allele: Chromosome DNA fingerprint: Electrophoresis: Gene:

Allele: Chromosome DNA fingerprint: Electrophoresis: Gene: Essential Vocabulary Allele: an alternate form of a gene; for example, a gene for human hair color may have alleles that cause red or brown hair Chromosome: a cell structure that contains genetic information

More information

Section 14.1 Structure of ribonucleic acid

Section 14.1 Structure of ribonucleic acid Section 14.1 Structure of ribonucleic acid The genetic code Sections of DNA are transcribed onto a single stranded molecule called RNA There are two types of RNA One type copies the genetic code and transfers

More information

DNA & Protein Synthesis UNIT D & E

DNA & Protein Synthesis UNIT D & E DNA & Protein Synthesis UNIT D & E How this Unit is broken down Chapter 10.1 10.3 The structure of the genetic material Chapter 10.4 & 10.5 DNA replication Chapter 10.6 10.15 The flow of genetic information

More information

Chapter 9 WHAT IS DNA?

Chapter 9 WHAT IS DNA? Notes DNA Chapter 9 WHAT IS DNA? DNA= Deoxyribonucleic Acid DNA s job is to hold the entire genetic code for the organism. Human, tree, bacteria, mushroom, paramecium, etc! ALL HAVE DNA! DNA is held on

More information

13.1 RNA Lesson Objectives Contrast RNA and DNA. Explain the process of transcription.

13.1 RNA Lesson Objectives Contrast RNA and DNA. Explain the process of transcription. 13.1 RNA Lesson Objectives Contrast RNA and DNA. Explain the process of transcription. The Role of RNA 1. Complete the table to contrast the structures of DNA and RNA. DNA Sugar Number of Strands Bases

More information

Genes and human health - the science and ethics

Genes and human health - the science and ethics Deoxyribonucleic acid (DNA) - why is it so important? Genes and human health - the science and ethics DNA is essential to all living organisms, from bacteria to man, as it contains a code which specifies

More information

Ch 12.DNA and RNA.Biology.Landis

Ch 12.DNA and RNA.Biology.Landis Identity Section 12 1 DNA (pages 287 294) This section tells about the experiments that helped scientists discover the relationship between genes and DNA. It also describes the chemical structure of the

More information

Concept 5.5: Nucleic acids store and transmit hereditary information

Concept 5.5: Nucleic acids store and transmit hereditary information Concept 5.5: Nucleic acids store and transmit hereditary information The amino acid sequence of a polypeptide is programmed by a unit of inheritance called a gene Genes are made of DNA, a nucleic acid

More information

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these

More information

EOC Review Reporting Category 2 Mechanisms of Genetics

EOC Review Reporting Category 2 Mechanisms of Genetics EOC Review Reporting Category 2 Mechanisms of Genetics The student will demonstrate an understanding of the mechanisms of genetics. Langham Creek High School 2012-2013 By PresenterMedia.com TEK 6A Identify

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Griffith and Transformation (pages ) 1. What hypothesis did Griffith form from the results of his experiments?

Griffith and Transformation (pages ) 1. What hypothesis did Griffith form from the results of his experiments? Section 12 1 DNA (pages 287 294) This section tells about the experiments that helped scientists discover the relationship between genes and DNA. It also describes the chemical structure of the DNA molecule.

More information

DNA and RNA Structure. Unit 7 Lesson 1

DNA and RNA Structure. Unit 7 Lesson 1 Unit 7 Lesson 1 Students will be able to: Explain the structure and function of the DNA and RNA. Illustrate the structure of nucleotide. Summarize the differences between DNA and RNA. Identify the different

More information

Name Class Date. Information and Heredity, Cellular Basis of Life Q: What is the structure of DNA, and how does it function in genetic inheritance?

Name Class Date. Information and Heredity, Cellular Basis of Life Q: What is the structure of DNA, and how does it function in genetic inheritance? 12 DNA Big idea Information and Heredity, Cellular Basis of Life Q: What is the structure of DNA, and how does it function in genetic inheritance? WHAT I KNOW WHAT I LEARNED 12.1 How did scientists determine

More information

Understanding Variation in Human Skin Color

Understanding Variation in Human Skin Color Understanding Variation in Human Skin olor INTRODUTION A look around the world shows that people s skin comes in many different shades from the lightest pink to darkest brown. All of this variation can

More information

NON MENDELIAN GENETICS. DNA, PROTEIN SYNTHESIS, MUTATIONS DUE DECEMBER 8TH

NON MENDELIAN GENETICS. DNA, PROTEIN SYNTHESIS, MUTATIONS DUE DECEMBER 8TH NON MENDELIAN GENETICS. DNA, PROTEIN SYNTHESIS, MUTATIONS DUE DECEMBER 8TH MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY 11/14 11/15 11/16 11/17 11/18 Non-Mendelian Genetics DNA Structure and Replication 11/28

More information

Replication Transcription Translation

Replication Transcription Translation Replication Transcription Translation A Gene is a Segment of DNA When a gene is expressed, DNA is transcribed to produce RNA and RNA is then translated to produce proteins. Genotype and Phenotype Genotype

More information

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes? Bio11 Announcements TODAY Genetics (review) and quiz (CP #4) Structure and function of DNA Extra credit due today Next week in lab: Case study presentations Following week: Lab Quiz 2 Ch 21: DNA Biology

More information

DNA Replication and Protein Synthesis

DNA Replication and Protein Synthesis DNA Replication and Protein Synthesis DNA is Deoxyribonucleic Acid. It holds all of our genetic information which is passed down through sexual reproduction DNA has three main functions: 1. DNA Controls

More information

Molecular Biology. IMBB 2017 RAB, Kigali - Rwanda May 02 13, Francesca Stomeo

Molecular Biology. IMBB 2017 RAB, Kigali - Rwanda May 02 13, Francesca Stomeo Molecular Biology IMBB 2017 RAB, Kigali - Rwanda May 02 13, 2017 Francesca Stomeo Molecular biology is the study of biology at a molecular level, especially DNA and RNA - replication, transcription, translation,

More information

Image adapted from: National Human Genome Research Institute

Image adapted from: National Human Genome Research Institute Jargon buster Image 1: The structure of DNA A double helix with base pairing 1 Image adapted from: National Human Genome Research Institute Allele An allele is one of two or more versions of a gene. An

More information

Name Class Date. Practice Test

Name Class Date. Practice Test Name Class Date 12 DNA Practice Test Multiple Choice Write the letter that best answers the question or completes the statement on the line provided. 1. What do bacteriophages infect? a. mice. c. viruses.

More information

DNA. Essential Question: How does the structure of the DNA molecule allow it to carry information?

DNA. Essential Question: How does the structure of the DNA molecule allow it to carry information? DNA Essential Question: How does the structure of the DNA molecule allow it to carry information? Fun Website to Explore! http://learn.genetics.utah.edu/content/molecules/ DNA History Griffith Experimented

More information