03-511/711 Computational Genomics and Molecular Biology, Fall

Size: px
Start display at page:

Download "03-511/711 Computational Genomics and Molecular Biology, Fall"

Transcription

1 03-511/711 Computational Genomics and Molecular Biology, Fall Problem Set 0 Due Tuesday, September 6th This homework is intended to be a self-administered placement quiz, to help you (and me) determine if you have the background for the course. You may consult textbooks and published materials, but you may not discuss it with other human beings. Collaboration is not allowed on this homework. In order to get full credit, show your intermediate work. 1. Provide a short answer (1-3 sentences) to each of these questions: (a) Consider translation as an algorithm. Describe initiation and elongation using pseudocode. (b) The tertiary structure of a protein is largely determined by hydrophilic and hydrophobic residues. Explain how these residues influence the structure. (c) Consider a cell where the end of a gene is still being transcribed to an RNA intermediate, while the other end of the same RNA molecule is already being translated into an amino acid sequence. What type of cell must this be? Explain. (d) Is it always true that one gene will produce one protein? Explain.

2 03-511/711 Computational Genomics and Molecular Biology, Fall To solve this problem, you will need a table of the genetic code and the following table giving the physico-chemical properties of the 20 amino acids: small hydrophobic polar basic acidic Gly Val Phe Asn Asp Lys Ala Cys Tyr Gln Glu Arg Ser Ile Met His Thr Leu Trp Pro Consider the following sequence of genomic DNA which is a template strand containing the beginning of the open reading frame for a gene: 3 TAAACTGTACAGGGCCATAACT... 5 (a) Find the open reading frame and identify the start codon by circling it. (b) Transcribe the open reading frame into a strand of mrna. Identify the start codon by circling it. (c) What is the peptide encoded by this strand of mrna?

3 03-511/711 Computational Genomics and Molecular Biology, Fall (d) Now, consider the 2nd translated codon from the mrna strand. What is the probability that a single base change in any position of this codon would change the identity of the second amino acid in the protein sequence? (e) What is the probability that a single base change in any position in the 2nd codon would result in an amino acid in a different physico-chemical class? Use the table given above.

4 03-511/711 Computational Genomics and Molecular Biology, Fall An X-linked recessive allele X c produces a red-green colorblindness in humans. A normal woman whose father was colorblind marries a colorblind man. The allele for normal vision is denoted as X N. (a) What genotypes are possible for the mother of the colorblind man? (b) What are the chances that the first child from this marriage will be a colorblind boy? (Assume a sex ratio at birth.) (c) Of the girls produced by these parents, what proportion can be expected to be colorblind? (d) What proportion of the children (sex unspecified) of these parents can be expected to have normal color vision?

5 03-511/711 Computational Genomics and Molecular Biology, Fall You are an intern to a medical doctor in an area with a high incidence of a particular virus. One out of every 75 people has it. The doctor has developed a procedure to determine if someone has the virus. You are in charge of tabulating the data from the trials of this test. If the patient has the virus, the test will correctly give a positive result 98% of the time. However, in healthy people, the test incorrectly predicts that the patient has the virus 7% of the time. The doctor has just performed the procedure on a new patient and the results show that the patient has the virus. Given what you know about the error rates of this test, determine the probability that the patient actually has the virus. (Hint: Use Bayes Rule)

6 03-511/711 Computational Genomics and Molecular Biology, Fall A clique is an undirected graph G = (V, E) in which every pair of vertices is connected by an edge. Suppose V = n. (a) Give the number of edges in G in terms of n. (b) What is the degree of each vertex, v? (c) Let T b be the spanning tree obtained from a breadth first search of G. Give the number of edges in T b in terms of n. (d) The diameter of a tree is defined to be max u,v V {the shortest path from u to v}. What is the diameter of T b? (e) Let T d be the spanning tree obtained from a depth first search of G. Give the number of edges in T d in terms of n. (f) What is the diameter of T d?

7 03-511/711 Computational Genomics and Molecular Biology, Fall Let T be a rooted, 3-ary tree, in which every node has either zero or three children. Let L be the number of leaves in T. (a) Give an expression for N, the number of internal nodes and leaves in T, in terms of L. (b) Give an expression for E, the number of edges in T, in terms of L. (c) What is the minimum depth of T in terms of L? (d) What is the maximum depth of T in terms of L?

8 03-511/711 Computational Genomics and Molecular Biology, Fall Algorithms (a) X and Y have worst case running times no greater than 64N log 2 N and N 3, respectively. Which algorithm has better asymptotic running time? For which values of N would you choose algorithm X over algorithm Y? (You may give an algebraic or graphical answer.) (b) Suppose you have just come up with algorithm Z which has a running time of 48N N. For what values of N should you use algorithm Z over the other two?