Genomics and Gene Recognition Genes and Blue Genes

Genomics and Gene Recognition Genes and Blue Genes November 1, 2004

Prokaryotic Gene Structure prokaryotes are simplest free-living organisms studying prokaryotes can give us a sense what is the minimum number of genes for survival great opportunities because of genome sequencing about 300 genes is minimum (must contain genes for replication, and genes to obtain and store energy) currently, about 120 genomes have been finished central dogma of molecular biology

Transcription and Regulation of Gene Expression transcription is highly regulated in all cells in prokaryotes, only about 3% of the genes are undergoing transcription at any given time in eukaryotes, only 0.01% of the genes are transcribed at any given time (for differentiated cells, cells that perform their specific function) which gene is transcribed? it is determined by the growth status of the cell, metabolic condition etc.

Gene Expression the process by which a gene s information is converted into its product (say, a protein) consists of transcription and translation followed by: folding post-translational modification targeting the amount of protein that is expressed depends on the tissue, developmental stage of the cell and metabolic and physiologic state of the cell

Transcription in Prokaryotes all RNA is synthesized by a single species of DNAdependent RNA polymerase the RNA polymerase of E. Coli, so called RNA polymerase holoenzyme is a complex multimeric protein large enough to be visible in the electron microscope it consists of four subunits α 2 ββ σ, where β is the largest subunit σ can be any of the remaining subunits and it recognizes promoters that identify the location of transcription start site β and β contribute to formation of a catalytic site two α subunits are essential for assembly of an enzyme

The Steps in Prokaryotic Transcription 1. binding of RNA polymerase holoenzyme at promoter sites 2. initiation of polymerization 3. chain elongation 4. chain termination

Experimental Promoter Identification Promoters are identified in vitro by a technique called DNA footprinting: RNA polymerase holoenzyme is bound to a putative promoter sequence in a DNA duplex DNA:protein complex is treated with DNase I DNase I cleaves the DNA at sites not protected by bound protein the set of DNA fragments left after DNase I digestion reveals the promoter (by definition, promoter is the RNA polymerase holoenzyme binding site)

Prokaryotic Promoters +1 site is defined as the transcription start site (that base is the first base in the RNA transcript) +2 base is the second base in RNA transcript bases upstream from the initiation site are 1, 2 etc. There is no zero! RNA polymerase binding is typically spanning 40 to +20 the transcript site on the template side is almost always a pyrimidine, so the transcript almost always begins with a purine

Prokaryotic Promoters promoters vary in size from 20 to 200bp typically consist of a 40-bp region upstream from the transcription start within a promoter are two consensus sequence elements a consensus is defined as the bases that appear with highest frequency at each position when a series of sequences believed to have common function are compared the two sequences are: Prinbow box: near 10, whose consensus sequence is TATAAT the sequence in the 35 region containing the TTGACA the two elements are separated by about 17bp of nonconserved sequence the more closely 35 region resembles consensus the greater the efficiency of transcription rrna promoters have a third upstream element at about 55 (recognized by the α subunit)

Prokaryotic Promoters Nucleotide sequences of representative E. Coli promoters

Prokaryotic Gene Structure a gene consists of both promoters and coding parts!

Open Reading Frame (ORF) ribosomes translate triplets of nucleotides into amino acids start codon is usually AUG (which is encoded into methionine) stop codons are UAA, UAG, UGA it is important to determine from which nucleotide to start open reading frame is a sequence from the start till the first stop codon 5' 3' atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa M P K L N S V E G F S S F E D D V * 2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat C P S * I A * R G F H H L R T M Y 3 gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata A Q A E * R R G V F I I * G R C I

Conceptual Translation 1960s and 1970s easy to determine amino acid sequence of a protein 1980s it becomes easier to sequence DNA many proteins are INFERRED from DNA sequences this is conceptual translation after conceptual translation, protein s structure and function are inferred (database search, structure prediction, function prediction, cell localization prediction, post-translational modification prediction and so on)

Genetic Code

Termination Sequences ~90% of prokaryotic operons contain intrinsic terminators Properties of intrinsic terminators inverted nucleotide sequences, e.g. (5 ) CGGATG CATCCG (3 ) run of ~6 uracils follows the repeat sequence RNA can adopt stable secondary structure (directly related to length of repeats) RNA secondary structure causes RNA polymerase to slow down transcription poly-u causes termination

Strength of Base Pairing G:C - stronger A:T - weaker

GC Content in Prokaryotes for every G in a double stranded DNA there must be a C it has been noticed that some bacteria have increased GC content some bacteria have 75% GC content, some have 25% relative ratios between G/C to A/T content is relatively uniform across bacterial genomes high GC or AT contents are consequences of the work of DNA polymerases and DNA repair mechanisms over time many bacteria have genes from other organisms through horizontal gene transfer new genes are different in nucleotide composition than old genes there is also codon usage bias

Prokaryotic Gene Density very high gene density in bacteria and archaea, >85% of DNA code for genes Example: E. Coli 4,288 genes average coding sequence: 950 bp separation between genes: 118 bp Characteristics: long open reading frames (60 or more codons) simple promoter sequences due to a small number of sigma factors recognizable transcriptional termination signals

DNA Polymerase duplicates genetic information the most accurate enzyme, makes less than one error in billion nucleotides it also proofreads the copied base, and if it is wrong, it repairs it DNA polymerase is used in forensics to help create DNA out of bad samples one cell has several DNA polymerases, some help replicate some repair DNA copies