Why study sequence similarity?

Size: px
Start display at page:

Download "Why study sequence similarity?"

Transcription

1 Sequence Similarity

2 Why study sequence similarity? Possible indication of common ancestry Similarity of structure implies similar biological function even among apparently distant organisms Example context: establishing possible causal relationship between wide use of antibiotics in agriculture and spread of antibiotic resistant bacteria

3 Antibiotic resistant bacteria have evolved rapidly can thrive when antibiotics kill nonresistant bugs horizontal gene transfer can speed development of antibiotic resistance Source: world/bactresanti.html

4 Figure 3.2: Vertical and horizontal gene transfer

5 Figure 3.3: How exposure to antibiotics selects for the survival of resistant cells in a population of bacteria

6 Figure 3.4: A plasmid carrying an antibiotic-resistance gene can be transferred to a new cell by conjugation

7 Antibiotic resistant bacteria Widespread use of antibiotics means nonresistant strains die, leaving resistant strains to survive and multiply; phenomenon observed in hospitals, care centers, etc. Once some bacteria in environment are resistant, HGT can occur & spread resistance faster than would otherwise occur (through mutation)

8 Antibiotic resistant bacteria Use of antibiotics common in agriculture Presence in human pathogens of resistant genes that are highly similar to genes found in animals would provide evidence that HGT has occurred

9 Gene similarity Homologues: similar sequences homology homologous Orthologs: a similar gene appears in two different organisms where several other such similarities occur organisms have common evolutionary ancestry Xenologs: similar gene found in organisms that have little else in common evidence of HST

10 Similarity: how close is close? Proteins considered homologous if 25% of residues are identical DNA homologous with 70% identity Threshold level for HST: 95% identity

11 Establishing homology: alignment Match sequences in meaningful way Account for differences in sequence length due to indels: insertions deletions Scoring system based on closeness of match

12 BLAST: Basic Local Alignment & Search Tool Versions exist to compare protein protein blastp: use when you want to learn about function of protein protein nucleotide tblastn: used to compare protein with DNA to discover new genes encoding simple proteins nucleotide nucleotide blastn: we ll use this to look for HGT evidence

13 BLAST servers Home server at NCBI Other servers available worldwide BLAST servers very popular (and busy) Japan is sleeping when it s morning in the USA Europe is sleeping when it s afternoon in the USA

14 Using blastn Start with query sequence nucleotide sequence you want to investigate BLAST compares query with every GenBank sequence performs alignment reports matches with high degree of similarity

15 Using blastn Point browser to NCBI website choose BLAST on home page scroll down to Basic BLAST and choose nucleotide

16 Using blastn Paste your query sequence in the window, as shown:

17 Using blastn Scroll down to the next box on the page, and select the database to be searched (Nucleotide, in this case)

18 Using blastn Scroll down to the BLAST button and click it Then wait Eventually, you ll see a screen like this:

19 BLAST results Graphical summary query sequence at top each bar represents portion of another sequence similar to query red: most similar homologous to query pink: not as good green: borderline blue/black: twilight zone

20 BLAST results: graphics section

21 BLAST results: description section

22 BLAST results: description section Accession: database entry s GenBank accession number Description: usually identifies organism, some characteristics of sequence Scores: based on number of matches in alignment E-value: statistical significance of score

23 E-value Estimate of the number of times a match could have been produced by chance The lower the e-value, the greater the significance: greater similarity between query & target greater confidence of homology identical sequences have e-value of 0; anything above.001 is considered insignificant E-values are written in scientific notation form

24 Alignment section