Exercise I, Sequence Analysis

Size: px
Start display at page:

Download "Exercise I, Sequence Analysis"

Transcription

1 Exercise I, Sequence Analysis atgcacttgagcagggaagaaatccacaaggactcaccagtctcctggtctgcagagaagacagaatcaacatgagcacagcaggaaaa gtaatcaaatgcaaagcagctgtgctatgggagttaaagaaacccttttccattgaggaggtggaggttgcacctcctaaggcccatgaagt tcgtattaagatggtggctgtaggaatctgtggcacagatgaccacgtggttagtggtaccatggtgaccccacttcctgtgattttaggccat gaggcagccggcatcgtggagagtgttggagaaggggtgactacagtcaaaccaggtgataaagtcatcccactcgctattcctcagtgtgg aaaatgcagaatttgtaaaaacccggagagcaactactgcttgaaaaacgatgtaagcaatcctcaggggaccctgcaggatggcaccag caggttcacctgcaggaggaagcccatccaccacttccttggcatcagcaccttctcacagtacacagtggtggatgaaaatgcagtagcc aaaattgatgcagcctcgcctctagagaaagtctgtctcattggctgtggattttcaactggttatgggtctgcagtcaatgttgccaaggtca ccccaggctctacctgtgctgtgtttggcctgggaggggtcggcctatctgctattatgggctgtaaagcagctggggcagccagaatcattg cggtggacatcaacaaggacaaatttgcaaaggccaaagagttgggtgccactgaatgcatcaaccctcaagactacaagaaacccatcc aggaggtgctaaaggaaatgactgatggaggtgtggatttttcatttgaagtcatcggtcggcttgacaccatgatggcttccctgttatgttgt catgaggcatgtggcacaagtgtcatcgtaggggtacctcctgattcccaaaacctctcaatgaaccctatgctgctactgactggacgtac ctggaagggagctattcttggtggctttaaaagtaaagaatgtgtcccaaaacttgtggctgattttatggctaagaagttttcattggatgcat taataacccatgttttaccttttgaaaaaataaatgaaggatttgacctgcttcactctgggaaaagtatccgtaccattctgatgttttgaga caatacagatgttttcccttgtggcagtcttcagcctcctctaccctacatgatctggagcaacagctgggaaatatcattaattctgctcatc acagattttatcaataaattacatttgggggctttccaaagaaatggaaattgatgtaaaattatttttcaagcaaatgtttaaaatccaaatg agaactaaataaagtgttgaacatcagctggggaattgaagccaataaaccttccttcttaaccatt

2 TUTORIAL: Sequence comparison with all the other data base sequences

3 Software : WU-Blast2 (Washington University Basic Local Alignment Search Tool Version 2.0)

4 Example sequence Download the sequence from the following link: We are looking for: - High score and low P-values alignments.

5 First step: Enter the program and paste the sequence:

6 WU-Blast2: Results RESULTS: We can choose between receiving a mail informing us when the results are ready or wait for them (although this last option is sometimes slow).

7 WU-Blast2: Program PROGRAM: Blastp: Compares a protein sequence with the selected data base. Blastx: Translates a nucletide sequence to the six possible protein sequence and compares them with the data base. Blastn: Compares a nucleotide sequence to a nucleotide data base. tblastn: Compare a protein sequence with a nucleotide data base. tblastx: Compares the six translated nucleotides phases with the corresponding six translated nucloetide data base.

8 WU-Blast2: Data base DATABASE: This selects the data base with which we want to compare.

9 WU-Blast2: Matrix MATRIX: The matrix option enables us to select which sustitution matrix we want to use.

10 WU-Blast2: Exp. Thr. EXP. THR.: We establish a threshold for the alignment. Whataver data base sequence which has the corresponding threshold alignment will appear In the result file.

11 WU-Blast2: Filter FILTER: This option enables us to filter statistically interesting alignments, which are not biologically important. For example, it could be interesting to not take into account a proline rich segment, in an alignment, which could leave us unspecified alignments with a great protein family. Ex.. XNU (Claverie and States, 1993) This masks regions that have periodic repetitions.

12 WU-Blast2: View Filter VIEW FILTER: This selects to show or hide the filtered sequence results.

13 WU-Blast2: Stats STATS: Type of statistical function used to calculate the statistical trust in a certain aligned pair.

14 WU-Blast2: Sort SORT: Sorts out the resulting scores Depending on the chosen option.

15 WU-Blast2: Alignments ALIGNMENTS: The Alignment option lets us choose the maximum number of alignments which will appear in the final result.

16 All the options should be changed to default mode. We should then click on Run Blast

17 Waiting screen:

18 Result list

19 Explore all the chosen options.

20 Result Analysis (Click on Blast Results) The best alignment is obtained when Clicking on; Mouse Stratifin. The protein on our exercise is an identical sequence. IN FACT THIS IS THE EXERCISE PROTEIN!!!

21 Result Analysis (Click on Show Alignment) Query Information line Sequence found in the data base.

22 Analysis results The sequence from our exercise Information line Data base sequence The aligned sequences are connected with symbols. If there is a matxh, both aminoacids are linked by the leeter of the aminoacid. + The aminoacids are not identical but are chemically related (ej. leucine and valine). - Represents a gap A space represents no match.

23 This link will take us to Uniprot.

24

25 EXERCISE PART I: Looking for sequences in data bases

26 1st Exercise: Using homology alignment, look for the proteins with a similar sequence: >query sequence MLGIWTLLPL VLTSVARLSS KSVNAQVTDI NSKGLELRKT VTTVETQNLE GLHHDGQFCH KPCPPGERKA RDCTVNGDEP DCVPCQEGKE YTDKAHFSSK CRRCRLCDEG HGLEVEINCT RTQNTKCRCK PNFFCNSTVC EHCDPCTKCE HGIIKECTLT SNTKCKEEGS RSNVKRKEVQ KTCRKHRKEN QGSHESPTLN PETVAINLSD VDLSKYITTI AGVMTLSQVK GFVRKNGVNE AKIDEIKNDN VQDTAEQKVQ LLRNWHQLHG KKEAYDTLIK DLKKANLCTL AEKIQTIILK DITSDSENSN FRNEIQSLV a) Which is the protein that shows larger homology? b) What is the important difference of the alignment? c) What positions are these differences mentioned at? d) What biological sense do these differences represent? (characteristics)

27 2nd Exercise: Using homology alignment, look for the proteins with a similar sequence: >query sequence TTGTCCCCCATTCAACAGCAGGTAACACCATTCGTTATGGCAGGCAATAGACCTTTCAACAAACAACAGA CTGATAACCGCGAACGCGATCCACAAGTTGCCGGGCTAAAAGTGCCTCCGCACTCGATCGAAGCGGAGCA GTCGGTGTTGGGCGGTTTAATGCTGGATAACGAACGCTGGGATGATGTAGCCGAGCGTGTGGTGGCAGAC GATTTTTACACCCGCCCACACCGTCATATCTTTACTGAAATGGCGCGTTTGCAGGAAAGCGGTAGTCCTA TCGATCTGATTACCCTTGCGGAATCGCTGGAACGCCAGGGGCAACTCGATAGCGTCGGTGGTTTTGCTTA TCTGGCAGAGCTGTCAAAAAATACGCCAAGTGCGGCGAACATCAGTGCATATGCGGATATCGTGCGCGAA CGTGCCGTTGTTCGTGAGATGATCTCCGTTGCTAACGAGATTGCCGAAGCTGGTTTTGATCCACAAGGGC GTACCAGTGAAGATCTGCTCGACCTTGCTGAATCCCGCGTCTTTAAAATTGCCGAAAGTCGTGCAAACAA AGACGAAGGGCCGAAGAACATCGCCGATGTGCTCGACGCAACCGTGGCGCGTATTGAGCAGTTGTTTCAG CAGCCACACGATGGCGTTACCGGGGTAAATACCGGTTATGACGATCTCAACAAAAAAACCGCTGGCTTGC AGCCGTCGGATTTGATCATCGTCGCCGCGCGTCCGTCGATGGGTAAAACAACATTTGCGATGAATCTCGT CGAAAACGCAGCGATGTTGCAGGATAAACCGGTGCTTATCTTTTCGCTGGAGATGCCATCAGAACAGATT ATGATGCGTTCTCTGGCGTCGCTGTCGCGCGTTGACCAGACTAAAATCCGTACCGGGCAGCTCGATGATG AAGACTGGGCACGCATTTCCGGCACCATGGGTATTTTGCTCGAAAAACGCAATATCTATATCGATGATTC CTCCGGCTTGACGCCAACGGAGGTGCGTTCCCGCGCACGCCGTATTGCCCGTGAACACGGCGGCATCGGG CTTATCATGATCGACTACCTGCAACTGATGCGCGTACCGGCGCTTTCCGATAACCGTACGCTGGAAATTG CAGAAATCTCCCGCTCGCTGAAAGCACTGGCGAAAGAACTGAACGTGCCGGTGGTGGCGCTGTCCCAGTT GAACCGTTCTCTGGAACAACGTGCCGACAAACGCCCGGTCAACTCCGACCTGCGTGAATCTGGCTCTATC GAGCAGGATGCGGACTTGATCATGTTTATCTATCGTGATGAGGTGTATCACGAAAACAGTGATTTAAAAG GCATCGCGGAAATTATTATCGGTAAACAACGTAACGGCCCAATCGGGACGGTACGCCTGACCTTTAACGG

28 2nd Exercise: a) With what other protein can you say it shows a greater homology? b) When was this entrance made? When was it updated for the last time? c) Which is the latest bibliographic reference? d) What sequence domain is actually described by this protein?

29 The software required to solve the following problems is BLAST, from NCBI: Use Megablast when we don t know which organism does the sequence belong to. If this is the case, you should copy the BLAST result identifier and look for information about the gene or protein inside NCBI. Use different organism genomes when you wish to know or align it with another specific organism

30 3rd Exercise: Look for the gene that is most similar to the following sequence: a) Which is the name of this? b) Which organism does this gene belong to? c) What is the coded protein? d) Which is this protein sequence?

31 4th Exercise: Using again the alignment by homology, look for the proteins that are similar to the following human sequence: a) Which is the human protein? b) To which organism does it correspond? c) What is the name of the gene that codes it? d) Which is this gene`s sequence?

32 5th Exercise: Repete the last exercise using PSI-BLAST from NCBI. Do you see any difference with BLAST?