Bioinformatics Course AA 2017/2018 Tutorial 2

Size: px
Start display at page:

Download "Bioinformatics Course AA 2017/2018 Tutorial 2"

Transcription

1 UNIVERSITÀ DEGLI STUDI DI PAVIA - FACOLTÀ DI SCIENZE MM.FF.NN. - LM MOLECULAR BIOLOGY AND GENETICS Bioinformatics Course AA 2017/2018 Tutorial 2 Anna Maria Floriano annamaria.floriano01@universitadipavia.it

2 Genome browsers are available for many organisms BUT not all organisms! The majority of the living organisms is yet unknown or not well-known: think of bacterial species! yet to be identified yet to be sequenced genome annotation is not graphically available For such organisms, you have to retrieve data in a different way...

3 Candidatus Midichloria mitochondrii

4 FTP page at NCBI Go to ftp://ftp.ncbi.nih.gov/genomes/archive/old_refseq/ Open the folder Bacteria Search the folder of the organism of interest: Candidatus_Midichloria_mitochondrii_IricVA_uid68687 (trick: press Ctrl+F on your keyboard, and then type Midichloria ) and open it

5 FTP page at NCBI

6 File formats (some of them...) Extension Format Description.fna FASTA Nucleic acids (genome).ffn FASTA Nucleic acids (CDS).faa FASTA Amino acids.frn FASTA Non-coding RNA regions for a genome, in DNA alphabet (e.g. trna, rrna).gbk GenBank flat file Genome annotation

7 FTP page at NCBI

8 You can download many genomes and compare them in many ways, e.g. through a local BLAST (you will learn it during the second part of the Bioinformatics course, with Prof. Sassera and the other tutors)

9 Primer design

10 Primer design PCRs: many different aims e.g. investigating alternative splicing e.g. finishing of a genome

11 Investigating alternative splicing Search NT5E at NCBI How many isoforms? What is the difference among the isoforms?

12

13 Where do we locate our primer pair? A B C D

14 Search NT5E at UCSC and design primers remember to select the correct exons (choose the correct isoform, include your target exon between square brackets) use Primer3 web tool (have a look at the possible settings) check primer specificity

15 Homo sapiens

16 Checking primers specificity You can check the specificity of your primers by running a BLAST search (against what?) Run blastn using primer sequences as queries and see if they are specific! They should not align to other sequences alignment = annealing of your primer to DNA If they do align to some sequences other than NT5E human: The primers should not align at 3 end Only one primer of the primer pair aligned means no amplification (but still, one primer is captured by non-target DNA)

17 Checking primers specificity You can check the specificity of your primers by running a BLAST search Run blastn using primer sequences as queries and see if they are specific! They should not align to other sequences alignment = annealing of your primer to DNA If they do align to some sequences other than NT5E human: The primers should not align at 3 end Only one primer of the primer pair aligned means no amplification (but still, one primer is captured by non-target DNA)

18 Primer design for genome finishing

19 de novo genome sequencing Organism Genomic DNA extraction DNA fragmentation DNA sequencing Reads Organism genome sequence Assembly Contigs A contig (from contiguous) is a set of overlapping DNA segments that together represent a consensus region of DNA

20 Next slides will be focused on BACTERIA! de novo genome sequencing Bioinformatics allow to join the short sequenced fragments into longer fragments, but usually does not allow to obtain the whole chromosome This is mainly due to the repeated sequences along the genomes: assemblers (softwares) are not capable of join them (contigs ends are rich in repeated sequences) ATGGATGGATGGAACTCA......CATTCATGGATGGATGG ATGGATGGATGGCATTCG...

21 Next slides will be focused on BACTERIA! Genome assembly Molecular biology can help solve these ambiguous situations HOW? ATGGATGGATGGAACTCA......CATTCATGGATGGATGG ATGGATGGATGGCATTCG...

22 Next slides will be focused on BACTERIA! Genome assembly Molecular biology can help solve these ambiguous situations HOW? PCRs! ATGGATGGATGGAACTCA......CATTCATGGATGGATGG ATGGATGGATGGCATTCG...

23 Genome finishing From this... To this!

24 Genome finishing Identify the repeated sequences at the ends of the contigs Design primers in the regions just upstream repeated sequences Contig 5 Contig 1

25 Genome finishing Identify the repeated sequences at the ends of the contigs: take contigs ends and align them to all the contigs of your genome of interest analyze the alignment and select the regions in which you can design primers (regions rich in repeats = primers are not specific)

26 Genome finishing Contig 1 >NODE_1_length_188135_cov_ _ID_1724_Left_end(1_to_3000)_cut TAAATCCAGGTGTCCAATAGCAAAATTTATCACTGCTCTTCTACCATCATCAGTACTCATCATGTCGTACTCTGTAGGAAGATCTTTAGATTCAACA CAAAAAGAACGTAGATTGAATGTAAGTTCTACAGCGCCTACTAACTGCACTGCATTATCATCAGCATGATATAACGTATACCCTGACGCATCTTTA GAAATTAAGAATAAATCATCAATTACATATAACTGAGTAGTAGTCTTCGAATGAGAAGCAGTATTCTTAATAAATACACATCGTAATTCTGCATTAAA AACTTCTATACCAGCCTTCTCACATAATTGACGATAGAAATCTCTATAACGCTCTTCATCACCTTCCTTCCTATTACTAACCTCTCTTTCATCTTCAG TAATATAATGAATAAGCTCTTCATCACTTTGCTTTTTACTCTCAACATCCTCTCTCCTACCAACATTCTCTCTTCTTTTATTATCAGCACTTTCATTAG TAATATCTAATTCAAATAAACTTGTAATAAAGCTCTCAGACATTAGCGATATTTCTTCTGTTTTCGCAGTAAAGAAAGGAATTTTTGAATGAGGCGAT TTAGAGGAATAACATCCATCACCACAGTGAGTACTAGCGCTCTTAAGCTCTATAACAGGCTGACCATTATTCTCTGTAACATAATACGCAGTATAAT GATCATCTTTAGTAAACATGATAATCAACTTGCTATCATTACCTACATTGATATTTAAAGTCCAAGCAATAGCGGTATGATCACTTTGCTCTACCACT TTAACAATATAGTAATCGCTTAATTTTACAGAACTATCCTTTTTAAAATCTTCAAGTGCACTTTTTATAAATTCTCGATATTCAATATCTTGAAAATATA ATTCAATTGAAGCACCTTTTCTCTTCAGCCGAGGTAATGACAATAATTGAGAAGGTTTAAAATCATGAGTAGTAAAATCTAATTTTTCATCAACAAAA TACACGTCGGACAAAGAGATATTATAGCCAAAGAATTCTTGATCCTTTCCTTTTTTTATCGCCTCCCAAAAAGCAAAATTCCACAAATTTTCTACATC ATCCTTGAAACTTCCTAAAGAATCATATTCTCGATCATATGCTGTTATAACTTCATCATCTTTTATTGAAGTTGCTACTATCGTAAACTTCTCTTTATC GTTTCGAGAAAAGATTAAATACAAATCCTTCGAAATTTCTGTGCTAAGTCTAACTGCTATAAAATTAGCTTCAATACTTGAAAATATATAATGCGATA CAGATTTGAGCGACTTCTTAATATCTAATTCTTTAGCAATTTCAGTAATAATATCATTTTTTGGCAGTGGCTCTTTCTGCATATTCCCTCCTTCTATC ATATTTACATCTAAAGCCTCTGCTTCTACATATTTAAGCGAAACACCTTCAAAAACCATCTTAGGAAGATAAGAAGGGATTCGGCCAATATCGCCTT TTCCTCTTACTTCTAAGAGCTCCTGCTGATTAAATTCTTTTATCAATTCCGCGTATGTAGCGTCAAAATCTTTTGCATATTTTAGCAAGAACGCCTGT TCGATCTTCTCTTCACTCGCATAGTACGGTGCAATATTCTTTTTTCCGATTTCTACTATAAAGTACCCATGCTCATCACGAGCAATCGCGTAGTACG ATTCTTTAAATCCGCTACTAGTCTTATTTTCATCTTTAGATTTGCTACTCTCATCTATATATTTCACAAACTTCACATCTTCTTCCGGAAGAGAATCCA TCGTAAAATTCACCCCTCTTTCTTCATATCGCTCTTTAAGTTTCTTCAAAAGCGAATCACTAAAAGGTAATTTTCTCAAATGTTCATCACCAAAAGAA GAGTCATTCTTCGTATGCTGTGTCATTGCTTTTTGCAAATGTGTATCAAAAGTATCAAGAGGATAGTACCATAATCACCGATCTAACACATTACATT TTTTCACATTTTTCTTTACTAAAGTGCAGGTAGTGTTAATTTTAGAATAAATCGCAACTATCTAATGCATTATTTCGCTATATAATTCGACTTCATACT TTATTCGAGCCATTAAGAAGTAGTATTTATATCACTCTCTTATCTTTCAAAAATCTCTCTATTTCAAACAAAGCAAGTTAAAAAGCATTCTGATTATGA ACATCTTTTCGCATCCATATTCTTAGTATATATCATTAATACGCAGCACTACCATATATGAAGAAGCGTTGCTCTAAGTCTCCAGTTTGCGGATTAA TAGGAATTGCATAATCAAAGGAGATTGTACCTAAAAACGGTAACTTTA

27 Genome finishing Contig 5 >NODE_5_length_66131_cov_ _ID_1732_Right_end(63132_to_66131)_cut TACAGAAAGTTTCTGCATCTGCAGCCCATTGTAATATCTCGCCAATTAGTGCTTTTCGCGGAAAAGTTGAAGATAACATCTCTTTTTTACCACTTCT CTCTACGATCACAATTCCATTTAATGCTGTTCTCGAGATAGTCTTCAATAACGTTTCTAAGTCAAGTATTTTTTGTTCAGATTGCTCCTCGAGTACGT TCACAAGATCTCTCAAAGCAACACTATTACGTTCATTCTCTATATTTGCACTACCTGTAAAATAACCATTTAGCGCAAACTCAAGATTTACCTTTAGA GAAGTAGATTGTACGAATGATGCATCTTTCCGATTCTTTCCAAAAAGTGCTATCAATTCTTTTAATGATTCTTCAGAAACATCATAGAGATTCACTGC ATCAGAAGAAACAAAAGATGAATCTTGCAGTGTTGAAAATTTCTTCTCTAAAGCTATACGCTTATTTAGATATTCATCATATTGTGAGAAGCTGGTA ATTTTATCGAATAATTTCGCAAATAATGCAGCATTTTCACGATATACGCTACCCGATAAAGTACCTTTAAAAGAAATTCCTTTGTTTAATGTGCTATA AAATTCACTAAATGAATCTGACACAGTCTGTATCATTGCAAGATTCGCCTTACTCGCTTTTTCTAATTCAGAAGAAGCTAATTTTACTTTATGATACA CGTACATCTCTATATCTAGAGCTGCTTTTAAAAATTGTAGTAATTGTGCCAAAAGTACAATTTCAAATTCTGATGCGCCTTCTTTCGCTTTTTCTTTT TTTATTAGATTTCCAAAAGTCACTTCGGCAGAACTTAACGCGTCGAATAAAGAAGCTCTTGCGCTTTTCATTTCAGCGGAAAATTCTTCACTTTCCA TAATATATTGCTCCAACTCAGAATTCATTACATCGCTTCCTCTTGATAGCTCTTTTCCTGTAGCGATTTCTATCAATTTACTCTTATAAAGATATCCA CTGCCATTATCATTAGGAAATTTCACCTTTCCAAAATATTTCTTCGTATCTTCGATCTCAATATTAGAAGGCACTACTGAAGTATTGTATCGTGTATA TAACTGCGTTACGACATCATCATCTACTGAGAACAGTTGTATATTTTTTTGCTCCATTACAAATAAGAGATGTTTCACTCTACCACTGTCTTTCAGAT GAGATACAAAAACATCATACTCGGCATCACCTATAACGAGTTTTACTCGCTTATACTGCAACAGTGCAGCACTAATATCTAATGTAGTACTTTCAGT AATAATACTGTCAAGTATTTTATAGAAATCATCATGCTGTACACTAAAATCTCTGCTGCTAAAACGACTATCACCAATAATTACTGTACCTCGCGTAA CATTACTCTCAAATAAATTAGACTTAACGTATTCTTCATCGCCTACAACGACAATAGCATCTTCGATATTGAAAAGACCGTCCTTTCTGTAATACGTA TTACGCGAAGAAATCTCTAATCCAAGAAATTTTTTCACACCACCTTTTACAATGCCGTTCAAATCGCCTAGCACTGCATAAGCACTGCTTGCTAAAG CCTTATCTGATACTTTTTTCACTTCTCTTCTATTGGCAACATCAGAAAATACTACATAAAAGTCCTCTTCATTATTTTCAAAAACCAAATACACATTTC TTTCGTCTTTTTCATATTCATTAAGCGAAAATTTCAATGCAAGTTGCTTATTAGAAACAGCATAACATACGACGTCTTTGATATTATGCAAAATAAAA GCACCTCCAATACCTTTTAGTATTTCTGCTATTTCTTTACTTAATACGTGAACATTTGGTTTTCTATCCATTAGCCACTCTAAATCCATAGCAGAAAG CTGTCCATCTGCGACTTTAGCAAAGCCCATAATAACATCATCACTAAAGTCGTCTACTTTACCAGATAATGCATTAACTTCATTGGTAAATGCTAAA TTTTCAGAATATTGAACTTTAGCTCCTTTCTCTACTTTACCATTATTAATGATATAACAATCTACATCCTGTGACTCTCTGAAAAGCATCACTGCATG CTCAACACTATCAGCATTGATAACAATTACATAAGAATCTTTGTTATCCACAGTATTAGTATATTTGAGCACTTTATAAGCTCCTACAGTAATTCCGA ATTTTTTACTGAAAACATCTACTATGAAATGATTCATATCTTCTAAAAATTTTTCCTTTTCTATCTCACTAATCTTATCATCAAGATCGCTTTGCCTAAA TTCTGAATATACCATACCGTTAATAAGGCTTATTATTGCATCATTTGTCGCAAAAGAACGCGGTATGTTTAGCACGTGATCTGGTTCTCCGATAAAT TGAACCATCTTGCTGGAAATAACATCATCTACATAATGCTGCACTTTTCCAGCGTCATCGATATAATATAAACCTCTACCACCTTCCTCATCTGATA TATGCATGATTGATTCATTTTGAGTCATTCCGCTCTCACTTCCAGAATGAGATATTACAATAATACGCACAGTACCGCTTTCGATAATAAGATGCGC AAAGCTTTCTCCCTCAAATTTTCCAAATTTGCCATTCTCCTTACTTCTGTCTACTATTTCTGATAAAATACGCTCATTGTCAGACTTT

28 Choose primer sequences and test them using: Operon tool (oligo analysis tool) mfold (nucleic acid folding and hybridization prediction) check primer specificity Remember: Genome finishing optimum primer length is nt they should not form dimers if contigs are coherently oriented: for reverse primer, reverse complement sequence is the one that you will use their melting temperatures should be compatible (usually, optimum is C) the primers should not hybridize with each other (mfold)

29 Genome finishing >5F GCCATTCTCCTTACTTCTG >1R GATGATGGTAGAAGAGCAGTG

30 Genome finishing >5F GCCATTCTCCTTACTTCTG

31 Genome finishing >1R_designed_on_contig_sequence GATGATGGTAGAAGAGCAGTG >1R_final GATGATGGTAGAAGAGCAGTG

32 See you next time!