Bioinformatics Course AA 2017/2018 Tutorial 2

Similar documents
Transcription:

UNIVERSITÀ DEGLI STUDI DI PAVIA - FACOLTÀ DI SCIENZE MM.FF.NN. - LM MOLECULAR BIOLOGY AND GENETICS Bioinformatics Course AA 2017/2018 Tutorial 2 Anna Maria Floriano annamaria.floriano01@universitadipavia.it

Genome browsers are available for many organisms BUT not all organisms! The majority of the living organisms is yet unknown or not well-known: think of bacterial species! yet to be identified yet to be sequenced genome annotation is not graphically available For such organisms, you have to retrieve data in a different way...

Candidatus Midichloria mitochondrii

FTP page at NCBI Go to ftp://ftp.ncbi.nih.gov/genomes/archive/old_refseq/ Open the folder Bacteria Search the folder of the organism of interest: Candidatus_Midichloria_mitochondrii_IricVA_uid68687 (trick: press Ctrl+F on your keyboard, and then type Midichloria ) and open it

FTP page at NCBI

File formats (some of them...) Extension Format Description.fna FASTA Nucleic acids (genome).ffn FASTA Nucleic acids (CDS).faa FASTA Amino acids.frn FASTA Non-coding RNA regions for a genome, in DNA alphabet (e.g. trna, rrna).gbk GenBank flat file Genome annotation

FTP page at NCBI

You can download many genomes and compare them in many ways, e.g. through a local BLAST (you will learn it during the second part of the Bioinformatics course, with Prof. Sassera and the other tutors)

Primer design

Primer design PCRs: many different aims e.g. investigating alternative splicing e.g. finishing of a genome

Investigating alternative splicing Search NT5E at NCBI How many isoforms? What is the difference among the isoforms?

Where do we locate our primer pair? A B C D

Search NT5E at UCSC and design primers remember to select the correct exons (choose the correct isoform, include your target exon between square brackets) use Primer3 web tool (have a look at the possible settings) check primer specificity

Homo sapiens

Checking primers specificity You can check the specificity of your primers by running a BLAST search (against what?) Run blastn using primer sequences as queries and see if they are specific! They should not align to other sequences alignment = annealing of your primer to DNA If they do align to some sequences other than NT5E human: The primers should not align at 3 end Only one primer of the primer pair aligned means no amplification (but still, one primer is captured by non-target DNA)

Checking primers specificity You can check the specificity of your primers by running a BLAST search Run blastn using primer sequences as queries and see if they are specific! They should not align to other sequences alignment = annealing of your primer to DNA If they do align to some sequences other than NT5E human: The primers should not align at 3 end Only one primer of the primer pair aligned means no amplification (but still, one primer is captured by non-target DNA)

Primer design for genome finishing

de novo genome sequencing Organism Genomic DNA extraction DNA fragmentation DNA sequencing Reads Organism genome sequence Assembly Contigs A contig (from contiguous) is a set of overlapping DNA segments that together represent a consensus region of DNA

Next slides will be focused on BACTERIA! de novo genome sequencing Bioinformatics allow to join the short sequenced fragments into longer fragments, but usually does not allow to obtain the whole chromosome This is mainly due to the repeated sequences along the genomes: assemblers (softwares) are not capable of join them (contigs ends are rich in repeated sequences) ATGGATGGATGGAACTCA......CATTCATGGATGGATGG ATGGATGGATGGCATTCG...

Next slides will be focused on BACTERIA! Genome assembly Molecular biology can help solve these ambiguous situations HOW? ATGGATGGATGGAACTCA......CATTCATGGATGGATGG ATGGATGGATGGCATTCG...

Next slides will be focused on BACTERIA! Genome assembly Molecular biology can help solve these ambiguous situations HOW? PCRs! ATGGATGGATGGAACTCA......CATTCATGGATGGATGG ATGGATGGATGGCATTCG...

Genome finishing From this... To this!

Genome finishing Identify the repeated sequences at the ends of the contigs Design primers in the regions just upstream repeated sequences Contig 5 Contig 1

Genome finishing Identify the repeated sequences at the ends of the contigs: take contigs ends and align them to all the contigs of your genome of interest analyze the alignment and select the regions in which you can design primers (regions rich in repeats = primers are not specific)

Genome finishing Contig 1 >NODE_1_length_188135_cov_164.169_ID_1724_Left_end(1_to_3000)_cut TAAATCCAGGTGTCCAATAGCAAAATTTATCACTGCTCTTCTACCATCATCAGTACTCATCATGTCGTACTCTGTAGGAAGATCTTTAGATTCAACA CAAAAAGAACGTAGATTGAATGTAAGTTCTACAGCGCCTACTAACTGCACTGCATTATCATCAGCATGATATAACGTATACCCTGACGCATCTTTA GAAATTAAGAATAAATCATCAATTACATATAACTGAGTAGTAGTCTTCGAATGAGAAGCAGTATTCTTAATAAATACACATCGTAATTCTGCATTAAA AACTTCTATACCAGCCTTCTCACATAATTGACGATAGAAATCTCTATAACGCTCTTCATCACCTTCCTTCCTATTACTAACCTCTCTTTCATCTTCAG TAATATAATGAATAAGCTCTTCATCACTTTGCTTTTTACTCTCAACATCCTCTCTCCTACCAACATTCTCTCTTCTTTTATTATCAGCACTTTCATTAG TAATATCTAATTCAAATAAACTTGTAATAAAGCTCTCAGACATTAGCGATATTTCTTCTGTTTTCGCAGTAAAGAAAGGAATTTTTGAATGAGGCGAT TTAGAGGAATAACATCCATCACCACAGTGAGTACTAGCGCTCTTAAGCTCTATAACAGGCTGACCATTATTCTCTGTAACATAATACGCAGTATAAT GATCATCTTTAGTAAACATGATAATCAACTTGCTATCATTACCTACATTGATATTTAAAGTCCAAGCAATAGCGGTATGATCACTTTGCTCTACCACT TTAACAATATAGTAATCGCTTAATTTTACAGAACTATCCTTTTTAAAATCTTCAAGTGCACTTTTTATAAATTCTCGATATTCAATATCTTGAAAATATA ATTCAATTGAAGCACCTTTTCTCTTCAGCCGAGGTAATGACAATAATTGAGAAGGTTTAAAATCATGAGTAGTAAAATCTAATTTTTCATCAACAAAA TACACGTCGGACAAAGAGATATTATAGCCAAAGAATTCTTGATCCTTTCCTTTTTTTATCGCCTCCCAAAAAGCAAAATTCCACAAATTTTCTACATC ATCCTTGAAACTTCCTAAAGAATCATATTCTCGATCATATGCTGTTATAACTTCATCATCTTTTATTGAAGTTGCTACTATCGTAAACTTCTCTTTATC GTTTCGAGAAAAGATTAAATACAAATCCTTCGAAATTTCTGTGCTAAGTCTAACTGCTATAAAATTAGCTTCAATACTTGAAAATATATAATGCGATA CAGATTTGAGCGACTTCTTAATATCTAATTCTTTAGCAATTTCAGTAATAATATCATTTTTTGGCAGTGGCTCTTTCTGCATATTCCCTCCTTCTATC ATATTTACATCTAAAGCCTCTGCTTCTACATATTTAAGCGAAACACCTTCAAAAACCATCTTAGGAAGATAAGAAGGGATTCGGCCAATATCGCCTT TTCCTCTTACTTCTAAGAGCTCCTGCTGATTAAATTCTTTTATCAATTCCGCGTATGTAGCGTCAAAATCTTTTGCATATTTTAGCAAGAACGCCTGT TCGATCTTCTCTTCACTCGCATAGTACGGTGCAATATTCTTTTTTCCGATTTCTACTATAAAGTACCCATGCTCATCACGAGCAATCGCGTAGTACG ATTCTTTAAATCCGCTACTAGTCTTATTTTCATCTTTAGATTTGCTACTCTCATCTATATATTTCACAAACTTCACATCTTCTTCCGGAAGAGAATCCA TCGTAAAATTCACCCCTCTTTCTTCATATCGCTCTTTAAGTTTCTTCAAAAGCGAATCACTAAAAGGTAATTTTCTCAAATGTTCATCACCAAAAGAA GAGTCATTCTTCGTATGCTGTGTCATTGCTTTTTGCAAATGTGTATCAAAAGTATCAAGAGGATAGTACCATAATCACCGATCTAACACATTACATT TTTTCACATTTTTCTTTACTAAAGTGCAGGTAGTGTTAATTTTAGAATAAATCGCAACTATCTAATGCATTATTTCGCTATATAATTCGACTTCATACT TTATTCGAGCCATTAAGAAGTAGTATTTATATCACTCTCTTATCTTTCAAAAATCTCTCTATTTCAAACAAAGCAAGTTAAAAAGCATTCTGATTATGA ACATCTTTTCGCATCCATATTCTTAGTATATATCATTAATACGCAGCACTACCATATATGAAGAAGCGTTGCTCTAAGTCTCCAGTTTGCGGATTAA TAGGAATTGCATAATCAAAGGAGATTGTACCTAAAAACGGTAACTTTA

Genome finishing Contig 5 >NODE_5_length_66131_cov_121.866_ID_1732_Right_end(63132_to_66131)_cut TACAGAAAGTTTCTGCATCTGCAGCCCATTGTAATATCTCGCCAATTAGTGCTTTTCGCGGAAAAGTTGAAGATAACATCTCTTTTTTACCACTTCT CTCTACGATCACAATTCCATTTAATGCTGTTCTCGAGATAGTCTTCAATAACGTTTCTAAGTCAAGTATTTTTTGTTCAGATTGCTCCTCGAGTACGT TCACAAGATCTCTCAAAGCAACACTATTACGTTCATTCTCTATATTTGCACTACCTGTAAAATAACCATTTAGCGCAAACTCAAGATTTACCTTTAGA GAAGTAGATTGTACGAATGATGCATCTTTCCGATTCTTTCCAAAAAGTGCTATCAATTCTTTTAATGATTCTTCAGAAACATCATAGAGATTCACTGC ATCAGAAGAAACAAAAGATGAATCTTGCAGTGTTGAAAATTTCTTCTCTAAAGCTATACGCTTATTTAGATATTCATCATATTGTGAGAAGCTGGTA ATTTTATCGAATAATTTCGCAAATAATGCAGCATTTTCACGATATACGCTACCCGATAAAGTACCTTTAAAAGAAATTCCTTTGTTTAATGTGCTATA AAATTCACTAAATGAATCTGACACAGTCTGTATCATTGCAAGATTCGCCTTACTCGCTTTTTCTAATTCAGAAGAAGCTAATTTTACTTTATGATACA CGTACATCTCTATATCTAGAGCTGCTTTTAAAAATTGTAGTAATTGTGCCAAAAGTACAATTTCAAATTCTGATGCGCCTTCTTTCGCTTTTTCTTTT TTTATTAGATTTCCAAAAGTCACTTCGGCAGAACTTAACGCGTCGAATAAAGAAGCTCTTGCGCTTTTCATTTCAGCGGAAAATTCTTCACTTTCCA TAATATATTGCTCCAACTCAGAATTCATTACATCGCTTCCTCTTGATAGCTCTTTTCCTGTAGCGATTTCTATCAATTTACTCTTATAAAGATATCCA CTGCCATTATCATTAGGAAATTTCACCTTTCCAAAATATTTCTTCGTATCTTCGATCTCAATATTAGAAGGCACTACTGAAGTATTGTATCGTGTATA TAACTGCGTTACGACATCATCATCTACTGAGAACAGTTGTATATTTTTTTGCTCCATTACAAATAAGAGATGTTTCACTCTACCACTGTCTTTCAGAT GAGATACAAAAACATCATACTCGGCATCACCTATAACGAGTTTTACTCGCTTATACTGCAACAGTGCAGCACTAATATCTAATGTAGTACTTTCAGT AATAATACTGTCAAGTATTTTATAGAAATCATCATGCTGTACACTAAAATCTCTGCTGCTAAAACGACTATCACCAATAATTACTGTACCTCGCGTAA CATTACTCTCAAATAAATTAGACTTAACGTATTCTTCATCGCCTACAACGACAATAGCATCTTCGATATTGAAAAGACCGTCCTTTCTGTAATACGTA TTACGCGAAGAAATCTCTAATCCAAGAAATTTTTTCACACCACCTTTTACAATGCCGTTCAAATCGCCTAGCACTGCATAAGCACTGCTTGCTAAAG CCTTATCTGATACTTTTTTCACTTCTCTTCTATTGGCAACATCAGAAAATACTACATAAAAGTCCTCTTCATTATTTTCAAAAACCAAATACACATTTC TTTCGTCTTTTTCATATTCATTAAGCGAAAATTTCAATGCAAGTTGCTTATTAGAAACAGCATAACATACGACGTCTTTGATATTATGCAAAATAAAA GCACCTCCAATACCTTTTAGTATTTCTGCTATTTCTTTACTTAATACGTGAACATTTGGTTTTCTATCCATTAGCCACTCTAAATCCATAGCAGAAAG CTGTCCATCTGCGACTTTAGCAAAGCCCATAATAACATCATCACTAAAGTCGTCTACTTTACCAGATAATGCATTAACTTCATTGGTAAATGCTAAA TTTTCAGAATATTGAACTTTAGCTCCTTTCTCTACTTTACCATTATTAATGATATAACAATCTACATCCTGTGACTCTCTGAAAAGCATCACTGCATG CTCAACACTATCAGCATTGATAACAATTACATAAGAATCTTTGTTATCCACAGTATTAGTATATTTGAGCACTTTATAAGCTCCTACAGTAATTCCGA ATTTTTTACTGAAAACATCTACTATGAAATGATTCATATCTTCTAAAAATTTTTCCTTTTCTATCTCACTAATCTTATCATCAAGATCGCTTTGCCTAAA TTCTGAATATACCATACCGTTAATAAGGCTTATTATTGCATCATTTGTCGCAAAAGAACGCGGTATGTTTAGCACGTGATCTGGTTCTCCGATAAAT TGAACCATCTTGCTGGAAATAACATCATCTACATAATGCTGCACTTTTCCAGCGTCATCGATATAATATAAACCTCTACCACCTTCCTCATCTGATA TATGCATGATTGATTCATTTTGAGTCATTCCGCTCTCACTTCCAGAATGAGATATTACAATAATACGCACAGTACCGCTTTCGATAATAAGATGCGC AAAGCTTTCTCCCTCAAATTTTCCAAATTTGCCATTCTCCTTACTTCTGTCTACTATTTCTGATAAAATACGCTCATTGTCAGACTTT

Choose primer sequences and test them using: Operon tool (oligo analysis tool) http://www.operon.com/tools/oligo-analysis-tool.aspx mfold (nucleic acid folding and hybridization prediction) http://unafold.rna.albany.edu/?q=mfold/dna-folding-form check primer specificity Remember: Genome finishing optimum primer length is 20-22 nt they should not form dimers if contigs are coherently oriented: for reverse primer, reverse complement sequence is the one that you will use their melting temperatures should be compatible (usually, optimum is 58-60 C) the primers should not hybridize with each other (mfold)

Genome finishing >5F GCCATTCTCCTTACTTCTG >1R GATGATGGTAGAAGAGCAGTG

Genome finishing >5F GCCATTCTCCTTACTTCTG

Genome finishing >1R_designed_on_contig_sequence GATGATGGTAGAAGAGCAGTG >1R_final GATGATGGTAGAAGAGCAGTG

See you next time!