Molecular Databases and Tools

Size: px
Start display at page:

Download "Molecular Databases and Tools"

Transcription

1 NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton 21/04/2010

2 Exploring bioinformatics tools for pairwise alignment, multiple sequence alignment, primer design and functional analysis. Session Objectives It is the aim of this session to introduce you to the following areas: NCBI databases and tools (mostly DNA) Navigation between databases Sequence databases Data formats and conversions Searching sequence databases (e.g., BLAST) Bioinformatics tools that are available to design and choose primers. Multiple sequence alignment programs and editors Session Outcomes At the end of today s course you will be able to: retrieve sequences from sequence data repositories browse the UCSC Genome Browser and navigate to other data resources understand which databases contain which information and how to access it know how to design primers using suitable bioinformatics tools understand and know how to create an MSA know which programs to use to create and visualise an MSA able to know the advantages and disadvantages of the MSA methods/programs know the uses of an MSA know how to design primers using suitable bioinformatics tools (e.g., eprimer3 and primer BLAST) understand the difficulties involved in using bioinformatics tools for primer design appreciate the difficulties when navigating various data resources 2

3 Pairwise Alignment Sequence comparisons are used to detect evolutionary relationships between organisms, proteins or gene sequences. They are also used to discover the function of a novel gene or the structure of an unknown protein, by comparing an already characterised gene or protein, since we assume that sequences that are very similar often have similar structure/function. If two sequences from different organisms are evolutionary related, it means they have a common ancestor and it is said to be homologous. By comparing sequence 1 and sequence 2, or aligning them, we may infer the evolutionary process starting from the same ancestor sequence and then changing through mutations. However, the snag is deciding how similar is similar. A general rule is: if your sequences are more than 100 amino acids or nucleotides long, the rule says that you can label proteins as homologous if 25% of the amino acids are identical and DNA as similar if 70% of the nucleotides are identical. Anything below this threshold is referred to as the twilight zone. Local and Global Alignments The two types of dynamic algorithms mentioned above are described as local and global respectively. A local alignment identifies regions of similarity within long sequences that are often widely divergent overall. Local alignments are often preferable, but can be more difficult to calculate because of the additional challenge of identifying the regions of similarity. A global alignment "forces" the alignment to span the entire length of all query sequences Searching sequence databases The growing size and diversity of the public sequence databases makes them invaluable resources for molecular biologists. When investigating a novel DNA/protein sequence, a fast, cheap and potentially very rewarding analysis involves scanning EMBL/GenBank, or UNIPROT/SWISSPROT for sequences with homology to your own sequence. Database searching is one of the first and most important steps in analysing a new sequence. If your unknown sequence has a similar copy already in the databases, a search will quickly reveal this fact and if the copy is well annotated you will have various clues to help you in further studying your sequence. Database searches usually provide the first clues of whether the sequence belongs to an already studied and well known protein family. If 3

4 there is a similarity to a sequence that is from another species, then they may be homologous (i.e., sequences that descended from a common ancestral sequence). Knowing the function of a homologous sequence will often give a good indication of the identity of the unknown sequence. Many programs for database searching already exist, but still many more are being developed. They can be spilt into two types: heuristic and dynamic algorithms. Dynamic algorithms including Needleman and Wunsch (1970) and Smith Waterman (1981) can be used, but the time taken to complete such a task is longer than desirable. To counteract this, heuristic search algorithms are used to routinely search large databases. The most commonly employed algorithms are FASTA and BLAST (Basic Local Alignment Search Tool). The following is a brief description of some programs: BLAST performs fast database searching combined with rigorous statistics for judging the significance of matches. FASTA can be used to compare either protein or DNA sequences and hence the name, which stands for Fast All. BLITZ is an automatic electronic mail server for the MPsrch program. MPsrch allows you to perform sensitive and extremely fast comparisons of your protein sequences against Swiss Prot protein sequence database using the Smith and Waterman best local similarity algorithm. All programs identify local regions of conserved residues between sequences. This approach allows the program to identify similarities between a query sequence and sequences in the database in the shortest possible time. We ll talk about BLAST today but you might want to look at others if time allows. 4

5 BLAST the most popular and used data mining tool The BLAST algorithm and family of programs rely on work on the statistics of local sequence alignments by Altschul et al i. The statistics allow us to estimate the probability of obtaining an alignment with a particular score. The BLAST algorithm permits nearly all sequence matches above a cutoff 1 to be located efficiently in a database. There are many flavours of BLAST that exist, so you can search both protein and nucleotide sequence databases with protein or nucleotide sequences! We deal with the different flavours today, depending on the type of query sequence and the type of biological question we hope to ask. BLAST program Database Query blastp Protein Protein blastn Nucleotide Nucleotide blastx Translated DNA Protein tblastn Translated DNA Protein tblastx Translated DNA Translated DNA BLAST input parameters you can change The default parameters that BLAST uses are quite optimal and well tested. However, here are some reasons you may wish to change the default parameters: The sequence your interested in contains many identical residues; it has a biased composition (change the sequence filtering) BLAST doesn t report any results (change substitution matrix or gap penalities) Your match has a borderline e value (change substitution matrix or gap penalities) Too many matches are reported (change database you are searching OR filter reported entries by keyword OR increase the number of reported matches OR increase the e value) BLAST output BLAST reports back a list of sequence matches to the query sequence ordered by score that represents the significance of the match. In BLAST, the reported value is referred to as the p value, as it represents the probability of a random sequence matching a database sequence with the same 1 This cutoff is usually generated by the BLAST program, based on the parameters you have selected. 5

6 or better score than the query. Sometimes the e value is reported, which represents the number of random matches with scores greater or equal to the query sequence that would be found by chance in a database of the same size. significant the match. It follows that both values, the smaller the value, the more What are you looking for? Several important features are worthy of note in BLAST output: Look for high scores with low p values. This means the match is unlikely to be random. Look for clusters of high scores at the top of the hitlist for hint of a potential family Look for trends in type of sequences matched BLASTing with DNA sequences which program for what problem?? blastn: compares a DNA sequence with a DNA database. You can use this for mapping oligonucleotides, cdnas and PCR products to a genome; annotating genomic DNA; screening repetitive elements and cross species sequence exploration. blastx: use this for finding protein coding regions in genomic cdna; determining if a cdna corresponds to a known protein. tblastx: by comparing a DNA translated into a protein with a DNA database also translated into protein allows cross species gene prediction at the genome or transcript level (ESTs) and searching for genes that are not yet in protein databases. 6

7 Dotplots visualising a pairwise alignment One of the earliest methods of comparing two protein or nucleotide sequences was to create a dot plot. This matrix can reveal the presence of insertions and deletions because they shift the diagonal horizontally or vertically. There are many programs that produce dot plots; however you can do simple dot plots by hand (DIY dot plots). A dot plot can be useful if you plot a sequence against itself as internal repeats, tandem genes, repeated domains in proteins and regions of low complexity can be highlighted. Please note that although useful a dot plot cannot resolve similarity that is interrupted by regions of low similarity or insertions/deletions. This is a dot plot of two similar, but not identical sequences 7

8 Sequence Databases and Retrieval There is a wealth of information that can be associated with a gene (see diagram above for a sample). Although this data is interlinked through links, each type of information is stored in a separate database. An example of this would be Entrez Gene (hosted at the NCBI) has a focus on the gene information, whereas dbsnp, holds database SNP entries. You can link between the two data resources, so you can find out more information about the SNPs of a particular gene. Sequence Retrieval System As these databases contain hundreds of thousands of sequences, searching through them requires the processing power of a computer search engine. The Sequence Retrieval System (SRS) has been designed to do just that. SRS is available at many sites over the world. However, every site allows access to a different set of databases and, sometimes, search and analysis tools. Of course, sequences and their information can be directly retrieved by searching primary sequence databases; for example, if you are doing more work with proteins, you might want to investigate the Expert Protein Analysis System (ExPASy) held at the Swiss Institute of Bioinformatics. This site not only holds the SwissProt and TrEMBL databases, but also offers many tools for the user to analyse their protein sequences. 8

9 Exercise 1: Pairwise alignments using EMBOSS 1) Retrieve sequences from NCBI: U14680 and NM_ and save the sequences in FASTA format in notepad. Call the filenames something sensible! (Note: files usually containing a FASTA sequence are usually given the prefix.fas or.fasta) 2) Go to the EMBOSS align website ( 3) Paste one sequence into the top box and the other in the second box. Check the parameters: Molecule = DNA; and Method=EMBOSS:needle(global), and run. 4) The output file (.output) in the Needle results, click on and save the page. 5) Run the program again, but this time choose the parameter: Method=EMBOSS::smith(local). 6) Compare the results. 9

10 A use of BLAST: primer design Software for primer design eprimer3 (primer3) is the standard software used to design primers. Its function: picks PCR primers and hybridization oligos (EMBOSS). eprimer3 is an interface to the 'primer3' program from the Whitehead Institute. Primer3 picks primers for PCR reactions, considering as criteria: oligonucleotide melting temperature, size, GC content, and primer dimer possibilities, PCR product size, positional constraints within the source sequence, and miscellaneous other constraints. All of these criteria are user specifiable as constraints. eprimer3 can also pick hybridisation oligos that are internal to the product. BLAST would then check the specificity of the primers by using blastn for short exact matches. However, more recently, a new BLAST method has become available Primer BLAST. This is a combination of the primer3 software and BLAST, thus allowing you to design primers and check specificity in one search!! Primer Design Guidelines 1. primers should be bases in length; 2. base composition should be 50 60% (G+C); 3. primers should end (3') in a G or C, or CG or GC: this prevents "breathing" of ends and increases efficiency of priming; 4. Tms between o C are preferred; 5. 3' ends of primers should not be complementary (ie. base pair), as otherwise primer dimers will be synthesised preferentially to any other product; 6. primer self complementarity (ability to form 2 o structures such as hairpins) should be avoided; 7. runs of three or more Cs or Gs at the 3' ends of primers may promote mispriming at G or C rich sequences (because of stability of annealing), and should be avoided. 10

11 Exercise 2 : Designing a primer using Primer BLAST Scenario Based on your microarray results, a specific gene is upregulated under a cold stress condition. You decided to go for a qpcr to confirm the microarray data. So, you need good primers to amplify the gene. You may decide to design the primers yourself, or you may use a program which will do it for you. Either way, we do advise you to check the resulting primers, see where they are in the sequence, and choose then carefully! Your experiment depends on the quality of the primers. To perform the following exercises, you will need the nucleotide sequence of H. sapiens fau 1 gene and the pgem T vector. I have provided them at:. The ID code for the gene sequence is P35544 (a Uniprot Identifier) and the vector sequence is at the specified place, called pgem.fasta. 1) Go to the NCBI Entrez website and search for FAU1 human against the Nucleotide database. At the top of the results, with a light blue background, click on the FAU link to view the entry. 2) As we want to design a set of primers to amplify this gene, we are going to use this sequence. I have already downloaded this sequence and stored in the hsfau1_dna.fasta file. Open the file in notepad. 3) Go to the Primer Blast website. Copy the file in the text box under the PCR template heading. 4) Keep all the parameters the same and click Get primers 5) What do your results suggest? Would you be okay with these? Looking at the options (mentioned above), you can specify the region of gene where the program should find a good primer. There are different ways to calculate the melting temperature for the primers. Using the first formula on the whiteboard, calculate the TM for the two first primers resulting from eprimer3. This is a really simple formula; if your primer were very long (more than 25MER), the size of the primer would need to be 11

12 considered, as indicated in the second formula. Compare your result with the result obtained by eprimer3. Exercise 3 : primersearch checking vector sequence (optional) There is another aspect that should be considered when you are chosen primers. Do the primers align to your vector sequence? PrimerSearch Function: Searches DNA sequences for matches with primer pairs. Description : primersearch reads in primer pairs from an input file and searches them against sequence(s) specified by the user. Each of the primers in a pair is searched against the sequence and potential amplimers are reported. The user can specify a maximum percent mismatch level; for example, 10% mismatch on a primer of length 20bp means that the program will classify a primer as matching a sequence if 18 of the 20 base pairs matches. It will only report matches if both primers in the pair have a match in opposite orientations. At the following website Follow the steps: 1) Paste the fasta file from the pgem.fasta file into the top text box and upload the primer file PGEMxprimers for the Primer file option. Allow a 20 percentage mismatch. Click Run. 2) Look at this file in notepad, and analyse the result. 12

13 Multiple Sequence Alignment Background on Multiple Sequence Alignments In the construction of a multiple sequence alignment (MSA), it is assumed that all sequences are biologically or evolutionarily related. An MSA allows the identification of highly conserved regions, corresponding to important functional or structural features within families of related proteins, and hence the study of evolutionary relationships between them. An MSA can be described as a tabular description of the relationships between proteins, where rows represent individual sequences, and the columns the residue positions. Similar residues are brought into vertical register by introducing gaps, so that the relative position of residues within the alignment is preserved. The result is an expression of the similarities and dissimilarities between the sequences. Why? There are many reasons why you might want to construct a multiple sequence alignment. These include: To highlight regions of similarity, divergence and mutations. To provide more information than a single sequence. (e.g. for an even more sensitive search to find other, more distant, family members.) Creating a consensus will highlight functionally important domains or residues. It could reveal errors in protein sequence prediction (or even in sequencing) Secondary structure and other predictions improve with multiple alignments Evolutionary analysis (phylogeny). To find novel motifs (e.g. using Hidden Markov Model techniques). To select appropriate primers for a gene family. To be used as input to identify changes in functionality due to missense mutations (ALIGN GVD, SIFT) 13

14 MSA methods MSA process can either be carried out manually in an editor (e.g., JalView, GeneDoc or CINEMA; see table 1 below for a detailed explanation of these) or using automatic alignment programs. The underlying process to construct an MSA is common to both manual and most automatic methods: groups of sequences that share a high percentage identity are grouped and aligned, and then these sequence groups are alignment with each other. When the protein family is highly conserved, both types of method are likely to produce exactly the same alignment. However, for more diverse families, automatic alignment methods tend to be error prone and result in biologically inaccurate alignments. In this case, it is better to align sequences by hand. However, depending on the size of the protein family, this may be a time consuming process. Almost everyone will want to start a MSA project using one of the automatic methods and then refining them by eye. There are several alignment programs, separated into a number of categories, depending on the strategy used to construct the alignment. 14

15 Alignment Editor CINEMA 5 Description CINEMA (Colour INteractive Editor for Multiple Alignments) is a tool for alignment construction, modification and visualisation. In addition to its advantage of allowing interactive alignment over the Web, CINEMA provides links to the primary data sources, thereby giving access to upto date sequences and alignments. The program accepts any number of sequences of any length, which may be loaded in various ways. By default, alignments are coloured according to intuitive residue property groups. Nevertheless, menu options allow user specification of residue colours (and hence residue groups) and to swap between different colouring alternatives. Flexible colouring facilitates the identification of core conserved regions of alignments and especially of key motifs that may be associated with the structure or function of the protein. The program offers various "pluglets": e.g., dotplots, CLUSTALW, a 3D backbone viewer, BLAST, etc.. JalView SeaView BioEdit JalView has the advantage of being available as both a downloadable application and an applet online. The application offers a CLUSTALW plug in, performs Smith Waterman pairwise alignment, and is able to calculate and draw UPGMA and NJ trees based on percent identity distances. Executable binaries (and source code) are available for many platforms. It also offers a CLUSTALW plug in, calculates simple dotplots, and allows motifs to be saved. Written for Windows 95/98/NT/2000/XP. It is an intuitive multiple document interface with convenient features makes alignment and manipulation of sequences relatively easy on your desktop computer. There are additional features that allow connection to bioinformatics tools that are available on the internet. 15

16 MSA and Primer BLAST For this example, we are going to use the Human Myglobin gene (geneid = 4151). There are three highly conserved variants of this gene: NM_ , NM_ and NM_ Our aim is to design a primer that will amplify this gene. Exercise 4: Multiple Alignment of Variants (optional) 1) Look at the Human Myglobin gene entry by searching at the NCBI with the search term 4151[uid]. You should be able to view all the information about the gene. 2) Retrieve sequences at NCBI by typing the following in the search box NM_ NM_ NM_ and search against the nucleotide database. Click all tick boxes, go to top of page and change the Summary option in the Display drop down menu to FASTA. Change the Send to drop down menu to Text and save page as myglobin.seq 3) Go to the EBI ClustalW website ( and Upload your myglobin.seq file in the interface. You don t need to change any other parameters. Click Run. 4) On the results page, click Start Jalview. You will be able to see the nucleotide alignment of the 3 variants and see that there is a high level of conservation. Exercise 4.1: Using the alignment to choose some primers (optional) 1) Using the alignment can you pick a forward and reverse primer that will be able to amplify the myglobin gene. Remember you can use the primer design guidelines specified earlier. I have chosen the following ones: 5 GATGAAGGCGTCTGAGGA 3 and 5 GATCTTGTGCTTGGTGGC 3. You can either use these are the ones you have chosen for the following exercise. 16

17 Exercise 4.2: Using Primer Blast (optional) Blast is used to compare a query sequence against a sequence database in a pairwise manner. This time we are going to use it to see check the specificity of our primers to a DNA template. Before the Primer Blast, you could do the same by using blastn for short exact matches! 1) Go to Primer Blast ( blast/) and in under the Primer Parameters heading put your forward primer in the Use my own forward primer (5' >3' on plus strand) box and the reverse primer in the Use my own reverse primer (5' >3' on minus strand) box. Leave all other parameters the same. Click Get Primers. Are your primers specific enough? 17

18 Exploring Sequence Formats, Sequence Databases, Genome Browsers and Multiple Sequence Alignments The basis of this exercise in identifying SNPs for BRCA1 has been taken from the paper: R. Rajasekaran, C. Sudandiradoss, C. George Priya Doss, Rao Sethumadhavan, Identification and in silico analysis of functional SNPs of the BRCA1 gene, Genomics, Volume 90, Issue 4, October 2007, Pages Exercise 5: Investigating genes and SNPs using the UCSC Browser (with a quick look at Uniprot for detailed protein function information) In this exercise we want to be able to position the gene on the genome and look at its SNPs. 1) Go to the UCSC Genome Browser Gateway and go to the Human (Hs) Genome Browser Gateway by clicking on Genomes. 2) In position/search term text box type the accession number NP_ ) Look at results. What do you notice about the position of this gene on the genome? 4) Click on the 4 th link for UCSC genes. We will now manipulate which tracks we can see using the selection boxes on the web page. 5) Hide anything that you feel is hindering your view of the gene and its SNPs. Hint: hide spliced ESTs, Repeat Masker, and make dense the conservation. What can you say about the SNPs in the BRCA1 gene? 6) Click on the track that is highlighted and you will a Description and Page Index. This links out to other databases that contain important functional information. 7) Explore the UniprotKB entry. What can you tell me about the status of this entry? 8) Go back to the Description and Page Index page in UCSC browser. Now go to the Entrez Gene entry at the NCBI. Look at the entry and the all the possible links to other NCBI data resources. What are these data resources? 18

19 Exercise 6: Viewing SNPs from dbsnp on 3D structures using Cn3D (optional/demo shows how tools don t always work due to inconsistencies in identifiers!!) In particular, you might want to look at the SNPs in dbsnp from the GeneView. 1) Link to the dbsnp database by clicking on the GeneView SNP Report Link under the Genotype heading. Can you answer the following questions from the dbsnp entry: How many gene models are there? How many synonymous mutations and missense mutations are there? 2) To view your synonymous SNPs on a 3D structure you can use the NCBI viewer CN3D, but you will need to install this locally on your computer in order for it to work. In the table of all mutations, you will be able to detect which SNPs have been validated and are mapped to 3D structures etc.. 3) Choose a SNP that has a 3D structure (e.g., the one in exon 5). Click on the link Yes to go to the SNP3D entry. You will notice that there are several isoforms of this protein; each represented in this entry. We will concentrate on the first (isoform 1). 4) Select both the SNPs to view in Cn3D by either ticking the boxes under the heading CN3D next to the information of the SNPs and then the button Selected ; or just select the button All underneath. 5) From the structure summary page, you will be able to see how many mutations there are and where they are mapped. For the top structure, click on the pink bar to the right to see your query aligned to the structure's protein sequence, with an option to open an interactive view of the alignment and 3D structure in Cn3D. 6) View the alignment and 3D structure in Cn3D. The SNPs are marked in a gold colour. It is easier to view in the wire style (change this using the Style >Rendering shortcuts menu). 19

20 Exercise 7: Retrieval of Sequences from NCBI 1) We want to download the protein sequence for this gene. Scroll down the entry until you mrna and Proteins section. Click on the NP_ link. This should be the first entry and should be on a line that looks something like: NM_ NP_ breast cancer type 1 susceptibility protein isoform 1 2) Now you should be looking at the NP_ entry. Download the FASTA sequence for this entry. At the top of the web page, underneath tabs, click on the FASTA link. Your web browser should now display the FASTA sequence. I have already downloaded this sequence for you in a file called hs_brca1.fasta. You could have done this two ways: (1) cut and paste sequence into notepad; or (2) use download link at RHS of page. 3) Open the hs_brac1.fasta file in notepad to check it. Exercise 8: Sequence Format Conversion Ultimately the sequence downloaded in the previous exercise is to be added to other BRCA1 protein sequences to create a Multiple Sequence Alignment (MSA), which then be used for further analysis in ALIGN GVD. 1) On your desktop, there should be a file called allbrca1_unaligned.phy. Open this file in notepad and look at the sequence format. Do you know or recognize this format? Google it to see if it is a regular format. This sequence format is often used with a suite of programs that concentrate on inferring phylogenies. This file format is not popular and is not often used as an input sequence file format. Also, remember that the sequence format is in FASTA format. YOU CANNOT MIX SEQUENCE FORMATS IN THE SAME FILE. In order to convert the file to the correct format, we can use Seqret. 2) Go to Seqret. Upload the file allbrca1_unaligned.phy and leave all other options as they are. Run Seqret interactively. 20

21 3) The output can be seen in the web browser you need to download it to the computer. Do this by right clicking on the output link and choose Save Target As. Call the file brca1_unaligned.fasta. 4) Open the file in notepad and now add you human sequence from your other file to the file you have just created (brca1_unaligned.fasta). You can do this using simple copy and paste. Save the file, which now includes brca1 sequences AND the human sequence as allbrca1_hs_unaligned.fasta. Exercise 9: Creating a MSA using CLUSTALW ClustalW is one of many MSA tools. It can be used via a web browser or can be installed locally on a server and used on the command line. Today we are to use the web browser. 1) Go to the ClustalW web page. Upload you file allbrca1_hs_unaligned.fasta and run ClustalW. 2) There are four output files: you are interested in the Alignment file. Right click on the link and save to computer. Exercise 10: Viewing alignment in BioEdit Bioedit (and Jalview, which you may have heard of) are not only tools for visualizing MSAs, but also allow you to edit them. In this exercise we will be manipulating the alignment and the order of the sequences so that the MSA will be able to be used in future programs. For example, ALIGN GVD sets out the requirement that the MSA must be in FASTA format and the Human sequence MUST be at the TOP of the alignment. 1) Start Bioedit, by going to the Start Menu; All Programs; Bioedit. There may also be a short cut on the desktop. 2) From the file menu, upload your alignment file produced from ClustalW. Make sure you choose the file type All files and you are looking in the directory where your file has been saved. Have a look at the alignment using the scroll bar at the bottom. Also explore other features that are available in this editor. For example, shading the alignment according to conservation. 21

22 3) First, we are going to place the human sequence at the top of the file. The human sequence is represented by the accession number: NP_ To do this: highlight the sequence, left click and hold down on the mouse whilst hovering over the selected sequence and move the accession number (and thus the sequence) to the top of the alignment. Let go of the mouse button. Your human sequence should now be at the top of the alignment. 4) Next, using the alignment printed out from the library of ALIGN_GVD alignments, see if you can spot any differences between your alignment and the standard. Hint: look specifically at the pufferfish and sea urchin sequences as these are more difficult to align as they are more diverse than the other sequences. Do you think you need to change anything? There will be a demo on how to do this. 5) Once you are happy with your alignment, save it as a fasta file. To do this: go to File; Save As. Save file as type FASTA and call it allbrca_clustalw.fasta. Exercise 11: Using ALIGN GVD to predict the effect of missense mutations 1) On the ALIGN GVD web site, click on the Use Align GVGD on menu on LHS. 2) Upload your file allbrca1_clustalw.fasta as the MSA file and the file called brca1_mutations.txt for the substitutions list. 3) Run ALIGN GVD. Are there any mutations that all likely to interfere with function? As an alternative, try using the alignment supplied ALIGN GVD. Exercise 12: Displaying your MSA in ESPript 2.2 (point of information/optional) 1) Go to: 2) Click on: execute 3) On Main alignment file:: get the correct alignment of all the BRCA1 sequences. Then go to Output layout: Font 7 Col 65 4) And press submit 22

23 URLS used (or mentioned in this practical Others are found in the Bioinformatics_links.txt file. Exploring function using NCBI resources UCSC Genome Browser NCBI Pairwise Alignment and Sequence Similarity NCBI BLAST submission page NCBI Sequence EXPASY translate tool SRS DNA Sequence Analysis (Primer searching) Uniprot eprimer3 docs online eprimer3 dna translator Primer BLAST blast/ Multiple Protein Sequence Alignment NCBI ClustalW Muscle TCoffee bin/muscle/input_muscle.py server.cnrs mrs.fr/tcoffee/tcoffee_cgi/index.cgi 23

24 Dialign Jalview CDD ESPript2.2 bielefeld.de/dialign/submission.html ALIGN GVD i Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997) Nucleic Acids Res. 25:

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010 Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010 Genomics is a new and expanding field with an increasing impact

More information

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5 Molecular Biology-2017 1 PRESENTING SEQUENCES As you know, sequences may either be double stranded or single stranded and have a polarity described as 5 and 3. The 5 end always contains a free phosphate

More information

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University Annotation Walkthrough Workshop NAME: BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University A Simple Annotation Exercise Adapted from: Alexis Nagengast,

More information

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)

More information

ELB18S. Entry Level Bioinformatics. Basic Bioinformatics Sessions. Practical 4: Primer Design November (Second 2018 run of this Course)

ELB18S. Entry Level Bioinformatics. Basic Bioinformatics Sessions. Practical 4: Primer Design November (Second 2018 run of this Course) ELB18S Entry Level Bioinformatics 05-09 November 2018 (Second 2018 run of this Course) Basic Bioinformatics Sessions Primer Design The prime intention of this exercise is to design a way to amplify a DNA

More information

Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G

Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Introduction: A genome is the total genetic content of

More information

COMPUTER RESOURCES II:

COMPUTER RESOURCES II: COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

Sequence Based Function Annotation

Sequence Based Function Annotation Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen The tutorial is designed to take you through the steps necessary to access SNP data from the primary database resources:

More information

Tutorial for Stop codon reassignment in the wild

Tutorial for Stop codon reassignment in the wild Tutorial for Stop codon reassignment in the wild Learning Objectives This tutorial has two learning objectives: 1. Finding evidence of stop codon reassignment on DNA fragments. 2. Detecting and confirming

More information

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html

More information

A Prac'cal Guide to NCBI BLAST

A Prac'cal Guide to NCBI BLAST A Prac'cal Guide to NCBI BLAST Leonardo Mariño-Ramírez NCBI, NIH Bethesda, USA June 2018 1 NCBI Search Services and Tools Entrez integrated literature and molecular databases Viewers BLink protein similarities

More information

Why study sequence similarity?

Why study sequence similarity? Sequence Similarity Why study sequence similarity? Possible indication of common ancestry Similarity of structure implies similar biological function even among apparently distant organisms Example context:

More information

Hands-On Four Investigating Inherited Diseases

Hands-On Four Investigating Inherited Diseases Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise

More information

BLASTing through the kingdom of life

BLASTing through the kingdom of life Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the main database of nucleotide sequences at the National Center for Biotechnology

More information

user s guide Question 1

user s guide Question 1 Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966

More information

FUNCTIONAL BIOINFORMATICS

FUNCTIONAL BIOINFORMATICS Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.

More information

The University of California, Santa Cruz (UCSC) Genome Browser

The University of California, Santa Cruz (UCSC) Genome Browser The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,

More information

Exercise I, Sequence Analysis

Exercise I, Sequence Analysis Exercise I, Sequence Analysis atgcacttgagcagggaagaaatccacaaggactcaccagtctcctggtctgcagagaagacagaatcaacatgagcacagcaggaaaa gtaatcaaatgcaaagcagctgtgctatgggagttaaagaaacccttttccattgaggaggtggaggttgcacctcctaaggcccatgaagt

More information

Why Use BLAST? David Form - August 15,

Why Use BLAST? David Form - August 15, Wolbachia Workshop 2017 Bioinformatics BLAST Basic Local Alignment Search Tool Finding Model Organisms for Study of Disease Can yeast be used as a model organism to study cystic fibrosis? BLAST Why Use

More information

CAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU

CAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU CAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU !2 Sequence Alignment! Global: Needleman-Wunsch-Sellers (1970).! Local: Smith-Waterman (1981) Useful when commonality

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

Hot Topics. What s New with BLAST?

Hot Topics. What s New with BLAST? Hot Topics What s New with BLAST? Slides based on NCBI talk at American Society of Human Genetics October 2005 Hot Topics Outline I. New BLAST Algorithm: Discontiguous MegaBLAST II. New Databases III.

More information

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Lecture 3 February 13, 2003 Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

Using the Genome Browser: A Practical Guide. Travis Saari

Using the Genome Browser: A Practical Guide. Travis Saari Using the Genome Browser: A Practical Guide Travis Saari What is it for? Problem: Bioinformatics programs produce an overwhelming amount of data Difficult to understand anything from the raw data Data

More information

Finding Genes, Building Search Strategies and Visiting a Gene Page

Finding Genes, Building Search Strategies and Visiting a Gene Page Finding Genes, Building Search Strategies and Visiting a Gene Page 1. Finding a gene using text search. For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium. Hint: use

More information

Investigating Inherited Diseases

Investigating Inherited Diseases Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise to inherited diseases.

More information

BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology

BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology Jeremy Buhler March 15, 2004 In this lab, we ll annotate an interesting piece of the D. melanogaster genome. Along the way, you ll get

More information

Briefly, this exercise can be summarised by the follow flowchart:

Briefly, this exercise can be summarised by the follow flowchart: Workshop exercise Data integration and analysis In this exercise, we would like to work out which GWAS (genome-wide association study) SNP associated with schizophrenia is most likely to be functional.

More information

Array-Ready Oligo Set for the Rat Genome Version 3.0

Array-Ready Oligo Set for the Rat Genome Version 3.0 Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.

More information

Sequence Analysis. BBSI 2006: Lecture #(χ+3) Takis Benos (2006) BBSI MAY P. Benos 1

Sequence Analysis. BBSI 2006: Lecture #(χ+3) Takis Benos (2006) BBSI MAY P. Benos 1 Sequence Analysis (part III) BBSI 2006: Lecture #(χ+3) Takis Benos (2006) BBSI 2006 31-MAY-2006 2006 P. Benos 1 Outline Sequence variation Distance measures Scoring matrices Pairwise alignments (global,

More information

Finding Genes, Building Search Strategies and Visiting a Gene Page

Finding Genes, Building Search Strategies and Visiting a Gene Page Finding Genes, Building Search Strategies and Visiting a Gene Page 1. Finding a gene using text search. For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium. Hint: use

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

TUTORIAL: PCR ANALYSIS AND PRIMER DESIGN

TUTORIAL: PCR ANALYSIS AND PRIMER DESIGN C HAPTER 8 TUTORIAL: PCR ANALYSIS AND PRIMER DESIGN Introduction This chapter introduces you to tools for designing and analyzing PCR primers and procedures. At the end of this tutorial session, you will

More information

Sequence Annotation & Designing Gene-specific qpcr Primers (computational)

Sequence Annotation & Designing Gene-specific qpcr Primers (computational) James Madison University From the SelectedWorks of Ray Enke Ph.D. Fall October 31, 2016 Sequence Annotation & Designing Gene-specific qpcr Primers (computational) Raymond A Enke This work is licensed under

More information

Analyzing an individual sequence in the Sequence Editor

Analyzing an individual sequence in the Sequence Editor BioNumerics Tutorial: Analyzing an individual sequence in the Sequence Editor 1 Aim The Sequence editor window is a convenient tool implemented in BioNumerics to edit and analyze nucleotide and amino acid

More information

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

Two Mark question and Answers

Two Mark question and Answers 1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three

More information

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries. PRACTICAL 1: BLAST and Sequence Alignment The EBI and NCBI websites, two of the most widely used life science web portals are introduced along with some of the principal databases: the NCBI Protein database,

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome

More information

Chapter 2: Access to Information

Chapter 2: Access to Information Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI

More information

601 CTGTCCACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGAGAGACCACATGGTCCTT GACAGGTGTGTTAGACGGGAAAGCTTTCTAGGGTTGCTTTTCTCTCTGGTGTACCAGGAA >>>>>>>>>>>>>>>>>>

601 CTGTCCACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGAGAGACCACATGGTCCTT GACAGGTGTGTTAGACGGGAAAGCTTTCTAGGGTTGCTTTTCTCTCTGGTGTACCAGGAA >>>>>>>>>>>>>>>>>> BIO450 Primer Design Tutorial The most critical step in your PCR experiment will be designing your oligonucleotide primers. Poor primers could result in little or even no PCR product. Alternatively, they

More information

Making Sense of DNA and Protein Sequences. Lily Wang, PhD Department of Biostatistics Vanderbilt University

Making Sense of DNA and Protein Sequences. Lily Wang, PhD Department of Biostatistics Vanderbilt University Making Sense of DNA and Protein Sequences Lily Wang, PhD Department of Biostatistics Vanderbilt University 1 Outline Biological background Major biological sequence databanks Basic concepts in sequence

More information

Aaditya Khatri. Abstract

Aaditya Khatri. Abstract Abstract In this project, Chimp-chunk 2-7 was annotated. Chimp-chunk 2-7 is an 80 kb region on chromosome 5 of the chimpanzee genome. Analysis with the Mapviewer function using the NCBI non-redundant database

More information

Evolutionary Genetics. LV Lecture with exercises 6KP

Evolutionary Genetics. LV Lecture with exercises 6KP Evolutionary Genetics LV 25600-01 Lecture with exercises 6KP HS2017 >What_is_it? AATGATACGGCGACCACCGAGATCTACACNNNTC GTCGGCAGCGTC 2 NCBI MegaBlast search (09/14) 3 NCBI MegaBlast search (09/14) 4 Submitted

More information

Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009)

Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009) Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009) Prerequisites: BLAST Exercise: An In-Depth Introduction to NCBI BLAST Familiarity

More information

Homework 4. Due in class, Wednesday, November 10, 2004

Homework 4. Due in class, Wednesday, November 10, 2004 1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors

More information

BME 110 Midterm Examination

BME 110 Midterm Examination BME 110 Midterm Examination May 10, 2011 Name: (please print) Directions: Please circle one answer for each question, unless the question specifies "circle all correct answers". You can use any resource

More information

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences. Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or

More information

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September

More information

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background

More information

BLAST. Subject: The result from another organism that your query was matched to.

BLAST. Subject: The result from another organism that your query was matched to. BLAST (Basic Local Alignment Search Tool) Note: This is a complete transcript to the powerpoint. It is good to read through this once to understand everything. If you ever need help and just need a quick

More information

Identifying Regulatory Regions using Multiple Sequence Alignments

Identifying Regulatory Regions using Multiple Sequence Alignments Identifying Regulatory Regions using Multiple Sequence Alignments Prerequisites: BLAST Exercise: Detecting and Interpreting Genetic Homology. Resources: ClustalW is available at http://www.ebi.ac.uk/tools/clustalw2/index.html

More information

Match the Hash Scores

Match the Hash Scores Sort the hash scores of the database sequence February 22, 2001 1 Match the Hash Scores February 22, 2001 2 Lookup method for finding an alignment position 1 2 3 4 5 6 7 8 9 10 11 protein 1 n c s p t a.....

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool 14.06.2010 Table of contents 1 History History 2 global local 3 Score functions Score matrices 4 5 Comparison to FASTA References of BLAST History the program was designed by Stephen W. Altschul, Warren

More information

Browser Exercises - I. Alignments and Comparative genomics

Browser Exercises - I. Alignments and Comparative genomics Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)

More information

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional

More information

Data Retrieval from GenBank

Data Retrieval from GenBank Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M.

Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M. Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M. Brent Prerequisites: A Simple Introduction to NCBI BLAST Resources: The GENSCAN

More information

Biotechnology Explorer

Biotechnology Explorer Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual

More information

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in

More information

Go to Bottom Left click WashU Epigenome Browser. Click

Go to   Bottom Left click WashU Epigenome Browser. Click Now you are going to look at the Human Epigenome Browswer. It has a more sophisticated but weirder interface than the UCSC Genome Browser. All the data that you will view as tracks is in reality just files

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE INDIA

PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE INDIA PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE-452017 INDIA BIOINFORMATICS Bioinformatics is considered as amalgam of biological sciences especially Biotechnology with

More information

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. David Wang Bio 434W 4/27/15 Annotation of contig27 in the Muller F Element of D. elegans Abstract Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. Genscan predicted six

More information

PRIMEGENSw3 User Manual

PRIMEGENSw3 User Manual PRIMEGENSw3 User Manual PRIMEGENSw3 is Web Server version of PRIMEGENS program to automate highthroughput primer and probe design. It provides three separate utilities to select targeted regions of interests

More information

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu.   handouts, papers, datasets Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger

More information

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome

More information

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Data mining in Ensembl with BioMart Worked Example The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Which other genes related to human

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) 1.1 Finding a gene using text search. Note: For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium.

More information

1. The AGI (Arabidospis Genome Initiative) convention gene names or AtRTPrimer ID should

1. The AGI (Arabidospis Genome Initiative) convention gene names or AtRTPrimer ID should We will show how users can select their desired types of primer-pairs, as we explain each of forms indicated by the blue-filled rectangles of Figure 1. Figure 1 Front-end webpage for searching desired

More information

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. BLAST Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. An example could be aligning an mrna sequence to genomic DNA. Proteins are frequently composed of

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map

More information

Introduction to human genomics and genome informatics

Introduction to human genomics and genome informatics Introduction to human genomics and genome informatics Session 1 Prince of Wales Clinical School Dr Jason Wong ARC Future Fellow Head, Bioinformatics & Integrative Genomics Adult Cancer Program, Lowy Cancer

More information

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Navreet Kaur M.Tech Student Department of Computer Engineering. University College of Engineering, Punjabi

More information

Overview of the next two hours...

Overview of the next two hours... Overview of the next two hours... Before tea Session 1, Browser: Introduction Ensembl Plants and plant variation data Hands-on Variation in the Ensembl browser Displaying your data in Ensembl After tea

More information

BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1

BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1 BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1 Bioinformatics Databases http://bioboot.github.io/bioinf525_w17/module1/#1.1 Dr. Barry Grant Jan 2017 Overview: The purpose of this lab session is

More information

Introduction to RNA-Seq in GeneSpring NGS Software

Introduction to RNA-Seq in GeneSpring NGS Software Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com Introduction to RNA-Seq In a few years,

More information

BLASTing through the kingdom of life

BLASTing through the kingdom of life Information for students Instructions: In short, you will copy one of the sequences from the data set, use blastn to identify it, and use the information from your search to answer the questions below.

More information

Bionano Access v1.2 Release Notes

Bionano Access v1.2 Release Notes Bionano Access v1.2 Release Notes Document Number: 30220 Document Revision: A For Research Use Only. Not for use in diagnostic procedures. Copyright 2018 Bionano Genomics, Inc. All Rights Reserved. Table

More information

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will

More information

BLASTing through the kingdom of life

BLASTing through the kingdom of life Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the database of nucleotide sequences at the National Center for Biotechnology

More information

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES.

MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES. MAKING WHOLE GENOME ALIGNMENTS USABLE FOR BIOLOGISTS. EXAMPLES AND SAMPLE ANALYSES. Table of Contents Examples 1 Sample Analyses 5 Examples: Introduction to Examples While these examples can be followed

More information

OncoMD User Manual Version 2.6. OncoMD: Cancer Analytics Platform

OncoMD User Manual Version 2.6. OncoMD: Cancer Analytics Platform OncoMD: Cancer Analytics Platform 1 Table of Contents 1. INTRODUCTION... 3 2. OVERVIEW OF ONCOMD... 3 3. ORGANIZATION OF INFORMATION IN ONCOMD... 3 4. GETTING STARTED... 6 4.1 USER AUTHENTICATION... 6

More information

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized 1 2 3 Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized medicine, risk assessment etc Public Health Bio

More information

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. Open Seqmonk Launch SeqMonk The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. SeqMonk Analysis Page 1 Create

More information

From AP investigative Laboratory Manual 1

From AP investigative Laboratory Manual 1 Comparing DNA Sequences to Understand Evolutionary Relationships. How can bioinformatics be used as a tool to determine evolutionary relationships and to better understand genetic diseases? BACKGROUND

More information

Designing TaqMan MGB Probe and Primer Sets for Gene Expression Using Primer Express Software Version 2.0

Designing TaqMan MGB Probe and Primer Sets for Gene Expression Using Primer Express Software Version 2.0 Designing TaqMan MGB Probe and Primer Sets for Gene Expression Using Primer Express Software Version 2.0 Overview This tutorial details how a TaqMan MGB Probe can be designed over a specific region of

More information

SAMPLE LITERATURE Please refer to included weblink for correct version.

SAMPLE LITERATURE Please refer to included weblink for correct version. Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information