Original summary generated in 2003 Updated in 2007

Size: px
Start display at page:

Download "Original summary generated in 2003 Updated in 2007"

Transcription

1 AFFYMETRIX ARABIDOPSIS GENOME ARRAY DESIGN, ANNOTATION, AND ANALYSIS Original summary generated in 2003 Updated in 2007 Brandon Le Javier Wagmaister Anhthu Bui Arabidopsis Array Annotation 1

2 TABLE OF CONTENTS I. Array Information & Design 3 A. Common Definition & Terminology 3 B. Probe Set Nomenclature 4 II. Annotation of the ATGENOME1 Array (2001) 6 A. Array Information 6 B. Array Features 6 C. Array Annotation Strategy 7 D. Array Annotation Summary 8 III. Annotation of the ATH1 Array (2003) 11 A. Array Information 11 B. Array Features 11 C. Array Annotation Strategy 12 D. Array Annotation Summary 13 IV. Comparison of the ATGENOME1 and ATH1 Arrays 17 A. Comparison of Array Features 17 B. Mapping of Probe Sets Across the Two Arrays 17 V. Re-Annotation of the ATH1 Array (2007) 19 A. Motivation for Re-Annotation Efforts 19 B. Array Re-Annotation Strategy 19 C. Array Re-Annotation Summary 21 D. Comparison of Old and New Annotations 25 Arabidopsis Array Annotation 2

3 I. ARRAY INFORMATION & DESIGN This section contains information about the design of the GeneChip array and general interpretation of features on the array. A. Common Definition & Terminology Figure 1. Affymetrix GeneChip Probe Design. A. GeneChip terminology. B. Schematic of probe design. Probe: A single stranded DNA oligonucleotide designed to match a specific mrna sequence. GeneChip probe arrays use oligonucleotide probes that are up to 25 bases long (Figure 1). The probes are synthesized directly on the surface of the array using photolithography and combinatorial chemistry. Probe Cell: A single square-shaped feature on an array containing one type of probe. The size can vary depending on the array type, but the AtGenome1 (8K) and ATH1 (22K) array contains 24 and 18 µm probe cells, respectively. Each probe cell contains millions of probe molecules representing a unique gene-specific 25-mer oligo (Figure 1). Perfect Match (PM): Probes that are designed to be exactly the same as the reference sequence (Figure 1). Mismatch (MM): Probes that are designed to be exactly the same as the reference sequence except for a homomeric mismatch at the central position. Mismatch probes serve as a control for cross-hybridization (Figure 1). Arabidopsis Array Annotation 3

4 Probe Pair: Consists of two probe cells, a PM and the corresponding MM probe cells. On the array, a probe pair is arranged with a PM cell directly above the MM cell (Figure 1). Probe Set: A set of probes designed to detect one transcript. A probe set usually consists of probe pairs. The AtGenome1 (8K) and ATH1 (22K) array contains probe sets with 16 and 11 probe pairs, respectively (Figure 1). B. Probe Set Nomenclature Each probe set is assigned an Affymetrix serial number ID (five and six digits for the AtGenome1 and ATH1 array, respectively) (see below). IDs containing AFFX represents control features on the array. Each ID is followed by suffixes that denotes the type of sequence each probe set represents. Detailed description of each suffixes is presented in Table 1 and Figure 2. Table 1. Description of probe set suffixes AFFYMETRIX SERIAL NUMBER ID_SUFFIXES S U F F I X E S A R R A Y * D E S C R I P T I O N _at 8K & 22K A unique probe set designed to be complementary to the crna target. ALL probe sets on the array have an _at designation. _st 8K Probe sets (obsolete in the 22K) _s_at 8K (Similarity) Probe sets that corresponds to a small number of unique genes that share identical sequence. Probes were chosen from regions that are common to these genes. Group members can be singleton or group of sequences. The sequences that make up these probe sets are BAC sequences with corresponding mrna sequences or overlapping BAC sequences. The sequences are IDENTICAL. _g_at (obsolete in the 22K) _f_at (obsolete in the 22K) 8K 8K (Group) Probes chosen in region of overlap. Sequences are represented as singletons on the same probe array. These probe sets have sequence with overlapping concensus. (Family) Probe sets that corresponds to sequences for which it was not possible to pick a full set of 16 unique and/or shared-similarity-constrained probes. Some probes are similar but not necessarily identical to other gene sequences. Arabidopsis Array Annotation 4

5 S U F F I X E S _i_at (obsolete in the 22K) _r_at (obsolete in the 22K) A R R A Y * D E S C R I P T I O N 8K (Incomplete) Sequences for which there are fewer than 15 8K unique probes. (Rules Dropped) Sequences for which it was not possible to pick a full set of unique probes using Affymetrix probe rules. Probes were picked by dropping some of these rules. _a_at 22K A probe set that recognizes alternative transcripts from the same gene (Figure 3). _s_at 22K A probe set with all probes common among multiple transcripts within a gene family (these probe sets can detect members of a gene family - Figure 3). _x_at 22K A probe sets with some probes that are identical, or highly similar, to unrelated sequences. These probes may crosshybridize in an unpredictable manner with sequences other than the main target. Data generated from these probe sets should be interpreted with caution, due to the likelihood that some of the signal is from transcripts other than the one being intentionally measured. * 8K refers to the first generation Arabidopsis array (AtGenome1) representing ~8,000 genes. 22K refers to the second generation Arabidopsis array (ATH ) representing ~25,000 genes (commonly referred as the whole genome array. Figure 2. Possible Relationships Between Sequences And Probe Sets on an Array. G1, G2, etc., represent families of related gene sequences; S1, S2, S3, etc., represent individual gene sequences; PS1, PS2, PS3, etc., represent probe sets; P1, P2, P3, etc., represent probes. Solid lines show how sequence sets or sequences are related to probe sets or individual probes. If a solid line is connected to a box, the relationship applies to all probes in that box. If it is connected to an individual circle, the relationship holds only for that probe. (Figure and figure legend taken from Redman et al., Plant J 38(3) (2004)). Arabidopsis Array Annotation 5

6 II. ANNOTATION OF THE ATGENOME1 ARRAY (2001) A. Array Information The Affymetrix Arabidopsis Genome GeneChip (AtGenome1) array is the first generation Arabidopsis array designed by Affymetrix in collaboration with Novartis Agriculture Discovery Institute, Inc (NADII). There are ~ 8200 genes represented on the Affymetrix Arabidopsis GeneChip. Eighty percent of the features are predicted coding sequences from genomic bacterial artificial chromosome (BAC) while 20% of the features are represent high quality cdna sequences. Furthermore, there are > 100 EST clusters sharing homology with the predicted coding sequences from BAC clones. Each gene (probe set) is represented by sixteen probe pairs, with each probe consisting of 25-mer oligos localized to a 24µm feature size. B. Array Features We fully characterized the array by first examining the total number of features and genes represented on the array. Then, we examined the types of probe sets represented on the array based on the suffixes used to distinguish different probe types. B1. How many features are represented on the array? We counted the total number of probe sets on the array. Control probe sets have the AFFX designation in the header. Table 2. Total number of features represented on the array Description # Probe Sets Arabidopsis Genes 8,247 Proprietary Features 578 Controls 50 Total 8,875* * Total number of probe sets based on the paper by Zhu and Wang, Plant Physiol. 124, (2000). The actual number of probe sets we have information for is 8,297. B2. What are the representation of different suffixes on the array? Probe IDs were extracted from the annotation file provided by Affymetrix. Using Microsoft Excel and the COUNTIF function, we determined the distribution of different sufffixes on the array. This analysis only considers the 8,247 non-proprietary features on the array excluding controls (Table 2). Arabidopsis Array Annotation 6

7 Table 3. Distribution of suffixes on the array Suffix # Probe Sets _at 6,242 _s_at 1,423 _i_at 208 _g_at 196 _f_at 101 _r_at 77 Total 8,247 C. Array Annotation Strategy During the initial release of this array, very limited information was provided by Affymetrix and TMRI. We had to contact many people to obtain enough information to characterize and annotate the array (Figure 3). 1. We contacted Stacey Harmer, a post-doc in Steve Kay s laboratory, after reading her paper using this array (Harmer et al., Science 290, (2000)). We were able to obtain information regarding the sequences of the genes represented on the array. 2.We were directed to the Torrey Mesa Research Institute (TMRI) website ( gene_exp/index.html), a division of Novartis that supplied Affymetrix with the sequences used to design the oligos on the GeneChip. 3. Sequences were downloaded from the site and we manually annotated the genes, assigning functional categories based on BLAST results, published literatures, internet searches ( molecular and biochemistry books, etc. (similar to our approach with EST sequence analysis). 4. Ceres provided a small file containing annotations for some of the genes on the GeneChip. 5. Using all the resources we had, we manually annotated all 8200 genes assigning each gene into a functional category based on the EU Arabidopsis Sequencing Project (Figure 3). 6. Probe sets representing predicted or hypothetical proteins that couldn t be assigned into a category were placed in the Unknown Function category. Arabidopsis Array Annotation 7

8 Figure 3. AtGenome1 Array Annotation Strategy D. Array Annotation Summary Using the strategy presented in Figure 3, we manually annotated sequences on the array and assigned functional categories to each sequence based on BLAST results, references, web searches, etc. For genes that encode transcription factors, we assigned the keyword transcription factor with a description of the transcription factor family i.e. myb, homeobox, AP2 domain containing protein, etc. The results are summarized in Tables 4 & 5 and Figures 4 & 5. Arabidopsis Array Annotation 8

9 Figure 4. Distribution of 8,247 Features on the AtGenome1 Array. Table 4. Functional category distribution of all probe sets Functional Category # Probe Sets % Total Cell Growth & Division % Cell Structure % Disease & Defense % Energy % Intracellular Traffic % Metabolism % Protein Destination & Storage % Protein Synthesis % Secondary Metabolism % Signal Transduction % Transcription & Post-Transcription % Transporter % Transposon % Unknown Function % Total 8,247 Arabidopsis Array Annotation 9

10 Figure 5. Distribution of Transcription Factor Families on the AtGenome1 Array. Table 5. Distribution of transcription factor families on the array Transcription Factor Families # Probe Sets % Total AP2 Domain Containing Protein % Basic Leucine Zipper (bzip) % Basic Helix-Loop-Helix (bhlh) % CCAAT-box % DNA-Binding Protein % General Transcription Factor % Heat Shock % Homeobox % MADS-box % Myb % No Apical Meristem (NAM) % Scarecrow % Squamosa % WRKY % Zinc Finger % TFs represented by ONE probe set % TFs represented by 2-10 probe sets % Total 704 Arabidopsis Array Annotation 10

11 III. ANNOTATION OF THE ATH1 ARRAY (2003) A. Array Information The Affymetrix Arabidopsis ATH1 Genome GeneChip (ATH ) array is the second generation Arabidopsis array designed by Affymetrix in collaboration with The Institute for Genomic Research (TIGR). There are 22,746 probe sets representing 23,734 genes in the Arabidopsis genome (Redman et al., Plant J. 38(3) (2004)). 26,200 genes were available in the TIGR-ATH1 database as of December 15, To represent as many gene sequences on the array as possible, non-unique probe sets (_s_at) were used to represent two or more highly similar genes. Preference was given to genes where there was evidence of expression, supported by database matches and robust gene models. Each gene (probe set) is represented by eleven probe pairs, with each probe consisting of 25-mer oligos localized to an 18µm feature size. For more information about the array design, please read Redman et al., Plant J. 38(3) (2004). B. Array Features Similar to the analysis carried out in Section II, we wanted to fully understand the nature of the ATH1 array by characterizing the features and representation of probe sets on the array. B1. How many features are represented on the array? We counted the total number of probe sets on the array. Control probe sets have the AFFX designation in the header. Table 6. Total number of features represented on the array Description # Probe Sets Arabidopsis Genes 22,746 Controls 64 Total 22,810 B2. What are the representation of different suffixes on the array? Probe IDs were extracted from the annotation file provided by Affymetrix. Using Microsoft Excel and the COUNTIF function, we determined the distribution of different sufffixes on the array. This analysis only considers the 22,746 probe sets representing Arabidopsis genes and does not include control features. Arabidopsis Array Annotation 11

12 Table 7. Distribution of suffixes on the array Suffix # Probe Sets _at 21,685 _s_at 935 _x_at 126 Total 22,746 C. Array Annotation Strategy Unlike the AtGenome1 array, sequences on the ATH1 array were freely available and accessible. We used a similar strategy as the AtGenome1 array to annotate the ATH1 array. The steps taken to annotate the ATH1 array are summarized in Figure 6 and are elaborated in more details below: 1. We first downloaded the target sequences of each probe set on the array from the Affymetrix site ( These target sequences represent the last 600 bp of the coding sequence. Note: 3 untranslated regions (UTRs) were not included in the probe design. 2. Sequences were blasted against NCBI non-redundant database using the blastx program. 3. Using results from the blast run and descriptions provided by Affymetrix for each probe set, manually annotate each features using classification based on the EU Arabidopsis Sequencing Project. To expedite the annotation process, we used the functional category assignment of 6,714 probe sets from the AtGenome1 array that have comparable match to probe sets on the ATH1 array. 4. We marked probe sets representing ~5,000 full-length cdnas provided by Ceres to TIGR for gene annotation analysis. 5. We also included information of probe sets representing genes with known mutant phenotypes extrapolated from David Meinke s 2003 Plant Physiology paper (Meinke et al., Plant Physiol. 131, 2003). 6. Genes encoding transcription factors and signal transduction proteins were further classified into transcription factor families and signal transduction categories (e.g. kinases, phosphatases, etc.) based on published information, molecular biology books, and the web. Arabidopsis Array Annotation 12

13 7. There are three categories for unclassified proteins: unclassified - hypothetical proteins with no cdna support, unclassified - hypothetical proteins with cdna support, and unclassified - proteins with unknown function. a. Unclassified - hypothetical proteins with no cdna support category includes sequences with description pertaining to hypothetical proteins or predicted proteins without any EST evidence. b. Unclassified - hypothetical proteins with cdna support category includes sequences with description indicating expressed proteins or predicted proteins supported by ESTs or Ceres s full-length cdna. c. Unclassified - proteins with unknown function category includes sequences representing proteins with known domains (i.e. WD40, DUF) but cannot be assign a category. Figure 6. ATH1 Array Annotation Strategy D. Array Annotation Summary Summaries of functional category distribution on the array were tallied up using Microsoft Excel and are shown in Tables 8 & 9 and Figures 7 and 8. Arabidopsis Array Annotation 13

14 Figure 7. Distribution of 22,746 Features on the ATH1 Array. Table 8. Functional category distribution of all probe sets on the ATH1 array Functional Category # Probe Sets % Total Cell Growth & Division % Cell Structure % Disease & Defense % Energy % Intracellular Traffic % Metabolism % Post-Transcription % Protein Destination & Storage % Protein Synthesis % Secondary Metabolism % Signal Transduction % Transcription % Transporter % Transposon % Unclassified - Proteins With cdna Support % Unclassified - Proteins With NO cdna Support % Unclassified - Proteins With Unknown Function % Total 22,746 Arabidopsis Array Annotation 14

15 Figure 8. Distribution of Transcription Factor Families Represented on the ATH1 Array. Table 9. Distribution of transcription factor families on the ATH1 array Functional Category # Probe Sets % Total AP2/EREBP % ARF % AUX/IAA % B3 Domain % Basic Helix-Loop-Helix (bhlh) % Basic Leucine Zipper (bzip) % CCAAT % CCR4-Associated 8 0.5% GATA % General % GRAS % Heat Shock % Homeodomain % LIM Domain 4 0.3% MADS-Box % MYB % NAC Domain % Polycomb group % Squamosa Promoter-Like % Arabidopsis Array Annotation 15

16 Functional Category # Probe Sets % Total SWI2/SNF % Unclassified % WRKY % YABBY 3 0.2% Zinc Finger % Total 1484 Arabidopsis Array Annotation 16

17 IV. COMPARISON OF THE ATGENOME1 AND ATH1 ARRAYS A. Comparison of Array Features We carried out experiments using both the first (AtGenome1) and second (ATH1) generation arrays for RNA profiling analysis. In order to correlate the results generated on the AtGenome1 versus ATH1 arrays, we needed to find the homologous sequence or probe sets represented on both arrays. Fortunately, Affymetrix carried out an extensive same-species array comparison (see the comparison sheet manual for more details: manual/comparison_spreadsheets_manual.pdf). The similarities and differences between the AtGenome1 and ATH1 arrays are summarized in Table 10. Table 10. AtGenome1 and ATH1 arrays comparison Array Features AtGenome1 (8K) ATH1 (22K) Number of Probe Sets 8,247 22,746 Oligonucleotide Length 25-mer 25-mer Number of Probe Pairs Probe Pair Distribution Contiguous Scattered Note: Probe pairs for a given probe set were designed to be scattered across the ATH1 array to ensure that information for a probe set would not be lost if a GeneChip is damaged locally in one section of the array. However, in the AtGenome1 array, since all probe pairs are next to each other, if the probe set is in a damaged area, then the probe set can no longer be used. B. Mapping of Probe Sets Across the Two Arrays Due to the dynamic nature of the public databases, probe sets between the two Arabidopsis arrays will not be identical. In some cases the same sequences will be represented by completely different probe sets, creating a challenge when comparing data sets generated on different generations. Table 11 shows results from the comparison analysis by Affymetrix. Table 11. Affymetrix Array Comparison Match Description* # Probe Sets Best Probe Match 6,714 Good Probe Match 6,731 Complex Probe Match 8,312 No Probe Match 516 Arabidopsis Array Annotation 17

18 * A detailed description of each probe match criteria is available from Affymetrix ( Figure 9 illustrates the relationships between probe sets from two arrays and the designation for each probe set. L&2(? " #! #. * ",& ",2 L&2(? " #H#. * ",& ",2 GR#/,%5#/1' PR#/ SR#/ =''0!>#/?+ ;R#/ TR#/ MR#/ *+%!('55'C12$!%2/,1%8!C155!#..%#,!12 /+%!01((%,%2/!8.,%#08+%%/8. * / 0;&< M**4#G 1,= - H&2,#G 1,= - GR#/ P4#/ GR#/ SR#/ GR#/ SR#/ GR#/ &R#/ GR#/ &R#/ GR#/ TR#/ 8 *#G 1,= - ;R#/ Figure 9. Possible Probe Set Relationships Between Arrays. In the case of the Arabidopsis arrays, Design A and B refers to the AtGenome1 and ATH1 arrays, respectively. The type of match (Best, Good, Complex) depends on the level of stringency applied to a probe set. This analysis was done at the probe level (25- mer) and not at the consensus or target sequence level. Arabidopsis Array Annotation 18

19 V. RE-ANNOTATION OF THE ATH1 ARRAY (2007) A. Motivation for Re-Annotation Efforts The Arabidopsis ATH1 array was annotated in 2003 using all the publicly available resources at the time (see Section III). The annotations were used to quickly annotate the Soybean Genome array (see the Soybean Genome Array Annotation summary for more details). In order to keep up with the increasing amount of information generated within the past four years since the annotation of the ATH1 array, we decided to re-annotate the ATH1 array in parallel with the soybean genome array. B. Array Re-Annotation Strategy The strategy for the re-annotation of the ATH1 array is summarized in Figure 10 below. Our strategy is as follows: 1. We updated the descriptions for each probe set on the array using TAIR Affy array descriptions (affy_ath1_array_elements txt). The description file was downloaded from the TAIR web site: ftp://ftp.arabidopsis.org/home/tair/microarrays. Descriptions were based on the latest release of the Arabidopsis genome TAIR 7 (released ). Note from TAIR: The mapping to the TAIR7 Transcripts was performed using the BLASTN program with e-value cutoff < 9.9e-6. For the 25-mer oligo probes used on the Affy chips, the required match length to achieve this e-value is 23 or more identical nucleotides. To assign a probe set to a given locus, at least 9 of the probes included in the probe set were required to match a transcript at that locus. Otherwise, the probe set was not assigned a locus and was given the description no match. 2. In addition to updating the descriptions for each probe set, we also updated gene ontology (GO) information provided by Affymetrix. 3. We gathered information about putative transcription factors from many publicly available TF database for Arabidopsis including: a. AGRIS - Arabidopsis Gene Regulatory Information Server ( b. DATF - Database of Arabidopsis Transcription Factors ( c. RARTF - Riken Arabidopsis Transcription Factor Database ( d. ArabTFDB - Arabidopsis Transcription Factor Database ( Arabidopsis Array Annotation 19

20 Transcription factors and transcription factor families were associated with each probe set on the array. Information obtained from points 1-3 were compiled together into an annotation file containing the 2003 ATH1 annotations. Transcription factors were automatically updated based on the information obtained from the databases in point We focused on probe sets that were previously assigned into the unclassified category. The rationale is that many of the sequences in the unclassified category might have update information that can be used to re-assign into a different category. Sequences previously assigned categories of protein synthesis or metabolism most likely will not change. Therefore, we first focused on re-assigning the 11,145 probe sets classified as unclassified in After the unclassified category was re-examined, we decided to re-examine the entire 22,746 probe sets on the array for consistent assignment of functional categories. We sorted all the probe sets by their description and made sure that probe sets with similar descriptions are assigned the same functional category. 6. We further examined the unclassified category that is divided into three groups as mentioned in Section IIIC. We obtained several files from TAIR that will distinguish the different sequences within the unclassified category. We downloaded several files from the TAIR site including: a. TAIR7_protein_coding_no_transcript_support_09_30_07 b. TAIR7_protein_coding_with_transcript_support_09_30_07 c. TAIR7_unknown_proteins_no_transcript_support_09_30_07 d. TAIR7_proteins_of_undefined_function_03_07 e. TAIR7_unknown_proteins_03_07 f. TAIR7_locus_type These files were compiled into one main table listing all the transcripts detected and/or predicted in the Arabidopsis genome. This list helps distinguish if a sequence has cdna support, represents a pseudogene/transposon, or is unknown. These files help re-assign the probe sets into appropriate unclassified categories. Arabidopsis Array Annotation 20

21 Figure 10. ATH1 Array Re-annotation Strategy C. Array Re-Annotation Summary The final annotation file contains a list of all the probe sets with annotation information from 2003 and the latest information (2007). Arabidopsis ATH1 Array Annotation version 2.4 is the current version used for all summaries. Tables 12 and Figure 11 summarizes functional category assignment of all probe sets on the array. Table 13 and Figure 12 summarizes all identified transcription factor probe sets, on the array. Table 12. Functional category distribution of all probe sets on the ATH1 array Functional Category # Probe Sets % Total Cell Growth & Division % Cell Structure % Disease & Defense % Energy % Intracellular Traffic % Metabolism % Arabidopsis Array Annotation 21

22 Functional Category # Probe Sets % Total Post-Transcription % Protein Destination & Storage % Protein Synthesis % Pseudogene % Secondary Metabolism % Signal Transduction % Transcription % Transporter % Transposon % Unclassified - Proteins With cdna Support % Unclassified - Proteins With NO cdna Support % Unclassified - Proteins With Unknown Function % Total 22,746 Figure 11. Distribution of 22,746 Features on the Re-Annotated ATH1 Array (2007). Table 13. Distribution of transcription factor probe sets on the array. Transcription Factor Families Pie Groups # Probe Sets % Total ABI3-VP1 ABI3-VP % AP2-EREBP AP2-EREBP % ARF ARF % Arabidopsis Array Annotation 22

23 Transcription Factor Families Pie Groups # Probe Sets % Total ARR-B ARR-B % AS2 AS % AUX-IAA AUX-IAA % bhlh bhlh % bzip bzip % CCAAT-Dr1 CCAAT 2 0.1% CCAAT-HAP2 CCAAT % CCAAT-HAP3 CCAAT % CCAAT-HAP5 CCAAT % G2-like G2-like % General General % GRAS GRAS % HB HB % HSF HSF % MADS MADS % MYB-related MYB % MYB MYB % NAC NAC % LFY Others 1 0.1% NZZ Others 1 0.1% SAP Others 1 0.1% ULT Others 1 0.1% LUG Others 2 0.1% GIF Others 3 0.2% MBF1 Others 3 0.2% S1Fa-like Others 3 0.2% Whirly Others 3 0.2% CSD Others 4 0.2% BZR-BES1 Others 5 0.3% Pseudo ARR-B Others 5 0.3% BBR-BPC Others 6 0.3% CPP Others 6 0.3% EIL Others 6 0.3% Sigma70-like Others 6 0.3% CAMTA Others 7 0.4% E2F-DP Others 7 0.4% PLATZ Others 7 0.4% GRF Others 8 0.4% TAZ Others 8 0.4% ARID Others % Nin-like Others % TUB Others % Arabidopsis Array Annotation 23

24 Transcription Factor Families Pie Groups # Probe Sets % Total HMG Others % FHA Others % LIM Others % GeBP Others % SBP Others % TCP Others % JUMONJI Others % PcG PcG % Trihelix Trihelix % Unclassified Unclassified % WRKY WRKY % VOZ ZF 2 0.1% HRT ZF 3 0.2% C2C2-YABBY ZF 5 0.3% SRS ZF 6 0.3% Alfin ZF 7 0.4% ZIM ZF % ZF-HD ZF % C2C2-GATA ZF % C2C2-Dof ZF % C2C2-CO-like ZF % PHD ZF % C3H ZF % C2H2 ZF % Total 1954 Arabidopsis Array Annotation 24

25 Figure 12. Pie Distribution of Transcription Factors Represented on the array. Other represents TF families with fewer than 20 members represented on the array (see Table13). D. Comparison of Old and New Annotations A. We first examined the overall difference in annotations carried out in 2003 and Notice that there is a 83.6% reduction of the Unclassified - Proteins with NO cdna Support category. Interestingly, we see more than a 100% increase in probe sets classified in the Energy and Intracellular Traffic categories. Overall, a moderate increase occurred for each functional categories. Table 14. Comparison of old and new annotation of the ATH1 array Functional Categories % Change Cell Growth & Division % Cell Structure % Disease & Defense % Energy % Intracellular Traffic % Metabolism % Post-Transcription % Protein Destination & Storage % Protein Synthesis % Arabidopsis Array Annotation 25

26 Functional Categories % Change Pseudogene INF Secondary Metabolism % Signal Transduction % Transcription % Transporter % Transposon % Unclassified - Proteins With cdna Support % Unclassified - Proteins With NO cdna Support % Unclassified - Proteins With Unknown Function % Total B. We next examined what changed within each of the three unclassified categories shown earlier. Tables shows the assignment of probe sets previously classified as unclassified. Table 15. Changes within the unclassified - proteins with NO cdna support category (2003). Functional Category # Probe Sets % Total Cell Growth & Division % Cell Structure % Disease & Defense % Energy % Intracellular Traffic % Metabolism % Post-Transcription % Protein Destination & Storage % Protein Synthesis % Pseudogene % Secondary Metabolism % Signal Transduction % Transcription % Transporter % Transposon % Unclassified - Proteins With cdna Support % Unclassified - Proteins With NO cdna Support % Unclassified - Proteins With Unknown Function % Total 6381 Table 16. Changes within the unclassified - proteins with cdna support category (2003). Arabidopsis Array Annotation 26

27 Functional Category # Probe Sets % Total Cell Growth & Division % Cell Structure % Disease & Defense % Energy % Intracellular Traffic % Metabolism % Post-Transcription % Protein Destination & Storage % Protein Synthesis % Pseudogene 4 0.1% Secondary Metabolism % Signal Transduction % Transcription % Transporter % Transposon 6 0.2% Unclassified - Proteins With cdna Support % Unclassified - Proteins With NO cdna Support 0 0.0% Unclassified - Proteins With Unknown Function % Total 3229 Table 17. Changes within the unclassified - protein of unknown function category (2003). Functional Category # Probe Sets % Total Cell Growth & Division % Cell Structure % Disease & Defense % Energy % Intracellular Traffic % Metabolism % Post-Transcription % Protein Destination & Storage % Protein Synthesis 6 0.4% Pseudogene 8 0.5% Secondary Metabolism % Signal Transduction % Transcription % Arabidopsis Array Annotation 27

28 Functional Category # Probe Sets % Total Transporter % Transposon 0 0.0% Unclassified - Proteins With cdna Support % Unclassified - Proteins With NO cdna Support % Unclassified - Proteins With Unknown Function % Total 1535 Arabidopsis Array Annotation 28

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays Materials are from http://www.ohsu.edu/gmsr/amc/amc_technology.html The GeneChip high-density oligonucleotide arrays are fabricated

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics Measure of the linear correlation (dependence) between two variables X and Y Takes a value between +1 and 1 inclusive 1 = total positive correlation 0 = no correlation 1 = total negative correlation. When

More information

CodeLink Human Whole Genome Bioarray

CodeLink Human Whole Genome Bioarray CodeLink Human Whole Genome Bioarray 55,000 human gene targets on a single bioarray The CodeLink Human Whole Genome Bioarray comprises one of the most comprehensive coverages of the human genome, as it

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis Gene expression analysis Biosciences 741: Genomics Fall, 2013 Week 5 Gene expression analysis From EST clusters to spotted cdna microarrays Long vs. short oligonucleotide microarrays vs. RT-PCR Methods

More information

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques) Microarrays and Transcript Profiling Gene expression patterns are traditionally studied using Northern blots (DNA-RNA hybridization assays). This approach involves separation of total or polya + RNA on

More information

Soybean Microarrays. An Introduction. By Steve Clough. November Common Microarray platforms

Soybean Microarrays. An Introduction. By Steve Clough. November Common Microarray platforms Soybean Microarrays Microarray construction An Introduction By Steve Clough November 2005 Common Microarray platforms cdna: spotted collection of PCR products from different cdna clones, each representing

More information

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1 CAP 5510-9 BIOINFORMATICS Su-Shing Chen CISE 10/5/2005 Su-Shing Chen, CISE 1 Basic BioTech Processes Hybridization PCR Southern blotting (spot or stain) 10/5/2005 Su-Shing Chen, CISE 2 10/5/2005 Su-Shing

More information

DNA Microarray Technology

DNA Microarray Technology 2 DNA Microarray Technology 2.1 Overview DNA microarrays are assays for quantifying the types and amounts of mrna transcripts present in a collection of cells. The number of mrna molecules derived from

More information

Transcription, Transcription Factors, and Chromatin Remodeling

Transcription, Transcription Factors, and Chromatin Remodeling Transcription, Transcription Factors, and Chromatin Remodeling The Gene Complex collection of sequences that o Controls a phenotype Individually OR Complexed with action of other genes Size varies Structural

More information

A: 1.2 STIF Algorithm

A: 1.2 STIF Algorithm APPENDIX 1 A new algorithm for identification of Stress responsive TranscrIption Factor binding sites (STIF) and a database of abiotic stress responsive transcription factors in Arabidopsis thaliana (STIFDB)

More information

Array-Ready Oligo Set for the Rat Genome Version 3.0

Array-Ready Oligo Set for the Rat Genome Version 3.0 Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.

More information

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technology Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA

More information

Introduction to DNA microarrays. DTU - January Hanne Jarmer

Introduction to DNA microarrays. DTU - January Hanne Jarmer Introduction to DNA microarrays DTU - January 2007 - Hanne Jarmer Microarrays - The Concept Measure the level of transcript from a very large number of genes in one go Microarrays - The Concept Measure

More information

Introduction to Microarray Analysis

Introduction to Microarray Analysis Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray

More information

DNA Microarray Data Oligonucleotide Arrays

DNA Microarray Data Oligonucleotide Arrays DNA Microarray Data Oligonucleotide Arrays Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course 2003 Copyright 2002, all rights reserved Biological question Experimental

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 February 15, 2013 Multiple choice questions (numbers in brackets indicate the number of correct answers) 1. Which of the following statements are not true Transcriptomes consist of mrnas Proteomes consist

More information

Lecture #1. Introduction to microarray technology

Lecture #1. Introduction to microarray technology Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing

More information

Clones. Glass Slide PCR. Purification. Array Printing. Post-process. Hybridize. Scan. Array Fabrication. Sample Preparation/Hybridization.

Clones. Glass Slide PCR. Purification. Array Printing. Post-process. Hybridize. Scan. Array Fabrication. Sample Preparation/Hybridization. Terminologies Reporter: the nucleotide sequence present in a particular location on the array (a.k.a. probe) Feature: the location of a reporter on the array Composite sequence: a set of reporters used

More information

BIOLOGY 205 Midterm II - 19 February Each of the following statements are correct regarding Eukaryotic genes and genomes EXCEPT?

BIOLOGY 205 Midterm II - 19 February Each of the following statements are correct regarding Eukaryotic genes and genomes EXCEPT? BIOLOGY 205 Midterm II - 19 February 1999 Name Multiple choice questions 4 points each (Best 12 out of 13). 1. Each of the following statements are correct regarding Eukaryotic genes and genomes EXCEPT?

More information

Motivation From Protein to Gene

Motivation From Protein to Gene MOLECULAR BIOLOGY 2003-4 Topic B Recombinant DNA -principles and tools Construct a library - what for, how Major techniques +principles Bioinformatics - in brief Chapter 7 (MCB) 1 Motivation From Protein

More information

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent

More information

Soybean Microarrays. Microarray construction. An Introduction. By Steve Clough

Soybean Microarrays. Microarray construction. An Introduction. By Steve Clough Soybean Microarrays Microarray construction An Introduction By Steve Clough Common Microarray platforms cdna: spotted collection of PCR products from different cdna clones, each representing a different

More information

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will

More information

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database

More information

HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet

HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet NAME: DATE: QUESTION ONE Using primers given to you by your TA, you carried out sequencing reactions to determine the identity of the

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Why DNA Chips? Functional genomics: get information about genes that is unavailable from sequence

More information

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

TIGR THE INSTITUTE FOR GENOMIC RESEARCH Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,

More information

MICROARRAYS: CHIPPING AWAY AT THE MYSTERIES OF SCIENCE AND MEDICINE

MICROARRAYS: CHIPPING AWAY AT THE MYSTERIES OF SCIENCE AND MEDICINE MICROARRAYS: CHIPPING AWAY AT THE MYSTERIES OF SCIENCE AND MEDICINE National Center for Biotechnology Information With only a few exceptions, every

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information

Microarrays The technology

Microarrays The technology Microarrays The technology Goal Goal: To measure the amount of a specific (known) DNA molecule in parallel. In parallel : do this for thousands or millions of molecules simultaneously. Main components

More information

Testing the ABC floral-organ identity model: cloning the genes

Testing the ABC floral-organ identity model: cloning the genes Testing the ABC floral-organ identity model: cloning the genes Objectives: To test the validity of the ABC model for floral organ identity we will: 1. Use the model to make predictions concerning the phenotype

More information

Data Sheet. GeneChip Human Genome U133 Arrays

Data Sheet. GeneChip Human Genome U133 Arrays GeneChip Human Genome Arrays AFFYMETRIX PRODUCT FAMILY > ARRAYS > Data Sheet GeneChip Human Genome U133 Arrays The Most Comprehensive Coverage of the Human Genome in Two Flexible Formats: Single-array

More information

Important gene-information's

Important gene-information's Sequences, domains and databases. How to gather information on a gene. Jens Bohnekamp, Institute for Biochemistry Important gene-information's Protein sequence Nucleotide sequence Gene structure Protein

More information

7 Gene Isolation and Analysis of Multiple

7 Gene Isolation and Analysis of Multiple Genetic Techniques for Biological Research Corinne A. Michels Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-89921-6 (Hardback); 0-470-84662-3 (Electronic) 7 Gene Isolation and Analysis of Multiple

More information

INTRODUCTION. The Technology of Microarrays January Hanne Jarmer

INTRODUCTION. The Technology of Microarrays January Hanne Jarmer INTRODUCTION The Technology of Microarrays January 2009 - Hanne Jarmer The Concept gene mrna gene specific DNA probes labeled target Spotted arrays High-density arrays 13-16 micron features ~60 micron

More information

Gene Signal Estimates from Exon Arrays

Gene Signal Estimates from Exon Arrays Gene Signal Estimates from Exon Arrays I. Introduction: With exon arrays like the GeneChip Human Exon 1.0 ST Array, researchers can examine the transcriptional profile of an entire gene (Figure 1). Being

More information

Applicazioni biotecnologiche

Applicazioni biotecnologiche Applicazioni biotecnologiche Analisi forense Sintesi di proteine ricombinanti Restriction Fragment Length Polymorphism (RFLP) Polymorphism (more fully genetic polymorphism) refers to the simultaneous occurrence

More information

Expression of the genome. Books: 1. Molecular biology of the gene: Watson et al 2. Genetics: Peter J. Russell

Expression of the genome. Books: 1. Molecular biology of the gene: Watson et al 2. Genetics: Peter J. Russell Expression of the genome Books: 1. Molecular biology of the gene: Watson et al 2. Genetics: Peter J. Russell 1 Transcription 1. Francis Crick (1956) named the flow of information from DNA RNA protein the

More information

Outline. Array platform considerations: Comparison between the technologies available in microarrays

Outline. Array platform considerations: Comparison between the technologies available in microarrays Microarray overview Outline Array platform considerations: Comparison between the technologies available in microarrays Differences in array fabrication Differences in array organization Applications of

More information

COMPUTER RESOURCES II:

COMPUTER RESOURCES II: COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

More information

Prokaryotic Annotation Pipeline SOP HGSC, Baylor College of Medicine

Prokaryotic Annotation Pipeline SOP HGSC, Baylor College of Medicine 1 Abstract A prokaryotic annotation pipeline was developed to automatically annotate draft and complete bacterial genomes. The protein coding genes in the genomes are predicted by the combination of Glimmer

More information

Good Science. Benefit to Society

Good Science. Benefit to Society Good Science Benefit to Society Human genes claimed in granted U.S. patents Jensen and Murray, Science 310:239-240 (14 Oct. 2005) Specifically, this map is based on a BLAST homology search linking nucleotide

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

PLNT2530 (2018) Unit 6b Sequence Libraries

PLNT2530 (2018) Unit 6b Sequence Libraries PLNT2530 (2018) Unit 6b Sequence Libraries Molecular Biotechnology (Ch 4) Analysis of Genes and Genomes (Ch 5) Unless otherwise cited or referenced, all content of this presenataion is licensed under the

More information

Background Analysis and Cross Hybridization. Application

Background Analysis and Cross Hybridization. Application Background Analysis and Cross Hybridization Application Pius Brzoska, Ph.D. Abstract Microarray technology provides a powerful tool with which to study the coordinate expression of thousands of genes in

More information

Testing the ABC floral-organ identity model: cloning the genes

Testing the ABC floral-organ identity model: cloning the genes Testing the ABC floral-organ identity model: cloning the genes Objectives: To test the validity of the ABC model for floral organ identity we will: 1. Use the model to make predictions concerning the phenotype

More information

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences. Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or

More information

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Companion lecture to the textbook: Fundamentals of BioMEMS and Medical Microdevices, by Prof., http://saliterman.umn.edu/

More information

High-throughput Transcriptome analysis

High-throughput Transcriptome analysis High-throughput Transcriptome analysis CAGE and beyond Dr. Rimantas Kodzius, Singapore, A*STAR, IMCB rkodzius@imcb.a-star.edu.sg for KAUST 2008 Agenda 1. Current research - PhD work on discovery of new

More information

Deoxyribonucleic Acid DNA

Deoxyribonucleic Acid DNA Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Companion lecture to the textbook: Fundamentals of BioMEMS and Medical Microdevices, by Prof., http://saliterman.umn.edu/

More information

Activation of a Floral Homeotic Gene in Arabidopsis

Activation of a Floral Homeotic Gene in Arabidopsis Activation of a Floral Homeotic Gene in Arabidopsis By Maximiliam A. Busch, Kirsten Bomblies, and Detlef Weigel Presentation by Lis Garrett and Andrea Stevenson http://ucsdnews.ucsd.edu/archive/graphics/images/image5.jpg

More information

DNA Arrays Affymetrix GeneChip System

DNA Arrays Affymetrix GeneChip System DNA Arrays Affymetrix GeneChip System chip scanner Affymetrix Inc. hybridization Affymetrix Inc. data analysis Affymetrix Inc. mrna 5' 3' TGTGATGGTGGGAATTGGGTCAGAAGGACTGTGGGCGCTGCC... GGAATTGGGTCAGAAGGACTGTGGC

More information

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016 Bi 8 Lecture 5 MORE ON HOW WE KNOW WHAT WE KNOW and intro to the protein code Ellen Rothenberg 19 January 2016 SIZE AND PURIFICATION BY SYNTHESIS: BASIS OF EARLY SEQUENCING complex mixture of aborted DNA

More information

Genome annotation & EST

Genome annotation & EST Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary

More information

Supplemental Table S1. Gene co-expression analysis using HHT/ASFT (At5g41040) as the bait*

Supplemental Table S1. Gene co-expression analysis using HHT/ASFT (At5g41040) as the bait* 1 2 Supplemental Table S1. Gene co-expression analysis using HHT/ASFT (At5g41040) as the bait* 3 4 5 AGI ID R value GeneChip ID Annotation At5g41040 1.000 249289_at HXXXD-type acyl-transferase family protein

More information

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama Method to assign the coding regions of ESTs Céline Becquet Summer Program 2002 Structural Neuropathology Lab Molecular Neuropathology Group RIKEN Brain Science Institute Host : Dr. Nobuyuki Nukina Tutor

More information

Gene regulation V Biochemistry 302. March 6, 2006

Gene regulation V Biochemistry 302. March 6, 2006 Gene regulation V Biochemistry 302 March 6, 2006 Common structural motifs associated with transcriptional regulatory proteins Helix-turn-helix Prokaryotic repressors and activators Eukaryotic homeodomain

More information

Table of Contents. 1. What is CREP and when to use it?

Table of Contents. 1. What is CREP and when to use it? Table of Contents 1. What is CREP and when to use it?... 1 2. How to login CREP?... 2 3. How to make use of CREP in general?... 3 4. How to query by identifiers?... 4 5. How to query by sequences?... 6

More information

Overview of the next two hours...

Overview of the next two hours... Overview of the next two hours... Before tea Session 1, Browser: Introduction Ensembl Plants and plant variation data Hands-on Variation in the Ensembl browser Displaying your data in Ensembl After tea

More information

Recombinant DNA Technology

Recombinant DNA Technology Recombinant DNA Technology Common General Cloning Strategy Target DNA from donor organism extracted, cut with restriction endonuclease and ligated into a cloning vector cut with compatible restriction

More information

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader

More information

Chapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering

Chapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering Chapter 8 Recombinant DNA and Genetic Engineering Genetic manipulation Ways this technology touches us Criminal justice The Justice Project, started by law students to advocate for DNA testing of Death

More information

Lecture 7 Motif Databases and Gene Finding

Lecture 7 Motif Databases and Gene Finding Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Goals of this course Learn about Software tools Databases Methods (Algorithms) in

More information

Chapter 25: Regulating Eukaryotic Transcription The Ligand Responsive Activators

Chapter 25: Regulating Eukaryotic Transcription The Ligand Responsive Activators Chapter 25: Regulating Eukaryotic Transcription The Ligand Responsive Activators At least 5 potential gene expression control points Superfamily of Gene Regulators Activation of gene structure Initiation

More information

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background COS 597c: Topics in Computational Molecular Biology Lecture 19a: December 1, 1999 Lecturer: Robert Phillips Scribe: Robert Osada DNA arrays Before exploring the details of DNA chips, let s take a step

More information

The University of California, Santa Cruz (UCSC) Genome Browser

The University of California, Santa Cruz (UCSC) Genome Browser The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,

More information

What is a microarray

What is a microarray DNA Microarrays What is a microarray A surface on which sequences from thousands of different genes are covalently attached to fixed locations (probes). Glass slides Silicon chips Utilize the selective

More information

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT Preprocessing Affymetrix GeneChip Data Credit for some of today s materials: Ben Bolstad, Leslie Cope, Laurent Gautier, Terry Speed and Zhijin Wu Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

HC70AL Spring 2011! An Introduction to Bioinformatics! By!! Brandon Le! April 7, 2011!

HC70AL Spring 2011! An Introduction to Bioinformatics! By!! Brandon Le! April 7, 2011! HC70AL Spring 2011! An Introduction to Bioinformatics! By!! Brandon Le! April 7, 2011! Outline 1. Review of Dideoxy Sequencing 2. Obtaining and Processing DNA Sequences 3. What is a Gene? 4. Sequence Analysis

More information

Serial Analysis of Gene Expression

Serial Analysis of Gene Expression Serial Analysis of Gene Expression Cloning of Tissue-Specific Genes Using SAGE and a Novel Computational Substraction Approach. Genomic (2001) Hung-Jui Shih Outline of Presentation SAGE EST Article TPE

More information

Introduction to biology and measurement of gene expression

Introduction to biology and measurement of gene expression Introduction to biology and measurement of gene expression Statistical analysis of gene expression data with R and Bioconductor University of Copenhagen, 17-21 August, 2009 Margaret Taub University of

More information

Chapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi

Chapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi Chapter 17. PCR the polymerase chain reaction and its many uses Prepared by Woojoo Choi Polymerase chain reaction 1) Polymerase chain reaction (PCR): artificial amplification of a DNA sequence by repeated

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 31 st January 2007 Aims To give a biologist s view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

Gene expression. What is gene expression?

Gene expression. What is gene expression? Gene expression What is gene expression? Methods for measuring a single gene. Northern Blots Reporter genes Quantitative RT-PCR Operons, regulons, and stimulons. DNA microarrays. Expression profiling Identifying

More information

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits)

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Some Basic Biology Genes are DNA sequences that code for proteins. (e.g. gene lengths perhaps 1000

More information

Chapter 14 Regulation of Transcription

Chapter 14 Regulation of Transcription Chapter 14 Regulation of Transcription Cis-acting sequences Distance-independent cis-acting elements Dissecting regulatory elements Transcription factors Overview transcriptional regulation Transcription

More information

Gene Expression Analysis with Pathway-Centric DNA Microarrays

Gene Expression Analysis with Pathway-Centric DNA Microarrays Gene Expression Analysis with Pathway-Centric DNA Microarrays SuperArray Bioscience Corporation George J. Quellhorst, Jr. Ph.D. Manager, Customer Education Topics to be Covered Introduction to DNA Microarrays

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Microarrays: since we use probes we obviously must know the sequences we are looking at!

Microarrays: since we use probes we obviously must know the sequences we are looking at! These background are needed: 1. - Basic Molecular Biology & Genetics DNA replication Transcription Post-transcriptional RNA processing Translation Post-translational protein modification Gene expression

More information

Technical Note. GeneChip Mouse Expression Set 430. GeneChip Mouse Expression Set 430 Design

Technical Note. GeneChip Mouse Expression Set 430. GeneChip Mouse Expression Set 430 Design GeneChip Mouse Expression Set 43 AFFYMETRIX PRODUCT FAMILY > RNA ARRAYS AND REAGENTS > Technical Note Array Design and Performance of the GeneChip Mouse Expression Set 43 The mouse is a popular model system

More information

SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data SPH 247 Statistical Analysis of Laboratory Data April 14, 2015 SPH 247 Statistical Analysis of Laboratory Data 1 Basic Design of Expression Arrays For each gene that is a target for the array, we have

More information

SAMPLE LITERATURE Please refer to included weblink for correct version.

SAMPLE LITERATURE Please refer to included weblink for correct version. Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel

More information

DNA/RNA MICROARRAYS NOTE: USE THIS KIT WITHIN 6 MONTHS OF RECEIPT.

DNA/RNA MICROARRAYS NOTE: USE THIS KIT WITHIN 6 MONTHS OF RECEIPT. DNA/RNA MICROARRAYS This protocol is based on the EDVOTEK protocol DNA/RNA Microarrays. 10 groups of students NOTE: USE THIS KIT WITHIN 6 MONTHS OF RECEIPT. 1. EXPERIMENT OBJECTIVE The objective of this

More information

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Expression

More information

Supplementary Figure mer depth distribution of the Illumina reads. Illumina reads from the 1- kb insert library were used in the analysis.

Supplementary Figure mer depth distribution of the Illumina reads. Illumina reads from the 1- kb insert library were used in the analysis. Supplementary Figure 1. 17-mer depth distribution of the Illumina reads. Illumina reads from the 1- kb insert library were used in the analysis. A total of 26,234,801,500 17-mers were obtained and the

More information

Whole Transcriptome Sequencing/RNA-Seq

Whole Transcriptome Sequencing/RNA-Seq Whole Transcriptome Sequencing/RNA-Seq RNA Seq refers to the use of high throughput next genera on sequencing technologies to sequence complementary DNA (cdna) sequences. Successful whole transcriptome

More information

6. GENE EXPRESSION ANALYSIS MICROARRAYS

6. GENE EXPRESSION ANALYSIS MICROARRAYS 6. GENE EXPRESSION ANALYSIS MICROARRAYS BIOINFORMATICS COURSE MTAT.03.239 16.10.2013 GENE EXPRESSION ANALYSIS MICROARRAYS Slides adapted from Konstantin Tretyakov s 2011/2012 and Priit Adlers 2010/2011

More information

KnetMiner USER TUTORIAL

KnetMiner USER TUTORIAL KnetMiner USER TUTORIAL Keywan Hassani-Pak ROTHAMSTED RESEARCH 10 NOVEMBER 2017 About KnetMiner KnetMiner, with a silent "K" and standing for Knowledge Network Miner, is a suite of open-source software

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7

More information

Chapter 5. Structural Genomics

Chapter 5. Structural Genomics Chapter 5. Structural Genomics Contents 5. Structural Genomics 5.1. DNA Sequencing Strategies 5.1.1. Map-based Strategies 5.1.2. Whole Genome Shotgun Sequencing 5.2. Genome Annotation 5.2.1. Using Bioinformatic

More information

Regulation of gene expression. (Lehninger pg )

Regulation of gene expression. (Lehninger pg ) Regulation of gene expression (Lehninger pg. 1072-1085) Today s lecture Gene expression Constitutive, inducible, repressible genes Specificity factors, activators, repressors Negative and positive gene

More information

Chapter 8. Microbial Genetics. Lectures prepared by Christine L. Case. Copyright 2010 Pearson Education, Inc.

Chapter 8. Microbial Genetics. Lectures prepared by Christine L. Case. Copyright 2010 Pearson Education, Inc. Chapter 8 Microbial Genetics Lectures prepared by Christine L. Case Structure and Function of Genetic Material Learning Objectives 8-1 Define genetics, genome, chromosome, gene, genetic code, genotype,

More information

BACTERIAL GENETICS. How does the DNA in the bacterial cell replicate

BACTERIAL GENETICS. How does the DNA in the bacterial cell replicate BACTERIAL GENETICS Bacterial genetics is the study of gene structure and function in bacteria. Genetics itself is concerned with determining the number, location, and character of the genes of an organism.

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru

More information

Bioinformatics: Microarray Technology. Assc.Prof. Chuchart Areejitranusorn AMS. KKU.

Bioinformatics: Microarray Technology. Assc.Prof. Chuchart Areejitranusorn AMS. KKU. Introduction to Bioinformatics: Microarray Technology Assc.Prof. Chuchart Areejitranusorn AMS. KKU. ความจร งเก ยวก บ ความจรงเกยวกบ Cell and DNA Cell Nucleus Chromosome Protein Gene (mrna), single strand

More information

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit Application Note 13 RNA Sample Preparation Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit B. Lam, PhD 1, P. Roberts, MSc 1 Y. Haj-Ahmad, M.Sc., Ph.D 1,2 1 Norgen

More information