Analysis pipe-line. Analysis pipe

Size: px
Start display at page:

Download "Analysis pipe-line. Analysis pipe"

Transcription

1 Bioconductor Bioconductor Platform specific Platform specific devices devices Analysis pipe Analysis pipe-line line Sample Sample Preparation Preparation Array Array Fabrication Fabrication Hybridization Hybridization Scanning Scanning + Image Image Analysis Analysis Normalization Normalization Filtering Filtering statistical statistical analysis analysis Annotation Annotation Biological Biological Knowledge Knowledge extraction extraction Quality Quality control control

2 onechannelgui This is a graphical interface to Bioconductor libraries devoted to the analysis of data derived from single channel platforms. affylmgui is a graphical interfase to limma library, which allows differential expression detection by mean of linear model analysis. onechannelgui is an extension of affylmgui capabilities.

3 onechannelgui 3 IVT / gene arrays: Primary (probe level QC, probe set summary and normalization), secondary analysis (replicates QC, filtering, statistical analysis, classification) and data mining (GO enrichment). Exon arrays: Secondary analysis (replicates QC, filtering, statistical analysis, classification, basic Splice Index inspection) using expression console as source of primary data. Large data set (i.e. probe set expression in tab delimited format): Secondary analysis (replicates QC, filtering, statistical analysis, classification) using expression console/geo/arrayexpress data as source of primary data.

4 Starting R and onechannelgui Setting Setting the the virtual virtual RAM RAM at at 2GB: 2GB: C:\..\R\R-2.3.0\bin\Rgui.exe --max-mem-size=2048m

5 A Double Double click click on on R R to to start start B

6 A B Click Click on on Package to to load load Bioconductor packages

7 A C B Click Click on on Load Load package to to select select the the onechannelgui package Click Click on on OK OK to to load load the the onechannelgui package

8 A Click Click on on Yes Yes to to start start the the affylmgui interface. Yes Wait few seconds! B C Yes Click Click on on Yes Yes to to start start the the onechannelgui interface.

9 Standard affylmgui menu menu Overlaying onechannelgui to to affylmgui will will change change the the default default affylmgui menu menu to to the the onechannelgui menu menu for for 3 IVT 3 IVT Affymetrix arrays arrays onechannelgui menu menu for for 3 IVT 3 IVT arrays arrays

10 A Summary of of loaded loaded data: data: none none is is available since since no no CEL CEL files files have have been been loaded loaded

11 A Click Click on on File File to to start start a new new project project B C Click Click on on New New to to start start a new new project project Selected 3 IVT 3 IVT arrays arrays D Selected as as working dir dir the the folder folder containing the the.cel.cel files files

12 Selected the the targets file. file. Then Then press press OK OK to to continue Targets file file is is a tab tab delimited text text file filecontaining the the description of of the the experiment. It It is is made made of of three three columns: Name: Name: the the name name you you want want to to assign assign to to each each array. array. FileName: the the names names of of the the corresponding.cel.cel file file Target: the the experimental condition associated to to the the array array (e.g. (e.g. mock, mock, treated, etc). etc). At At least least two two conditions should should be be present.

13 Widget to create a target for Affy arrays

14 Widget to create a target for Affy arrays

15 Widget to create a target for Affy arrays

16 Widget to create a target for Affy arrays Skip Skipitit

17 Define Define the the name name of of you you analysis. Press Press OK OK to to continue... Now Now the the array array will will be be loaded loaded in in a specific R object object called called environment. Raw Raw data data are are now now loaded loaded and and are are ready ready for for normalization.

18 Analysis pipe-line Quality control Normalization Filtering Biological Knowledge extraction Statistical analysis Annotation

19 A The The next next steps steps are are few few simple simple basic basic quality quality controls. B Click Click on on Quality Quality Control Control menu menu

20 A You You can can now now evaluate: Intensity histogram for for one one array array at at a time. time. E C D

21 A You You can can now now evaluate: Intensity density density plot plot for for one one array array at at a time. time. E C D

22 A You You can can now now evaluate: all all arrays arrays intensities as as box box plots. plots. C

23 A B B A It It is is possible that that crna crna concentration in in sample sample se2 se2 was was over over estimated and and a low low crna crnaamount was was loaded loaded on on the the array. array. As As result result a lot lot of of signals signals are are below below the the value value [log [log 2 (100) 2 (100) = 6.44] 6.44]

24 A Some Some other other basic basic controls can can be be done done after after the the calculation of of the the probe probe set set intensity summary using using a special special Bioconductor library library affyplm Fit Fit the the model model (BE (BE PATIENT!!!) B The The end end of of the the fitting fitting procedure is is given given by by a message. Then Then the the NUSE/RLE function is is automatically called called C

25 affyplm QC library affyplm provides a number of useful tools based on probe-level modelling procedures. affyplm package allows arrays quality controls.

26 What is a Probe Level Model? A Probe Level Model (PLM) is a model that is fit to probe-intensity data. affyplm fits a model with probe level and chip level parameters on a probe set by probe set basis. In quality control chip level parameters are a factor variable with a level for each array.

27 What is a PLMset? The main function for fitting PLM is the function fitplm. This function will fit a linear model with an effect estimated for each chip and an effect for each probe. fitplm implements iteratively re-weighted least squares M-estimation M regression. The fitted model is stored in a PLMset object containing chip level parameter estimates and the corresponding standard errors.

28 Default fitted model log 2 PM kij = β + α + ε kj ki kij where β kj is the log 2 probe set expression value on array j for probeset k and α ki are probe effects. To make the model identifiable the constrain I = α = 0 is used. i 1 ki For this default model, the parameter estimates given are probe set expression values.

29 Relative Log Expression (RLE) RLE values are computed for each probe set by comparing the expression value on each array against the median expression value for that probeset across all arrays. Assuming that most genes are not changing in expression across arrays means ideally most of these RLE values will be near 0. Boxplots of these values, for each array, provides a quality assessment tool. RLE plots: Estimation of expression θ gi for each gene g on each array i. Compute the median value across arrays for each gene

30 Relative Log Expression (RLE)

31 Normalized Unscaled Standard Errors (NUSE) Standard error measures the amount of errors done fitting y for every x value. se= Normalized Unscaled Standard Errors (NUSE) can also be used for assessing quality. The standard error estimates obtained for each gene on each array from fitplm are taken and standardized across arrays so that the median standard error for that genes is 1 across all arrays. This process accounts for differences in variability between genes. es. An array were there are elevated SE relative to the other arrays is typically of lower quality. Boxplots of these values, separated by array can be used to compare arrays.

32 NUSE ( θ ) ˆ = gi med ( ˆ θ gi ) ( SE( ˆ θ ) SE gi

33 A C B

34 A B

35 A Since the fitplm object can be be very big. It It is is a good idea, to to delete it it after quality control. Before Delete PLM After Delete PLM

36 Analysis pipe-line Quality control Normalization Filtering Biological Knowledge extraction Statistical analysis Annotation

37 Analysis steps: affylmgui Calculating probe set summaries: RMA GCRMA PLIER Normalization: Quantile method

38 Brief summary about probe set intensity calculation RMA methodology (Irizarry et al., 2003) performs background correction, normalization, and summarization in a modular way. RMA does not take in account unspecific probe hybridization in probe set background calculation. GCRMA is a version of RMA with a background correction component that makes use of probe sequence information (Wu et al., 2004). The PLIER (Probe Logarithmic Error Intensity Estimate) method produces an improved signal by accounting for experimentally observed patterns in probe behavior and handling error at the appropriately at low and high signal values. Methods such as PLIER+16 and GCRMA, which use model-based background correction, maintain relatively good accuracy without losing much precision.

39 Why Normalization? To remove systematic biases, which include, Sample preparation Variability in hybridization Spatial effects Scanner settings Experimenter bias Extracted from D. Hyle presentation,

40 What Normalization Is & What It Isn t Methods and Algorithms Applied after some Image Analysis Applied before subsequent Data Analysis Allows comparison of experiments Not a cure for poor data.

41 Quantile normalization Extracted from Irizarry presentation at Bioconductor Course (Brixen IT, 2005)

42 Extracted from Irizarry presentation at Bioconductor Course (Brixen IT, 2005)

43 A The The next next step step is is normalization and and calculation of of probe probe set set summary. B Click Click on on probe probe set set menu menu and and select select the the probe probe set set summary and and normalization option. option.

44 Normalization and and intensity calculation come come together. Three Three Normalization/intensity calculation option option are are available: RMA RMA + quantile normalization GCRMA + quantile normalization PLM PLM + quantile normalization At At any any time time it it is is possible to to check check the the structure of of the the normalized data data set set

45 Replicates quality control To evaluate sample replicates quality we will use a partition technique called Principal component analysis (PCA).

46 Principal component analysis Principal component analysis (PCA) involves a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible Each succeeding component accounts for as much of the remaining variability as possible. The components can be thought of as axes in n-n dimensional space, where n is the number of components. Each axis represents a different trend in the data.

47 A B To To perform sample sample replicates QC QC we we use use principal component analysis (PCA) (PCA) This This check check is is performed on on probe probe set set summaries! t4 t4 is is clearly clearly an an outlier! outlier!

48 Analysis pipe-line Quality control Normalization Filtering Biological Knowledge extraction Statistical analysis Annotation

49 Filtering Filtering affects the false discovery rate. Researcher is interested in keeping the number of tests/genes as low as possible while keeping the interesting genes in the selected subset. If the truly differentially expressed genes are overrepresented among those selected in the filtering step, the FDR associated with a certain threshold of the test statistic will be lowered due to the filtering. Extracted from: Heydebreck et al. Bioconductor Project Working Papers 2004

50 Filtering can be performed at Annotation features: various levels: Specific gene features (i.e. GO term, presence of transcriptional regulative elements in promoters, etc.) Signal features: % intensities greater of a user defined value Interquantile range (IQR) greater of a defined value

51 Specific gene feature In transcriptional studies focusing on genes characterized by specific feature (i.e.( transcription factor elements in promoters) ) the best filtering approach is selecting only those genes linked to the peculiar feature. For example: Identification of genes modulated by estradiol:er or IGF1 by direct binding to Estrogen-Responsive Elements (ERE): HGU133plus2: probe sets Entrez Genes HGU133plus2 with ERE in putative promoter regions: 6764 probe sets 3058 Entrez Genes

52 Specific gene feature Data derived from specifically devoted annotation data set can be used for functional filtering. The Ingenuity Pathways Knowledge Base is the world's largest curated database of biological networks created from millions of individually modeled relationships between: proteins, genes, complexes, cells, tissues, drugs, diseases. The Ingenuity Pathways Analysis software (IPA) identifies relations between genes. The relations that can be grasped are: Regulates Regulated by Binds

53 Start an Ingenuity session at:

54 Specific classes of proteins can be searched and exported

55

56 A key word can also be used to perform a wide search

57 After selection of the Functions & diseases of interest genes should be visualized as gene details before exportation in a file to be used for filtering expression data Exporting results in a table as previously

58 The Entrez Gene IDs present in this file can be used to extract e specific subset of genes. To use filtering using a list of EG you need to extract from the IPA table only the Entrez genes of interest and save them on a text file without header.

59

60 Non specific filtering This technique has as its premise the removal of genes that are deemed to be not expressed or unchanged according to some specific criterion that is under the control of the user. The aim of non specific filtering is to remove genes that, e. g. due to their low overall intensity or variability,, are unlikely to carry information about the phenotypes under investigation. Extracted from: Heydebreck et al. Bioconductor Project Working Papers 2004

61 A B C D

62

63 A B C In In this this example will will be be selected only only those those genes genes characterized by by having having in in at at least least 50% 50% of of the the arrays arrays an an intensity

64

65 QC and filtering for exon data At the time onechannelgui was setup Bioconductor tools for handling raw data from Affymetrix exon arrays were not available. For this reason the onechannelgui uses the libraries and primary analysis outputs from Affymetrix Expression Console. Exon raw data quality control is done using the Expression Console. Sample QC and filtering are performed on onechannlegui.

66 Exon arrays on onechannelgui On onechannelgui gene level and exon level data from Expression Console are loaded. User needs to specify where Expression Console library files are located, at any time a new exon data set is loaded.

67 Loading an exon array data set it is necessary to indicate the organism and which kind of exon data are going to be loaded (core, extended, full) Loading an exon array data set it is necessary to specify the location of Expression Console libraries.

68 Subsequently three files have to be loaded: The target file, which has the same structure previously described. The tab delimited files containing GENE and EXON level data exported form the Expression Console.

69 A new Menu is then available for exon data

70 Exon arrays QC on onechannelgui

71 The brain (b) replicates are very poor. The quality is particularly bad for exon data. However, we have to consider that these data are derived from tissues coming from different post-mortem donors.

72 Exon arrays filtering Since the knowledge on exon data is still relatively limited we have little empirical information about background threshold. Exon/intron housekeeping gene information available in exon data might be a possible approach to define it. Different color lines indicate the possible thresholds to be selected. In black are shown the intensity density plots for introns as in red those for exons.

73 IQR filter works as described for 3 IVT arrays. However, any filter done at gene level will also affect the corresponding exon data. Starting condition After filtering

74 Intensity filter is instead based on the threshold previously selected on the basis of exon/intron HK expression signals. In this example we are keeping only the genes where all samples have a signal greater than the pre-defined BG.

75 Splice Index The Splicing Index captures the basic metric for the analysis of alternative splicing. It is a measure of how much exon specific expression (with gene induction factored out) differs between two samples. Defining function-oriented oriented data set for splice index calculation

76 A Use Usea set set of of function-oriented EGs EGstoto select selectprobe set set IDs IDs B C

77

78

79

80

81

82

83 Use Usethe the selected probe probe set set IDs IDsfor for Filtering using usinga list list of of probe probe sets. sets.

84 B A

85 C ATTENTION: this is only a very rough descriptive instrument! Much work needs to be done on exon analysis! Splice SpliceIndex Indexinspection is is performed modelling the the splice spliceindex indexexon exonprofiles for fortwo twoexperimental conditions. Results are are saved savedon on a pdf pdf file file in in your yourworking dir. dir.

86 The The sub sub set set of of splice spliceindexes to tobe beinspected is isdefined definedusing usingtwo twofilters:

87 A D Example of of one one gene gene output output B C

88 This Thisplot gives givessome some advise adviseabout about the the scattering levels levelsof of the the Splice Splice Indexes over over the the gene gene under under analysis A Model Model of of splice spliceindexes over over the the two twoexperimental conditions. Red Reddashed dashedlines linesindicate the the confidence interval intervalof of the the model. model.

89 Plots Plotsof of significance p-value p-valueof of the the alternative splicing splicingversus versusthe the average Splice SpliceIndex Indexvalues. In In this thisexample only onlyone one exon exonseems seemstoto be be differentially spliced spliced ::

90 Significance p-value p-valueof of the the alternative splicing versus versusthe the average Splice Splice Index Indexvalues. IN IN this thisexample only onlyone one exon exonseems seemstoto be bedifferentially spliced. Filtering conditions are are shown shown over over the the plot plot of of intensity values values versus versusexon exonnumber.

91 Analysis pipe-line Quality control Normalization Filtering Biological Knowledge extraction Statistical analysis Annotation This step is the same for 3 IVT arrays and exon arrays gene level analyses

92 Fold change filtering The intensity change between experimental groups (i.e. control versus treated) are known as: Fold change. Frequently an arbitrary threshold Trtd log 2 = 1 Ctrl is used to define a significant differential expression.

93 Fold change filtering There are no rules to define the correct fold change (fc( fc) threshold for differential expression. fc >1 is an arbitrary threshold. Fc threshold estimation is dependent on the % of fc fluctuations due to experimental reasons. Fc threshold estimation can be better appreciated in time/concentration course experiments. Biologically speaking many small variations all together can be functionally important (i.e. fc fc =0.5 for all chr 21 genes induces the Down syndrome)

94 Statistical analysis Intensity changes between experimental groups (i.e. control versus treated) are known as: Fold change. Ranking genes based on fold change alone implicitly assigns equal variance to every gene. Fold change alone is not sufficient to indicate the significance of the expression changes. Fold change has to be supported by statistical information.

95 Multiple testing errors Performing multiple statistical tests two types of errors can occur: Type I error (False positive) Type II error (False negative) Reduction of type I errors increases the number of type II errors. It is important to identify an approach that reduces false positives with the minimum loss of information (false( negative)

96 Statistical analysis The sensitivity of statistical tests is affected by the number of available replicates. Replicates can be: Technical Biological Biological replicates better summarize the variability of samples belonging to a common group. The minimum number of replicates is an important issue!

97 How much replicates are important? Yang YH e Speed T, 2002

98 Sample size Microarray experiments are often performed with a small number of biological replicates, resulting in low statistical power for detecting differentially expressed genes and concomitant high false positive rates. The issue of how many replicates are required in a typical experimental system needs to be addressed. Of particular interest is the difference in required sample sizes for similar experiments in inbred vs. outbred populations (e.g. mouse and rat vs. human).

99 Assessing sample sizes in microarray experiments Assessment of sample sizes for microarray data is a tricky exercise. The reason why we are performing such analysis is to have a general feeling on the ability of our experimental data to robustly detect differential expression. The method implemented in onechannelgui is that proposed by Warnes & Liu and implemented in the Bioconductor library ssize.

100 Assessing sample sizes in microarray experiments The key component of Warnes method is the generation of cumulative plot of the proportion of genes achieving a desired power as a function of sample size,, based on simple gene-by by-gene calculations. Its real utility is as a visual tool for helping users to understand the trade off between sample size and statistical power.

101 Assumptions A microarray experiment is set up to compare gene expressions between one treatment group and one control group. Microarray data has been normalized and transformed so that the data for each gene is sufficiently close to a normal distribution that a standard 2-sample pooled-variance t-test will reliably detect differentially expressed genes.

102 The tested hypothesis for each gene is: versus where μt and μc are means of gene expressions for treatment and control group respectively. The analysis is done using the common variance described in: Wei et al. BMC Genomics. 2004, 5:87

103 Sample size estimation The required sample size of an experiment depends on: variance component (σ), the desired detectable fold change (δ), the power to detect this change (1-β,, the likelihood of detecting the change or the true positive rate), a chosen type I error rate (α=( false positive). IMPORTANT: This implementation of ssize functions uses BH type I error correction instead of Bonferroni, which is the default in ssize functions. β= type II error rate, i.e. false negative.

104 This is not log2(fc)

105 To detect 95% of the differentially expressed genes, characterized by a power of 0.8, a sample size, FOR EACH GROUP, greater than 20 is needed.

106 To detect 97% of the differentially expressed genes, characterized by a power of 0.8, a fold change greater than 10 (log 2 (10)=3.32) is needed.

107 Assessing sample sizes in microarray experiments The R package, sizepower, is used to calculate sample size and power in the planning stage of a microarray study. It helps the user to determine how many samples are needed to achieve a specified power for a test of whether a gene is differentially expressed or, in reverse, to determine the power of a given sample size.

108

109

110 Comments about experimental design If the biological material is not a limiting factor THINK WIDE : Experiment should be designed with many replicas (>3) Time course experiments should be designed with many time points (>4). Investigate part of the experiment by microarrays and use the rest for further validations.

111 Statistical validation Statistical validation can be performed using parametric and non-parametric tests. Parametric tests: The populations under analysis are normally distributed. Non parametric tests: There is no assumption on samples distribution. Non parametric are less sensitive than parametric.

112 Selecting differentially expressed genes Statistical validation method I Statistical validation method II Differential expression linked to a specific biological event. Statistical validation method III

113 Selecting differentially expressed genes Each method grasps some true signals but not all. Each method catches some false signals. The trick is is to find the best condition to maximize true signals while minimizing fakes.

114 SAM Significance Analysis of Microarray

115 A SAM analysis can be performed in Bioconductor using the siggenes library. Two class or multi class analysis is selected automatically due to the structure of Target information B C The delta table prompts to the user the information related to the amount of differentially expressed genes given a certain FDR.

116 The user selects a delta value and check the behaviour of the differentially expressed genes.

117 The user selects a delta value and check the behaviour of the differentially expressed genes.

118 Subsequently the user performs a log2(fold change) filter to produce a table of differentially expressed genes.

119 Subsequently the user performs a log2(fold change) filter to produce a table of differentially expressed genes.

120 The table can be saved in a tab delimited file

121 relative difference in gene expression Raw p-value Fold change Standard deviation Significance measurement derived from raw p-value

122 Limma Linear model analysis of microarrays

123 BH correction BH is the most used method for the correction of type I errors in microarray analysis. However, it has some limitation due to the initial hypotheses: The gene expressions are independent from each other. The raw distribution of p values should be uniform in the non significant range.

124

125 The application of of BH correction to to these pvalues will not produce any differential expressed gene!

126 Let s identify differentially expressed probe sets by by linear modelling To To use use linear linear models models targets targets description and and raw raw data data will will be be reorganized on on the the basis basis of of the the number of of factors factors under under analysis by by Compute Linear Linear Model Model Fit. Fit.

127 Next Next step step is is the the definition of of the the contrasts, which which represent the the differential expression couples to to be be considered. If If more more than than two two conditions are are available more more contrasts can can be be evaluated

128 Contrast parameterization is is saved saved with with a specific name name REMEMBER: contrasts represent the the different experimental groups groups (e.g. (e.g. Treated, Control). Making Making Treated Control Control means means that that the the log(expression) of of control control samples are are subtracted to to that that of of treated treated samples. The The result result is is the the log2(fold change)

129 A Before Before evaluating differential expression raw raw p-value p-value distribution is is checked. B C

130

131 A If If BH BH correction can can be be applied applied to to correct correct type type I I errors, errors, we we can can move move to to the the selection of of the the subset subset of of differentially expressed genes genes C B

132 A B

133 These results can can be be saved in in a new new toptable containing only only the the probe sets sets shown in in red red on on plots Yes

134 TopTable structure AffyID AffyID Gene Gene Symbol Gene Gene Description Log2 Log2 FC FC Average intensity T statistics P-values Log-odd statistics

135 A B Differential expressions probe probe set set lists lists generated by by affylmgui or or SAM SAM can can be be compared using using Venn Venn Diagrams. A max max of of three three files files can can be be compared. Attention: C Each Each file file is is made made by by a unique unique column column of of probe probe sets sets ID ID without without header. Comparison can can be be performed at at probe probe sets sets or or EG EG level. level. D E F G

136 Yes The The various various list list subsets will will be be saved saved in in your your working directory

137 Making a Template A for Ingenuity Pathways Analysis

138 A If If BH BH correction can can be be applied applied to to correct correct type type I I errors, errors, we we can can move move to to the the selection of of the the subset subset of of differentially expressed genes genes C B

139 A B

140 A To create a template A you can use a function implemented in the affylmgui. B C D

141

142 The P value for subsetting is used to discriminate between the differentially expressed with respect to the other probe sets that are used for Ingenuity functional classes enrichment

143 Time Course experiments masigpro is a R package for the analysis of single and multiseries time course microarray experiments. masigpro follows a two steps regression strategy to find genes with significant temporal expression changes significant differences between experimental groups.

144 Time course experimental design: We denote experimental groups as the experimental factor (dummy variables) for which temporal profiles are defined (e.g. Treatment A, A Tissue1, etc) Conditions are each experimental group vs. time combination (e.g. Treatment A at Time 0 ). 0 Conditions can have or not replicates. Variables are the regression variables defined by the masigpro approach for the experiment regression model. masigpro defines dummy variables to model differences between experimental groups. Dummy variables,, Time and their interactions are the variables of the regression model.

145 Time Course design for masigpro IMPORTANT: each treatment at each time has its corresponding untreated control! All these information should be collapsed in the Target column of the targets file using _ to combine data. This can be done using the function JOIN in excel.

146 Time Course design for masigpro

147 Time Course design for masigpro The targets file for masigpro has a peculiar structure: Each row of the column named Target describes the array on the basis of the experimental design. Each element describing the time course experiment is separated from the others by an underscore. The first three elements of the row are fixed and represent Time, Replicate, Control, all the other elements refer to various experimental conditions. In this case we have a 8, h time course, in triplicates with two different treatments: cond1 and cond2

148 The Target column is reformatted to be used by masigpro using the command

149 Large data set onechannelgui interface has some limits (RAM memory) in loading/handling large set of.cel files. This is expecially true for a large time course experiment like our example. To overcome this problem probe set average expression intensities are calculated by Expression Console.

150

151

152 Loading tab delimited file the Bioconductor annotation library is not automatically defined. Annotation Library information can be attached using:

153 Do not forget! Multiple test problem is also present in msigpro analysis. Therefore, before running masigpro, remember to perform some filter based on functional information or samples distribution.

154

155 Ones the experiment design for masigpro is ready it is possible to run the analysis Yes When masigpro is running, check what is going on in the main R window!

156 Some parameters need to be set Q: The first step is to compute a regression fit for each gene. The p-value associated to the F-Statistic of the model are computed and they are subsequently used to select significant genes. masigpro corrects this p-value for multiple comparisons by applying false discovery rate (FDR) procedures. The level of FDR control is given by the function parameter Q.

157 Some parameters need to be set Alpha: masigpro applies a variable selection procedure to find significant variables for each gene. This will ultimatelly be used to find which are the profile differences between experimental groups. At each regression step the p-value of each variable is computed and variables get in/out the model when this p-value is lower or higher than the given cut-off value alfa.

158 Some parameters need to be set R-squared: The following step is to generate lists of significant genes according to the way we want to see results. As filtering masigpro uses the R-squared of the regression model.

159 Computation info are available in the main R window Step 1 The procedure first adjusts this global model by the least-squared technique to identify differentially expressed genes and selects significant genes applying false discovery rate control procedures. Step 2 Secondly, stepwise regression is applied as a variable selection strategy to study differences between experimental groups and to find statistically significant different profiles.

160 When the computation is finished a message pops up The coefficients obtained in the second regression model will be useful to cluster together significant genes with similar expression patterns and to visualize the results.

161 Results can be visualized as Venn diagrams or plotting in a PDF file the curves. The K mean clustering is not yet implemented

162 Results can be visualized plotting in a PDF file the curves. A B C The plots are related only to the sub set of genes specific of each treatment condition. D

163

164 Analysis pipe-line Quality control Normalization Filtering Biological Knowledge extraction Statistical analysis Annotation

165 Gene Ontology

166 Ontologies An ontology is a specification of a conceptualization: a hierarchical mapping of concepts within a given frame of reference. An ontology is a restricted structured vocabulary of terms that represent domain knowledge. An ontology specifies a vocabulary that can be used to exchange queries and assertions. A commitment to the use of the ontology is an agreement to use the shared vocabulary in a consistent way.

167 The Gene Ontology The goal of the Gene Ontology (GO) Consortium is to produce a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. For genes and gene products the Gene Ontology Consortium (GO) is an initiative that is designed to address the problem of defining common set of terms and descriptions for basic biological functions. GO provides a restricted vocabulary as well as clear indications of the relationships between terms.

168 The Gene Ontology The Gene Ontology (GO) consortium produces three independent ontologies for gene products. The three ontologies are: molecular function of a gene product which is defined to be biochemical activity or action of the gene product (MF 7220). biological process interpreted as a biological objective to which the gene product contributes (BP 9529). cellular component is a component of a cell that is part of some larger object or structure (CC 1536).

169 The Graph Structure of GO The GO ontologies are structured as directed acyclic graphs (DAGs) that represent a network in which each term may be a child of one or more parents. GO node is interchangeable with GO term. Child terms are more specific than their parents: The term transmembrane receptor proteintyrosine kinase is child of transmembrane receptor and protein tyrosine kinase.

170 The Graph Structure of GO The relationship between a child and a parent can be characterized by the relations: is a has a (part of) mitotic chromosome is a child of chromosome and the relationship is an is a relation. telomere is a child of chromosome with the has a relation.

171 GO structure Top node Graph Graph of of GO GO relationships for for the the term: term: transcription factor factor (GO: )

172 Induced GO graph for a set of diff exprs genes. Top node The induced GO graph colored according to unadjusted hypergeometric p-value 0.01 GO can be used to link differentially expressed genes to specific functional classes.

173 Hypergeometric Distribution a c a+c b d b+d a+b c+d The probability of any particular matrix occurring by random selection, given no association between the two variables, is given by the hypergeometric rule. ( a + c)! ( b + d)! a! c! b! d! n! ( a + b)!( c + d)! = ( a + b)!( c + d)!( a + c)!( b n! a! b! c! d! + d)!

174 Assigning Significance to the Findings The HyperGeometric Test permits us us to to determine if if there are non-random associations between the two variables, differential expression membership and membership to to a particular Gene Ontology term. in Subset out 8 2 in GO term p.0002 out 4 26 ( 2x2 contingency matrix )

175 GOstats package To perform an analysis using the Hypergeometric-based test, one needs to define a gene universe and a list of selected genes from the universe. To identify the set of expressed genes from a microarray experiment,, R. Gentleman (GOstats( developer) proposed that a non-specific filter be applied and that the genes that pass the filter be used to form the universe for any subsequent functional analyses.

176 B A In In Bioconductor is is available a library library called called GOstat GOstatwhich allows allows the the calculation of of enriched GO GO classes within within a set set of of differentially expressed probe probe sets. sets. Select Select the the threshold of of significance and and the the GO GO class class of of interest. C D Select Select the the list list of of affyids affyidsrepresenting the the differentially expressed probe probe sets. sets. REMEMBER: the the file file should should contain contain only only the the affy affyids!!!!

177 If If the the names names of of GO GO classes are are too too tiny tiny in in the the plot plot,, save save it it as as pdf pdf and and visualize it it with with Acrobat Reader, zooming in in the the figure. figure.

178 The reason of this representation is the selection of the GO terms that contains smaller subsets.

179 GO GO identifier significance N. N. of of genes genes in in the the differentially expressed set set N. N. of of genes genes belonging to to the the GO GO terms terms in in the the universe Description of of GO GO term term

180 To To know know more more on on the the parents of of a specific GO GO term term you you can can use use the the plotgo plotgofunction

181 A It It is is possible to to identify identify the the affy affyids ids associated to to a specific GO GO term. term. B C D

182

183 Classification

184 Classification The task of diagnosing cancer on the basis of microarray data has been termed class prediction in the literature. The task is to classify and predict the diagnostic category of a sample on the basis of its gene expression profile.

185 Large Large data data set set can can be be loaded loaded as as tab tab delimited files files To To load load them them you you need need 1) 1) a tab tab delimited file file with with array array names names on on the the first first row row and and probe probe set set ids ids on on first first column column 2) 2) A target target file file containing the the clinical clinical information. The The usual usual Target Target column column o the the target target file file should should have have this this characterstics.

186 This This file file can can be be generated joining joining the the columns on on the the clinical clinical parameters by by an an underscore _. _. Join function in excel

187

188

189 Riorganize clinical information Load a large data set as tab delimited file. Save in a file the description of the clinical parameters collapsed in the Target column of the targets file.

190 Riorganize clinical information

191 run PAMR analysis

192

193

194 If the selected probe sets are less than 50

195

196 Yes

197 Nice separation between ER positive and negative samples can be achieved also on the test set

Basic aspects of Microarray Data Analysis

Basic aspects of Microarray Data Analysis Hospital Universitari Vall d Hebron Institut de Recerca - VHIR Institut d Investigació Sanitària de l Instituto de Salud Carlos III (ISCIII) Basic aspects of Microarray Data Analysis Expression Data Analysis

More information

Introduction to gene expression microarray data analysis

Introduction to gene expression microarray data analysis Introduction to gene expression microarray data analysis Outline Brief introduction: Technology and data. Statistical challenges in data analysis. Preprocessing data normalization and transformation. Useful

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options

More information

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008 Agilent GeneSpring GX 10: Gene Expression and Beyond Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008 GeneSpring GX 10 in the News Our Goals for GeneSpring GX 10 Goal 1: Bring back GeneSpring

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

Standard Data Analysis Report Agilent Gene Expression Service

Standard Data Analysis Report Agilent Gene Expression Service Standard Data Analysis Report Agilent Gene Expression Service Experiment: S534662 Date: 2011-01-01 Prepared for: Dr. Researcher Genomic Sciences Lab Prepared by S534662 Standard Data Analysis Report 2011-01-01

More information

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Arrays Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. Open Seqmonk Launch SeqMonk The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. SeqMonk Analysis Page 1 Create

More information

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6 Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6 The example data set used in this tutorial consists of 6 technical replicates from the same human cell line, 3 are SP1 treated, and 3

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

From CEL files to lists of interesting genes. Rafael A. Irizarry Department of Biostatistics Johns Hopkins University

From CEL files to lists of interesting genes. Rafael A. Irizarry Department of Biostatistics Johns Hopkins University From CEL files to lists of interesting genes Rafael A. Irizarry Department of Biostatistics Johns Hopkins University Contact Information e-mail Personal webpage Department webpage Bioinformatics Program

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review Visualizing

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Gene expression analysis: Introduction to microarrays

Gene expression analysis: Introduction to microarrays Gene expression analysis: Introduction to microarrays Adam Ameur The Linnaeus Centre for Bioinformatics, Uppsala University February 15, 2006 Overview Introduction Part I: How a microarray experiment is

More information

Release Notes. JMP Genomics. Version 3.1

Release Notes. JMP Genomics. Version 3.1 JMP Genomics Version 3.1 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Expression

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data

Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU March 12, 2003 Outline Review of technology Why study probe level

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7

More information

Exercise on Microarray data analysis

Exercise on Microarray data analysis Exercise on Microarray data analysis Aim The aim of this exercise is to introduce basic data analysis of transcriptome data using the statistical software R. The exercise is divided in two parts. First,

More information

Expression summarization

Expression summarization Expression Quantification: Affy Affymetrix Genechip is an oligonucleotide array consisting of a several perfect match (PM) and their corresponding mismatch (MM) probes that interrogate for a single gene.

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 31 st January 2007 Aims To give a biologist s view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

Image Analysis. Based on Information from Terry Speed s Group, UC Berkeley. Lecture 3 Pre-Processing of Affymetrix Arrays. Affymetrix Terminology

Image Analysis. Based on Information from Terry Speed s Group, UC Berkeley. Lecture 3 Pre-Processing of Affymetrix Arrays. Affymetrix Terminology Image Analysis Lecture 3 Pre-Processing of Affymetrix Arrays Stat 697K, CS 691K, Microbio 690K 2 Affymetrix Terminology Probe: an oligonucleotide of 25 base-pairs ( 25-mer ). Based on Information from

More information

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Outline. Analysis of Microarray Data. Most important design question. General experimental issues Outline Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization Introduction to microarrays Experimental design Data normalization Other data transformation Exercises George Bell,

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

Microarray Data Analysis. Normalization

Microarray Data Analysis. Normalization Microarray Data Analysis Normalization Outline General issues Normalization for two colour microarrays Normalization and other stuff for one color microarrays 2 Preprocessing: normalization The word normalization

More information

Computing with large data sets

Computing with large data sets Computing with large data sets Richard Bonneau, spring 2009 Lecture 16 (week 10): bioconductor: an example R multi-developer project Acknowledgments and other sources: Ben Bolstad, Biostats lectures, Berkely

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

Gene-Level Analysis of Exon Array Data using Partek Genomics Suite 6.6

Gene-Level Analysis of Exon Array Data using Partek Genomics Suite 6.6 Gene-Level Analysis of Exon Array Data using Partek Genomics Suite 6.6 Overview This tutorial will demonstrate how to: Summarize core exon-level data to produce gene-level data Perform exploratory analysis

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 8

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

Background Correction and Normalization. Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Background Correction and Normalization. Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Background Correction and Normalization Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Feature Level Data Outline Affymetrix GeneChip arrays Two

More information

Quality Control Assessment in Genotyping Console

Quality Control Assessment in Genotyping Console Quality Control Assessment in Genotyping Console Introduction Prior to the release of Genotyping Console (GTC) 2.1, quality control (QC) assessment of the SNP Array 6.0 assay was performed using the Dynamic

More information

Affymetrix Quality Assessment and Analysis Tool

Affymetrix Quality Assessment and Analysis Tool Affymetrix Quality Assessment and Analysis Tool Xiwei Wu and Xuejun Arthur Li October 30, 2018 1 Introduction Affymetrix GeneChip is a commonly used tool to study gene expression profiles. The purpose

More information

Array Quality Metrics. Audrey Kauffmann

Array Quality Metrics. Audrey Kauffmann Array Quality Metrics Audrey Kauffmann Introduction Microarrays are widely/routinely used Technology and protocol improvements trustworthy Variance and noise Technical causes: Platform Lab, experimentalist

More information

Generating quality metrics reports for microarray data sets. Audrey Kauffmann

Generating quality metrics reports for microarray data sets. Audrey Kauffmann Generating quality metrics reports for microarray data sets Audrey Kauffmann Introduction Microarrays are widely/routinely used Technology and protocol improvements trustworthy Variance and noise Technical

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

6. GENE EXPRESSION ANALYSIS MICROARRAYS

6. GENE EXPRESSION ANALYSIS MICROARRAYS 6. GENE EXPRESSION ANALYSIS MICROARRAYS BIOINFORMATICS COURSE MTAT.03.239 16.10.2013 GENE EXPRESSION ANALYSIS MICROARRAYS Slides adapted from Konstantin Tretyakov s 2011/2012 and Priit Adlers 2010/2011

More information

Agilent Genomic Workbench 7.0

Agilent Genomic Workbench 7.0 Agilent Genomic Workbench 7.0 Product Overview Guide Agilent Technologies Notices Agilent Technologies, Inc. 2012, 2015 No part of this manual may be reproduced in any form or by any means (including electronic

More information

Technical Note. GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison. Introduction:

Technical Note. GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison. Introduction: Technical Note GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison Introduction: Affymetrix has launched a new 3 IVT PLUS Reagent Kit which creates hybridization ready target

More information

Integrative Genomics 1a. Introduction

Integrative Genomics 1a. Introduction 2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering

More information

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays GENE EXPRESSION MONITORING TECHNICAL NOTE New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays Introduction Affymetrix has designed new algorithms for monitoring GeneChip

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org

More information

CS-E5870 High-Throughput Bioinformatics Microarray data analysis

CS-E5870 High-Throughput Bioinformatics Microarray data analysis CS-E5870 High-Throughput Bioinformatics Microarray data analysis Harri Lähdesmäki Department of Computer Science Aalto University September 20, 2016 Acknowledgement for J Salojärvi and E Czeizler for the

More information

Microarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world.

Microarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world. Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Preprocessing and normalization A trailer show of the rest of the microarray world Carsten Friis Media glna tnra GlnA TnrA C2 glnr C3 C5 C6

More information

The Affymetrix platform for gene expression analysis Affymetrix recommended QA procedures The RMA model for probe intensity data Application of the

The Affymetrix platform for gene expression analysis Affymetrix recommended QA procedures The RMA model for probe intensity data Application of the 1 The Affymetrix platform for gene expression analysis Affymetrix recommended QA procedures The RMA model for probe intensity data Application of the fitted RMA model to quality assessment 2 3 Probes are

More information

Package TIN. March 19, 2019

Package TIN. March 19, 2019 Type Package Title Transcriptome instability analysis Version 1.14.0 Date 2014-07-14 Package TIN March 19, 2019 Author Bjarne Johannessen, Anita Sveen and Rolf I. Skotheim Maintainer Bjarne Johannessen

More information

Seven Keys to Successful Microarray Data Analysis

Seven Keys to Successful Microarray Data Analysis Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment

More information

Exploration and Analysis of DNA Microarray Data

Exploration and Analysis of DNA Microarray Data Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate

More information

The essentials of microarray data analysis

The essentials of microarray data analysis The essentials of microarray data analysis (from a complete novice) Thanks to Rafael Irizarry for the slides! Outline Experimental design Take logs! Pre-processing: affy chips and 2-color arrays Clustering

More information

UNIVERSITY OF TORINO. Department of Clinical and Biological Sciences. Doctoral School in Complex Systems in Medicine and Life Sciences

UNIVERSITY OF TORINO. Department of Clinical and Biological Sciences. Doctoral School in Complex Systems in Medicine and Life Sciences UNIVERSITY OF TORINO Department of Clinical and Biological Sciences Doctoral School in Complex Systems in Medicine and Life Sciences Ph.D. in COMPLEX SYSTEMS IN POST-GENOMIC BIOLOGY XXIII cycle Computational

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

Getting Started with OptQuest

Getting Started with OptQuest Getting Started with OptQuest What OptQuest does Futura Apartments model example Portfolio Allocation model example Defining decision variables in Crystal Ball Running OptQuest Specifying decision variable

More information

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule Gene expression: Microarray data analysis Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN -47-4-8). Copyright

More information

How to view Results with Scaffold. Proteomics Shared Resource

How to view Results with Scaffold. Proteomics Shared Resource How to view Results with Scaffold Proteomics Shared Resource Starting out Download Scaffold from http://www.proteomes oftware.com/proteom e_software_prod_sca ffold_download.html Follow installation instructions

More information

From hybridization theory to microarray data analysis: performance evaluation

From hybridization theory to microarray data analysis: performance evaluation RESEARCH ARTICLE Open Access From hybridization theory to microarray data analysis: performance evaluation Fabrice Berger * and Enrico Carlon * Abstract Background: Several preprocessing methods are available

More information

Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm

Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm Exam description: The purpose of this exam is for you to demonstrate your ability to use the different biomolecular

More information

ALLEN Human Brain Atlas

ALLEN Human Brain Atlas TECHNICAL WHITE PAPER: MICROARRAY DATA NORMALIZATION The is a publicly available online resource of gene expression information in the adult human brain. Comprising multiple datasets from various projects

More information

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics abedi777@ymail.com Outlines Technology Basic concepts Data analysis Printed Microarrays In Situ-Synthesized

More information

Deakin Research Online

Deakin Research Online Deakin Research Online This is the published version: Church, Philip, Goscinski, Andrzej, Wong, Adam and Lefevre, Christophe 2011, Simplifying gene expression microarray comparative analysis., in BIOCOM

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

DNA Microarray Data Oligonucleotide Arrays

DNA Microarray Data Oligonucleotide Arrays DNA Microarray Data Oligonucleotide Arrays Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course 2003 Copyright 2002, all rights reserved Biological question Experimental

More information

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT Preprocessing Affymetrix GeneChip Data Credit for some of today s materials: Ben Bolstad, Leslie Cope, Laurent Gautier, Terry Speed and Zhijin Wu Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

New Features in JMP Genomics 4.1 WHITE PAPER

New Features in JMP Genomics 4.1 WHITE PAPER WHITE PAPER Table of Contents Platform Updates...2 Expanded Operating Systems Support...2 New Import Features...3 Affymetrix Import and Analysis... 3 New Expression Features...5 New Pattern Discovery Features...8

More information

Analyzing Gene Set Enrichment

Analyzing Gene Set Enrichment Analyzing Gene Set Enrichment BaRC Hot Topics June 20, 2016 Yanmei Huang Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Purpose of Gene Set Enrichment Analysis

More information

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies. Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies. References Summaries of Affymetrix Genechip Probe Level Data,

More information

Measuring and Understanding Gene Expression

Measuring and Understanding Gene Expression Measuring and Understanding Gene Expression Dr. Lars Eijssen Dept. Of Bioinformatics BiGCaT Sciences programme 2014 Why are genes interesting? TRANSCRIPTION Genome Genomics Transcriptome Transcriptomics

More information

Introduction to Bioinformatics! Giri Narasimhan. ECS 254; Phone: x3748

Introduction to Bioinformatics! Giri Narasimhan. ECS 254; Phone: x3748 Introduction to Bioinformatics! Giri Narasimhan ECS 254; Phone: x3748 giri@cs.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs11.html Reading! The following slides come from a series of talks by Rafael Irizzary

More information

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure

More information

RNA-Seq analysis using R: Differential expression and transcriptome assembly

RNA-Seq analysis using R: Differential expression and transcriptome assembly RNA-Seq analysis using R: Differential expression and transcriptome assembly Beibei Chen Ph.D BICF 12/7/2016 Agenda Brief about RNA-seq and experiment design Gene oriented analysis Gene quantification

More information

Mixture modeling for genome-wide localization of transcription factors

Mixture modeling for genome-wide localization of transcription factors Mixture modeling for genome-wide localization of transcription factors Sündüz Keleş 1,2 and Heejung Shim 1 1 Department of Statistics 2 Department of Biostatistics & Medical Informatics University of Wisconsin,

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html

More information

Measuring gene expression (Microarrays) Ulf Leser

Measuring gene expression (Microarrays) Ulf Leser Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

Expression data analysis with Chipster. Eija Korpelainen, Massimiliano Gentile

Expression data analysis with Chipster. Eija Korpelainen, Massimiliano Gentile Expression data analysis with Chipster Eija Korpelainen, Massimiliano Gentile chipster@csc.fi Understanding data analysis - why? Bioinformaticians might not always be available when needed Biologists know

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

Introduction to microarrays

Introduction to microarrays Bayesian modelling of gene expression data Alex Lewin Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) Philippe Broët (INSERM, Paris) In collaboration with Anne-Mette Hein, Natalia

More information

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Michigan State University February 15, 2005 Outline for Today's Talk A brief introduction to

More information

Normalizing Affy microarray data

Normalizing Affy microarray data Normalizing Affy microarray data All product names are given as examples only and they are not endorsed by the USDA or the University of Illinois. INTRODUCTION The following is an interactive demo describing

More information

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood

More information

The first and only fully-integrated microarray instrument for hands-free array processing

The first and only fully-integrated microarray instrument for hands-free array processing The first and only fully-integrated microarray instrument for hands-free array processing GeneTitan Instrument Transform your lab with a GeneTitan Instrument and experience the unparalleled power of streamlining

More information

A Distribution Free Summarization Method for Affymetrix GeneChip Arrays

A Distribution Free Summarization Method for Affymetrix GeneChip Arrays A Distribution Free Summarization Method for Affymetrix GeneChip Arrays Zhongxue Chen 1,2, Monnie McGee 1,*, Qingzhong Liu 3, and Richard Scheuermann 2 1 Department of Statistical Science, Southern Methodist

More information

PLM Extensions. B. M. Bolstad. October 30, 2013

PLM Extensions. B. M. Bolstad. October 30, 2013 PLM Extensions B. M. Bolstad October 30, 2013 1 Algorithms 1.1 Probe Level Model - robust (PLM-r) The goal is to dynamically select rows and columns for down-weighting. As with the standard PLM approach,

More information

Gene List Enrichment Analysis

Gene List Enrichment Analysis Outline Gene List Enrichment Analysis George Bell, Ph.D. BaRC Hot Topics March 16, 2010 Why do enrichment analysis? Main types Selecting or ranking genes Annotation sources Statistics Remaining issues

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

Background and Normalization:

Background and Normalization: Background and Normalization: Investigating the effects of preprocessing on gene expression estimates Ben Bolstad Group in Biostatistics University of California, Berkeley bolstad@stat.berkeley.edu http://www.stat.berkeley.edu/~bolstad

More information

SAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG

SAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG for the Analysis of Microarray Data Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG Overview Challenges in Microarray Data Analysis Software for Microarray Data Analysis SAS Scientific Discovery

More information

How to view Results with. Proteomics Shared Resource

How to view Results with. Proteomics Shared Resource How to view Results with Scaffold 3.0 Proteomics Shared Resource An overview This document is intended to walk you through Scaffold version 3.0. This is an introductory guide that goes over the basics

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

Measuring gene expression

Measuring gene expression Measuring gene expression Grundlagen der Bioinformatik SS2018 https://www.youtube.com/watch?v=v8gh404a3gg Agenda Organization Gene expression Background Technologies FISH Nanostring Microarrays RNA-seq

More information

RNA-Seq Analysis. August Strand Genomics, Inc All rights reserved.

RNA-Seq Analysis. August Strand Genomics, Inc All rights reserved. RNA-Seq Analysis August 2014 Strand Genomics, Inc. 2014. All rights reserved. Contents Introduction... 3 Sample import... 3 Quantification... 4 Novel exon... 5 Differential expression... 12 Differential

More information

RNA Degradation and NUSE Plots. Austin Bowles STAT 5570/6570 April 22, 2011

RNA Degradation and NUSE Plots. Austin Bowles STAT 5570/6570 April 22, 2011 RNA Degradation and NUSE Plots Austin Bowles STAT 5570/6570 April 22, 2011 References Sections 3.4 and 3.5.1 of Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Gentleman et

More information

Rafael A Irizarry, Department of Biostatistics JHU

Rafael A Irizarry, Department of Biostatistics JHU Getting Usable Data from Microarrays it s not as easy as you think Rafael A Irizarry, Department of Biostatistics JHU rafa@jhu.edu http://www.biostat.jhsph.edu/~ririzarr http://www.bioconductor.org Acknowledgements

More information

Introduction to Bioinformatics. Fabian Hoti 6.10.

Introduction to Bioinformatics. Fabian Hoti 6.10. Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction

More information

Exercises: Analysing ChIP-Seq data

Exercises: Analysing ChIP-Seq data Exercises: Analysing ChIP-Seq data Version 2018-03-2 Exercises: Analysing ChIP-Seq data 2 Licence This manual is 2018, Simon Andrews. This manual is distributed under the creative commons Attribution-Non-Commercial-Share

More information