Basic principles of NMR-based metabolomics

Size: px
Start display at page:

Download "Basic principles of NMR-based metabolomics"

Transcription

1 Basic principles of NMR-based metabolomics Professor Dan Stærk Bioanalytical Chemistry and Metabolomics research group Natural Products and Peptides research section Department of Drug Design and Pharmacology

2 Slide 2 Outline Definition of the metabolome/metabolomics Principle of NMR-based metabolomics Case story illustrating the process: starved ewes and fat lambs Step-by-step procedure in data handling Processing Export/import Calibration Baseline adjustment Projection of data to a common axis Integration and merging of buckets PCA

3 Slide 3 Definition of the metabolome The entire assembly of low-molecular-weight molecules in an organism (cell, organ, tissue) is defined as the metabolome, i.e. the equivalent to the genome and the proteome Genome Proteome Metabolome The metabolome can be considered as the biological endpoint of the genome and the proteome, and metabolomics is therefore an important technique for assessing the state of an organism (cell, organ, tissue) Genomics Proteomics Metabolomics Systems biology

4 Slide 4 Metabonomics vs. metabolomics Definition of metabonomics (Nicholson et al.) Metabonomics originally defined as the quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification Nicholson, J. K.; Lindon, J.; Holmes, E. `Metabonomics : understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica, 1999, 29, Definition of metabolomics (Fiehn, O.) Metabolomics defined as a comprehensive analysis in which all the metabolites of a biological system are identified and quantified Fiehn, O. Metabolomics the link between genotypes and phenotypes. Plant Molecular Biology, 2002, 48,

5 Slide 5 Alternative definition... A more general description of metabolomics is the qualitative and/or quantitative assessment of lowmolecular-weight molecules in a complex mixture Complex metabolomic mixtures ü Biofluids (Urine, blood, serum, etc) ü Plant material (raw material, plant extracts, single cell, etc) ü Food and feed (vegetables, juices, beer, crop plants, etc) ü Cell culture (primary cells, cell cultures, etc) ü Environmental (drug metabolites, pollutants, etc)

6 Slide 6 Metabolomics a new bioanalytical technology 2500 Number of metabolomics publications Year

7 Slide 7 Nicholson et al. Nat. Rev. Drug. Discov. 2002, 1, Principle of NMR-based metabolomics Global Non-targeted Multivariate data analysis: Unsupervised mapping of data (PCA = Principal Component Analysis) Hypothesis Dataacquisition: 1 H NMR spectra of a large dataset Preprocessing: Reduce dataset (bucketing 0.04 ppm), normalize and scale Data-driven and hypothesis-generating Multivariate data analysis: Supervised classification and calculation of coinfidence limits

8 Principal component analysis Reducing a large data set with many variables into fewer variables without loosing the information in the original data ü Finding a vector the first principal component (PC1) that describes the largest variance within the data set ü Each new principal component (PC2, PC3, etc) orthogonal to the proceeding and describing as much of the remaining variance as possible ü Information shown in score plots (grouping according to different chemical fingerprints) PC2 PC1 PC3 ü Loading plots: shows which variables, i.e., spectroscopic signals, that are responsible for the groupings observed in the score plot Slide 8 Davis A. M. C., Fearn T. Back to Basics: the principles of principal component analysis. Spectroscopy Europe. 2004;20-23

9 Slide 9 NMR - the universal detector NMR is the most universal detector for small metabolites No physical separation of analytes! Robust => reproducible results Directly quantitative Simple sample preparation Information rich Not as sensitive as mass spectrometry Expensive NMR is good for a top-down approach Study the whole system first, before breaking it into smaller pieces

10 Slide 10 Case showing necessary steps: Fetal programming Undernutrition during fetal development is associated with increased risk of metabolic diseases later in life. Dutch winter famine 1944 Obesity at the age of 50 y in men and women exposed to famine prenatally, Am J Clin Nutr 1999;70: Coronary heart disease, hypertension, and type 2 diabetes. Consequences of programming, whereby a stimulus or insult at a critical, sensitive period of early life has permanent effects on structure, physiology, and metabolism. Metabolic programming Phenotypic alterations by fetal adaption Higher risk of obesity and diabetes if mismatched diet (programmed to cope with famine, exposed to hypernutrition)

11 Slide 11 Starved ewes and fat lambs Hypothesis: Metabolic programming by feed restriction leads to changed metabolic pathways The changes can be studied by acquiring NMR spectra of urine Sheep as animal model system Before birth: Ewes well fed or starved (50% of energy) After birth: Normal diet or High fat, high carbohydrate diet

12 Slide 12 Data sampling and data acquisition 164 NMR spectra Repeats at 2 and 6 months

13 Slide 13 Data handling procedures and terms The overall aim is to transform raw data (FID s) from multiple samples into one single table for multivariate data analysis

14 Slide 14 Data handling procedures and terms Keep track of your samples and data! Enter title or label for each sample FID s to spectra: Window function, Fourier transform, phase correction, base line adjustment Make spectra comparable Calibration of ppm-scale Project data on a common axis Normalize Compress data/simplify spectra Integrate (binning, buckets) Simplify calculations/interpretation of models Mean center Scaling

15 Slide 15 FID s to spectra (I) Use the same processing parameters for all spectra! Window function with parameters Exponential Multiplication with a line broadening factor of 1 Hz Number of data points in the final spectrum data points/20 ppm/600 MHz = 2.7 data points/hz Make sure the peaks are properly defined.

16 Slide 16 FID s to spectra (II) Adjust each spectrum individually Phase correction: Adjust only zeroth-order phase constant if possible

17 Slide 17 FID s to spectra (III) Base line adjustment Make sure the base line is represented in the spectrum (large SW) Use a simple function (2 nd or 3 rd order polynomial)

18 Slide 18 Calibration of ppm-scale Select a reference peak In all spectra TMS, DSS or Residual solvent signal Sharp, well resolved

19 Slide 19 Project data on a common axis (I) Discrete data points in different spectra are not necessarily aligned Normally a very small effect

20 Slide 20 Project data on a common axis (II) Serum, CPMG, b-glc H-1 after calibration d 1 H [ppm] 4.62

21 Slide 21 Project data on a common axis (III) Serum, CPMG, b-glc H-1 after calibration 4.64 d 1 H [ppm]

22 Slide 22 Project data on a common axis (IV) Serum, CPMG, b-glc H-1 after calibration 4.64 d 1 H [ppm]

23 Slide 23 Normalization Make data directly comparable with each other by removing known variation by reducing unknown variation Variation caused by different amounts/concentrations/volumes instrument settings (tuning/matching, gain) Variation expressed as additive effects (base line) multiplicative effects Context dependent processing! urine, serum, juice, depending on the type of samples, sampling schemes and sample preprocessing

24 Slide 24 Some Normalization schemes Normalize to constant sum constant squared sum highest signal Find a common constant feature in the spectra internal standard invariant metabolite (e.g. urinary creatinine/body weight)

25 Slide 25 Normalization Be pragmatic if it works, it s probably ok! But make sure the sampling and analysis parameters are kept constant Some normalization schemes will introduce new correlations Normalize to constant sum = if one signal increases, others are decreased 1.5 Before normalization to constant sum After normalization to constant sum

26 Slide 26 Binning Binning = Bucketing = Integration of spectral ranges Reduce data set typical spectra: data points (64k) binned data ~200 data points Remove variability of chemical shifts temperature ph concentration overall composition of samples (salt, proteins, ) Reduce effects of differences in shimming the area of a peak is a more robust measure than intensity value of each point

27 Slide 27 Binning Integration into smaller ranges Bucketing or binning Start with equidistant ranges, ~ ppm Combine vicinal buckets with a high degree of co-variation

28 Slide 28 Mean/median centering Removes (subtracts) the mean/median value of each variable Operates on the columns of the data matrix (for each variable/bucket) Centering of the data gives more stable numerical solutions for the PCA (and other transformations). If not used the first pc will be the mean spectrum Use median centering for a more robust centering less sensitive to outliers

29 Slide 29 Centering Raw data, before centering Values ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm

30 Slide 30 Centering mean centering Values x ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm Values ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm

31 Slide Centering median centering Values x ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm Values ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm

32 Slide 32 Scaling Scaling sets the weighting (importance) of each variable in the models For NMR-spectroscopic data the largest signals have the highest variance small signals have low variance noise have lowest variance Serum CPMG-spectra (AFB) Standard deviation d 1 H [ppm]

33 Slide x 10-3 Auto scaling = univariate scaling Auto scaling (variables divided by standard deviation, variance set to 1). Values ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm 2 1 Values ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm

34 Slide x 10-3 Pareto scaling Pareto scaling (variables divided by the square root of the standard deviation). Values ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm Values ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm

35 Slide 35 Scaling the math behind Centering Removes the offset in the data Highlights the differences within each variable ~ x ij = x ij - x ij Auto scaling/univariate scaling Sets the variance of each variable to unity. Inflates the noise. All signals equally important. ~ x ij = x ij - s i x ij Pareto scaling Reduce relative importance of large values. Scaling effect between no scaling (only centering) and auto scaling. ~ x ij = x ij - s i x ij

36 Slide 36 PCA Principal Component Analysis (PCA) Calculate scores and loadings Data reduction (from data points to two ) Keep the variance, don t show the noise Display the relationships between samples Loadings S X (systematic + random variation)

37 Slide 37 PCA Principal Component Analysis (PCA) Calculate scores and loadings Data reduction (from data points to two ) Keep the variance, don t show the noise Display the relationships between samples Loadings S X (systematic variation) E (random variation)

38 Slide 38 PCA Principal Component Analysis (PCA) Calculate scores and loadings Data reduction (from data points to two ) Keep the variance, don t show the noise Display the relationships between samples