Clinical and Translational Bioinformatics

Size: px
Start display at page:

Download "Clinical and Translational Bioinformatics"

Transcription

1 Clinical and Translational Bioinformatics An Overview Jussi Paananen Institute of Biomedicine September 4 th, 2015

2 Bioinformatics Bioinformatics combines statistics, computer science and information technology to solve challenges in the biological domain

3 Sub-areas of Bioinformatics Analysis Data management Visualization Software engineering

4 WISDOM KNOWLEDGE INFORMATION DATA

5 WISDOM KNOWLEDGE INFORMATION DATA BIOINFORMATICS

6 Analysis Process of inspecting and transforming data to discover useful information, and helping to turn the information to actionable knowledge

7 Data management Process of obtaining, saving, organizing and archiving data

8 Visualization Creating visual representations of data, for the purpose of interpretation, exploration or communication

9 Software engineering Process of applying engineering to the design, development and maintenance of software

10 Computational Biology / System Biology vs. Bioinformatics Process of creating theoretical models and modeling biological phenomena & systems

11 Application fields of Bioinformatics Anything concerned with life or living organisms

12 Biomedicine and Bioinformatics Medicine, Genetics, Genomics, Metabolomics, Molecular biology, Biochemistry

13 Bioinformaticians vs. Bioinformatics Researchers Bioinformaticians apply bioinformatics methods and tools, bioinformatics researchers develop new methods and tools

14 Case: Disease Genetics 1. Design a study to identify genetic risks (Analysis) 2. Collect DNA samples and perform DNA whole genome sequencing 3. Store and archive the sequence data (Data management) 4. Find genetic variations associated with the disease (Analysis) 5. Communicate the findings (Visualization) 6. Create a product to help in medical decision making (Software engineering)

15 Basic Science Basic research is performed without thought of practical ends. It results in general knowledge and an understanding of nature and its laws. This general knowledge provides the means of answering a large number of important practical problems, though it may not give a complete specific answer to any one of them. National Science Foundation, 1945

16 Clinical Research 1. Patient-oriented research. Research conducted with human subjects (or on material of human origin such as tissues, specimens and cognitive phenomena) for which an investigator (or colleague) directly interacts with human subjects. Excluded from this definition are in vitro studies that utilize human tissues that cannot be linked to a living individual. Patient-oriented research includes: (a) mechanisms of human disease, (b) therapeutic interventions, (c) clinical trials, or (d) development of new technologies. 2. Epidemiologic and behavioral studies. 3. Outcomes research and health services research - National Institutes of Health, 1997

17 Translational Research Translational research includes two areas of translation. One is the process of applying discoveries generated during research in the laboratory, and in preclinical studies, to the development of trials and studies in humans. The second area of translation concerns research aimed at enhancing the adoption of best practices in the community. Cost-effectiveness of prevention and treatment strategies is also an important part of translational science. - National Institute of Health, 2007

18 Bioinformatics Contains several sub-fields, including Basic Bioinformatics, Clinical Bioinformatics and Translational Bioinformatics

19 Basic Bioinformatics Applying bioinformatics to generate general knowledge and understanding of biological phenomena

20 Clinical Bioinformatics Applying bioinformatics to clinical applications

21 Translational Bioinformatics (TBI) Applying bioinformatics to translate research discoveries to practical applications with impact on health and well-being

22 BASIC BIOINFORMATICS CLINICAL BIOINFORMATICS HEALTH INFORMATICS TRANSLATIONAL BIOINFORMATICS

23 The Core of Bioinformatics Solving challenges related to translating data to knowledge

24 DEPTH (NUMBER OF MEASUREMENTS) AMOUNT OF DATA WIDTH (NUMBER OF SUBJECTS)

25 MEASUREMENTS SUBJECTS AMOUNT OF DATA

26 High-throughput technologies The depth of data has increased enormously during the last decade

27 TECHNOLOGY RAW DATA PROCESSED MEASUREMENTS Metabolomics Gene expression microarray SNP microarray RNA-sequencing (10x) (1.3 G) DNA-sequencing (10x) (68 G)

28 MEASUREMENTS SUBJECTS AMOUNT OF DATA 68 G 1 68 G 68 G T 68 G T * 68 G T 68 G P ** * Maximum Excel worksheet size is rows by columns = 17.2 T ** data points

29 Courier font (10 pt) A4 paper, no margins characters / side 6.8 P / = (1.8 T) double sided pages Average copy paper is about 0.1 mm thick

30

31 Double sided pile of A4 papers with 6.8 P of data printed, km Planet Earth, diameter km

32 Practical example Data from whole genome sequencing project of 30 individuals, around 5 TB of data

33 Data storage requirements Modern desktop computers have around 1-2 TB of storage space. All computers at UEF have probably around 1 PB of storage space, could fit around whole genome sequences.

34 Transferring 5 TB of data NETWORK TRANSFER SPEED TIME UEF internal servers 1 Gb/s 11 h UEF desktop computers 100 Mb/s 5 d Internet 10 Mb/s 46 d Internet (international) 1 Mb/s 1.3 y

35 Speaking of time.. Analyzing association of a single phenotype to risk factors, assuming it takes seconds per factor. NUMBER OF RISK FACTORS TIME s s s s m 40 s m

36 Speaking of time.. Analyzing association of combinations of risk factors to a single phenotype, assuming it takes seconds per combination. FACTORS IN COMBINATION TIME 1 17 m 2 16 y y

37 Cloud computing to the rescue Utilize remote servers by sharing their resources with clients upon demand Can be very cost efficient depending on your needs Security, privacy & legislation challenges when moving data to remote cloud servers/commercial entities

38 Non-data related challenges Understanding Communication Regulation/responsibilities

39 Statistical errors in manuscripts submitted to Biochemia Medica journal Ana-Maria Šimundić*, Nora Nikolac Biochemia Medica 2009;19(3): Materials and methods: All original scientific and professional manuscripts submitted to Biochemia Medica during the were eligible for the study, if they contained some kind of statistical analysis of the data. Manuscripts were reviewed manually by two reviewers. Following errors were included: 1) incorrect use or presentation of descriptive analysis; 2) incorrect choice of the statistical test; 3) incorrect use of statistical test for comparing three or more groups for differences; 4) incorrect presentation of P value; 5) incorrect interpretation of P value; 6) incorrect interpretation of correlation analysis; 7) power analysis not provided. Results: A total of 55 eligible manuscripts were identified. None of these manuscripts reported power analysis. As of other 6 errors analyzed, at least one error was observed in 48/55 (0.87) manuscripts. Most common errors were incorrect use of statistical test for comparing three or more groups for differences (21/28 (0.75)) and incorrect presentation of P value (36/54 (0.66)).

40 Erroneous analyses of interactions in neuroscience: a problem of significance Sander Nieuwenhuis, Birte U Forstmann & Eric-Jan Wagenmakers Nature Neuroscience 14, (2011) In theory, a comparison of two experimental effects requires a statistical test on their difference. In practice, this comparison is often based on an incorrect procedure involving two separate tests in which researchers conclude that effects differ when one effect is significant (P < 0.05) but the other is not (P > 0.05). We reviewed 513 behavioral, systems and cognitive neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) and found that 78 used the correct procedure and 79 used the incorrect procedure. An additional analysis suggests that incorrect analyses of interactions are even more common in cellular and molecular neuroscience. We discuss scenarios in which the erroneous procedure is particularly beguiling.

41 Why most published research findings are false Ioannidis JP PLoS Med Aug;2(8):e124 There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

42 xkcd.com

43 BASIC BIOINFORMATICS CLINICAL BIOINFORMATICS HEALTH INFORMATICS TRANSLATIONAL BIOINFORMATICS REGULATION/RESPONSIBILITIES

44 Translational example from real life..

45

46

47

48