Bioinformatika a výpočetní biologie. KFC/BIN VIII. Systémová biologie

Size: px
Start display at page:

Download "Bioinformatika a výpočetní biologie. KFC/BIN VIII. Systémová biologie"

Transcription

1 Bioinformatika a výpočetní biologie KFC/BIN VIII. Systémová biologie RNDr. Karel Berka, Ph.D. Univerzita Palackého v Olomouci

2 Syllabus Pathway Analysis KEGG Whole cell simulation

3 Pathway analysis

4

5

6

7

8

9

10

11

12

13

14 Caveats

15

16

17

18

19

20

21

22 Detailed view on KEGG database

23 What Is KEGG? KEGG is a repository of pathways, genes, orthologs, drugs, functional hierarchies, ligands and diseases KEGG stands for Kyoto Encyclopedia of Genes and Genomes The KEGG databases allow for linking (and displaying) of the various biological components Mike Sweredoski

24 KEGG Pathway Maps Mike Sweredoski

25 How to get started KEGG is located at: You can start by search for a single protein id at Note, you must enter the IPI ids like IPI:IPI , Uniprot Ids like Uniprot:Q08170, SGD ORFs like sce:ygl111w But, you probably have a long list of proteins you would like to analyze Mike Sweredoski

26 Dealing with many proteins I developed a quick tool to retrieve all of the KEGG ids associated with your MaxQuant analysis and identify which pathways you have identified most of the genes This tool is currently available at Note: This script is a work in progress. There are several issues regarding protein groups and missing KEGG annotations that I m still working on

27 Mike Sweredoski An Example: SILAC SGD Step One: Upload proteingroups.txt to KEGGER

28 This link will show all KEGG Gene Ids KEGG Pathways are sorted by percentage of genes found in sample Mike Sweredoski An Example: SILAC SGD These links will take you to the KEGG Pathway These Page links will show all the KEGG Ids found for the individua l

29 Mike Sweredoski An Example: SILAC SGD Suppose we are interested in looking further at the Proteasome, for which we have found 34 out of 35 yeast genes We can click on the link taking us to the KEGG page for the Proteasome, or we view the list of yeast genes we ve identified in the proteasome

30 An Example: SILAC SGD We can combine this list of KEGG genes with the KEGG maps This will allow us to visualize which components we have identified To do this, we copy the KEGG gene list, and navigate to r_pathway.html Once there, we need to change Search Against to our species of interest, in this example Saccharomyces cerevisiae We then paste the list in the text box I like to change the default color to red so they stand out Mike Sweredoski

31 Mike Sweredoski An Example: SILAC SGD Maybe several pathways identified, but we will select the pathway of interest

32 An Example: SILAC SGD Identified genes in our default color (red) Unidentified known yeast genes are in pale green

33 Mike Sweredoski An Example: SILAC SGD

34 and back to pathway analysis

35 Example of stages

36 Example: microrna network in REH/MSC cells

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51 A Whole-Cell Computational Model Predicts Phenotype from Genotype Literature review Roman Laskowski 19th September 2011

52 Na skripta/bin/cell-wholesimulation.pdf samotný článek skripta/bin/wholecellsim.mp4 - video

53 Model organism: M. genitalium

54 Whole-cell simulation M. genitalium 525 genes

55 Whole-cell simulation Previous methods: ODEs ordinary differential equations - difficulty of obtaining model parameters Boolean network modelling - fewer parameters required Constraint-based modelling - not practical for whole-cell models

56 Cellular function models Each timestep is 1 sec Modules use current cell variables to calculate their effect on them Loop until cell divides Poisson processes Flux-balance analysis

57 End of simulation - when pinched diameter is zero

58 Overview

59 Validation Simulated 128 cells in typical Mycoplasma culture environment Predictions: Cell properties - cell mass - growth rate Molecular properties - count - localization - activity

60 Training Observed doubling time Observed doubling time Cellular chemical composition Major cell mass fractions

61 Independent validation 1. Metabolic fluxes

62 Independent validation 2. Metabolite concentrations

63 Independent validation 3. Bursts of protein synthesis Caused by - intermittent mrna expression - availability of amino acids following protein degradation

64 Independent validation 4. Copy number distribution

65 Protein-DNA interactions Model has 30 DNA-binding proteins Chromosome explored v. quickly 50% of chromosome by 1 or more proteins within the first 6 mins 90% within 20 mins RNA polymerase binds 90% of chromosome within 49 mins 90% of genes are expressed within the first 143 minutes

66 DNA replication

67 Protein-protein collisions on chromosome Over 30,000 collisions occur per cell cycle Nearly 1 protein is displaced from chromosome per second Most collisions are caused by RNA polymerase (84%) and DNA polymerase (8%) Most commonly displaced proteins are: structural maintenance of chromosome (SMC) proteins (70%) and single-stranded binding proteins (6%)

68 Rate of DNA replication Initial rapid DNA replication Acts as a control on cell cycle duration Rate limited by available dntp (deoxyribonucleotide triphosphate)

69 Synthesis of energy storers Mainly used in production of protein and mrna

70 Waste of energy 44% discrepancy between synthesis and use of ATP and GTP

71 Knock-out simulations Knocked out each of the 525 genes in turn Found 284 genes to be essential for growth and division and 117 nonessential Unable either to produce one of the crucial biomass components, or preventing division

72 Knock-out studies

73 Use of model for biological discovery Experimentally measured growth rates of 12 single-gene-disruption strains 2/3 of the growth rates matched the predicted rates Investigation of the discrepancies led to new insights into the organism s biology However, model should be just considered a first draft Plus, M genitalium is not as experimentally tractable as E coli