Advances in genomic technologies and understanding infection

Size: px
Start display at page:

Download "Advances in genomic technologies and understanding infection"

Transcription

1 Advances in genomic technologies and understanding infection Mick Watson Director of ARK-Genomics The Roslin Institute

2 Structure The Roslin Institute Challenges in food security Genetics and Genomics Technologies sequencing and arrays Applications The future

3 The Roslin Institute LIVESTOCK GENETICS Bioscience underpinning health ANIMAL HEALTH BIOTECH HUMAN HEALTH Food security

4 The Roslin Institute: Aims Enhance animal health and welfare through knowledge animal genetics Enhance sustainability and productivity of livestock systems Enhance food safety by understanding host-pathogen interactions Enhance human health through an understanding of disease Identify new and emerging zoonoses Enhance quality of life for farmed animals Highlights C. McKenzie Wilmut at al (1997) Viable offspring derived from fetal and adult mammalian cells. Nature 385(6619):810-3 Zhao et al (2010) Somatic sex identity is cell autonomous in the chicken. Nature 464(7286): Lyall et al (2011) Suppression of avian influenza transmission in genetically modified chickens. Science 331(6014):223-6

5 CHALLENGES IN FOOD SECURITY

6

7 GENETICS AND GENOMICS

8 Genetics and Genomics in a nutshell What? and Why? Identify a heritable trait of interest Disease resistance Strength of immune response Response to vaccine Production traits (meat quality, weight, %fat etc) Identify genomic variation What causes it? Genetic markers identified using SNP genotyping, GWAS Why does it happen? Molecules and pathways identified using functional genomics RNA-Seq, ChIP-Seq, Microarrays etc

9 Timeline of genetics Reference genome sequenced Re-sequence interesting breeds, compare to the reference Design whole-genome variation chip Screen many individuals using SNP chip, case/control Associate genetic markers with trait

10 THE TECHNOLOGIES

11 Next-generation sequencing Ultra-highthroughput Characterised by Many millions of short reads Dominated by 3 companies Roche 454 FLX pyrosequencing Illumina / Solexa (SBS) ABI SOLiD / Ion Torrent

12 SEQUENCING THE BIG BOYS

13 Roche 454 Complex emulsion PCR library prep Read lengths average bp Typically 1 million reads per run Run takes 10 hours 21 days to create 23Gb of data FLX+ out now Read length average 700bp Both have paired-end Homo-polymer problems

14 ABI SOLiD Reads are colourspace Colours indicate incorporation of dinucleotide Current machine is 5500xl Read lengths are 35bp, 60bp or 75bp 75 bp (fragment) 75 bp x 35 bp (paired-end) Up to 60 bp x 60 bp (mate-paired) 4.8 billion paired-end reads per run A run takes 7 days Just over half a day to create 23Gb of sequence

15 Illumina (Solexa) Bridge PCR and sequencing by synthesis on dense arrays Read lengths are 36, 57, 78, 101 and 150bp (GAIIx) Read lengths are 35, 50, 75 and 100bp (HiSeq 2000) Up to 640 million paired-end reads (GAIIx) Up to 4.8 billion paired-end reads (HiSeq 2000)

16 SEQUENCING THE BENCHTOPS

17 Roche 454 Junior Same technology as 454 FLX Read length: 400 bases 100,000 reads 12 hours Output 35-70Mb

18 LifeTech: Ion Torrent Semi conductor based sequencing Records the release of H+ atoms when bases are incorporated 2 hours of sequencing Need extra kit for library prep bp reads 100million reads Homopolymer problems

19 Illumina MiSeq Same technology as HiSeq , 100 or 150bp reads Paired-end 1 x 35bp is 4 hours, total run time 8 hours 2 x 150bp is 27 hours 3.4M paired reads 1-2Gb of data

20 SEQUENCING SINGLE MOLECULES

21 Helicos Heliscope First SMS produced 25-55bp reads (avg 35bp) 600M to 1b reads per run 21 35Gb per run Raw error rate Substitution 0.2% Insertion 1.5% Deletion 3.0%

22 Pacific Biosciences RS PacBio get mean 2.3Kb read lengths, max 17kb Others get 1.7Kb, maximum 6kb Raw read 85% accurate Circularize: 95% accurate Shorter read length Errors are again indels

23 ARRAYS SNP AND EXPRESSION

24 High density SNP arrays Far cheaper to assay via SNP array than sequence High density SNP arrays available for Cattle Illumina (770,000 SNPs) Affymetrix (648,000 SNPs) Chicken Release 2011/12 700,000 SNPs Smaller SNP chips available for sheep, pig etc Affymetrix GeneTitan 192 samples per run 700k x 192 = 134M assays per run

25 Microarrays Still used! Cheap! Dominated by Affymetrix Agilent Nimblegen Illumina OGT Gene expression, array CGH, microarrays etc

26 ARK-GENOMICS

27 ARK-Genomics High-throughput facility focusing on the genetics and genomics of animals Based at the Roslin Institute, University of Edinburgh Offering research, collaborations and service provision Investing in the latest genomics technologies Sequencing Genotyping Transcriptomics Comparative Genomics Bioinformatics

28 Technologies DNA Sequencing Illumina Sequencing Up to 150bp paired Novel genomes Resequencing RNA-Seq ChIP-Seq Epigenetics Illumina HiSeq 2000 Sanger 3730 Genotyping Illumina - from HD to custom chips iscan, Inifinium BeadXpress, Goldengate BeadChip Affymetrix GeneTitan, Axiom Process 96 arrays / run Microarrays Gene Expression Affymetrix Agilent Illumina Whole genome Exon-level microrna CGH, ChIP-Chip, MeIP Nimblegen

29 Bioinformatics Genome Sequencing Align to a reference De novo genome assembly SNP discovery RNA-Seq or microarray analysis microrna analysis ChIP-Seq peak analysis CNV detection Pathway / GO term enrichment Data Integration Network analysis Systems Biology

30 APPLICATIONS

31 Applications of NGS Novel genome sequencing Re-sequencing Discover genome variation Targetted re-sequencing Exome sequencing High-throughput pathogen sequencing Including metagenomics RNA-Seq ChIP-Seq microrna-seq Methylated DNA sequencing

32 Genome wide association (GWAS) Assay SNPs across the genome (millions) In as many samples as possible (thousands) Case / Control study design Discover genetic variants associated with traits eg disease Use SNP arrays Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 447(7145):661-78

33 Genomic Selection Traditional breeding Based on physical/actual measures Noddy: if you take a big bull and a big cow -> big calves Breeders are interested in breeding value How fit is any individual? Which should I breed from if I want to optimise particular traits? Genome selection Genotype individuals millions of markers Measure all traits (quantitatively) Create a predictive model which accurately maps SNPs -> traits In future, do not phenotype, simply use genotype

34 EXAMPLES OF CURRENT RESEARCH

35 Genome Sequencing The sheep genome ARK generated 140Gb of sequence from a single Texel ram (45X) Sheep Genome Consortium pairing this with 220Gb from BGI 21 chicken genomes For IAH and a variety of commercial breeders SNP discovery leading to improved breeding Functional analysis

36 Genome Sequencing: the IAH lines

37 Functional genomics Avian Oncogenic Viruses Marek s Disease T-cell lymphoma in poultry (MDV-1) MDV-1 genome encodes 14 micrornas Also encodes Meq oncogene Transformed cell lines show predominance of virus encoded micrornas [1, 2] and down-regulation of host micrornas REV-T 1. Yao Y, Zhao Y, Xu H, Smith LP, Lawrie CH, Watson M, Nair V: MicroRNA profile of Marek's disease virus-transformed T-cell line MSB-1: predominance of virus-encoded micrornas. J Virol 2008, 82(8): Yao Y, Zhao Y, Smith LP, Lawrie CH, Saunders NJ, Watson M, Nair VK: Differential expression of mirnas in Marek's disease virus-transformed T-lymphoma cell lines. J Gen Virol 2009, 90(Pt 7): Yao Y, Zhao Y, Smith LP, Watson M, Nair V: Novel micrornas encoded by herpesvirus of turkeys (HVT): Evidence of mirna evolution by duplication. J Virol 2009, 83(13): Avian reticuloendotheliosis virus Causes acute leukaemia Genome encodes the v-rel oncogene Does not encode any micrornas.

38 Bioinformatic analysis Question: which micrornas are implicated in REVmediated transformation? Focus on genes differentially expressed between d14 and d0 Used Bioconductor / limma to perform the analysis: 2562 down-regulated probes 1613 matched to an ensembl gene model 1215 was predicted to be a target by at least one microrna Used CORNA to predict microrna associations The mir cluster is also implicated van Haaften and Agami (2010) Tumorigenicity of the mir cluster distilled. Genes Dev 24(1):1-4

39 Metagenomics Metagenomics of the gut: the forgotten organ [1] Aim: to increase yield per kg food Rodents lacking gut flora need 30% more calories [2] Transplanted microbiota results in mice with increased weight despite decreased food consumption [3,4,5,6,7] Pharmabiotics of farm animals Associate genetics with population structure Associate population structure with diet Associate genetics, diet and population structure with yield Roslin / ARK-Genomics have several gut metagenome projects Ruminants and avian species

40 THE FUTURE

41 Genomics Grand Challenges What are we missing? E.g. Missing chicken chromosomes RNA species what do they do? Issues with (de novo) assembly Repeat sequences CNV, genome structure, structural variants How many genomes in an individual? Many genomes, many epigenomes! Metagenomes, the forgotten organ Public storage

42 Future Technologies Third-generation sequencing (TGS) Single molecule sequencing The Promise Billions of reads The current reality ~80,000 reads Challenges 10,000 50,000 read length 30bp, up to 1000bp High accuracy Each observation relates to zero, one or more realities Technology-aware software > 5% error rate Branton et al (2008) Nat. Biotech. 26(10):1146 Schadt et al (2010) Hum. Mol. Gen. 19(R2):R227

43 Personal / individual genomics 1000 genomes data petabyte scale TGS will enable entire scans of genomes, transcriptomes and epigenomes in minutes Huge data potential exabyte scale

44 Thank you! ARK-Genomics Richard Talbot Sarah Smith Alison Downing David Morrice Karen Troup Mark Fell Frances Turner Emily Richardson Caroline Gilhooly Roslin Alan Archibald Pete Kaiser Dave Burt IAH Venu Nair Yongxiu Yao Funders BBSRC MRC TSB