Microbiome Analysis in Kawasaki Disease Kristine M. Wylie, Susan C. Baker, George M. Weinstock, Stanford T. Shulman, and Anne H. Rowley
Disclosures Kristine M. Wylie Microbiome Analysis in Kawasaki Disease No relevant financial relationships exist
Evidence for a viral etiology for Kawasaki Disease Epidemiologic Epidemics with geographic spread Seasonal predominance in non-temperate climates Clinical History of preceding respiratory symptoms Failure of antibiotic therapy Ultrastructural Intracytoplasmic inclusion bodies and virus-like particles detected in the bronchial epithelium of KD tissues but not controls (oligoclonal IgA) Gene expression Up-regulated cytotoxic T-cell and type I IFN pathways in coronary arteries from KD patients compared with controls (Presented at this meeting by Anne Rowley)
High-throughput metagenomic shotgun sequencing (MSS) A powerful approach for identifying pathogens associated with diseases of unknown etiologies No a priori knowledge of the pathogen required No culturing needed Insensitive to minor sequence variation that can confound molecular tests
MSS overview Sample collection and processing Frozen and FFPE tissues used Nucleic acid extracted (can use DNA/RNA/TNA) We focused on RNA due to previous data that suggested the etiologic agent is an RNA virus Library construction and sequencing cdna made from RNA Sheared to uniform size Sequencing library adapters ligated to ends Data generated on Illumina HiSeq platform 2x100 paired-end sequencing: 100 bp sequenced from each end of the fragment Sequence analysis Described on next slide Interpretation and additional follow up
MSS analysis overview Initial processing 100 base sequences Trim low quality, mask low complexity, etc. Known sequences Related to known sequences Unclassified sequences Assembly of overlapping sequences Alignment against reference genomes Alignments o Nucleotide o Translated amino acid o HMM Short sequence reads Contig (consensus sequence) Classification: human, bacterial, fungal, viral, other Unclassified sequences
Identify unclassified sequences that are shared among samples
Application of MSS to identifying viral pathogens in diseases of unknown etiologies Subject was 20-month-old boy with fever and petechial rash, no other pathogenic agents detected Plasma sequencing yielded sequences with remote similarity to astrovirus MLB-1 While this was being investigated, we learned astrovirus MLB-2 had been discovered and was being characterized by Dave Wang s group at WU. We obtained the complete genome sequence of MLB-2 from them. 238 reads were identified with > 80% identity to MLB-2, confirmed by specific RT-PCR MLB-2 was not found in 188 other samples screened by consensus astrovirus PCR First detection of an astrovirus in blood unexpected pathogen Holtz, L. R. et al. Astrovirus MLB2 viremia in febrile child. Emerging Infect. Dis. 17, 2050 2052 (2011). Wylie, K. M., et al. Sequence analysis of the human virome in febrile and afebrile children. PLoS ONE 7, e27735 (2012).
Application to Kawasaki Disease samples sequenced Cases number of sample Controls number of sample Tissue type Coronary artery (CA) 7 7 FFPE Serum 11 10 Frozen Throat 10 4 Frozen Lung 8 0 FFPE Heart 3 1 FFPE Spleen 1 0 Frozen 4.9 billion sequence reads generated Minimum 32 million Maximum 500 million Non-human sequences ranged from <1% (serum) to 99% (throat)
Virus discovery in sequence data Color Key 0 40 80 Value Case Control Metastats does not find any differentially represented viral candidates Human viruses do not associate with KD Reagent or environmental contaminants Parvo-like hybrid (Naccache, et al, PNAS) Many plant viruses Also likely contaminants Human.rhinovirus Rotavirus.A Anellovirus Human.parainfluenza.virus.1 Human.parechovirus.6 Human.parvovirus.B19 Paramecium.bursaria.Chlorella Parvo.like.hybrid.virus Acute.bee.paralysis Aphid.lethal.paralysis.virus Circovirus.like.genome No.genus..species..Brome.mosaic.virus Cherry.leaf.roll Cucumber.green.mottle Cucumber.mosaic.virus Lettuce.big.vein.associated Mirafiori.lettuce.big.vein Mirafiori.lettuce.virus Papaya.ringspot.virus Peanut.stunt.virus Pepper.mild.mottle.virus Tobacco.mosaic.virus Tomato.chlorotic.spot Tomato.mosaic.virus Tomato.spotted.wilt.virus Zucchini.yellow.mosaic.virus 9374X1.CASE_HEART 9374X3.CASE_HEART 9374X8.CASE_HEART 9374X6.CTRL_HEART D 10229X4.CASE_LUNG 10535X1.CASE_LUNG 10535X4.CASE_LUNG 9374X2.CASE_LUNG 9374X4.CASE_LUNG 9374X5.CASE_LUNG 9374X7.CASE_LUNG KD1.CASE_LUNG KD3.CASE_LUNG C 10535X2.CASE_THROAT KD2.CASE_THROAT 10535X3.CTRL_THROAT B 8898X6.CASE_SPLEEN A 62UW4AAXX_8.CASE_SERUM 630WBAAXX_2.CASE_SERUM 70EY4AAXX_2.CASE_SERUM 70EY4AAXX_3.CASE_SERUM 70EY4AAXX_4.CASE_SERUM 70EY4AAXX_5.CASE_SERUM 70EY4AAXX_6.CASE_SERUM 70EY4AAXX_7.CASE_SERUM 9639X1.CASE_SERUM 9639X2.CASE_SERUM 9785X1.CASE_SERUM 62VDFAAXX_5.CTRL_SERUM 630WBAAXX_1.CTRL_SERUM 630WBAAXX_3.CTRL_SERUM 630WBAAXX_4.CTRL_SERUM 630WBAAXX_5.CTRL_SERUM 630WBAAXX_6.CTRL_SERUM 630WBAAXX_7.CTRL_SERUM 630WBAAXX_8.CTRL_SERUM 70EY4AAXX_1.CTRL_SERUM 70EY4AAXX_8.CTRL_SERUM E 10229X1.CASE_CA 10229X2.CASE_CA 10229X3.CASE_CA 8502X3.CASE_CA 8502X4.CASE_CA 8898X1.CASE_CA 8898X2.CASE_CA 8502X1.CTRL_CA 8502X2.CTRL_CA 8898X3.CTRL_CA 8898X4.CTRL_CA 8898X5.CTRL_CA 8898X7.CTRL_CA 8898X8.CTRL_CA Heart Lung Throat Spleen Serum Coronary artery
Bacterial/fungal discovery in sequence data Color Key Metastats does not find any differentially represented bacterial or fungal candidates Many low level background bacteria/fungi (a smaller representative set shown here) Some known pathogens, but not associated with KD Many organisms that look like environmental or kit contaminants, but not associated with KD 0 40 Value Stenotrophomos.maltophilia.K279a Methanococcus.sp. Ruminococcus.sp. Streptobacillus.moniliformis.DSM.12112 Mollicutes.bacterium.D7 Prochlorococcus.marinus Acinetobacter.sp. Candida.albicans Staphylococcus.sp. Lactobacillus.gasseri.224.1 Fusobacterium.sp. Mycoplasma.sp. Clostridium.sp. Propionibacterium.acnes Streptococcus.sp. Neisseria.sp. Heamophilus.sp. Lindgomyces.breviappendiculatus Lentithecium.aquaticum Boeremia.exigua Teratosphaeria.fibrillosa Leptosphaeruli.australis Endosporium.populi.tremuloidis Westerdykella.cylindrica Zygosaccharomyces.fermentati Dissoconium.commune Hysterobrevium.mori Mytilinidion.resinicola Phoma.betae Aliquandostipite.khaoyaiensis Mycosphaerella.graminicola Dothidea.insculpta Case Control 9374X1.CASE_HEART 9374X3.CASE_HEART 9374X8.CASE_HEART 9374X6.CTRL_HEART D 10229X4.CASE_LUNG 10535X1.CASE_LUNG 10535X4.CASE_LUNG 9374X2.CASE_LUNG 9374X4.CASE_LUNG 9374X5.CASE_LUNG 9374X7.CASE_LUNG KD1.CASE_LUNG KD3.CASE_LUNG C KD2.CASE_THROAT 10535X3.CTRL_THROAT B 8898X6.CASE_SPLEEN A 62UW4AAXX_8.CASE_SERUM 630WBAAXX_2.CASE_SERUM 70EY4AAXX_2.CASE_SERUM 70EY4AAXX_3.CASE_SERUM 70EY4AAXX_4.CASE_SERUM 70EY4AAXX_5.CASE_SERUM 70EY4AAXX_6.CASE_SERUM 70EY4AAXX_7.CASE_SERUM 9639X1.CASE_SERUM 9639X2.CASE_SERUM 9785X1.CASE_SERUM 62VDFAAXX_5.CTRL_SERUM 630WBAAXX_1.CTRL_SERUM 630WBAAXX_3.CTRL_SERUM 630WBAAXX_4.CTRL_SERUM 630WBAAXX_5.CTRL_SERUM 630WBAAXX_6.CTRL_SERUM 630WBAAXX_7.CTRL_SERUM 630WBAAXX_8.CTRL_SERUM 70EY4AAXX_1.CTRL_SERUM 70EY4AAXX_8.CTRL_SERUM E 10229X1.CASE_CA 10229X3.CASE_CA 8502X3.CASE_CA 8502X4.CASE_CA 8898X1.CASE_CA 8898X2.CASE_CA 8502X1.CTRL_CA 8502X2.CTRL_CA 8898X3.CTRL_CA 8898X4.CTRL_CA 8898X5.CTRL_CA 8898X7.CTRL_CA 8898X8.CTRL_CA F Heart Lung Throat Spleen Serum Coronary artery
Discovery in unclassified contig data Metastats used to identify contigs more common in KD than controls Subset only in KD and Prioritizing for follow up not in controls shown here Contigs Cases Controls 10535X1.CASE.L 9374X5.CASE_L 9374X5.CASE_L 9374X5.CASE_L 9374X5.CASE_L KD1.CASE_LUN KD1.CASE_LUN KD1.CASE_LUN KD1.CASE_LUN KD1.CASE_LUN KD1.CASE_LUN KD1.CASE_LUN KD1.CASE_LUN
Current candidates Best candidates are sequences predominantly in patient samples but not controls Could be highly divergent from known viruses Could be very low abundance in samples Current work with existing data: Continue to try to classify sequences with updated databases Validate candidates with real-time RT-PCR assays on a larger set of cases controls Proceed cautiously due to confounding factors associated with pathogen discovery in MSS data
Confounding factors associated with MSS analysis Can find microbes (known or novel) that are contaminants from reagents, water, enzymes, etc. Contaminants can sometimes associate with a clinical phenotype by chance Example: If control samples are typically higher biomass (signal), case samples will have a higher representation of contaminants (noise) While we aim for relatively unbiased representation of the metagenome, biases are introduced during sample preparation Classifying sequences based on similarity to reference genomes is limited by the quality of the databases Missing entries Mis-labeled entries
Summary and conclusions Most comprehensive ultra-deep sequencing study to identify a causative agent of KD to date No etiologic agent of KD identified yet, but we are pursuing candidates We suspect that sequences of the agent are presently unclassified Prioritizing ORFs associated with KD samples for further study Possible challenges the agent is highly divergent from sequenced reference genomes causes a very early viremia is highly cell-associated is present in very low quantity in patient tissues may also be present in some controls. Our work highlights the challenges of identifying a new viral agent
Support for this work National Institutes of Health HL63771, HL109955, AI106030 (to AHR), the Max Goldenberg Foundation, and the Center for Kawasaki Disease at the Ann & Robert H. Lurie Children's Hospital of Chicago.