Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Size: px
Start display at page:

Download "Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids."

Transcription

1 Supplementary Figure 1 Number and length distributions of the inferred fosmids. Fosmid were inferred by mapping each pool s sequence reads to hg19. We retained only those reads that mapped to within a 3~50 kb region. (a) Fosmid number in each pool. On average, there were ~32 fosmids per pool. (b) Fosmid size. The average length was 36.8 kb.

2 Supplementary Figure 2 Fosmids physical coverage distribution. Blue curve denotes the theoretical coverage distribution, at an average coverage of 8x, and red curve denote the actual coverage. The average fosmid coverage was 8x, with a median of 7x. About 7% of YHref was not covered by fosmids, which may be due to a bias in the fosmid library construction and/or sequencing.

3 Supplementary Figure 3 Completeness of assembled sequence in each fosmid pool. The horizontal axis represents the percentage of the fosmid sequence that was assembled in each pool. The vertical axis represents the proportion of fosmid pools at that given percentage. In total, 88.5% of the assembled pools contained at least 80% of the fosmid sequence, and 53.2% of the assembled pools contained at least 95% of the fosmid sequence.

4 Supplementary Figure 4 Contiguity of assembled sequence for individual fosmids. The horizontal axis represents the ratio of the longest assembled sequence vs the inferred length of each defined fosmid. The vertical axis represents the proportion of fosmids at the given ratio. 54.7% of fosmids had a longest assembled sequence equal to, or longer than, half of the fosmid length. About 18% of the fosmids were completely assembled.

5 Supplementary Figure 5 Construction of the haplotype-resolved sequence. The top (orange) bar represents the non-phased YHref sequence and the bottom (multi-color) bar represents the haplotype-resolved output. The middle (blue) bars represent the fosmid assembled haploid (FAH) sequences belonging to the same haplotype.

6 Supplementary Figure 6 Theoretical N50 length of haplotype phasing and long homozygous region. a. Long homozygous regions (>=20 kb) for different populations in 1000 genomes project. Asians have more long homozygous region than other populations. This might be why YH had a shorter haplotype N50 than other individuals sequenced at a comparable fosmid depth. b. The theoretical N50 length distribution of haplotype phasing using the method of the current study, in 4 different individuals. Heterozygous marker numbers are shown at the top-left. The haplotype N50 of YH is expected to be 510 kb with a fosmid coverage of 4x per haplotype (or 8x for a 3 Gb genome).

7 Supplementary Figure 7 HDG coverage on hg19 and RefSeq genes. Our HDG sequence was aligned to the hg19 genome using Lastz. Coverage of the chromosomes and gene regions was calculated. Both means covered by the two assembled haplotypes (blue), Single means covered by just one assembled haplotype (red). a. Coverage information for each chromosome. b. Proportion of RefSeq gene at given coverage.

8 Supplementary Figure 8 Length distributions of insertions and deletions. a. Length distribution of short indels (<10 bp). Peaks at multiples of 3 bp in the exon distribution are expected because they do not disturb the reading frames. b. Length distribution of long indels (100 bp~1 kb). The peak at ~300 bp is due an enrichment for Alu element insertions and deletions. Note that there is no bias between insertion and deletion, which is progress compared to previous studies. c. Distribution of long indels (100 bp~1 kb) in unique versus repeat regions. As expected, there are more indels in the repeat regions and the peak at ~300 bp is more pronounced. d. Length distribution of homozygous and heterozygous long indels (100 bp~1 kb).

9 Supplementary Figure 9 SNP detection and intersection from different methods/platforms. A total of ~4.0 M SNPs were detected by three different methods/platforms. The majority (68.2%) of these was consistent between all three datasets. However, there were still tens of thousands of methods/platforms specific calls.

10 Supplementary Figure 10 Indel detection and intersection from different platforms. We show the number of small indels detected by each method/platform and their intersection, at a flank size of 50 bp. For the ~1 M indels detected, there was only 27.6% concordance.

11 Supplementary Figure 11 Example of a heterozygous deletion located inside a gene. This heterozygous deletion was detected by the ASV method but difficult to find by either WGS resequencing method. The yellow block in the reference is the region that was missing from hap2. Below are the WGS reads aligned to this region. This 151 bp deletion covered the 5-UTR and a part of exon1 for the gene PSMD1.

12 Supplementary Figure 12 Example of a heterozygous insertion located inside a gene. This heterozygous insertion was detected by the ASV method but difficult to find by either WGS resequencing method. The yellow block in hap1 is the region that was missing from the reference. Below are the WGS reads aligned to this region. Near the breakpoint there were very few reads, perhaps because the insert sequence influenced the alignment. This 54 bp insertion covered exon3 of the gene LATS2.

13 Supplementary Figure 13 Variation rate for YH vs hg19 and heterozygosity between the two haplotypes of YH. The curves at the top and the right summarize the distribution of heterozygosity rates for the two haplotypes of YH and the variation between YH and hg19, respectively. The black line indicates the 99% cutoff for each distribution.

14 Supplementary Figure 14 The classification of novel gene sequences. a. Classification of different types of novel and gap covered sequences. i) novel insertion; ii-iv), novel haplotypes; v-vii), gap covered sequences; viii), orphan scaffolds. b. Distribution of novel sequences based on their length and number, in 100 bp bins. Novel sequences of length >1000 bp accounted for 93% of the total length. The longest was 123 kb. c. Distribution of breakpoints for novel sequences. Most of the novel sequences were in non-coding (intron, repeat and intergenic) regions. Only 0.8% were in CDS regions. These distributions are subdivided by the length of the sequence, represented by the color bars. d. Repeat content based on RepeatMasker.

15 Supplementary Figure 15 Examples of cis- and trans-acting genes. a. Cis-acting gene DSPP on 4q22.1 encoding dentin sialophosphoprotein. Mutations in DSPP are associated with Dentinogenesis imperfecta, Shields type II, and deafness. b. Trans-acting gene CA9 on 9p13.3. Diseases associated with mutations in CA9 include horseshoe kidney and renal cell carcinoma. GO annotations include carbonate dehydratase activity.

16 Supplementary Figure 16 Allele specific methylation and expression. Venn diagram showing the relationship between allele specific methylation (ASM) and allele specific expression (ASE). The numbers refer to the gene count. The red/brown circle inside the larger ASM circle represents genes where ASM was detecting in the promoter region.

17 Supplementary Figure 17 Construction of the fosmid libraries. Approximately 30 fosmid clones were cultured together to form a single fosmid pool. Then, 3 g of DNA from each pool was digested, and fragments with insert size ranging from 180 to 800 bp were selected. Adapters containing the 11 bp barcode were ligated to these selected fragments to form a single pooled-fosmid library. Barcoded fragments from 60~320 single pooled-fosmid libraries were pooled again (evenly) to create a Stage I barcode library. DNA fragments of sizes between 180 bp to 650 bp (lengths exclude barcode) from each Stage I barcode library were used to construct two independent libraries (one with small insert sizes and one with intermediate insert sizes). Each library was then PCR amplified with index primers, each of which contained an 8 bp barcode, to form a Stage II barcode library.

18 Supplementary Figure 18 Indel positional concordance as a function of flank size for the different methods of detection. To determine the best flank size for use in indel detection, we plotted the concordance between the ASV and resequencing based analyses. The results stabilize at above 50 bp.

19 Supplementary Figure 19 Length distributions for method-specific short indels. Short indels (1-50 bp) detected only by one method/platform, were selected out and plotted according to the length. Top-right figure provided information for indels with length between 10 and 50.

20 Supplementary Figure 20 Example of ASV-specific indel supported by fosmid aligned reads. This was a 3 bp heterozygous deletion in a region covered by fosmids from eight independent pools, two of which supported the deletion.

21 SUPPLEMENTARY TABLES Supplementary Table 1. Summary of sequenced genome data Sequencing Type Insert size(bp) Read length(bp) Number of reads(m) Raw data(gbp) Fosmid library WGS-seq Small (180~300 bp) 93 9, Intermediate (450~650 bp) IL 93 9, , ~2K ~5K ~10K ~20K CG , a Total - 33,250 2,246 a:gross mapping yield

22 Supplementary Table 2. Genes in hypervariable regions.xlsx Supplementary Table 3. Annotation of predicted novel genes.xlsx Supplementary Table 4. Cis and Trans genes annotation.xlsx Supplementary Table 5. ASE and ASM gene analysis.xlsx

23 Supplementary Table 6. Parameters and filter criteria used in Lastz alignment. Variation SNP Short indel Inversion/Translocation Long indel Alignment Parameters --strand=both --hspthresh= chain --ambiguous=iupac --gapped --identity=90 --step=8 --word=31 --seed=12of19 "--strand=both --hspthresh= chain --ambiguous=iupac --gapped --ydrop= gap=2000,1 --identity=90 --step=19 --word=31 --seed=12of19" Filter Criteria 1. in 50bp flanking region: disallow consecutive N 2. distance between any two SNPs must over 5bp 1. in 50bp flanking region: disallow consecutive N ; disallow any other indel; mismatchs<3bp. 2. It should not be located in the boundary of each aligment block The neighboring alignment blocks must be in a good linear relation. 1. in 50bp flanking region: disallow consecutive N ; disallow any other indel over10bp. 2. It should not be located in the boundary of each alignment block

24 Supplementary Table 7. Summary of re-sequencing based variations. SNP Indel CG HS All 3,411,305 3,365,182 Ti/Tv hete/homo dbsnp137 3,368,094 3,322,168 novel 43,211 43,014 cording 21,054 19,763 Nonsynonymous 9,583 9,184 All 510, ,679 hete/homo dbsnp , ,733 novel 179, ,946 cording frameshift

25 Supplementary Table 8. Summary of detected novel sequence Summary Hap1(Logic) Hap2(Logic) XY Number Length(bp) Number Length(bp) Number Length(bp) All(un-redundant) 1,367 3,934,838 1,335 3,344, ,484 Novel insertion , , ,702 Novel haplotype 706 2,913, ,287, ,563 Novel(nomadic) , , ,219 Cover reference 'N' 420 3,149, ,016, ,082 *There is 1,183,474bp novel sequence share with the two haplotype

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Contents De novo assembly... 2 Assembly statistics for all 150 individuals... 2 HHV6b integration... 2 Comparison of assemblers... 4 Variant calling and genotyping... 4 Protein truncating variants (PTV)...

More information

The Diploid Genome Sequence of an Individual Human

The Diploid Genome Sequence of an Individual Human The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions.

Nature Genetics: doi: /ng Supplementary Figure 1. Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions. Supplementary Figure 1 Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions. Relationships of cultivated and wild rice correspond to previously observed relationships 40. Wild rice

More information

Supplementary Figures

Supplementary Figures Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived

More information

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure

More information

Processing Ion AmpliSeq Data using NextGENe Software v2.3.0

Processing Ion AmpliSeq Data using NextGENe Software v2.3.0 Processing Ion AmpliSeq Data using NextGENe Software v2.3.0 July 2012 John McGuigan, Megan Manion, Kevin LeVan, CS Jonathan Liu Introduction The Ion AmpliSeq Panels use highly multiplexed PCR in order

More information

Parts of a standard FastQC report

Parts of a standard FastQC report FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity Supplementary Figure 1 Read Complexity A) Density plot showing the percentage of read length masked by the dust program, which identifies low-complexity sequence (simple repeats). Scrappie outputs a significantly

More information

Map-Based Cloning of Qualitative Plant Genes

Map-Based Cloning of Qualitative Plant Genes Map-Based Cloning of Qualitative Plant Genes Map-based cloning using the genetic relationship between a gene and a marker as the basis for beginning a search for a gene Chromosome walking moving toward

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. Ideograms showing scaffold boundaries and segmental duplication locations.

Nature Methods: doi: /nmeth Supplementary Figure 1. Ideograms showing scaffold boundaries and segmental duplication locations. Supplementary Figure 1 Ideograms showing scaffold boundaries and segmental duplication locations. Blue lines mark the boundaries of scaffolds. Black marks show the locations of segmental duplications.

More information

Next-generation sequencing technologies

Next-generation sequencing technologies Next-generation sequencing technologies NGS applications Illumina sequencing workflow Overview Sequencing by ligation Short-read NGS Sequencing by synthesis Illumina NGS Single-molecule approach Long-read

More information

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for you to discover methylation changes at specific genomic

More information

Release Notes for Genomes Processed Using Complete Genomics Software

Release Notes for Genomes Processed Using Complete Genomics Software Release Notes for Genomes Processed Using Complete Genomics Software Version 1.11.0 Related Documents... 1 Changes to Version 1.11.0... 2 Changes to Version 1.10.0... 6 Changes to Version 1.9.0... 10 Changes

More information

Nature Biotechnology: doi: /nbt.3943

Nature Biotechnology: doi: /nbt.3943 Supplementary Figure 1. Distribution of sequence depth across the bacterial artificial chromosomes (BACs). The x-axis denotes the sequencing depth (X) of each BAC and y-axis denotes the number of BACs

More information

Supplementary Table 1: Oligo designs. A list of ATAC-seq oligos used for PCR.

Supplementary Table 1: Oligo designs. A list of ATAC-seq oligos used for PCR. Ad1_noMX: Ad2.1_TAAGGCGA Ad2.2_CGTACTAG Ad2.3_AGGCAGAA Ad2.4_TCCTGAGC Ad2.5_GGACTCCT Ad2.6_TAGGCATG Ad2.7_CTCTCTAC Ad2.8_CAGAGAGG Ad2.9_GCTACGCT Ad2.10_CGAGGCTG Ad2.11_AAGAGGCA Ad2.12_GTAGAGGA Ad2.13_GTCGTGAT

More information

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype Next Generation Genetics: Using deep sequencing to connect phenotype to genotype http://1001genomes.org Korbinian Schneeberger Connecting Genotype and Phenotype Genotyping SNPs small Resequencing SVs*

More information

T G T A. artificial chimera

T G T A. artificial chimera False mutation detection caused by WA artifacts original genome A C A artifacts in amplified DNA C C A A false detection of local variants true ssnv mistaken as error artificial chimera false LOH early

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. Pilot CrY2H-seq experiments to confirm strain and plasmid functionality.

Nature Methods: doi: /nmeth Supplementary Figure 1. Pilot CrY2H-seq experiments to confirm strain and plasmid functionality. Supplementary Figure 1 Pilot CrY2H-seq experiments to confirm strain and plasmid functionality. (a) RT-PCR on HIS3 positive diploid cell lysate containing known interaction partners AT3G62420 (bzip53)

More information

02 Agenda Item 03 Agenda Item

02 Agenda Item 03 Agenda Item 01 Agenda Item 02 Agenda Item 03 Agenda Item SOLiD 3 System: Applications Overview April 12th, 2010 Jennifer Stover Field Application Specialist - SOLiD Applications Workflow for SOLiD Application Application

More information

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang Supplementary Materials for: Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John

More information

Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing:

Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing: Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing: Patented, Anti-Correlation Technology Provides 99.5% Accuracy & Sensitivity to 5% Variant Knowledge Base and External Annotation

More information

Figure S1. Schematic representation of the winter VRN-H1 allele from cv. Strider (AY750993) with positions of markers genotyped in this study

Figure S1. Schematic representation of the winter VRN-H1 allele from cv. Strider (AY750993) with positions of markers genotyped in this study Figure S1. Schematic representation of the winter VRN-H1 allele from cv. Strider (AY750993) with positions of markers genotyped in this study indicated. Exons are denoted by black boxes. SNP1 (T-1,948/C),

More information

Supplementary Figure 1: sgrna library generation and the length of sgrnas for the functional screen. (a) A diagram of the retroviral vector for sgrna

Supplementary Figure 1: sgrna library generation and the length of sgrnas for the functional screen. (a) A diagram of the retroviral vector for sgrna Supplementary Figure 1: sgrna library generation and the length of sgrnas for the functional screen. (a) A diagram of the retroviral vector for sgrna expression. It contains a U6-promoter-driven sgrna

More information

Get to Know Your DNA. Every Single Fragment.

Get to Know Your DNA. Every Single Fragment. HaloPlex HS NGS Target Enrichment System Get to Know Your DNA. Every Single Fragment. High sensitivity detection of rare variants using molecular barcodes How Does Molecular Barcoding Work? HaloPlex HS

More information

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. Construction of a sensitive TetR mediated auxotrophic off-switch.

Nature Methods: doi: /nmeth Supplementary Figure 1. Construction of a sensitive TetR mediated auxotrophic off-switch. Supplementary Figure 1 Construction of a sensitive TetR mediated auxotrophic off-switch. A Production of the Tet repressor in yeast when conjugated to either the LexA4 or LexA8 promoter DNA binding sequences.

More information

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen The tutorial is designed to take you through the steps necessary to access SNP data from the primary database resources:

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

Chromatin signature identifies monoallelic gene expression across mammalian cell types

Chromatin signature identifies monoallelic gene expression across mammalian cell types Chromatin signature identifies monoallelic gene expression across mammalian cell types Anwesha Nag* 1, Sébastien Vigneau* 1, Virginia Savova*, Lillian M. Zwemer*, Alexander A. Gimelbrant* 2 * Department

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

Supplemental Figure 1.

Supplemental Figure 1. Supplemental Data. Charron et al. Dynamic landscapes of four histone modifications during de-etiolation in Arabidopsis. Plant Cell (2009). 10.1105/tpc.109.066845 Supplemental Figure 1. Immunodetection

More information

Supplementary Figure 1

Supplementary Figure 1 Nucleotide Content E. coli End. Neb. Tr. End. Neb. Tr. Supplementary Figure 1 Fragmentation Site Profiles CRW1 End. Son. Tr. End. Son. Tr. Human PA1 Position Fragmentation site profiles. Nucleotide content

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. sndrop-seq overview.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. sndrop-seq overview. Supplementary Figure 1 sndrop-seq overview. A. sndrop-seq method showing modifications needed to process nuclei, including bovine serum albumin (BSA) coating and droplet heating to ensure complete nuclear

More information

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:1.138/nature11233 Supplementary Figure S1 Sample Flowchart. The ENCODE transcriptome data are obtained from several cell lines which have been cultured in replicates. They were either left intact (whole

More information

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer. DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.

More information

Supplementary Figures

Supplementary Figures Supplementary Figures 1 Supplementary Figure 1. Analyses of present-day population differentiation. (A, B) Enrichment of strongly differentiated genic alleles for all present-day population comparisons

More information

Biol 478/595 Intro to Bioinformatics

Biol 478/595 Intro to Bioinformatics Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

Mutations during meiosis and germ line division lead to genetic variation between individuals

Mutations during meiosis and germ line division lead to genetic variation between individuals Mutations during meiosis and germ line division lead to genetic variation between individuals Types of mutations: point mutations indels (insertion/deletion) copy number variation structural rearrangements

More information

Genome Projects. Part III. Assembly and sequencing of human genomes

Genome Projects. Part III. Assembly and sequencing of human genomes Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences

More information

Supplemental Data. Zhou et al. (2016). Plant Cell /tpc

Supplemental Data. Zhou et al. (2016). Plant Cell /tpc Supplemental Figure 1. Confirmation of mutant mapping results. (A) Complementation assay with stably transformed genomic fragments (ComN-N) (2 kb upstream of TSS and 1.5 kb downstream of TES) and CaMV

More information

Revolutionize Genomics with SMRT Sequencing. Single Molecule, Real-Time Technology

Revolutionize Genomics with SMRT Sequencing. Single Molecule, Real-Time Technology Revolutionize Genomics with SMRT Sequencing Single Molecule, Real-Time Technology Resolve to Master Complexity Despite large investments in population studies, the heritability of the majority of Mendelian

More information

Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C

Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C CORRECTION NOTICE Nat. Genet. 47, 598 606 (2015) Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C Borbala Mifsud, Filipe Tavares-Cadete, Alice N Young, Robert Sugar,

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Processing of mutations and generation of simulated controls. On the left, a diagram illustrates the manner in which covariate-matched simulated mutations were obtained, filtered

More information

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle Figure S1 a Unrearranged locus Rearranged locus Concordant read pairs Region1 Concordant read pairs Cluster of discordant read pairs, bundle Region2 Concordant read pairs b Physical coverage 5 4 3 2 1

More information

Towards Personal Genomics

Towards Personal Genomics Towards Personal Genomics Tools for Navigating the Genome of an Individual Saul A. Kravitz J. Craig Venter Institute Rockville, MD Bio-IT World 2008 Introduce yourself Relate our experience with individual

More information

Chapter 5. Structural Genomics

Chapter 5. Structural Genomics Chapter 5. Structural Genomics Contents 5. Structural Genomics 5.1. DNA Sequencing Strategies 5.1.1. Map-based Strategies 5.1.2. Whole Genome Shotgun Sequencing 5.2. Genome Annotation 5.2.1. Using Bioinformatic

More information

GATCGTGCACGATCTCGGCAATTCGGGATGCCGGCTCGTCACCGGTCGCT

GATCGTGCACGATCTCGGCAATTCGGGATGCCGGCTCGTCACCGGTCGCT Problem. (pts) A. (5pts) Your colleague professor Eugene Mathew Lateed generated a genome-wide DNA methylation map for normal colon cells using MRE-seq and MeDIP-seq. In an intergenic region, he found

More information

The Human Genome and its upcoming Dynamics

The Human Genome and its upcoming Dynamics The Human Genome and its upcoming Dynamics Matthias Platzer Genome Analysis Leibniz Institute for Age Research - Fritz-Lipmann Institute (FLI) Sequencing of the Human Genome Publications 2004 2001 2001

More information

Mammalian non-cg methylations are conserved and cell-type specific and may have been involved in the evolution of transposon elements

Mammalian non-cg methylations are conserved and cell-type specific and may have been involved in the evolution of transposon elements Mammalian non-cg methylations are conserved and cell-type specific and may have been involved in the evolution of transposon elements Weilong Guo, Michael Zhang, Hong Wu Supplementary Figures Fig. S1-S16

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig. S1 Diagram of Pst genome sequencing and assembly using a fosmid to fosmid strategy. Fosmid pooling and sequencing: Fosmid librarywas constructed according to Kim

More information

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR

More information

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold Mapping and quantifying mammalian transcriptomes by RNA-Seq Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold Supplementary figures and text: Supplementary Figure 1 RNA shatter

More information

SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System

SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System Demonstrated performance you can count on Christina Chiu Product Manager, SureSelect Kyeong Jeong Ph.D. R&D Scientist

More information

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse SUPPLEMENTARY INFORMATION De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations Wong et al. The Supplementary Information contains 4 Supplementary Figures, 3

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Transcriptomics analysis with RNA seq: an overview Frederik Coppens Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)

More information

Supplementary Figure 2.Quantile quantile plots (QQ) of the exome sequencing results Chi square was used to test the association between genetic

Supplementary Figure 2.Quantile quantile plots (QQ) of the exome sequencing results Chi square was used to test the association between genetic SUPPLEMENTARY INFORMATION Supplementary Figure 1.Description of the study design The samples in the initial stage (China cohort, exome sequencing) including 216 AMD cases and 1,553 controls were from the

More information

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed. MCB 104 MIDTERM #2 October 23, 2013 ***IMPORTANT REMINDERS*** Print your name and ID# on every page of the exam. You will lose 0.5 point/page if you forget to do this. Name KEY If you need more space than

More information

The genome of Leishmania panamensis: insights into genomics of the L. (Viannia) subgenus.

The genome of Leishmania panamensis: insights into genomics of the L. (Viannia) subgenus. SUPPLEMENTARY INFORMATION The genome of Leishmania panamensis: insights into genomics of the L. (Viannia) subgenus. Alejandro Llanes, Carlos Mario Restrepo, Gina Del Vecchio, Franklin José Anguizola, Ricardo

More information

EFI 2016 DEBATE: WHOLE GENE VERSUS EXONIC SEQUENCING. Dr Katy Latham Stance: Whole gene sequencing should be the norm for HLA typing

EFI 2016 DEBATE: WHOLE GENE VERSUS EXONIC SEQUENCING. Dr Katy Latham Stance: Whole gene sequencing should be the norm for HLA typing EFI 2016 DEBATE: WHOLE GENE VERSUS EXONIC SEQUENCING Dr Katy Latham Stance: Whole gene sequencing should be the norm for HLA typing Why we should be utilising whole gene sequencing Ambiguity generated

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

Deep Sequencing technologies

Deep Sequencing technologies Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University

More information

14 March, 2016: Introduction to Genomics

14 March, 2016: Introduction to Genomics 14 March, 2016: Introduction to Genomics Genome Genome within Ensembl browser http://www.ensembl.org/homo_sapiens/location/view?db=core;g=ensg00000139618;r=13:3231547432400266 Genome within Ensembl browser

More information

Supporting Information

Supporting Information Supporting Information Kilian et al. 10.1073/pnas.1105861108 SI Materials and Methods Determination of the Electric Field Strength Required for Successful Electroporation. The transformation construct

More information

Supplemental Figure Legends

Supplemental Figure Legends Supplemental Figure Legends Fig. S1 Genetic linkage maps of T. gondii chromosomes using F1 progeny from the ME49 and VAND genetic cross. All the recombination points were identified by whole genome sequencing

More information

Erhard et al. (2013). Plant Cell /tpc

Erhard et al. (2013). Plant Cell /tpc Supplemental Figure 1. c1-hbr allele structure. Diagram of the c1-hbr allele found in stocks segregating 1:1 for rpd1-1 and rpd1-2 homozygous mutants showing the presence of a 363 base pair (bp) Heartbreaker

More information

Comparing a few SNP calling algorithms using low-coverage sequencing data

Comparing a few SNP calling algorithms using low-coverage sequencing data Yu and Sun BMC Bioinformatics 2013, 14:274 RESEARCH ARTICLE Open Access Comparing a few SNP calling algorithms using low-coverage sequencing data Xiaoqing Yu 1 and Shuying Sun 1,2* Abstract Background:

More information

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome Of course, every person on the planet with the exception of identical twins has a unique

More information

Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing (HaploSeq)

Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing (HaploSeq) Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing (HaploSeq) Lyon Lab Journal Clubs Han Fang 01/28/2014 Lyon Lab Journal Clubs 1 Lyon Lab Journal Clubs 2 Existing genome

More information

Strand NGS Variant Caller

Strand NGS Variant Caller STRAND LIFE SCIENCES WHITE PAPER Strand NGS Variant Caller A Benchmarking Study Rohit Gupta, Pallavi Gupta, Aishwarya Narayanan, Somak Aditya, Shanmukh Katragadda, Vamsi Veeramachaneni, and Ramesh Hariharan

More information

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences. Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENRY INFORMION doi:.38/nature In vivo nucleosome mapping D4+ Lymphocytes radient-based and I-bead cell sorting D8+ Lymphocytes ranulocytes Lyse the cells Isolate and sequence mononucleosome cores

More information

Introduction to RNA-Seq in GeneSpring NGS Software

Introduction to RNA-Seq in GeneSpring NGS Software Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com Introduction to RNA-Seq In a few years,

More information

Systematic evaluation of spliced alignment programs for RNA- seq data

Systematic evaluation of spliced alignment programs for RNA- seq data Systematic evaluation of spliced alignment programs for RNA- seq data Pär G. Engström, Tamara Steijger, Botond Sipos, Gregory R. Grant, André Kahles, RGASP Consortium, Gunnar Rätsch, Nick Goldman, Tim

More information

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales Targeted Sequencing Using Droplet-Based Microfluidics Keith Brown Director, Sales brownk@raindancetech.com Who we are: is a Provider of Microdroplet-based Solutions The Company s RainStorm TM Technology

More information

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome

More information

Supplementary Figure 1

Supplementary Figure 1 Supplementary Figure 1 Concept of barcoding to suppress error in sequencing. Each template DNA molecule is barcoded with a random and unique sequence (marked as red, turquoise and green). All PCR generated

More information

Supplementary Figure 1 Strategy for parallel detection of DHSs and adjacent nucleosomes

Supplementary Figure 1 Strategy for parallel detection of DHSs and adjacent nucleosomes Supplementary Figure 1 Strategy for parallel detection of DHSs and adjacent nucleosomes DNase I cleavage DNase I DNase I digestion Sucrose gradient enrichment Small Large F1 F2...... F9 F1 F1 F2 F3 F4

More information

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D. HaloPlex HS Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D. Sr. Global Product Manager Diagnostics & Genomics Group Agilent Technologies For Research Use Only. Not for Use in Diagnostic

More information

SCIENCE CHINA Life Sciences. High-performance single-chip exon capture allows accurate whole exome sequencing using the Illumina Genome Analyzer

SCIENCE CHINA Life Sciences. High-performance single-chip exon capture allows accurate whole exome sequencing using the Illumina Genome Analyzer SCIENCE CHINA Life Sciences RESEARCH PAPERS October 2011 Vol.54 No.10: 945 952 doi: 10.1007/s11427-011-4232-4 High-performance single-chip exon capture allows accurate whole exome sequencing using the

More information

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci SUPPLEMENTARY METHODS AND DATA General strategy for identifying deletion loci We used two approaches for identifying potentially suitable deletion loci for PDP-FISH analysis. In the first approach, we

More information

Supplemental Figure 1 A

Supplemental Figure 1 A Supplemental Figure A prebleach postbleach 2 min 6 min 3 min mh2a.-gfp mh2a.2-gfp mh2a2-gfp GFP-H2A..9 Relative Intensity.8.7.6.5 mh2a. GFP n=8.4 mh2a.2 GFP n=4.3 mh2a2 GFP n=2.2 GFP H2A n=24. GFP n=7.

More information

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Table of Contents SUPPLEMENTARY TEXT:... 2 FILTERING OF RAW READS PRIOR TO ASSEMBLY:... 2 COMPARATIVE ANALYSIS... 2 IMMUNOGENIC

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Gene replacements and insertions in rice by intron targeting using CRISPR Cas9 Table of Contents Supplementary Figure 1. sgrna-induced targeted mutations in the OsEPSPS gene in rice protoplasts. Supplementary

More information

Genome-wide genetic screening with chemically-mutagenized haploid embryonic stem cells

Genome-wide genetic screening with chemically-mutagenized haploid embryonic stem cells 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Supplementary Information Genome-wide genetic screening with chemically-mutagenized haploid embryonic stem cells Josep V. Forment 1,2, Mareike Herzog

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly Supplementary Tables Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly Library Read length Raw data Filtered data insert size (bp) * Total Sequence depth Total Sequence

More information

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence

More information

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) tru TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) Anton Bankevich Center for Algorithmic Biotechnology, SPbSU Sequencing costs 1. Sequencing costs do not follow Moore s law

More information

SMRT Analysis Barcoding Overview (v6.0.0)

SMRT Analysis Barcoding Overview (v6.0.0) SMRT Analysis Barcoding Overview (v6.0.0) Introduction This document applies to PacBio RS II and Sequel Systems using SMRT Link v6.0.0. Note: For information on earlier versions of SMRT Link, see the document

More information

Supplementary Materials. Sequence-based profiling of DNA methylation: comparisons of methods and catalogue of allelic epigenetic modifications

Supplementary Materials. Sequence-based profiling of DNA methylation: comparisons of methods and catalogue of allelic epigenetic modifications Supplementary Materials Sequence-based profiling of DNA methylation: comparisons of methods and catalogue of allelic epigenetic modifications Supplementary Figure 1. Analysis of biological replicates (three

More information

About Strand NGS. Strand Genomics, Inc All rights reserved.

About Strand NGS. Strand Genomics, Inc All rights reserved. About Strand NGS Strand NGS-formerly known as Avadis NGS, is an integrated platform that provides analysis, management and visualization tools for next-generation sequencing data. It supports extensive

More information

Before starting, write your name on the top of each page Make sure you have all pages

Before starting, write your name on the top of each page Make sure you have all pages Biology 105: Introduction to Genetics Name Student ID Before starting, write your name on the top of each page Make sure you have all pages You can use the back-side of the pages for scratch, but we will

More information

Supplementary Information

Supplementary Information Supplementary Information Genome-wide profiling of DNA methylation provides insights into epigenetic regulation of fungal development in a plant pathogenic fungus, Magnaporthe oryzae Junhyun Jeon, Jaeyoung

More information

How to view Results with Scaffold. Proteomics Shared Resource

How to view Results with Scaffold. Proteomics Shared Resource How to view Results with Scaffold Proteomics Shared Resource Starting out Download Scaffold from http://www.proteomes oftware.com/proteom e_software_prod_sca ffold_download.html Follow installation instructions

More information

Services Presentation Genomics Experts

Services Presentation Genomics Experts Services Presentation Genomics Experts Illumina Seminar Marriott May 11th IntegraGen at a glance Autism Oncology Genomics Services Serves the researcher s most complex needs in genomics The n 1 privately-owned

More information