Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.

Similar documents
Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

SUPPLEMENTAL MATERIAL

Analysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004

Detecting ancient admixture using DNA sequence data

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

Amapofhumangenomevariationfrom population-scale sequencing

Axiom mydesign Custom Array design guide for human genotyping applications

Supplementary Figures

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Detecting selection on nucleotide polymorphisms

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Gap Filling for a Human MHC Haplotype Sequence

Haplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014

A haplotype map of the human genome

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Runs of Homozygosity Analysis Tutorial

Why can GBS be complicated? Tools for filtering, error correction and imputation.

UK Biobank Axiom Array

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

The Red Queen Model of Recombination Hotspots Evolution in the Light of Archaic and Modern Human Genomes

Appendix - I. Primer and Probe Sequences. MTUS1 gene

SNPs - GWAS - eqtls. Sebastian Schmeier

Measurement of Molecular Genetic Variation. Forces Creating Genetic Variation. Mutation: Nucleotide Substitutions

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

PLINK gplink Haploview

Identification of the Photoreceptor Transcriptional Co-Repressor SAMD11 as Novel Cause of. Autosomal Recessive Retinitis Pigmentosa

SUPPLEMENTAL MATERIALS

Measures of human population structure show heterogeneity among genomic regions

Introns early. Introns late

Guided tour to Ensembl

Genome variation - part 1

Autozygosity by difference a method for locating autosomal recessive mutations. Geoff Pollott

DOI: /journal.pgen Publication date: Document Version Publisher's PDF, also known as Version of record

It s not a fundamental force like mutation, selection, and drift.

Hands-On Four Investigating Inherited Diseases

Psoriasis genome-wide association study identifies susceptibility variants within LCE gene cluster at 1q21

Inconsistencies in Neanderthal Genomic DNA Sequences

I/O Suite, VCF (1000 Genome) and HapMap

The human noncoding genome defined by genetic diversity

Variant calling in NGS experiments

Data Mining and Applications in Genomics

An introduction to genetics and molecular biology

You use the UCSC Genome Browser ( to assess the exonintron structure of each gene. You use four tracks to show each gene:

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Nature Genetics: doi: /ng Supplementary Figure 1. Analysis of DHS STARR-seq in the P5424 cell line.

Walk Through The Y Project. FTDNA Conference 2010 Houston TX Dipl.- Ing. Thomas Krahn

Using the Association Workflow in Partek Genomics Suite

Genes with monoallelic expression contribute disproportionately to genetic diversity in humans

Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation

Training materials.

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Introduction to the UCSC genome browser

Gap hunting to characterize clustered probe signals in Illumina methylation array data

Overview One of the promises of studies of human genetic variation is to learn about human history and also to learn about natural selection.

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6

VCGDB: A Virtual and Dynamic Genome Database of the Chinese Population

Detecting copy-neutral LOH in cancer using Agilent SurePrint G3 Cancer CGH+SNP Microarrays

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Recurring exon deletions in the haptoglobin (HP) gene associate with lower blood cholesterol levels

Derrek Paul Hibar

File S1 Technical details of a SNP array optimized for population genetics

LINKAGE DISEQUILIBRIUM MAPPING USING SINGLE NUCLEOTIDE POLYMORPHISMS -WHICH POPULATION?

Release Notes for Genomes Processed Using Complete Genomics Software

Mutations during meiosis and germ line division lead to genetic variation between individuals

Browsing Genomes with Ensembl

MoGUL: Detecting Common Insertions and Deletions in a Population

BIOSTATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH. Genetic Variation & CANCERS

Detection of Fusion Genes by Targeted Roche 454 Sequencing

user s guide Question 3

RNA-Sequencing analysis

Initial sequence of the chimpanzee genome and comparison with the human genome

Linkage Disequilibrium. Adele Crane & Angela Taravella

Introduction. The American Journal of Human Genetics 88, , June 10,

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

PERSPECTIVES. A gene-centric approach to genome-wide association studies

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

SALSA MLPA probemix P155-D1 Ehlers-Danlos syndrome III & IV

Genome Patterns of Selection and Introgression of Haplotypes in Natural Populations of the House Mouse (Mus musculus)

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

user s guide Question 3

Genomes summary. Bacterial genome sizes

l e t t e r s Haplotype-resolved genome sequencing of a Gujarati Indian individual

Research Article The Influence of 3 UTRs on MicroRNA Function Inferred from Human SNP Data

A high resolution HLA and SNP haplotype map for disease association studies in the extended human MHC

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

Data Sources and Biobanks in the Asia-Pacific Region. Wei Zhou, MD, Ph.D. Department of Epidemiology, Merck Research Laboratories October 23, 2014

Basic Concepts of Human Genetics

Genome-wide association studies (GWAS) Part 1

SNP calling and VCF format

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Title: Genome-wide signals of positive selection in human evolution. Authors: David Enard*, Philipp W. Messer, and Dmitri A.

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Conifer Translational Genomics Network Coordinated Agricultural Project

Custom TaqMan Assays DESIGN AND ORDERING GUIDE. For SNP Genotyping and Gene Expression Assays. Publication Number Revision G

Transcription:

The psoriasis associated deletion of late cornified envelope genes LCE3B and LCE3C has been maintained under balancing selection since Human Denisovan divergence Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY. *The authors contributed equally to this work Corresponding Author Omer Gokcumen Department of Biological Sciences University at Buffalo Cooke 639 Buffalo, NY 14260 omergokc@buffalo.edu KEYWORDS: Copy number variation, genomic structural variants, atopic dermatitis, HLA, defensins, Neanderthal, LCE3A, Human Evolution

SUPPLEMENTARY FILES Figure S1. Integrative Genome Viewer (IGV) screenshot shows the sequence alignments in Neanderthal, Denisovan and 45k Siberian genomes for the LCE3BC locus. The bottom blue boxes represent the genes in the LCE3 region. The green line shows the breakpoints of the variable LCE3BC deletion among humans. The small grey lines are individual reads from Altai Neanderthal (Top), Denisovan (Middle), and the 45,000 year old Siberian Human (Bottom) genomes mapping to the human reference genome (Hg19). The histogram above each sample indicates the read depth at that location.

Figure S2. Linkage Disequilibrium (LD) scores within and around the LCE3BC deletion. The X axis represents the chromosomal location, while the y axis is the R 2 score. The blue arrows represent the location of the LCE3BC deletion. The single nucleotide variants with high level (R 2 >0.9) are shown in red dots and found to range from 5.5kb upstream to 6.6kb downstream of the LCE3BC deletion. The 5.6kb target region indicated by the red bracket was used for majority of the population genetic analyses. This target region also harbors the SNP that was used for interrogating ancient genomes, rs6693105.

Figure S3. (a) Maximum likelihood haplotype tree constructed using 1000 Genomes Phase 3 dataset. This input file for the tree is the same as the one used to create Figure 2c, with the addition of the Chimpanzee ( Pan troglodytes). The Chimp grouped with the non deleted haplotype group, represented by the blue circle. (b) Rooted maximum likelihood haplotype tree constructed using a select 24 haplotypes from 1000 Genomes Phase 3 dataset, representing major branches. Rhesus Macaque (rhemac3) was used as an outgroup. The blue haplotypes, Altai Neanderthal, and Chimpanzee grouped together as non deleted. The red haplotypes and Denisovan cluster with the deleted haplogroup. Parametric bootstrap branch support values were included on the primary branches. a. b.

Figure S4. (a) Pairwise population differentiation (F ST, y axis) between 3 populations; CEU vs. YRI; CHB vs. CEU; and YRI vs. CHB. The populations considered here are CEU: Central Europeans from Utah; CHB: Han Chinese from Beijing; and YRI: Yoruba from Ibadan Nigeria (x axis). The red box represents the neutral regions on chromosome 1 with the blue representing the previously consistent target region. This is consistent with balancing selection acting on this locus. (b) Integrated haplotype homozygosity (ΔiHH) (y axis) for the populations considered here; CEU: Central Europeans from Utah; CHB: Han Chinese from Beijing; and YRI: Yoruba from Ibadan Nigeria (x axis). Data was taken from 1000 Genomes Dataset. a. b.

Figure S5. Location of the L1 (LINE) element (chr1: 152,587,904 152,590,051) within the target haplotype block (red), with respect to the LCE3BC deletion (blue). The original target region (152,587, 904 152,593,549; green), has been reconstructed for analysis, avoiding the LINE element (152,590,052 152,593,549; purple).

Figure S6. Re constructed maximum likelihood haplotype tree using 1000 Genomes Phase 3 dataset to avoid the L1 element. This tree compares to the one previously constructed in Figure 2, with no changes other than branch swapping. The same haplotypes group with the same haplogroups as expected.

Figure S7. (a) Tajima s D score in three different populations (CEU: Central Europeans from Utah; CHB: Han Chinese from Beijing; and YRI: Yoruba from Ibadan Nigeria). The red boxes represent all regions of the genome that are predicted to be evolving under neutrality, whereas the blue boxes represent the data points from the re constructed LCE3BC haplotype block omitting the region harboring the LINE element. (b) Pi score in three different populations (CEU: Central Europeans from Utah; CHB: Han Chinese from Beijing; and YRI: Yoruba from Ibadan Nigeria). The red boxes represent all regions of the genome that are predicted to be evolving under neutrality. The blue boxes represent the data points from the re constructed LCE3BC haplotype block omitting the region harboring the LINE element. The green boxes represent π and Tajima s D scores calculated for the size matched downstream regions of non exonic ancient deletions. TajD P values: ( YRI; Target:Neutral; 1.6e 2, Target: NonExon; 3.8e 2, Neutral:NonExon; 1.032e 5) ( CEU; Target:Neutral; 7.5e 3, Target: NonExon; 8.9e 3, Neutral:NonExon; 1.6e 3) ( CHB; Target:Neutral; 7.8e 3, Target: NonExon; 9.3e 3, Neutral:NonExon;.5713) Pi P values: ( YRI; Target:Neutral; 2.0e 3, Target: NonExon; 5.2e 3, Neutral:NonExon; 2.0e 2) ( CEU; Target:Neutral; 7.8e 3, Target: NonExon; 1.2e 2, Neutral:NonExon; 0.1958) ( CHB; Target:Neutral; 1.4e 3, Target: NonExon; 3.1e 3, Neutral:NonExon; 7.1e 2) a. b.

Figure S8. The regulatory potential in the LCE3BC haplotype block. The UCSC genome browser screenshot depicts the chromosomal location of the LCE3BC deletion (red bracket). The grey circle shows the deletion overlapping with multiple transcription factor binding sites. The blue arrow show peaks of conservation among vertebrates that coincides with LCE3C and LCE3B.

Figure S9. Gene expression levels (y axis) of LCE3A, in sun exposed skin in the lower leg, for individuals Homozygous for no deletion, Heterozygous, and Homozygous with the deletion. Data was displayed from GTex Portal. The SNP rs6693105 (high LD w/ deletion) was used to for this analysis. The effect size= 0.4, p=6.4x10 10