De novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly

Similar documents
Post-assembly Data Analysis

Post-assembly Data Analysis

TECH NOTE Stranded NGS libraries from FFPE samples

Multiplexed Strand-specific RNA-Seq Library Preparation for Illumina Sequencing Platforms

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Supplementary Materials for De-novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Next Generation Sequencing

Zika infected human samples

RNAseq Differential Gene Expression Analysis Report

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Applications of short-read

CBC Data Therapy. Metatranscriptomics Discussion

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Next Generation Sequencing. Tobias Österlund

Total RNA isola-on End Repair of double- stranded cdna

G E N OM I C S S E RV I C ES

Lecture 7. Next-generation sequencing technologies

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Differential gene expression analysis using RNA-seq

Functional Genomics Research Stream. Research Meeting: November 8, 2011 cdna Library Construction for RNA-Seq

Next-generation sequencing technologies

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

TECH NOTE Ligation-Free ChIP-Seq Library Preparation

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

RNA-Seq data analysis course September 7-9, 2015

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

From assembled genome to annotated genome

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

SOLiD Total RNA-Seq Kit SOLiD RNA Barcoding Kit

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

How to deal with your RNA-seq data?

Genome sequencing in Senecio squalidus

NEBNext. Ultra II RNA Library Prep Kit for Illumina

Long and short/small RNA-seq data analysis

Introduction to RNA-Seq in GeneSpring NGS Software

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Protein and transcriptome quantitation using BD AbSeq Antibody-Oligonucleotide

RNA-seq Data Analysis

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

Application Note Single-colony whole-genome sequencing

Form for publishing your article on BiotechArticles.com this document to

Selected Techniques Part I

RNA-Seq analysis workshop

Procedure & Checklist - Multiplex Isoform Sequencing (Iso-Seq Analysis)

RNA-Seq Analysis. Simon Andrews, Laura v

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Computational & Quantitative Biology Lecture 6 RNA Sequencing

Considerations for Illumina library preparation. Henriette O Geen June 20, 2014 UCD Genome Center

Wet-lab Considerations for Illumina data analysis

Supplementary Protocol: CIRCLE-seq Library Preparation

Gene Expression Technology

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Analysis of Differential Gene Expression in Cattle Using mrna-seq

Transcriptome analysis

Deep Sequencing technologies

ThruPLEX -FD Prep Kit Instruction Manual. Single Tube Library Preparation for Illumina NGS Platforms

Combined final report: genome and transcriptome assemblies

APPLICATION NOTE. Abstract. Introduction

454 Sample Prep / Workflow at the BioMedical Genomics Center (BMGC) University of Minnesota. Sushmita Singh

An introduction to RNA-seq. Nicole Cloonan - 4 th July 2018 #UQWinterSchool #Bioinformatics #GroupTherapy

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Eucalyptus gene assembly

SUPPLEMENTARY MATERIAL AND METHODS

FFPE in your NGS Study

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Low input RNA-seq library preparation provides higher small non-coding RNA diversity and greatly reduced hands-on time

NEBNext. for Ion Torrent LIBRARY PREPARATION KITS

High Resolution LabChip XT Fractionation of Illumina Compatible Small RNA Libraries using the DNA 300 Assay Kit

High-yield, Scalable Library Preparation with the NEBNext Ultra II FS DNA Library Prep Kit

QUANTITATIVE RT-PCR PROTOCOL (SYBR Green I) (Last Revised: April, 2007)

BIOO LIFE SCIENCE PRODUCTS. NEXTflex TM 16S V4 Amplicon-Seq Kit 4 (Illumina Compatible) BIOO Scientific Corp V13.01

NEXT GENERATION SEQUENCING. Farhat Habib

Impact of gdna Integrity on the Outcome of DNA Methylation Studies

scgem Workflow Experimental Design Single cell DNA methylation primer design

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

Microarrays: since we use probes we obviously must know the sequences we are looking at!

KAPA RNA HyperPrep Workflow: Recommendations and Expectations for RNA-sequencing Using Degraded Inputs

In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features

Next Generation Sequencing Method for Illumina TruSeq DNA Sample Preparation Protocol on the Hamilton STAR

Introduction to RNA sequencing

HyperCap, an automatable workflow on the Agilent Bravo B

Introductory Next Gen Workshop

Application Note Crude sample bacterial whole-genome sequencing

Introduction of RNA-Seq Analysis

Parts of a standard FastQC report

Supplementary Information Supplementary Figures

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Introduction to RNA-Seq

Introduction to RNA-Seq

Why QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format:

Application Note Selective transcript depletion

Nature Methods Optimal enzymes for amplifying sequencing libraries

Next-Generation Sequencing: Quality Control

Transcription:

Additional File 1 De novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly Monika Frazier, Martin Helmkampf, M. Renee Bellinger, Scott Geib, Misaki Takabayashi 2 2 MDS 2 0 2 Tissue A H U MDS 2 0 2 Clade C CD D 4 2 0 2 4 MDS 1 4 2 0 2 4 MDS 1 Figure S1. Holobiont gene expression profiles. Samples are colored according to tissue type (A=GA-affected, U=GA-unaffected, H=healthy tissue) and dominant Symbiodinum clade, and were mapped by metric multidimensional scaling (MDS). Distances between each pair of samples represents the typical log 2 fold-change in gene expression between transcripts. Black lines connect GA-affected and GA-unaffected samples obtained from the same colony. 1

2 MDS 2 1 0 Clade C CD D 1 1 0 1 2 MDS 1 Figure S2. Coral host gene expression profiles. Samples are colored according to dominant Symbiodinum clade, and were mapped by metric multidimensional scaling (MDS). Distances between each pair of samples represent the typical log 2 fold-change in gene expression between coral host transcripts. 2

Methods S1 Library Preparation Protocol (Yale Center for Genomic Analysis) RNA Seq Quality Control: Total RNA quality is determined by estimating the A260/A280 and A260/A230 ratios by nanodrop. RNA integrity is determined by running an Agilent Bioanalyzer gel, which measures the ratio of the ribosomal peaks. RNA Seq Library Prep:mRNA is purified from approximately 500ng of total RNA with oligo-dt beads and sheared by incubation at 94C.Following first-strand synthesis with random primers, second strand synthesis is performed with dutp for generating strand-specific sequencing libraries.the cdna library is then end-repaired, and A-tailed, adapters are ligated and secondstrand digestion is performed by Uricil-DNA-Glycosylase. Indexed libraries that meet appropriate cut-offs for both are quantified by qrt-pcr using a commercially available kit (KAPA Biosystems) and insert size distribution determined with the LabChip GX. Samples with a yield of 0.5 ng/ul are used for sequencing. Flow Cell Preparation and Sequencing: Sample concentrations are normalized to 2 nm and loaded onto Illumina version 3 flow cells at a concentration that yields 170-200 million passing filter clusters per lane. Samples are sequenced using 75 bp paired end sequencing on an Illumina HiSeq 2000 according to Illumina protocols. The 6 bp index is read during an additional sequencing read that automatically follows the completion of read 1.Data generated during sequencing runs are simultaneously transferred to the YCGA high performance computing cluster. A positive control (prepared bacteriophage Phi X library) provided by Illumina is spiked into every lane at a concentration of 0.3% to monitor sequencing quality in real time. Data Analysis and Storage: Signal intensities are converted to individual base calls during a run using the system's Real Time Analysis (RTA) software.base calls are transferred from the machine's dedicated personal computer to the Yale High Performance Computing cluster via a 1 Gigabit network mount for downstream analysis.primary analysis - sample de-multiplexing and alignment to the human genome - is performed using Illumina's CASAVA 1.8.2 software suite. The data are returned to the user if the sample error rate is less than 2% and the distribution of reads per sample in a lane is within reasonable tolerance. Data is retained on the cluster for at least 6 months, after which it is transferred to a tape backup system. 3

Command line scripts for bioinformatic analyses FastQC: Assess Raw Reads Quality for FILE in Sample1 Sample2 Sample3 do./fastqc/fastqc -t 32 $FILE.fastq done Trimmomatic: Trim Raw Data Parameters: IN1=A1.R1.fastq IN2=A1.R2.fastq WINSIZE=5 WINCUTOFF=20 LEADING=20 TRAILING=20 MINLEN=50 java -jar trimmomatic-0.32.jar PE -phred33 $IN1 $IN2 trimmed.$in1.$winsize.$wincutoff.r1.fastq single.$in1.$winsize.$wincutoff.r1.fastq trimmed.$in2.$winsize.$wincutoff.r2.fastq single.$in2.$winsize.$wincutoff.r2.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:$LEADING TRAILING:$TRAILING SLIDINGWINDOW:$WINSIZE:$WINCUTOFF MINLEN:$MINLEN Trinity: Assemble Transcriptome bowtie samtools Perl modules: PerlIO::gzip.pm 4

/path/trinity --seqtype fq --JM 200G --left all.trimmed.r1.fastq --right all.trimmed.r2.fastq --SS_lib_type RF --CPU 32 --normalize_reads TransDecoder: Translate Amino Acid Required programs/files in path: PfamA.hmm cd-hit hmmer /path/trinity-plugins/transdecoder_r20131110/transdecoder -t Trinity.fasta --reuse -S -- search_pfam /path/pfam-a.hmm --MPI --CPU 100 --cd_hit_est /path/cd-hit-est CD-HIT: Cluster Similar Proteins /path/cd-hit -i Trinity.fasta.transdecoder.pep -M 0 -T 0 -g 1 -c 0.5 -n 2 -o Output_File BLAST: Protein Annotation Commands used: /path/blastp -query Input_file -db nr -out Output_file -evalue 1e-4 -num_threads 50 - max_target_seqs 1 -outfmt 6 /path/blastp -query Input_file -db uniprot_sprot.fasta -out Output_file -evalue 1e-4 - num_threads 50 -max_target_seqs 1 -outfmt 6 RSEM: Align raw reads to assembly bowtie samtools RSEM /path/trinityrnaseq_r20140717/util/align_and_estimate_abundance.pl --transcripts Trinity.fasta --left all.trimmed.r1.fastq --right all.trimmed.r2.fastq --seqtype fq -- SS_lib_type RF --thread_count 32 --est_method RSEM --aln_method bowtie2 -- trinity_mode --prep_reference Parse results file: FPKM=value 5

cat RSEM.genes.results sed '1,1d' awk '$7 >= $FPKM' wc l Inparanoid: Identify Acropora and Symbiodinium inparalogs blastall formatdb perl inparanoid.pl Assembled_transcriptome Acropora_reference Symbiodinium_reference RSEM: Align raw reads and estimate gene expression bowtie samtools RSEM /path/ trinityrnaseq_r20140717/util/align_and_estimate_abundance.pl --transcripts Transcriptome_assembly --left Sample_name.R1.fastq --right Sample_name.R2.fastq -- seqtype fq --SS_lib_type RF --thread_count 32 --est_method RSEM --aln_method bowtie2 --output_prefix Sample_name--trinity_mode --prep_reference EdgeR: Determine differentially expressed genes R /path/trinityrnaseq_r20140717/util/abundance_estimates_to_matrix.pl --est_method RSEM Sample1.genes.results Sample2.genes.results out_prefix Output_file 6