Split input data, map to hg19, split by chromosome, and variant call with GATK

Size: px
Start display at page:

Download "Split input data, map to hg19, split by chromosome, and variant call with GATK"

Transcription

1 Split input data, map to hg19, split by chromosome, and variant call with GATK This is a template script written by Scott Hunicke-Smith to illustrate how to run exome analysis much faster on lonestar. It only requires two fastq files (paired files) and two parameters. It is NOT optimized, not highly robust, etc. It relies on many sub-scripts both within Scott's home directory and the BioITeam corral directories. This bash script needs to be run on a head node somewhere where it won't be killed: fastexon.sh!/bin/bash Copyright 2012 Scott Hunicke-Smith and the University of Texas at Austin module load python module load bwa module load samtools module load java64 r1file=$1 r2file=$2 splitsize=$3 batchsize=$4 queue="normal" echo "Starting: `date`" 1. Split input fastq's as one job; store job echo "split -d -l $splitsize $r1file r1. split -d -l $splitsize $r2file r2. " > split.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j split.script -l split.sge -a DNAdenovo -n split -t 1:00:00 qsub split.sge &> split.sge.sublog splitjid=`tail -1 split.sge.sublog awk '{print $3}'` echo "Submitted $splitjid to split input files at `date`" echo "Waiting for split to finish" while qstat grep $splitjid ; do echo `date` sleep Move a set of splits into their own directory i=0 filelist="" subdirlist="" for file in $( ls r1.* ) ; do fileext="${file*.}" if [ `expr $i % $batchsize` -eq `expr $batchsize - 1` ] then mkdir b.$i subdirlist="$subdirlist b.$i" filelist="$filelist $fileext"

2 for datafiles in $filelist ; do mv r2.$datafiles b.$i mv r1.$datafiles b.$i filelist="" else filelist="$filelist $fileext" fi i=`expr $i + 1` And the residual set, if any: for file in $( ls r1.* ) ; do mkdir b.$i subdirlist="$subdirlist b.$i" mv r1.* b.$i mv r2.* b.$i 3. Launch exome_step1.sh on each split within it's own directory; store job numbers; launch exome_step2.sh to combine chr files mapjids="" for subdir in $subdirlist ; do cd $subdir echo "Creating launcher for all files in $subdir: `date`" rm -f map.sge for file in $( ls r1.* ) ; do fileext="${file*.}" echo "Run exome_step1.sh on r1.$fileext and r2.$fileext" echo "/home1/01057/sphsmith/local/bin/exome_step1.bash r1.$fileext r2.$fileext mapped.$fileext >& mapped.$fileext.log" >> map.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j map.script -l map.sge -a DNAdenovo -n map.$subdir -t 1:00:00 -w 2 qsub map.sge &> map.sge.sublog mapjid=`tail -1 map.sge.sublog awk '{print $3}'` mapjids="$mapjids,`tail -1 map.sge.sublog awk '{print $3}'`" echo "Submitted $mapjid to split input files in $subdir at `date`" cd.. echo "Waiting for mapping to finish" while qstat grep $mapjid ; do echo `date` sleep 30 echo "Finished: `date`" 4. Launch job to combine final chr files across all directories echo "Creating launcher for merging by reference sequence: `date`" subdir=`ls -d b.* head -1` subdirext="${subdir*.}"

3 evalcmd="ls b.$subdirext/*.mapped.*.sorted.bam awk -F [./] '{print \$3}' sort uniq" reflist=`eval $evalcmd` Randomize this list so large and small reference sequences are mixed up reflist=`echo $reflist awk 'BEGIN {srand() } {for (i=1;i<=nf;i++) {print rand() "\t" $i "\n"}}' sort -n cut -f 2` echo $reflist rm -f merge.script for refs in $reflist ; do echo "Merging $refs " echo "samtools merge -f $refs.sorted.bam b.*/$refs.mapped.*.sorted.bam; samtools index $refs.sorted.bam" >> merge.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j merge.script -l merge.sge -a DNAdenovo -n merge -t 1:00:00 -w 4 echo "Submitting job; queue start contingent on $mapjids completing first" qsub merge.sge -hold_jid $mapjids &> merge.sge.sublog mergejid=`tail -1 merge.sge.sublog awk '{print $3}'` echo "Submitted $mergejid to merge output files at `date`" echo "Waiting for merging to finish" while qstat grep $mergejid ; do echo `date` sleep Launch GATK on each reference sequences' sorted bam file echo "Creating launcher for merging by reference sequence: `date`" subdir=`ls -d b.* head -1` subdirext="${subdir*.}" evalcmd="ls b.$subdirext/*.mapped.*.sorted.bam awk -F [./] '{print \$3}' sort uniq" reflist=`eval $evalcmd` Randomize this list so large and small reference sequences are mixed up reflist=`echo $reflist awk 'BEGIN {srand() } {for (i=1;i<=nf;i++) {print rand() "\t" $i "\n"}}' sort -n cut -f 2` echo $reflist rm -f variants.script for refs in $reflist ; do echo "GATK via exome_step2.bash on $refs.sorted.bam" echo "/home1/01057/sphsmith/local/bin/exome_step2.bash $refs.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf $refs >& variants.$refs.log" >> variants.script Note that the -w 2 option here defines how many GATK's run per node - might need optimization /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j variants.script -l variants.sge -a DNAdenovo -n variants -t 4:00:00 -w 2 sed -i s/'module load launcher'/'module load launcher\nmodule load java64\nmodule load samtools'/ variants.sge qsub variants.sge -hold_jid $mergejid &> variants.sge.sublog variantsjid=`tail -1 variants.sge.sublog awk '{print $3}'`

4 echo "Submitted $variantsjid to call variants at `date`" echo "Waiting for variant calling to finish" while qstat grep $variantsjid ; do echo `date` sleep Merge all bam files & vcf files echo "samtools merge -f $r1file.sorted.bam *.sorted.bam" > merge2.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j merge2.script -l merge2.sge -a DNAdenovo -n merge2 -t 1:00:00 -w 4 qsub merge2.sge -hold_jid $variantsjid &> merge2.sge.sublog merge2jid=`tail -1 merge2.sge.sublog awk '{print $3}'` echo "Submitted $merge2jid to merge output files at `date`" echo "Waiting for merging to finish" while qstat grep $merge2jid ; do echo `date` sleep 30 grep '^' chrx.sorted.bam.snps.vcf > $r1file.snps.vcf grep -v '^' chr*.sorted.bam.snps.vcf >> $r1file.snps.vcf

5 echo "Fast exon analysis is complete at: `date`" It uses the TACC "launcher" functionality to do the following: 1. Create one job on one node which splits the two input fastq files into files with $splitsize lines per file each using split.sge and sp lit.script. Wait for job to finish. 2. Create as many subdirectories as needed for the split output files to be mapped $batchsize per directory. Lonestar nodes have two sockets with 12 processors per socket, so a good choice here is to make $batchsize two so that the mapping step can use 6 threads. 3. Create $batchsize * originalfilesize % $splitsize lines in map.script and submit map.sge to do the mapping. Note that the embedded mapping script exome_step1.sh splits the mapping output into chromosome-specific files during the bwa sampe step. This mapping script is also where multi-threading for bwa is set. It should be parameterized of course. 4. Merge all the chromosome-specific files from these subdirectories back to the run directory using merge.sge and merge.script 5. Run GATK on sets of these chromosome-specific files, with 2 GATK's per node (hardcoded in script right now) using variants.sge an d variants.script; since chromosomes are usually named based on their size (i.e. chr1 < chr2 < chr3, etc.), randomize the list so that we don't wind up with all the big chromosomes on one node. 6. Merge the final GATK chromosome-specific variant calls - both the BAM files and the VCF files - using merge2.sge and merge2.scri pt. Examples of the various.sge and.script files are shown below. Benchmark analysis on ~40 million read pairs from a single human exome experiment show that this script takes about 2 hours vs. about 15 hours if all these same processes are run on only 1 node. Expand here to see example split.sge and split.script split.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N split $ -pe 12way 12 $ -q normal $ -o split.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit

6 $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE split.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE."

7 Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE

8 echo " Parameteric Job Complete" ********************* split.script split -d -l Sample_5_L003_R1.cat.fastq r1. split -d -l Sample_5_L003_R2.cat.fastq r2. Expand here to see example merge.sge and merge.script merge.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N merge $ -pe 4way 72 $ -q normal $ -o merge.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher

9 setenv CONTROL_FILE merge.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE

10

11 echo " Parameteric Job Complete" merge.script

12 samtools merge -f chr6.sorted.bam b.*/chr6.mapped.*.sorted.bam; samtools index chr6.sorted.bam samtools merge -f chrx.sorted.bam b.*/chrx.mapped.*.sorted.bam; samtools index chrx.sorted.bam samtools merge -f chr17.sorted.bam b.*/chr17.mapped.*.sorted.bam; samtools index chr17.sorted.bam samtools merge -f chr21.sorted.bam b.*/chr21.mapped.*.sorted.bam; samtools index chr21.sorted.bam samtools merge -f chr5.sorted.bam b.*/chr5.mapped.*.sorted.bam; samtools index chr5.sorted.bam samtools merge -f chry.sorted.bam b.*/chry.mapped.*.sorted.bam; samtools index chry.sorted.bam samtools merge -f chr4.sorted.bam b.*/chr4.mapped.*.sorted.bam; samtools index chr4.sorted.bam samtools merge -f chr19.sorted.bam b.*/chr19.mapped.*.sorted.bam; samtools index chr19.sorted.bam samtools merge -f chr13.sorted.bam b.*/chr13.mapped.*.sorted.bam; samtools index chr13.sorted.bam samtools merge -f chr16.sorted.bam b.*/chr16.mapped.*.sorted.bam; samtools index chr16.sorted.bam samtools merge -f chr7.sorted.bam b.*/chr7.mapped.*.sorted.bam; samtools index chr7.sorted.bam samtools merge -f chr9.sorted.bam b.*/chr9.mapped.*.sorted.bam; samtools index chr9.sorted.bam samtools merge -f chr14.sorted.bam b.*/chr14.mapped.*.sorted.bam; samtools index chr14.sorted.bam samtools merge -f chr11.sorted.bam b.*/chr11.mapped.*.sorted.bam; samtools index chr11.sorted.bam samtools merge -f chr22.sorted.bam b.*/chr22.mapped.*.sorted.bam; samtools index chr22.sorted.bam samtools merge -f chr1.sorted.bam b.*/chr1.mapped.*.sorted.bam; samtools index chr1.sorted.bam samtools merge -f chr10.sorted.bam b.*/chr10.mapped.*.sorted.bam; samtools index chr10.sorted.bam samtools merge -f chr15.sorted.bam b.*/chr15.mapped.*.sorted.bam; samtools index chr15.sorted.bam samtools merge -f chr18.sorted.bam b.*/chr18.mapped.*.sorted.bam; samtools index chr18.sorted.bam samtools merge -f chr3.sorted.bam b.*/chr3.mapped.*.sorted.bam; samtools index chr3.sorted.bam samtools merge -f chr20.sorted.bam b.*/chr20.mapped.*.sorted.bam; samtools index chr20.sorted.bam samtools merge -f chr8.sorted.bam b.*/chr8.mapped.*.sorted.bam; samtools index chr8.sorted.bam samtools merge -f chr2.sorted.bam b.*/chr2.mapped.*.sorted.bam; samtools index chr2.sorted.bam samtools merge -f chr12.sorted.bam b.*/chr12.mapped.*.sorted.bam; samtools index chr12.sorted.bam Expand here to see example map.sge and map.script; note that the fastexon.sh script creates these within subdirectories.

13 map.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N map.b.1 $ -pe 2way 12 $ -q normal $ -o map.b.1.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE map.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters >

14 Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE

15 echo " Parameteric Job Complete" map.script /home1/01057/sphsmith/local/bin/exome_step1.bash r1.00 r2.00 mapped.00 >& mapped.00.log /home1/01057/sphsmith/local/bin/exome_step1.bash r1.01 r2.01 mapped.01 >& mapped.01.log Expand here to see example variants.sge and variants.script variants.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N variants $ -pe 2way 144 $ -q normal $ -o variants.o$job_id $ -l h_rt=4:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file>

16 NOTE: The env variable $JOB_ID contains the job id. module load launcher module load java64 module load samtools setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE variants.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission

17 cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE

18 echo " Parameteric Job Complete" variants.script /home1/01057/sphsmith/local/bin/exome_step2.bash chry.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chry >& variants.chry.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr22.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr22 >& variants.chr22.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr9.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr9 >& variants.chr9.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr3.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr3 >& variants.chr3.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr21.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr21 >& variants.chr21.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr5.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr5 >& variants.chr5.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr16.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr16 >& variants.chr16.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr19.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr19 >& variants.chr19.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr18.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr18 >& variants.chr18.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr4.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr4 >& variants.chr4.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr12.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr12 >& variants.chr12.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr15.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr15 >& variants.chr15.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr14.sorted.bam

19 /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr14 >& variants.chr14.log /home1/01057/sphsmith/local/bin/exome_step2.bash chrx.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chrx >& variants.chrx.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr6.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr6 >& variants.chr6.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr13.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr13 >& variants.chr13.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr8.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr8 >& variants.chr8.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr7.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr7 >& variants.chr7.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr11.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr11 >& variants.chr11.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr20.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr20 >& variants.chr20.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr10.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr10 >& variants.chr10.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr1.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr1 >& variants.chr1.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr2.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr2 >& variants.chr2.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr17.sorted.bam

20 /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr17 >& variants.chr17.log Expand here to see example merge2.sge and merge2.script merge2.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N merge2 $ -pe 4way 12 $ -q normal $ -o merge2.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE merge2.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process

21 (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE

22 echo " Parameteric Job Complete" merge2.script samtools merge -f Sample_5_L003_R1.cat.fastq.sorted.bam *.sorted.bam

From Lab Bench to Supercomputer: Advanced Life Sciences Computing. John Fonner, PhD Life Sciences Computing

From Lab Bench to Supercomputer: Advanced Life Sciences Computing. John Fonner, PhD Life Sciences Computing From Lab Bench to Supercomputer: Advanced Life Sciences Computing John Fonner, PhD Life Sciences Computing A Decade s Progress in DNA Sequencing 2003: ABI 3730 Sequencer Human Genome: $2.7 Billion, 13

More information

Weitemier et al. Applications in Plant Sciences (9): Data Supplement S1 Page 1

Weitemier et al. Applications in Plant Sciences (9): Data Supplement S1 Page 1 Weitemier et al. Applications in Plant Sciences 2014 2(9): 1400042. Data Supplement S1 Page 1 /bin/tcsh Appendix S1. Detailed target enrichment probe design protocol Building_exon_probes.sh Workflow and

More information

Accelerate precision medicine with Microsoft Genomics

Accelerate precision medicine with Microsoft Genomics Accelerate precision medicine with Microsoft Genomics Copyright 2018 Microsoft, Inc. All rights reserved. This content is for informational purposes only. Microsoft makes no warranties, express or implied,

More information

MMAP Genomic Matrix Calculations

MMAP Genomic Matrix Calculations Last Update: 9/28/2014 MMAP Genomic Matrix Calculations MMAP has options to compute relationship matrices using genetic markers. The markers may be genotypes or dosages. Additive and dominant covariance

More information

Exercise: Fractals, Task Farms and Load Imbalance

Exercise: Fractals, Task Farms and Load Imbalance Exercise: Fractals, Task Farms and Load Imbalance May 24, 2015 1 Introduction and Aims This exercise looks at the use of task farms and how they can be applied to parallelise a problem. We use the calculation

More information

Understanding and Controlling Processor Affinity PRESENTED BY: Kent Milfeld. Slides at: tinyurl.com/chpc-2017-affinity. National Conference

Understanding and Controlling Processor Affinity PRESENTED BY: Kent Milfeld. Slides at: tinyurl.com/chpc-2017-affinity. National Conference Understanding and Controlling Processor Affinity PRESENTED BY: National Conference Kent Milfeld Slides at: tinyurl.com/chpc-2017-affinity Outline Motivation Affinity -- what is it OpenMP Affinity Ways

More information

OpenMP Affinity in Many-core Computing. PRESENTED BY: Kent Milfeld

OpenMP Affinity in Many-core Computing. PRESENTED BY: Kent Milfeld OpenMP Affinity in Many-core Computing PRESENTED BY: Kent Milfeld Outline Motivation Affinity -- what is it OpenMP Affinity Ways to show process masks Hybrid Computing How it works Advancing Standards

More information

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare

More information

Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor

Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor Here we provide an example worked in detail from antibody sequence and unbound antigen structure to a docked model of the antibody antigen

More information

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science + UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point

More information

Jaime E. Combariza, PhD Director Edition 06/02/18

Jaime E. Combariza, PhD Director Edition 06/02/18 MARCC Environment Jaime E. Combariza, PhD Director 1 Edition 06/02/18 Full Slides Available Online www.marcc.jhu.edu/training marcc-help@marcc.jhu.edu 2 What Nodes Do We Have? Please check our website

More information

Specific tools for genomics in UNIX: bedtools, bedops, vc:ools, Course: Work with genomic data in the UNIX April 2015

Specific tools for genomics in UNIX: bedtools, bedops, vc:ools, Course: Work with genomic data in the UNIX April 2015 Specific tools for genomics in UNIX: bedtools, bedops, vc:ools, Course: Work with genomic data in the UNIX April 2015 Genome arithmeics OperaIons with genomic data based on their physical posiion in genome

More information

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang Supplementary Materials for: Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John

More information

optislang 5 David Schneider Product manager

optislang 5 David Schneider Product manager optislang 5 David Schneider Product manager 1. Postprocessing 2. optislang for ANSYS 6. Algorithms 3. Customization 5. SPDM 4. Workflows 2 Update optislang Dynardo GmbH Postprocessing Predefined modes

More information

Lecture 7. Next-generation sequencing technologies

Lecture 7. Next-generation sequencing technologies Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing

More information

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte BICF Variant Analysis Tools Using the BioHPC Workflow Launching Tool Astrocyte Prioritization of Variants SNP INDEL SV Astrocyte BioHPC Workflow Platform Allows groups to give easy-access to their analysis

More information

Prioritization: from vcf to finding the causative gene

Prioritization: from vcf to finding the causative gene Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for

More information

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test Copyrighted 2018 Zongxiao He & Suzanne M. Leal Introduction Many population-based rare-variant association tests, which aggregate

More information

Environment Modules. Reinhold Bader Markus Michael Müller Volker Weinberg

Environment Modules. Reinhold Bader Markus Michael Müller Volker Weinberg Reinhold Bader Markus Michael Müller Volker Weinberg Controlling the programming environment UNIX/Linux systems shell (usually remotely started) System settings: limits (memory, stack, coresize etc.),

More information

Downloading PrecisionFDA Challenge Datasets 1. Consistency challenge (https://precision.fda.gov/challenges/consistency)

Downloading PrecisionFDA Challenge Datasets 1. Consistency challenge (https://precision.fda.gov/challenges/consistency) Supplementary Notes for Strelka2: Fast and accurate variant calling for clinical sequencing applications Supplementary Note 1 Command lines to run analyses Downloading PrecisionFDA Challenge Datasets 1.

More information

Assignment 9: Genetic Variation

Assignment 9: Genetic Variation Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant

More information

Globus Genomics at GSI Boston University. Dinanath Sulakhe, Alex Rodriguez

Globus Genomics at GSI Boston University. Dinanath Sulakhe, Alex Rodriguez Globus Genomics at GSI Boston University Dinanath Sulakhe, Alex Rodriguez July 2014 Agenda 1. Introduction to Globus Genomics - Key features of Globus Genomics - How to use Globus Transfer 2. Introduce

More information

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009

UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009 UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments

More information

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 What s Galaxy? Bringing Developers And Biologists Together. Reproducible Science Is Our Goal An open, web-based platform for data intensive

More information

Compute- and Data-Intensive Analyses in Bioinformatics"

Compute- and Data-Intensive Analyses in Bioinformatics Compute- and Data-Intensive Analyses in Bioinformatics" Wayne Pfeiffer SDSC/UCSD August 8, 2012 Questions for today" How big is the flood of data from high-throughput DNA sequencers? What bioinformatics

More information

RNA-seq Data Analysis

RNA-seq Data Analysis Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture

More information

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%

More information

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.

More information

ACT-Concurrently Concurrency Work- Around for ACT-R

ACT-Concurrently Concurrency Work- Around for ACT-R ACT-Concurrently Concurrency Work- Around for ACT-R Frank Tamborello National Research Council Postdoctoral Research Associate U. S. Naval Research Laboratory "ACT-R models are often computationally expensive.

More information

Read Mapping and Variant Calling. Johannes Starlinger

Read Mapping and Variant Calling. Johannes Starlinger Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.

More information

Identifying copy number alterations and genotype with Control-FREEC

Identifying copy number alterations and genotype with Control-FREEC Identifying copy number alterations and genotype with Control-FREEC Valentina Boeva contact: freec@curie.fr Most approaches for predicting copy number alterations (CNAs) require you to have whole exomesequencing

More information

Novel Variant Discovery Tutorial

Novel Variant Discovery Tutorial Novel Variant Discovery Tutorial Release 8.4.0 Golden Helix, Inc. August 12, 2015 Contents Requirements 2 Download Annotation Data Sources...................................... 2 1. Overview...................................................

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

Fast and Accurate Variant Calling in Strand NGS

Fast and Accurate Variant Calling in Strand NGS S T R A ND LIF E SCIENCE S WH ITE PAPE R Fast and Accurate Variant Calling in Strand NGS A benchmarking study Radhakrishna Bettadapura, Shanmukh Katragadda, Vamsi Veeramachaneni, Atanu Pal, Mahesh Nagarajan

More information

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 29/WG 11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June

More information

ArcGIS Workflow Manager Advanced Workflows and Concepts

ArcGIS Workflow Manager Advanced Workflows and Concepts 2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop ArcGIS Workflow Manager Advanced Workflows and Concepts Kevin Bedel Nishi Mishra Esri UC2013. Technical

More information

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Variant Finding UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Types of Variants Adapted from Alkan et al, Nature Reviews Genetics 2011 Why Look For Variants? Genotyping Correlation with

More information

Graph Optimization Algorithms for Sun Grid Engine. Lev Markov

Graph Optimization Algorithms for Sun Grid Engine. Lev Markov Graph Optimization Algorithms for Sun Grid Engine Lev Markov Sun Grid Engine SGE management software that optimizes utilization of software and hardware resources in heterogeneous networked environment.

More information

After working through that presentation, you will be prepared to use Xcelsius dashboards accessing BI query data via SAP NetWeaver BW connection in

After working through that presentation, you will be prepared to use Xcelsius dashboards accessing BI query data via SAP NetWeaver BW connection in After working through that presentation, you will be prepared to use Xcelsius dashboards accessing BI query data via SAP NetWeaver BW connection in your company. 1 Topics Learn how to build Xcelsius dashboards

More information

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting Setting Standards and Raising Quality for Clinical Bioinformatics Joo Wook Ahn, Guy s & St Thomas 04/07/2016 - ACGS summer scientific meeting 1. Best Practice Guidelines Draft guidelines circulated to

More information

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

TMT Fleet Maintenance Windows. TruckMate Installation Guide

TMT Fleet Maintenance Windows. TruckMate Installation Guide TMW Asset Maintenance TMT Fleet Maintenance Windows TruckMate Installation Guide 1 Table of Contents TruckMate Interface... 3 TruckMate TMT Fleet Maintenance Interface... 4 TruckMate Installation from

More information

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples Andreas Scherer, Ph.D. President and CEO Dr. Donald Freed, Bioinformatics Scientist, Sentieon

More information

White Paper GENALICE MAP: Variant Calling in a Matter of Minutes. Bas Tolhuis, PhD - GENALICE B.V.

White Paper GENALICE MAP: Variant Calling in a Matter of Minutes. Bas Tolhuis, PhD - GENALICE B.V. White Paper GENALICE MAP: Variant Calling in a Matter of Minutes Bas Tolhuis, PhD - GENALICE B.V. White Paper GENALICE MAP Variant Calling GENALICE BV May 2014 White Paper GENALICE MAP Variant Calling

More information

NGS in Pathology Webinar

NGS in Pathology Webinar NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical

More information

MPG NGS workshop I: SNP calling

MPG NGS workshop I: SNP calling MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula

More information

A Slurm Simulator: Implementation and Parametric Analysis

A Slurm Simulator: Implementation and Parametric Analysis A Slurm Simulator: Implementation and Parametric Analysis Nikolay A. Simakov, Martins D. Innus, Matthew D. Jones,Robert L. DeLeon, Joseph P. White, Steven M. Gallo, Abani K. Patra and Thomas R. Furlani

More information

Robert Edgar. Independent scientist

Robert Edgar. Independent scientist Robert Edgar Independent scientist robert@drive5.com www.drive5.com Reads FASTQ format Millions of reads Many Gb USEARCH commands "UPARSE pipeline" OTU sequences FASTA format >Otu1 GATTAGCTCATTCGTA >Otu2

More information

CS3211 Project 2 OthelloX

CS3211 Project 2 OthelloX CS3211 Project 2 OthelloX Contents SECTION I. TERMINOLOGY 2 SECTION II. EXPERIMENTAL METHODOLOGY 3 SECTION III. DISTRIBUTION METHOD 4 SECTION IV. GRANULARITY 6 SECTION V. JOB POOLING 8 SECTION VI. SPEEDUP

More information

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR

More information

Package geno2proteo. December 12, 2017

Package geno2proteo. December 12, 2017 Type Package Package geno2proteo December 12, 2017 Title Finding the DNA and Protein Sequences of Any Genomic or Proteomic Loci Version 0.0.1 Date 2017-12-12 Author Maintainer biocviews

More information

Short Read Alignment to a Reference Genome

Short Read Alignment to a Reference Genome Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Summer School in Bioinformatics Cambridge, September 2018 Aligning to a reference genome BWA Bowtie2 STAR GEM Pseudo Aligners for RNA-seq

More information

Gene Expression analysis with RNA-Seq data

Gene Expression analysis with RNA-Seq data Gene Expression analysis with RNA-Seq data C3BI Hands-on NGS course November 24th 2016 Frédéric Lemoine Plan 1. 2. Quality Control 3. Read Mapping 4. Gene Expression Analysis 5. Splicing/Transcript Analysis

More information

Quantifying gene expression

Quantifying gene expression Quantifying gene expression Genome GTF (annotation)? Sequence reads FASTQ FASTQ (+reference transcriptome index) Quality control FASTQ Alignment to Genome: HISAT2, STAR (+reference genome index) (known

More information

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

Jaime E. Combariza, PhD Director. Edition 02/06/18

Jaime E. Combariza, PhD Director. Edition 02/06/18 Jaime E. Combariza, PhD Director 1 Edition 02/06/18 Slides available online www.marcc.jhu.edu/training marcc-help@jhu.edu 2 Model & Funding Grant from the State of Maryland to JHU to build an HPC/big Data

More information

SAS. Activity-Based Management Adapter 6.1 for SAP R/3 User s Guide

SAS. Activity-Based Management Adapter 6.1 for SAP R/3 User s Guide SAS Activity-Based Management Adapter 6.1 for SAP R/3 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2004. SAS Activity-Based Management Adapter 6.1 for

More information

Analytics Behind Genomic Testing

Analytics Behind Genomic Testing A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical

More information

Supplementary Figures and Data

Supplementary Figures and Data Supplementary Figures and Data Whole Exome Screening Identifies Novel and Recurrent WISP3 Mutations Causing Progressive Pseudorheumatoid Dysplasia in Jammu and Kashmir India Ekta Rai 1, Ankit Mahajan 2,

More information

Cluster Workload Management

Cluster Workload Management Cluster Workload Management Goal: maximising the delivery of resources to jobs, given job requirements and local policy restrictions Three parties Users: supplying the job requirements Administrators:

More information

IMPACT User Manual. Version 1.0

IMPACT User Manual. Version 1.0 IMPACT User Manual Version 1.0 1 Table of index: Overview 3 Dependencies 4 Preparation 4 Download 4 Quick Start 5 Module 1: Somatic Variants Detection 6 Module 2: Copy Number Alteration Detection 8 Module

More information

PredictSNP 1.0: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations. User guide

PredictSNP 1.0: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations. User guide PredictSNP 1.0: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations User guide Contact: Loschmidt Laboratories, Department of Experimental Biology and Research Centre for

More information

Zika infected human samples

Zika infected human samples Lecture 16 RNA-seq Zika infected human samples Experimental design ZIKV-infected hnpcs 56 hours after ZIKA and mock infection in parallel cultures were used for global transcriptome analysis. RNA-seq libraries

More information

Invoice Manager Admin Guide Basware P2P 17.3

Invoice Manager Admin Guide Basware P2P 17.3 Invoice Manager Admin Guide Basware P2P 17.3 Copyright 1999-2017 Basware Corporation. All rights reserved.. 1 Invoice Management Overview The Invoicing tab is a centralized location to manage all types

More information

Big Data & Hadoop Advance

Big Data & Hadoop Advance Course Durations: 30 Hours About Company: Course Mode: Online/Offline EduNextgen extended arm of Product Innovation Academy is a growing entity in education and career transformation, specializing in today

More information

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. Open Seqmonk Launch SeqMonk The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. SeqMonk Analysis Page 1 Create

More information

Moreno Baricevic Stefano Cozzini. CNR-IOM DEMOCRITOS Trieste, ITALY. Resource Management

Moreno Baricevic Stefano Cozzini. CNR-IOM DEMOCRITOS Trieste, ITALY. Resource Management Moreno Baricevic Stefano Cozzini CNR-IOM DEMOCRITOS Trieste, ITALY Resource Management RESOURCE MANAGEMENT We have a pool of users and a pool of resources, then what? some software that controls available

More information

OHSU Digital Commons. Oregon Health & Science University. Benjamin Cordier. Scholar Archive

OHSU Digital Commons. Oregon Health & Science University. Benjamin Cordier. Scholar Archive Oregon Health & Science University OHSU Digital Commons Scholar Archive 5-19-2017 Evaluation Of Background Prediction For Variant Detection In A Clinical Context: Towards Improved Ngs Monitoring Of Minimal

More information

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti! Variant Analysis CB2-201 Computational Biology and Bioinformatics February 27, 2015 Emidio Capriotti http://biofold.org/emidio Division of Informatics Department of Pathology Variant Call Format The final

More information

Introduction to Copy Number Analysis

Introduction to Copy Number Analysis Introduction to Copy Number Analysis Document Number: 30210 Document Revision: B1 30210 Rev B, Introduction to Copy Number Analysis Page 1 of 12 Table of Contents Legal Notice... 3 Introduction... 4 Input...

More information

Sanger vs Next-Gen Sequencing

Sanger vs Next-Gen Sequencing Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics

More information

Fractal Exercise. Fractals, task farm and load imbalance

Fractal Exercise. Fractals, task farm and load imbalance Fractal Exercise Fractals, task farm and load imbalance 2 Contents 1 Introduction and Aims... 3 1.1 Mandelbrot Set... 3 2 Looking at the concepts... 4 2.1 What is a task farm?... 4 2.1.1 Using a task farm...

More information

An Introduction to the package geno2proteo

An Introduction to the package geno2proteo An Introduction to the package geno2proteo Yaoyong Li January 24, 2018 Contents 1 Introduction 1 2 The data files needed by the package geno2proteo 2 3 The main functions of the package 3 1 Introduction

More information

CHEM5302 Fall 2017: Docking and BEDAM Free Energy Calculations for LEDGF Inhibitors of HIV Integrase

CHEM5302 Fall 2017: Docking and BEDAM Free Energy Calculations for LEDGF Inhibitors of HIV Integrase CHEM5302 Fall 2017: Docking and BEDAM Free Energy Calculations for LEDGF Inhibitors of HIV Integrase Ronald Levy November 30, 2017 1 Introduction The human LEDGF protein links the HIV integrase (HIV-IN)

More information

Powering Statistical Genetics with the Grid: Using GridWay to Automate R Workflows

Powering Statistical Genetics with the Grid: Using GridWay to Automate R Workflows Powering Statistical Genetics with the Grid: Using GridWay to Automate R Workflows John-Paul Robinson Information Technology Purushotham Bangalore Department of Computer Science Jelai Wang, Tapan Mehta

More information

Accelerate Insights with Topology, High Throughput and Power Advancements

Accelerate Insights with Topology, High Throughput and Power Advancements Accelerate Insights with Topology, High Throughput and Power Advancements Michael A. Jackson, President Wil Wellington, EMEA Professional Services May 2014 1 Adaptive/Cray Example Joint Customers Cray

More information

Barcode Printing. SIMMS Inventory Management Software February 24, 2012

Barcode Printing. SIMMS Inventory Management Software February 24, 2012 Barcode Printing SIMMS Inventory Management Software 2012 February 24, 2012 Contents Barcode Printing.................. 1 Printing Barcodes.................. 1 Print Barcodes for Inventory Items..........

More information

How to Align a BNX to a Reference. Document Number: Document Revision: A

How to Align a BNX to a Reference. Document Number: Document Revision: A How to Align a BNX to a Reference Document Number: 30193 Document Revision: A Legal Notice For Research Use Only. Not for use in diagnostic procedures. This material is protected by United States Copyright

More information

CPU scheduling. CPU Scheduling

CPU scheduling. CPU Scheduling EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming

More information

From raw reads to variants

From raw reads to variants From raw reads to variants Sebastian DiLorenzo Sebastian.DiLorenzo@NBIS.se Talk Overview Concepts Reference genome Variants Paired-end data NGS Workflow Quality control & Trimming Alignment Local realignment

More information

Genome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014

Genome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014 Genome STRiP ASHG Workshop demo materials Bob Handsaker October 19, 2014 Running Genome STRiP directly on AWS Genome STRiP Structure in Populations Popula'on)aware-discovery-andgenotyping-of-structural-varia'onfrom-whole)genome-sequencing-

More information

Overview of Scientific Workflows: Why Use Them?

Overview of Scientific Workflows: Why Use Them? Overview of Scientific Workflows: Why Use Them? Blue Waters Webinar Series March 8, 2017 Scott Callaghan Southern California Earthquake Center University of Southern California scottcal@usc.edu 1 Overview

More information

Introduction to the Neuroscience Gateway (NSG)

Introduction to the Neuroscience Gateway (NSG) Introduction to the Neuroscience Gateway (NSG) www.nsgportal.org Amit Majumdar, Subhashini Sivagnanam, Kenneth Yoshimoto San Diego Supercomputer Center Ted Carnevale Yale School of Medicine Vadim Astakhov,Maryann

More information

Integration of Titan supercomputer at OLCF with ATLAS Production System

Integration of Titan supercomputer at OLCF with ATLAS Production System Integration of Titan supercomputer at OLCF with ATLAS Production System F Barreiro Megino 1, K De 1, S Jha 2, A Klimentov 3, P Nilsson 3, D Oleynik 1, S Padolski 3, S Panitkin 3, J Wells 4 and T Wenaus

More information

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS

More information

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012 + Variant Detection in Next Generation Sequencing Data John Osborne Sept 14, 2012 + Overview My Bias Talk slanted towards analyzing whole genomes using Illumina paired end reads with open source tools

More information

IBM i Version 7.2. Systems management Advanced job scheduler IBM

IBM i Version 7.2. Systems management Advanced job scheduler IBM IBM i Version 7.2 Systems management Advanced job scheduler IBM IBM i Version 7.2 Systems management Advanced job scheduler IBM Note Before using this information and the product it supports, read the

More information

Data Exchange Module. Vendor Invoice Import

Data Exchange Module. Vendor Invoice Import Data Exchange Module Vendor Invoice Import Information in this document is subject to change without notice and does not represent a commitment on the part of Dexter + Chaney. The software described in

More information

Dipping into Guacamole. Tim O Donnell & Ryan Williams NYC Big Data Genetics Meetup Aug 11, 2016

Dipping into Guacamole. Tim O Donnell & Ryan Williams NYC Big Data Genetics Meetup Aug 11, 2016 Dipping into uacamole Tim O Donnell & Ryan Williams NYC Big Data enetics Meetup ug 11, 2016 Who we are: Hammer Lab Computational lab in the department of enetics and enomic Sciences at Mount Sinai Principal

More information

iq 5 Calibration: Contents.

iq 5 Calibration: Contents. iq 5 Calibration: Contents. When do you calibrate Preparation. What you will need Assembly. What you will prepare Procedure. What you will do Summary Detailed Procedure Mask Background Well Factors Pure

More information

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG

More information

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters 1.119/TCC.15.15, IEEE Transactions on Cloud Computing 1 Self-Adjusting Configurations for Homogeneous and Heterogeneous Hadoop Clusters Yi Yao 1 Jiayin Wang Bo Sheng Chiu C. Tan 3 Ningfang Mi 1 1.Northeastern

More information

Data Exchange Module. Vendor Invoice Import

Data Exchange Module. Vendor Invoice Import Data Exchange Module Vendor Invoice Import Information in this document is subject to change without notice and does not represent a commitment on the part of Dexter + Chaney. The software described in

More information

Variant Quality Score Recalibra2on

Variant Quality Score Recalibra2on talks Variant Quality Score Recalibra2on Assigning accurate confidence scores to each puta2ve muta2on call You are here in the GATK Best Prac2ces workflow for germline variant discovery Data Pre-processing

More information

alanarentsen.blogspot.com @alanarentsen Inspirations Developing Installation Upgrades Final thoughts Inspirations instead of repeating structure it while developing Inspirations Developing Installation

More information

How to Configure the Workflow Service and Design the Workflow Process Templates

How to Configure the Workflow Service and Design the Workflow Process Templates How - To Guide SAP Business One 9.0 Document Version: 1.1 2013-04-09 How to Configure the Workflow Service and Design the Workflow Process Templates Typographic Conventions Type Style Example Description

More information