Split input data, map to hg19, split by chromosome, and variant call with GATK
|
|
- Gyles Parrish
- 6 years ago
- Views:
Transcription
1 Split input data, map to hg19, split by chromosome, and variant call with GATK This is a template script written by Scott Hunicke-Smith to illustrate how to run exome analysis much faster on lonestar. It only requires two fastq files (paired files) and two parameters. It is NOT optimized, not highly robust, etc. It relies on many sub-scripts both within Scott's home directory and the BioITeam corral directories. This bash script needs to be run on a head node somewhere where it won't be killed: fastexon.sh!/bin/bash Copyright 2012 Scott Hunicke-Smith and the University of Texas at Austin module load python module load bwa module load samtools module load java64 r1file=$1 r2file=$2 splitsize=$3 batchsize=$4 queue="normal" echo "Starting: `date`" 1. Split input fastq's as one job; store job echo "split -d -l $splitsize $r1file r1. split -d -l $splitsize $r2file r2. " > split.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j split.script -l split.sge -a DNAdenovo -n split -t 1:00:00 qsub split.sge &> split.sge.sublog splitjid=`tail -1 split.sge.sublog awk '{print $3}'` echo "Submitted $splitjid to split input files at `date`" echo "Waiting for split to finish" while qstat grep $splitjid ; do echo `date` sleep Move a set of splits into their own directory i=0 filelist="" subdirlist="" for file in $( ls r1.* ) ; do fileext="${file*.}" if [ `expr $i % $batchsize` -eq `expr $batchsize - 1` ] then mkdir b.$i subdirlist="$subdirlist b.$i" filelist="$filelist $fileext"
2 for datafiles in $filelist ; do mv r2.$datafiles b.$i mv r1.$datafiles b.$i filelist="" else filelist="$filelist $fileext" fi i=`expr $i + 1` And the residual set, if any: for file in $( ls r1.* ) ; do mkdir b.$i subdirlist="$subdirlist b.$i" mv r1.* b.$i mv r2.* b.$i 3. Launch exome_step1.sh on each split within it's own directory; store job numbers; launch exome_step2.sh to combine chr files mapjids="" for subdir in $subdirlist ; do cd $subdir echo "Creating launcher for all files in $subdir: `date`" rm -f map.sge for file in $( ls r1.* ) ; do fileext="${file*.}" echo "Run exome_step1.sh on r1.$fileext and r2.$fileext" echo "/home1/01057/sphsmith/local/bin/exome_step1.bash r1.$fileext r2.$fileext mapped.$fileext >& mapped.$fileext.log" >> map.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j map.script -l map.sge -a DNAdenovo -n map.$subdir -t 1:00:00 -w 2 qsub map.sge &> map.sge.sublog mapjid=`tail -1 map.sge.sublog awk '{print $3}'` mapjids="$mapjids,`tail -1 map.sge.sublog awk '{print $3}'`" echo "Submitted $mapjid to split input files in $subdir at `date`" cd.. echo "Waiting for mapping to finish" while qstat grep $mapjid ; do echo `date` sleep 30 echo "Finished: `date`" 4. Launch job to combine final chr files across all directories echo "Creating launcher for merging by reference sequence: `date`" subdir=`ls -d b.* head -1` subdirext="${subdir*.}"
3 evalcmd="ls b.$subdirext/*.mapped.*.sorted.bam awk -F [./] '{print \$3}' sort uniq" reflist=`eval $evalcmd` Randomize this list so large and small reference sequences are mixed up reflist=`echo $reflist awk 'BEGIN {srand() } {for (i=1;i<=nf;i++) {print rand() "\t" $i "\n"}}' sort -n cut -f 2` echo $reflist rm -f merge.script for refs in $reflist ; do echo "Merging $refs " echo "samtools merge -f $refs.sorted.bam b.*/$refs.mapped.*.sorted.bam; samtools index $refs.sorted.bam" >> merge.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j merge.script -l merge.sge -a DNAdenovo -n merge -t 1:00:00 -w 4 echo "Submitting job; queue start contingent on $mapjids completing first" qsub merge.sge -hold_jid $mapjids &> merge.sge.sublog mergejid=`tail -1 merge.sge.sublog awk '{print $3}'` echo "Submitted $mergejid to merge output files at `date`" echo "Waiting for merging to finish" while qstat grep $mergejid ; do echo `date` sleep Launch GATK on each reference sequences' sorted bam file echo "Creating launcher for merging by reference sequence: `date`" subdir=`ls -d b.* head -1` subdirext="${subdir*.}" evalcmd="ls b.$subdirext/*.mapped.*.sorted.bam awk -F [./] '{print \$3}' sort uniq" reflist=`eval $evalcmd` Randomize this list so large and small reference sequences are mixed up reflist=`echo $reflist awk 'BEGIN {srand() } {for (i=1;i<=nf;i++) {print rand() "\t" $i "\n"}}' sort -n cut -f 2` echo $reflist rm -f variants.script for refs in $reflist ; do echo "GATK via exome_step2.bash on $refs.sorted.bam" echo "/home1/01057/sphsmith/local/bin/exome_step2.bash $refs.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf $refs >& variants.$refs.log" >> variants.script Note that the -w 2 option here defines how many GATK's run per node - might need optimization /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j variants.script -l variants.sge -a DNAdenovo -n variants -t 4:00:00 -w 2 sed -i s/'module load launcher'/'module load launcher\nmodule load java64\nmodule load samtools'/ variants.sge qsub variants.sge -hold_jid $mergejid &> variants.sge.sublog variantsjid=`tail -1 variants.sge.sublog awk '{print $3}'`
4 echo "Submitted $variantsjid to call variants at `date`" echo "Waiting for variant calling to finish" while qstat grep $variantsjid ; do echo `date` sleep Merge all bam files & vcf files echo "samtools merge -f $r1file.sorted.bam *.sorted.bam" > merge2.script /home1/01057/sphsmith/local/bin/launcher_creator.py -q $queue -j merge2.script -l merge2.sge -a DNAdenovo -n merge2 -t 1:00:00 -w 4 qsub merge2.sge -hold_jid $variantsjid &> merge2.sge.sublog merge2jid=`tail -1 merge2.sge.sublog awk '{print $3}'` echo "Submitted $merge2jid to merge output files at `date`" echo "Waiting for merging to finish" while qstat grep $merge2jid ; do echo `date` sleep 30 grep '^' chrx.sorted.bam.snps.vcf > $r1file.snps.vcf grep -v '^' chr*.sorted.bam.snps.vcf >> $r1file.snps.vcf
5 echo "Fast exon analysis is complete at: `date`" It uses the TACC "launcher" functionality to do the following: 1. Create one job on one node which splits the two input fastq files into files with $splitsize lines per file each using split.sge and sp lit.script. Wait for job to finish. 2. Create as many subdirectories as needed for the split output files to be mapped $batchsize per directory. Lonestar nodes have two sockets with 12 processors per socket, so a good choice here is to make $batchsize two so that the mapping step can use 6 threads. 3. Create $batchsize * originalfilesize % $splitsize lines in map.script and submit map.sge to do the mapping. Note that the embedded mapping script exome_step1.sh splits the mapping output into chromosome-specific files during the bwa sampe step. This mapping script is also where multi-threading for bwa is set. It should be parameterized of course. 4. Merge all the chromosome-specific files from these subdirectories back to the run directory using merge.sge and merge.script 5. Run GATK on sets of these chromosome-specific files, with 2 GATK's per node (hardcoded in script right now) using variants.sge an d variants.script; since chromosomes are usually named based on their size (i.e. chr1 < chr2 < chr3, etc.), randomize the list so that we don't wind up with all the big chromosomes on one node. 6. Merge the final GATK chromosome-specific variant calls - both the BAM files and the VCF files - using merge2.sge and merge2.scri pt. Examples of the various.sge and.script files are shown below. Benchmark analysis on ~40 million read pairs from a single human exome experiment show that this script takes about 2 hours vs. about 15 hours if all these same processes are run on only 1 node. Expand here to see example split.sge and split.script split.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N split $ -pe 12way 12 $ -q normal $ -o split.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit
6 $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE split.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE."
7 Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE
8 echo " Parameteric Job Complete" ********************* split.script split -d -l Sample_5_L003_R1.cat.fastq r1. split -d -l Sample_5_L003_R2.cat.fastq r2. Expand here to see example merge.sge and merge.script merge.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N merge $ -pe 4way 72 $ -q normal $ -o merge.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher
9 setenv CONTROL_FILE merge.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE
10
11 echo " Parameteric Job Complete" merge.script
12 samtools merge -f chr6.sorted.bam b.*/chr6.mapped.*.sorted.bam; samtools index chr6.sorted.bam samtools merge -f chrx.sorted.bam b.*/chrx.mapped.*.sorted.bam; samtools index chrx.sorted.bam samtools merge -f chr17.sorted.bam b.*/chr17.mapped.*.sorted.bam; samtools index chr17.sorted.bam samtools merge -f chr21.sorted.bam b.*/chr21.mapped.*.sorted.bam; samtools index chr21.sorted.bam samtools merge -f chr5.sorted.bam b.*/chr5.mapped.*.sorted.bam; samtools index chr5.sorted.bam samtools merge -f chry.sorted.bam b.*/chry.mapped.*.sorted.bam; samtools index chry.sorted.bam samtools merge -f chr4.sorted.bam b.*/chr4.mapped.*.sorted.bam; samtools index chr4.sorted.bam samtools merge -f chr19.sorted.bam b.*/chr19.mapped.*.sorted.bam; samtools index chr19.sorted.bam samtools merge -f chr13.sorted.bam b.*/chr13.mapped.*.sorted.bam; samtools index chr13.sorted.bam samtools merge -f chr16.sorted.bam b.*/chr16.mapped.*.sorted.bam; samtools index chr16.sorted.bam samtools merge -f chr7.sorted.bam b.*/chr7.mapped.*.sorted.bam; samtools index chr7.sorted.bam samtools merge -f chr9.sorted.bam b.*/chr9.mapped.*.sorted.bam; samtools index chr9.sorted.bam samtools merge -f chr14.sorted.bam b.*/chr14.mapped.*.sorted.bam; samtools index chr14.sorted.bam samtools merge -f chr11.sorted.bam b.*/chr11.mapped.*.sorted.bam; samtools index chr11.sorted.bam samtools merge -f chr22.sorted.bam b.*/chr22.mapped.*.sorted.bam; samtools index chr22.sorted.bam samtools merge -f chr1.sorted.bam b.*/chr1.mapped.*.sorted.bam; samtools index chr1.sorted.bam samtools merge -f chr10.sorted.bam b.*/chr10.mapped.*.sorted.bam; samtools index chr10.sorted.bam samtools merge -f chr15.sorted.bam b.*/chr15.mapped.*.sorted.bam; samtools index chr15.sorted.bam samtools merge -f chr18.sorted.bam b.*/chr18.mapped.*.sorted.bam; samtools index chr18.sorted.bam samtools merge -f chr3.sorted.bam b.*/chr3.mapped.*.sorted.bam; samtools index chr3.sorted.bam samtools merge -f chr20.sorted.bam b.*/chr20.mapped.*.sorted.bam; samtools index chr20.sorted.bam samtools merge -f chr8.sorted.bam b.*/chr8.mapped.*.sorted.bam; samtools index chr8.sorted.bam samtools merge -f chr2.sorted.bam b.*/chr2.mapped.*.sorted.bam; samtools index chr2.sorted.bam samtools merge -f chr12.sorted.bam b.*/chr12.mapped.*.sorted.bam; samtools index chr12.sorted.bam Expand here to see example map.sge and map.script; note that the fastexon.sh script creates these within subdirectories.
13 map.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N map.b.1 $ -pe 2way 12 $ -q normal $ -o map.b.1.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE map.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters >
14 Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE
15 echo " Parameteric Job Complete" map.script /home1/01057/sphsmith/local/bin/exome_step1.bash r1.00 r2.00 mapped.00 >& mapped.00.log /home1/01057/sphsmith/local/bin/exome_step1.bash r1.01 r2.01 mapped.01 >& mapped.01.log Expand here to see example variants.sge and variants.script variants.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N variants $ -pe 2way 144 $ -q normal $ -o variants.o$job_id $ -l h_rt=4:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file>
16 NOTE: The env variable $JOB_ID contains the job id. module load launcher module load java64 module load samtools setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE variants.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission
17 cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE
18 echo " Parameteric Job Complete" variants.script /home1/01057/sphsmith/local/bin/exome_step2.bash chry.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chry >& variants.chry.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr22.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr22 >& variants.chr22.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr9.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr9 >& variants.chr9.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr3.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr3 >& variants.chr3.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr21.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr21 >& variants.chr21.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr5.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr5 >& variants.chr5.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr16.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr16 >& variants.chr16.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr19.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr19 >& variants.chr19.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr18.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr18 >& variants.chr18.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr4.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr4 >& variants.chr4.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr12.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr12 >& variants.chr12.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr15.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr15 >& variants.chr15.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr14.sorted.bam
19 /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr14 >& variants.chr14.log /home1/01057/sphsmith/local/bin/exome_step2.bash chrx.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chrx >& variants.chrx.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr6.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr6 >& variants.chr6.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr13.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr13 >& variants.chr13.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr8.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr8 >& variants.chr8.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr7.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr7 >& variants.chr7.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr11.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr11 >& variants.chr11.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr20.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr20 >& variants.chr20.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr10.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr10 >& variants.chr10.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr1.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr1 >& variants.chr1.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr2.sorted.bam /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr2 >& variants.chr2.log /home1/01057/sphsmith/local/bin/exome_step2.bash chr17.sorted.bam
20 /work/01057/sphsmith/dbsnp/dbsnp_132.hg19.vcf chr17 >& variants.chr17.log Expand here to see example merge2.sge and merge2.script merge2.sge!/bin/csh Simple SGE script for submitting multiple serial jobs (e.g. parametric studies) using a script wrapper to launch the jobs. To use, build the launcher executable and your serial application(s) and place them in your WORKDIR directory. Then, edit the CONTROL_FILE to specify each executable per process < Setup Parameters > $ -N merge2 $ -pe 4way 12 $ -q normal $ -o merge2.o$job_id $ -l h_rt=1:00:00 $ -V $ -cwd < You MUST Specify a Project String -----> $ -A DNAdenovo Usage: $ -pe <parallel environment> <number of slots> $ -l h_rt=hours:minutes:seconds to specify run time limit $ -N <job name> $ -q <queue name> $ -o <job output file> NOTE: The env variable $JOB_ID contains the job id. module load launcher setenv EXECUTABLE $TACC_LAUNCHER_DIR/init_launcher setenv CONTROL_FILE merge2.script setenv WORKDIR. Variable description: EXECUTABLE = full path to the job launcher executable CONTROL_FILE = text input file which specifies executable for each process
21 (should be located in WORKDIR) WORKDIR = location of working directory < End Setup Parameters > Error Checking if (! -e $WORKDIR ) then echo "Error: unable to change to working directory." echo " $WORKDIR" if (! -f $EXECUTABLE ) then echo "Error: unable to find launcher executable $EXECUTABLE." if (! -f $WORKDIR/$CONTROL_FILE ) then echo "Error: unable to find input control file $CONTROL_FILE." Job Submission cd $WORKDIR/ echo " WORKING DIR: $WORKDIR/" $TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE
22 echo " Parameteric Job Complete" merge2.script samtools merge -f Sample_5_L003_R1.cat.fastq.sorted.bam *.sorted.bam
From Lab Bench to Supercomputer: Advanced Life Sciences Computing. John Fonner, PhD Life Sciences Computing
From Lab Bench to Supercomputer: Advanced Life Sciences Computing John Fonner, PhD Life Sciences Computing A Decade s Progress in DNA Sequencing 2003: ABI 3730 Sequencer Human Genome: $2.7 Billion, 13
More informationWeitemier et al. Applications in Plant Sciences (9): Data Supplement S1 Page 1
Weitemier et al. Applications in Plant Sciences 2014 2(9): 1400042. Data Supplement S1 Page 1 /bin/tcsh Appendix S1. Detailed target enrichment probe design protocol Building_exon_probes.sh Workflow and
More informationAccelerate precision medicine with Microsoft Genomics
Accelerate precision medicine with Microsoft Genomics Copyright 2018 Microsoft, Inc. All rights reserved. This content is for informational purposes only. Microsoft makes no warranties, express or implied,
More informationMMAP Genomic Matrix Calculations
Last Update: 9/28/2014 MMAP Genomic Matrix Calculations MMAP has options to compute relationship matrices using genetic markers. The markers may be genotypes or dosages. Additive and dominant covariance
More informationExercise: Fractals, Task Farms and Load Imbalance
Exercise: Fractals, Task Farms and Load Imbalance May 24, 2015 1 Introduction and Aims This exercise looks at the use of task farms and how they can be applied to parallelise a problem. We use the calculation
More informationUnderstanding and Controlling Processor Affinity PRESENTED BY: Kent Milfeld. Slides at: tinyurl.com/chpc-2017-affinity. National Conference
Understanding and Controlling Processor Affinity PRESENTED BY: National Conference Kent Milfeld Slides at: tinyurl.com/chpc-2017-affinity Outline Motivation Affinity -- what is it OpenMP Affinity Ways
More informationOpenMP Affinity in Many-core Computing. PRESENTED BY: Kent Milfeld
OpenMP Affinity in Many-core Computing PRESENTED BY: Kent Milfeld Outline Motivation Affinity -- what is it OpenMP Affinity Ways to show process masks Hybrid Computing How it works Advancing Standards
More informationBulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University
Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare
More informationWorked Example of Humanized Fab D3h44 in Complex with Tissue Factor
Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor Here we provide an example worked in detail from antibody sequence and unbound antigen structure to a docked model of the antibody antigen
More informationUAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science
+ UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point
More informationJaime E. Combariza, PhD Director Edition 06/02/18
MARCC Environment Jaime E. Combariza, PhD Director 1 Edition 06/02/18 Full Slides Available Online www.marcc.jhu.edu/training marcc-help@marcc.jhu.edu 2 What Nodes Do We Have? Please check our website
More informationSpecific tools for genomics in UNIX: bedtools, bedops, vc:ools, Course: Work with genomic data in the UNIX April 2015
Specific tools for genomics in UNIX: bedtools, bedops, vc:ools, Course: Work with genomic data in the UNIX April 2015 Genome arithmeics OperaIons with genomic data based on their physical posiion in genome
More informationChang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang
Supplementary Materials for: Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John
More informationoptislang 5 David Schneider Product manager
optislang 5 David Schneider Product manager 1. Postprocessing 2. optislang for ANSYS 6. Algorithms 3. Customization 5. SPDM 4. Workflows 2 Update optislang Dynardo GmbH Postprocessing Predefined modes
More informationLecture 7. Next-generation sequencing technologies
Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationIncorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits
Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing
More informationBICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte
BICF Variant Analysis Tools Using the BioHPC Workflow Launching Tool Astrocyte Prioritization of Variants SNP INDEL SV Astrocyte BioHPC Workflow Platform Allows groups to give easy-access to their analysis
More informationPrioritization: from vcf to finding the causative gene
Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for
More informationRV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test
RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test Copyrighted 2018 Zongxiao He & Suzanne M. Leal Introduction Many population-based rare-variant association tests, which aggregate
More informationEnvironment Modules. Reinhold Bader Markus Michael Müller Volker Weinberg
Reinhold Bader Markus Michael Müller Volker Weinberg Controlling the programming environment UNIX/Linux systems shell (usually remotely started) System settings: limits (memory, stack, coresize etc.),
More informationDownloading PrecisionFDA Challenge Datasets 1. Consistency challenge (https://precision.fda.gov/challenges/consistency)
Supplementary Notes for Strelka2: Fast and accurate variant calling for clinical sequencing applications Supplementary Note 1 Command lines to run analyses Downloading PrecisionFDA Challenge Datasets 1.
More informationAssignment 9: Genetic Variation
Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant
More informationGlobus Genomics at GSI Boston University. Dinanath Sulakhe, Alex Rodriguez
Globus Genomics at GSI Boston University Dinanath Sulakhe, Alex Rodriguez July 2014 Agenda 1. Introduction to Globus Genomics - Key features of Globus Genomics - How to use Globus Transfer 2. Introduce
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationCourse Presentation. Ignacio Medina Presentation
Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital
More informationUHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009
UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments
More informationGalaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12
Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 What s Galaxy? Bringing Developers And Biologists Together. Reproducible Science Is Our Goal An open, web-based platform for data intensive
More informationCompute- and Data-Intensive Analyses in Bioinformatics"
Compute- and Data-Intensive Analyses in Bioinformatics" Wayne Pfeiffer SDSC/UCSD August 8, 2012 Questions for today" How big is the flood of data from high-throughput DNA sequencers? What bioinformatics
More informationRNA-seq Data Analysis
Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture
More informationBST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data
BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%
More informationAnalysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail
Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.
More informationACT-Concurrently Concurrency Work- Around for ACT-R
ACT-Concurrently Concurrency Work- Around for ACT-R Frank Tamborello National Research Council Postdoctoral Research Associate U. S. Naval Research Laboratory "ACT-R models are often computationally expensive.
More informationRead Mapping and Variant Calling. Johannes Starlinger
Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.
More informationIdentifying copy number alterations and genotype with Control-FREEC
Identifying copy number alterations and genotype with Control-FREEC Valentina Boeva contact: freec@curie.fr Most approaches for predicting copy number alterations (CNAs) require you to have whole exomesequencing
More informationNovel Variant Discovery Tutorial
Novel Variant Discovery Tutorial Release 8.4.0 Golden Helix, Inc. August 12, 2015 Contents Requirements 2 Download Annotation Data Sources...................................... 2 1. Overview...................................................
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationFast and Accurate Variant Calling in Strand NGS
S T R A ND LIF E SCIENCE S WH ITE PAPE R Fast and Accurate Variant Calling in Strand NGS A benchmarking study Radhakrishna Bettadapura, Shanmukh Katragadda, Vamsi Veeramachaneni, Atanu Pal, Mahesh Nagarajan
More informationISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 29/WG 11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June
More informationArcGIS Workflow Manager Advanced Workflows and Concepts
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop ArcGIS Workflow Manager Advanced Workflows and Concepts Kevin Bedel Nishi Mishra Esri UC2013. Technical
More informationVariant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016
Variant Finding UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Types of Variants Adapted from Alkan et al, Nature Reviews Genetics 2011 Why Look For Variants? Genotyping Correlation with
More informationGraph Optimization Algorithms for Sun Grid Engine. Lev Markov
Graph Optimization Algorithms for Sun Grid Engine Lev Markov Sun Grid Engine SGE management software that optimizes utilization of software and hardware resources in heterogeneous networked environment.
More informationAfter working through that presentation, you will be prepared to use Xcelsius dashboards accessing BI query data via SAP NetWeaver BW connection in
After working through that presentation, you will be prepared to use Xcelsius dashboards accessing BI query data via SAP NetWeaver BW connection in your company. 1 Topics Learn how to build Xcelsius dashboards
More informationSetting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting
Setting Standards and Raising Quality for Clinical Bioinformatics Joo Wook Ahn, Guy s & St Thomas 04/07/2016 - ACGS summer scientific meeting 1. Best Practice Guidelines Draft guidelines circulated to
More informationAlignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014
Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationTMT Fleet Maintenance Windows. TruckMate Installation Guide
TMW Asset Maintenance TMT Fleet Maintenance Windows TruckMate Installation Guide 1 Table of Contents TruckMate Interface... 3 TruckMate TMT Fleet Maintenance Interface... 4 TruckMate Installation from
More informationThe Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples
The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples Andreas Scherer, Ph.D. President and CEO Dr. Donald Freed, Bioinformatics Scientist, Sentieon
More informationWhite Paper GENALICE MAP: Variant Calling in a Matter of Minutes. Bas Tolhuis, PhD - GENALICE B.V.
White Paper GENALICE MAP: Variant Calling in a Matter of Minutes Bas Tolhuis, PhD - GENALICE B.V. White Paper GENALICE MAP Variant Calling GENALICE BV May 2014 White Paper GENALICE MAP Variant Calling
More informationNGS in Pathology Webinar
NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical
More informationMPG NGS workshop I: SNP calling
MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula
More informationA Slurm Simulator: Implementation and Parametric Analysis
A Slurm Simulator: Implementation and Parametric Analysis Nikolay A. Simakov, Martins D. Innus, Matthew D. Jones,Robert L. DeLeon, Joseph P. White, Steven M. Gallo, Abani K. Patra and Thomas R. Furlani
More informationRobert Edgar. Independent scientist
Robert Edgar Independent scientist robert@drive5.com www.drive5.com Reads FASTQ format Millions of reads Many Gb USEARCH commands "UPARSE pipeline" OTU sequences FASTA format >Otu1 GATTAGCTCATTCGTA >Otu2
More informationCS3211 Project 2 OthelloX
CS3211 Project 2 OthelloX Contents SECTION I. TERMINOLOGY 2 SECTION II. EXPERIMENTAL METHODOLOGY 3 SECTION III. DISTRIBUTION METHOD 4 SECTION IV. GRANULARITY 6 SECTION V. JOB POOLING 8 SECTION VI. SPEEDUP
More informationSNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es
SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR
More informationPackage geno2proteo. December 12, 2017
Type Package Package geno2proteo December 12, 2017 Title Finding the DNA and Protein Sequences of Any Genomic or Proteomic Loci Version 0.0.1 Date 2017-12-12 Author Maintainer biocviews
More informationShort Read Alignment to a Reference Genome
Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Summer School in Bioinformatics Cambridge, September 2018 Aligning to a reference genome BWA Bowtie2 STAR GEM Pseudo Aligners for RNA-seq
More informationGene Expression analysis with RNA-Seq data
Gene Expression analysis with RNA-Seq data C3BI Hands-on NGS course November 24th 2016 Frédéric Lemoine Plan 1. 2. Quality Control 3. Read Mapping 4. Gene Expression Analysis 5. Splicing/Transcript Analysis
More informationQuantifying gene expression
Quantifying gene expression Genome GTF (annotation)? Sequence reads FASTQ FASTQ (+reference transcriptome index) Quality control FASTQ Alignment to Genome: HISAT2, STAR (+reference genome index) (known
More informationAlignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014
Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationJaime E. Combariza, PhD Director. Edition 02/06/18
Jaime E. Combariza, PhD Director 1 Edition 02/06/18 Slides available online www.marcc.jhu.edu/training marcc-help@jhu.edu 2 Model & Funding Grant from the State of Maryland to JHU to build an HPC/big Data
More informationSAS. Activity-Based Management Adapter 6.1 for SAP R/3 User s Guide
SAS Activity-Based Management Adapter 6.1 for SAP R/3 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2004. SAS Activity-Based Management Adapter 6.1 for
More informationAnalytics Behind Genomic Testing
A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical
More informationSupplementary Figures and Data
Supplementary Figures and Data Whole Exome Screening Identifies Novel and Recurrent WISP3 Mutations Causing Progressive Pseudorheumatoid Dysplasia in Jammu and Kashmir India Ekta Rai 1, Ankit Mahajan 2,
More informationCluster Workload Management
Cluster Workload Management Goal: maximising the delivery of resources to jobs, given job requirements and local policy restrictions Three parties Users: supplying the job requirements Administrators:
More informationIMPACT User Manual. Version 1.0
IMPACT User Manual Version 1.0 1 Table of index: Overview 3 Dependencies 4 Preparation 4 Download 4 Quick Start 5 Module 1: Somatic Variants Detection 6 Module 2: Copy Number Alteration Detection 8 Module
More informationPredictSNP 1.0: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations. User guide
PredictSNP 1.0: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations User guide Contact: Loschmidt Laboratories, Department of Experimental Biology and Research Centre for
More informationZika infected human samples
Lecture 16 RNA-seq Zika infected human samples Experimental design ZIKV-infected hnpcs 56 hours after ZIKA and mock infection in parallel cultures were used for global transcriptome analysis. RNA-seq libraries
More informationInvoice Manager Admin Guide Basware P2P 17.3
Invoice Manager Admin Guide Basware P2P 17.3 Copyright 1999-2017 Basware Corporation. All rights reserved.. 1 Invoice Management Overview The Invoicing tab is a centralized location to manage all types
More informationBig Data & Hadoop Advance
Course Durations: 30 Hours About Company: Course Mode: Online/Offline EduNextgen extended arm of Product Innovation Academy is a growing entity in education and career transformation, specializing in today
More informationThe first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.
Open Seqmonk Launch SeqMonk The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. SeqMonk Analysis Page 1 Create
More informationMoreno Baricevic Stefano Cozzini. CNR-IOM DEMOCRITOS Trieste, ITALY. Resource Management
Moreno Baricevic Stefano Cozzini CNR-IOM DEMOCRITOS Trieste, ITALY Resource Management RESOURCE MANAGEMENT We have a pool of users and a pool of resources, then what? some software that controls available
More informationOHSU Digital Commons. Oregon Health & Science University. Benjamin Cordier. Scholar Archive
Oregon Health & Science University OHSU Digital Commons Scholar Archive 5-19-2017 Evaluation Of Background Prediction For Variant Detection In A Clinical Context: Towards Improved Ngs Monitoring Of Minimal
More informationVariant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!
Variant Analysis CB2-201 Computational Biology and Bioinformatics February 27, 2015 Emidio Capriotti http://biofold.org/emidio Division of Informatics Department of Pathology Variant Call Format The final
More informationIntroduction to Copy Number Analysis
Introduction to Copy Number Analysis Document Number: 30210 Document Revision: B1 30210 Rev B, Introduction to Copy Number Analysis Page 1 of 12 Table of Contents Legal Notice... 3 Introduction... 4 Input...
More informationSanger vs Next-Gen Sequencing
Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics
More informationFractal Exercise. Fractals, task farm and load imbalance
Fractal Exercise Fractals, task farm and load imbalance 2 Contents 1 Introduction and Aims... 3 1.1 Mandelbrot Set... 3 2 Looking at the concepts... 4 2.1 What is a task farm?... 4 2.1.1 Using a task farm...
More informationAn Introduction to the package geno2proteo
An Introduction to the package geno2proteo Yaoyong Li January 24, 2018 Contents 1 Introduction 1 2 The data files needed by the package geno2proteo 2 3 The main functions of the package 3 1 Introduction
More informationCHEM5302 Fall 2017: Docking and BEDAM Free Energy Calculations for LEDGF Inhibitors of HIV Integrase
CHEM5302 Fall 2017: Docking and BEDAM Free Energy Calculations for LEDGF Inhibitors of HIV Integrase Ronald Levy November 30, 2017 1 Introduction The human LEDGF protein links the HIV integrase (HIV-IN)
More informationPowering Statistical Genetics with the Grid: Using GridWay to Automate R Workflows
Powering Statistical Genetics with the Grid: Using GridWay to Automate R Workflows John-Paul Robinson Information Technology Purushotham Bangalore Department of Computer Science Jelai Wang, Tapan Mehta
More informationAccelerate Insights with Topology, High Throughput and Power Advancements
Accelerate Insights with Topology, High Throughput and Power Advancements Michael A. Jackson, President Wil Wellington, EMEA Professional Services May 2014 1 Adaptive/Cray Example Joint Customers Cray
More informationBarcode Printing. SIMMS Inventory Management Software February 24, 2012
Barcode Printing SIMMS Inventory Management Software 2012 February 24, 2012 Contents Barcode Printing.................. 1 Printing Barcodes.................. 1 Print Barcodes for Inventory Items..........
More informationHow to Align a BNX to a Reference. Document Number: Document Revision: A
How to Align a BNX to a Reference Document Number: 30193 Document Revision: A Legal Notice For Research Use Only. Not for use in diagnostic procedures. This material is protected by United States Copyright
More informationCPU scheduling. CPU Scheduling
EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming
More informationFrom raw reads to variants
From raw reads to variants Sebastian DiLorenzo Sebastian.DiLorenzo@NBIS.se Talk Overview Concepts Reference genome Variants Paired-end data NGS Workflow Quality control & Trimming Alignment Local realignment
More informationGenome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014
Genome STRiP ASHG Workshop demo materials Bob Handsaker October 19, 2014 Running Genome STRiP directly on AWS Genome STRiP Structure in Populations Popula'on)aware-discovery-andgenotyping-of-structural-varia'onfrom-whole)genome-sequencing-
More informationOverview of Scientific Workflows: Why Use Them?
Overview of Scientific Workflows: Why Use Them? Blue Waters Webinar Series March 8, 2017 Scott Callaghan Southern California Earthquake Center University of Southern California scottcal@usc.edu 1 Overview
More informationIntroduction to the Neuroscience Gateway (NSG)
Introduction to the Neuroscience Gateway (NSG) www.nsgportal.org Amit Majumdar, Subhashini Sivagnanam, Kenneth Yoshimoto San Diego Supercomputer Center Ted Carnevale Yale School of Medicine Vadim Astakhov,Maryann
More informationIntegration of Titan supercomputer at OLCF with ATLAS Production System
Integration of Titan supercomputer at OLCF with ATLAS Production System F Barreiro Megino 1, K De 1, S Jha 2, A Klimentov 3, P Nilsson 3, D Oleynik 1, S Padolski 3, S Panitkin 3, J Wells 4 and T Wenaus
More informationIntroduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017
Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS
More informationVariant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012
+ Variant Detection in Next Generation Sequencing Data John Osborne Sept 14, 2012 + Overview My Bias Talk slanted towards analyzing whole genomes using Illumina paired end reads with open source tools
More informationIBM i Version 7.2. Systems management Advanced job scheduler IBM
IBM i Version 7.2 Systems management Advanced job scheduler IBM IBM i Version 7.2 Systems management Advanced job scheduler IBM Note Before using this information and the product it supports, read the
More informationData Exchange Module. Vendor Invoice Import
Data Exchange Module Vendor Invoice Import Information in this document is subject to change without notice and does not represent a commitment on the part of Dexter + Chaney. The software described in
More informationDipping into Guacamole. Tim O Donnell & Ryan Williams NYC Big Data Genetics Meetup Aug 11, 2016
Dipping into uacamole Tim O Donnell & Ryan Williams NYC Big Data enetics Meetup ug 11, 2016 Who we are: Hammer Lab Computational lab in the department of enetics and enomic Sciences at Mount Sinai Principal
More informationiq 5 Calibration: Contents.
iq 5 Calibration: Contents. When do you calibrate Preparation. What you will need Assembly. What you will prepare Procedure. What you will do Summary Detailed Procedure Mask Background Well Factors Pure
More informationEcole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech
GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG
More informationSelf-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters
1.119/TCC.15.15, IEEE Transactions on Cloud Computing 1 Self-Adjusting Configurations for Homogeneous and Heterogeneous Hadoop Clusters Yi Yao 1 Jiayin Wang Bo Sheng Chiu C. Tan 3 Ningfang Mi 1 1.Northeastern
More informationData Exchange Module. Vendor Invoice Import
Data Exchange Module Vendor Invoice Import Information in this document is subject to change without notice and does not represent a commitment on the part of Dexter + Chaney. The software described in
More informationVariant Quality Score Recalibra2on
talks Variant Quality Score Recalibra2on Assigning accurate confidence scores to each puta2ve muta2on call You are here in the GATK Best Prac2ces workflow for germline variant discovery Data Pre-processing
More informationalanarentsen.blogspot.com @alanarentsen Inspirations Developing Installation Upgrades Final thoughts Inspirations instead of repeating structure it while developing Inspirations Developing Installation
More informationHow to Configure the Workflow Service and Design the Workflow Process Templates
How - To Guide SAP Business One 9.0 Document Version: 1.1 2013-04-09 How to Configure the Workflow Service and Design the Workflow Process Templates Typographic Conventions Type Style Example Description
More information