UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009
|
|
- Malcolm Gilbert
- 5 years ago
- Views:
Transcription
1 UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009
2 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation
3 Introduction Basic problem: distinguish polymorphism from sequencing error Use quality measures Use redundancy Use knowledge about data source
4 Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping
5 Retinitis pigmentosa Inherited eye disease Linkage analysis PRPF31 mutation Incomplete penetrance Attempt sequencing
6 PRPF31 example c c>g 13 14
7 PRPF31 example
8 PRPF31 example, zoom
9 PRPF31 example, MFA
10 Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping
11 Hypertrophic cardiomiopathy Small collection of known genes PCR amplify gene pieces Sequence
12 Small deletion
13 Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping
14 Exome sequencing Extract selected genomic parts Sequence collected pieces
15 Coverage on HsA 21q
16 Coverage detail HsA 21q
17 HsA 21q HAPMAP NA12782
18 Base calling Rolexa FastQ...
19 Reads filtering Entropy Quality values (Position)
20 Filtering example Rolexa base calling Filter reads for length and ambiguity ACGTU -> 1 KMRSWY -> 2 BDHV -> 3 N -> 4 Minimum length 20 Maximum ambiguity 81
21 Read classification Use fetchgwi against whole genome Single exact matches -> U (unique) Multiple exact matches -> R (repeat) No exact match -> M (missed)
22 Detailed alignment Use M reads Split region of interest in chunks (eg 300 bp + 40 bp overlap) Find reads with identical 12-mer Global alignment of reads vs chunks Filter alignments, retain good set Eg: maximum 3 mismatches
23 Alignment analysis Map retained reads to full genome Remove set with better maps outside region of interest
24 Practical alignment analysis 1 12-mers U R M
25 Practical alignment analysis 2 12-mers U R M
26 Output generation Create multiple sequence alignment Prepare text output in column format Call SNPs (alleles, coverage, etc.)
27 Results in CSV files
28 Detailed view in UCSC
29 Results in MFA
30 Script srmap Needs fetch.conf, input chunk and genomic coordinates Produces MFA and CSV output
31 Script preparejobs Needs genomic coordinates Prepares scripts to process each chunk using srmap
32 Script local2genomic Needs CSV file produced by srmap Adds genomic coordinates
33 Script collatecsv Needs CSV file produced by local2genomic Merges chunks back together
34 Script matchgenotype Needs CSV file produced by srmap, local2genomic, or collatecsv Needs genotype file, eg genotypes_chrmt_yri_r24_nr.b36_fwd.txt.gz Compares detected SNPs with reference and produces CSV output
35 Exercise data source ftp://ftp.ncbi.nih.gov:21/pub/tracedb/shortread/sra000271/fastq Locally in UHTS_SNP subdirectory of student accounts
36 Exercise 1 Analyze Illumina reads from NA18507 Confirm HapMap genotype for the mitochondrial genome Choose subsets of the reads and see how coverage and SNPs are affected (confirm other genomic regions of interest)
37 Exercise 2 Analyze paired Illumina reads from NA18507 Look at the mitochondrial DNA and explain the apparent gap near coordinates 1-120
38 Exercise 3 Analyze paired Illumina reads from NA18507 Can you confirm homozygous 1Kb deletion on chromosome 20 at 61 Mb?
39 Exercise 4 Analyze paired Illumina reads from NA18507 Can you confirm a complex re-arrangement on chromosome 5 What do you expect to see in the pairs?