Back end. Sequencing of DNA clones. Lane finding and base calling - steps. Lane finding and base calling - software. Base calling - steps

Size: px
Start display at page:

Download "Back end. Sequencing of DNA clones. Lane finding and base calling - steps. Lane finding and base calling - software. Base calling - steps"

Transcription

1 Sequencing of DNA clones Back end Front end - sample preparation Separation and detection - gel electrophoresis Lane finding Base calling Assembly Finishing Back end - determine input DNA Lane finding and base calling - software Lane finding and base calling - steps Built into sequencer Preprimer Data Removal GelImager (Giddings, Madison, USA) The primer peak contains dye primer material not attached to newly synthesized DNA. Sequence data follows. Starting point for analysis. BaseFinder (Giddings, Madison, USA) Lane finding and base calling - steps Baseline Adjustment Junk in the sequencing reactions and glass plates cause background. This background has to be removed. Noise filtering In later portions of a data set where the signal has substantially dropped off, noise is a problem. Search for minimal signal intensity in successive windows of size n. The resulting line connecting the adjacent minima is subtracted. (n=50-100) Gaussian convolution or fast Fourier transform 1

2 Multicomponent transformation Deconvolution Data collected: 4 channels representing spectral intensity with 4 total wavelengths for each Data wanted: dye-linked DNA concentration versus time for each of the 4 dyes, normalized Removes the zone broadening effect of electrophoretic separation. Zone broadening described by Gaussian function depending on run time. Measurements found it to be linear ( bases), s(i)=a+bi+ci2. With this formula the peak is corrected. 4x4 transformation matrix, either calculated or user defined by clicking on base labels Effect of processing Signal normalization The scaling of intensity values between different channels are normalized. Normalization of peaks according to average signal response, taking in account the number of base calls. Staden - PREGAP Availability Giddings et al., 1998, Genome Research 8,

3 PREGAP - modules PREGAP methods Estimate base accuracies Accuracy = trace area called base/sum(trace area uncalled bases) Uncalled base clip Removes part of the sequence which is too poor for a reliable assembly. 5 and 3 clipping, start in the middle, where too many N s occur, it stops. Parameters: start offset, window length, number of uncalled bases Cloning Vector Clip Tag Repeats PREGAP methods Search for Sequencing Vectors used in the shotgunning process, eg for Cosmid or YAC. Vector file must be there. Identify and mark repetitive sequences, that they will not be used by the assembly algorithm. Vector clip PREGAP methods Uses the information of vectors used in the exp files: name, position of cloning, primer sites to remove the parts belonging to the vector Staden file formats Exp : EMBL like format, Sequence Scf : Data from Sequencer: trace sample points, position of the bases relative to sample points, accuracy estimates, sequence, comments PREGAP - modules GCG Sequence Assembly Gelstart Gelenter Gelmerge Gelassemble, Gelview, Geldisassemble,... 3

4 GCG Sequence Assembly - method GCG Sequence Assembly - result Find the 2 contigs with the longest overlap Align them to assemble a single contig Repeat, until only 1 contig is left or the remaining contigs can not be assembled Finding overlaps: Search for short identical blocks - wordsize (7) Gaps allowed between blocks Alignments must contain at least one long block - minidentity (14) Number of identities in an overlap more than 80% Staden Assembly - GAP Staden - GAP - Shotgun assembly Different assembly methods possible Shotgun Assembly (similar to GCG assembly) Cap (Xiaoqiu Huang, huang@mtu.edu) Phrap (Green, P. UWGC/analysistools/phrap.htm) GAP - Assembly - Result 1 GAP - Assembly - Result 2 4

5 GAP - Assembly - Contig Editor GAP - Assembly - Trace Viewer 1 GAP - Assembly - Trace Differences GAP - Assembly - Trace info GAP - Assembly - Primer proposal GAP - Assembly - Probe proposal 5

6 HTTP Addresses of Assembly packages Mutation Detection - Staden Staden: GCG: Create a reference trace Trace_diff Show all differences above a user defined level as possible mutations Search for differences which show on negative and one positive peak (exchange of a base) View differences manually GAP - Assembly - Trace Differences Trace_diff - example 3 sets number of readings number of bases average analysed length base differences real mutations trace-diff false pos trace-diff false neg GAP Editor GAP template display 6

7 PolyPhred - Automatic search for SNP s Phred (base calling), Phrap (sequence assembly), Consed (assembly editor) used for automatic SNP search Nickerson et al., NAR, 1997 Polyphred information on 7