Myzus persicae Clone G006 Assembly

Size: px
Start display at page:

Download "Myzus persicae Clone G006 Assembly"

Transcription

1 Myzus persicae Clone G006 Assembly R. Chikhi, T. Derrien, F. Legeai October 8, Reads correction Although sequence qualities from Illumina technologies are known to be accurate, typical errors are substitutions and arise at a frequency %, especially at the 3 end of the reads [7]. Assembling reads that contains errors may lead to false positive overlaps of k-mers ( reads) or trigger false positive gaps in the final assembly. We therefore used a state-ofthe-art program, Quake [7], that is dedicated to identify and correct reads. Briefly, Quake uses a specific method to choose an appropriate coverage cutoff between trusted k-mers (those that are truly part of the genome) and erroneous k-mers based on weighting k-mer counts in the reads using the quality values assigned to each base. The main statistics of this correcting steps are summarized in table 1. 2 Assembly 2.1 Minia assembly The Minia assembler (v ) [4] was used to assemble the paired-end reads into contigs. It was executed in de Bruijn graph assembly mode, i.e. no scaffolds were created. The commandline parameters are k=71, this value was recommended following a run of kmergenie [3] and min abundance=43. The contigs were (s) scaffolded and (g) gap-filled by executing each operation two times as follows: s+g+s+g. The scaffolding software used is SuperScaffolder Number Validated Corrected Trimmed Total Library of fragmentmentsmentsments* fragments frag- frag- frag- cleaned Name MPA MPB S(PE) Table 1: Summary table of Quake corrections on Myzus libraries. (* each number correspond to the 2 mate files per library), reads number are in millions 1

2 [5], a modified version of SSPACE [2] that is still in development. SuperScaffolder version was executed with no other parameters than the read files and the insert size of each library. The gap-filling software used is GapCloser v1.12 from SOAPdenovo [11]. GapCloser was executed with default parameters. Only the two mate-pairs libraries were used in scaffolding and gap-filling. Minia and SuperScaffolder have been used on non-corrected reads. 2.2 Allpaths-LG2 assembly The AllPaths-LG (r40324) [13] assembler was runned using the default parameters as recommended by the authors. We provided the followings descriptor files : in groups.csv : group_name, library_name, file_name MPA, MPA, awilson_mpersicaeg006_ mpa_s_7_?.fastq MPB, MPB, awilson_mpersicaeg006_ mpb_s_8_?.fastq PE, PE, awilson_mpersicaeg006_ _s_6_?.fastq in libs.csv : library_name, project_name, organism_name, type, paired, frag_size, frag_stddev,insert_size, insert_stddev, read_orientation, genomic_start, genomic_end MPA, Myzus, Myzus persicae, jumping, 1,,, 5000,500, outward,, MPB, Myzus, Myzus persicae, jumping, 1,,, 2000,200, outward,, PE, Myzus, Myzus persicae, fragment, 1, 200, 20,,, inward,, 2.3 Abyss assembly The Abyss assembler v1.3.2 [15] has been used on the corrected reads using different kmer size (k=41, 64 and 91). 3 Metrics We computed standard statistics on the AllPaths-LG2, Minia and Abyss assemblies, scaffolds (table 2) and contigs (table 3). These metrics have been calculated using the script assemblathon stats2.pl, used during the Assemblathon contest [6]. The expected size of the genome has been fixed at 350MBP, the number of consecutive N used to split scaffolds in contigs is 3. All these metrics have been calculated on scaffolds larger than 1000 bp. 3.1 Conclusion Minia gives better metrics, especially a better scaffold NG50, and less N in scaffolds. This better performance is not essentially due to a Gap closing step or a better scaffolding because the contigs show also a better NG50. In comparison with the other assemblers, the metrics given by Abyss are not good (number of scaffolds, NG50,... ). Thus, we decided to discard Abyss from further analyses and compares the AllPaths-LG and Minia assemblies, only. 2

3 Metric Allpaths Minia Abyss (k=41) Abyss (k=64) Abyss (k=91) Number of scaffolds Total size of scaffolds Total scaffold length as percentage of known 99.5% 99.3% 96.1% 103.8% 99.7% genome size Longest scaffold Shortest scaffold Number of scaffolds > 10K nt 1858 (43.2%) (37.9%) (39.6%) 5128 (41.0%) 8916 (32.1%) Number of scaffolds > 100K nt 788 (18.3%) 673 (18.3%) 976 (8.1%) 1031 (8.2%) 246 (0.9%) Number of scaffolds > 1M nt 38 (0.9%) 67 (1.8%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Mean scaffold size Median scaffold size N50 scaffold length L50 scaffold count NG50 scaffold length LG50 scaffold count N50 scaffold - NG50 scaffold length difference scaffold %A scaffold %C scaffold %G scaffold %T scaffold %N scaffold N nt Table 2: Scaffolds metrics Metric Allpaths Minia Abyss (k=41) Abyss (k=64) Abyss (k=91) Percentage of assembly in scaffolded contigs 95.4% 96.2% 93.5% 88.8% 90.8% Percentage of assembly in unscaffolded contigs 4.6% 3.8% 6.5% 11.2% 9.2% Average number of contigs per scaffold Average length of breaks (3 or more Ns) between contigs Number of contigs Number of contigs in scaffolds Number of contigs not in scaffolds Total size of contigs Longest contig Shortest contig Number of contigs > 500 nt (98.8%) (92.4%) (87.0%) (94.1%) (91.8%) Number of contigs > 1K nt (96.1%) 8655 (80.5%) (79.2%) (86.7%) (79.3%) Number of contigs > 10K nt 5915 (46.2%) 3983 (37.0%) (22.8%) (31.7%) 4173 (3.9%) Number of contigs > 100K nt 870 (6.8%) 1072 (10.0%) 9 (0.0%) 65 (0.2%) 1 (0.0%) Number of contigs > 1M nt 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Mean contig size Median contig size N50 contig length L50 contig count NG50 contig length LG50 contig count N50 contig - NG50 contig length difference contig %A contig %C contig %G contig %T contig %N contig N nt Table 3: Contigs metrics 3

4 Metric Allpaths Minia number of reads number of mapped reads (86.67%) (87.72%) number of aligned reads >1 times (7.80%) (15.01%) number of properly paired reads (83.11%) (79.79%) number of reads with itself and mate mapped () (85.57%) singletons (1.89%) (2.15%) with mate mapped to a different chr (1.35%) (2.09%) number of improperly paired reads on the same chr (0.03%) (3.70%) Table 4: PE mapping statistics Metric Allpaths Minia number of reads number of mapped reads (63.96%) (64.49%) number of aligned reads >1 times (1.42%) (1.67%) number of properly paired reads (47.19%) (45.69%) number of reads with itself and mate mapped (57.00%) (57.06%) singletons (6.95%) (7.44%) with mate mapped to a different chr (6.17%) (6.94%) number of improperly paired reads on the same chr (3.64%) (4.43%) 4 Remapping Reads 4.1 Protocol Table 5: MPA mapping statistics The corrected reads have been remapped on the genome using bowtie2 [9]. We used the option -X 500 for the pair ends mapping and the options rf for the mate pairs reads, we put the option -X to 8000 and 5000 for the MPA and MPB libraries, respectively. For the mapping statistics, we used the option flagstat of samtools [10]. For the coverage analysis, we used the tool genomecov of bedtools [14] and R scripts. 4.2 Mapping statistics An overview of the tables 4, 5, 6 below shows that both assemblers give very similar results, but it appears that Allpaths-LG shows better statistics on properly paired mapping for each library. It appears also that there is 3 times more reads with multiple putative locations found in the Minia assembly, although there is just few more mapping reads. 4

5 Metric Allpaths Minia number of reads number of mapped reads (75.23%) (75.79%) number of aligned reads >1 times (2.51%) (2.31%) number of properly paired reads (63.72%) (61.66%) number of reads with itself and mate mapped (71.88%) (71.84%) singletons (3.35%) (3.95%) with mate mapped to a different chr (6.84%) (7.55%) number of improperly paired reads on the same chr (1.32%) (2.63%) Table 6: MPB mapping statistics Min. 1st Qu. Median Mean 3rd Qu. Max Table 7: Allpaths depth coverage by scaffolds 4.3 Analysis of the coverage Allpaths Following the mapping reads statistics we are expecting a depth of coverage of 140x. But the median coverage by scaffold given by genomecov is slightly lower as described (cf table 7). This might be explained by a contamination of bacterial population or recent transposable elements regions in the nuclear genome. Thus, we plotted the GC percent and coverage by scaffolds (see figure 1), and observe a distinct cloud (in the orange circle) with a coverage higher than 1000x, suspected as being putatively bacterial contamination, and discard these scaffolds for further analyses. This set includes 222 scaffolds covering bp. Among these 222 scaffolds, 173 return at least one hit when compared to Genbank Bacterial division, 171 matchs especially buchnera aphidicola. (see table 8) Minia As previously, we obtained a median coverage lower as described in table 9. The figure 2, we can also point out a distinct cloud (in orange) with a coverage higher than 1000x, we also discard them from the set for further analyses. This set includes 35 scaffolds covering bp. 23 of these 35 scaffolds return at least one hit when compared to Genbank Bacterial division, 22 matchs especially Buchnera aphidicola. (see table 10) 5

6 Figure 1: Plot of the coverage x GC percent of the allpaths scaffolds, the orange circle highlights the scaffolds with high coverage Bacterial hit # scaffolds Buchnera aphidicola strȧk (Acyrthosiphon kondoi) 112 Buchnera aphidicola str. JF98 (Acyrthosiphon pisum) 24 Buchnera aphidicola str. Ua (Uroleucon ambrosiae) 11 Buchnera aphidicola str. JF99 (Acyrthosiphon pisum) 7 Buchnera aphidicola str. TLW03 (Acyrthosiphon pisum) 4 Buchnera aphidicola str. 5A (Acyrthosiphon pisum) 3 Buchnera aphidicola str. Sg (Schizaphis graminum) 2 Buchnera aphidicola str. LL01 (Acyrthosiphon pisum) 2 Buchnera aphidicola str. APS (Acyrthosiphon pisum) 1 Buchnera aphidicola (Myzus persicae) hupa-rpoc intergenic spacer 1 Buchnera aphidicola DNA polymerase III beta subunit (dnan) gene 1 Buchnera aphidicola (Diuraphis noxia) plasmid pleu-dn(usa8) 1 Buchnera aphidicola (Acyrthosiphon kondoi) 1-deoxy-D-xylulose 5-phosphate reductoisomerase (dxr) gene, 1 Bacterium IS422 gene for 16S rrna 1 Bacillus subtilis BSn5 1 Azotobacter vinelandii isolate DNA S ribosomal RNA gene 1 No hit 49 Table 8: Hits of the highly covered AllPaths scaffolds against the Genbank Bacterial division Min. 1st Qu. Median Mean 3rd Qu. Max Table 9: Minia depth coverage by scaffolds 6

7 Figure 2: Plot of the coverage x GC percent of the Minia scaffolds longer than 1000bp, the orange circle highlights the scaffolds with high coverage Bacterial hit # scaffolds Buchnera aphidicola str. Ak (Acyrthosiphon kondoi) 16 Buchnera aphidicola str. Ua (Uroleucon ambrosiae) 2 Buchnera aphidicola str. JF98 (Acyrthosiphon pisum) 2 Buchnera aphidicola str. JF99 (Acyrthosiphon pisum) 1 Buchnera aphidicola anthanilate syntyhase component I (trpe) and anthranilate synthase component II (trpg) genes 1 Bacillus subtilis BSn5 1 No hit 12 Table 10: Hits of the highly covered Minia scaffolds against the Genbank Bacterial division 7

8 evalue no hit unique hit multiple hits Median Mean Max. 1e e e e Table 11: Blast hits at different e-value threshold while comparing the 3369 BUSCO drosophila melanogaster set to the Minia assembly evalue no hit unique hit multiple hits Median Mean Max. 1e e e e Table 12: Blast hits at different e-value threshold while comparing the 3369 BUSCO drosophila melanogaster set to the Allpaths assembly 5 Mapping BUSCO proteins 5.1 Protocol We extracted the proteins from drosophila melanogaster from the OrthoDB BUSCO Arthopoda set (ftp://cegg.unige.ch/orthodb6/busco/). Firstly, we aligned this protein set by blast on the genome and retrieve the hits with an e-value lesser than 1e Blast hits We used different threshold for filtering blast results, and observe at each threshold, the number of missing proteins, or proteins with unique or multiple matchs. The results are reported in the tables 11 and Completion of the proteins For each of the 2440 and 2432 proteins having a hit with respectively Allpaths and Minia assemblies, we aligned the BUSCO protein to the corresponding scaffold using GeneWise [1]. For analyzing the completion of the predicted peptides in the Myzus genome and the BUSCO set, we simply compared the size of the proteins, and plotted the empirical cumulative distribution frequencies (ecdf), as presented in figures 3 and 4. We observe that the plots are very similar. For both assembly, 50% of the proteins are at least complete at 78% (red lines), and 30% of the proteins are 90% complete (orange lines). 8

9 Figure 3: Cumulative distribution of the BUSCO drosophila melanogaster set completion on the Allpaths-LG genome 5.4 Conclusion Interestingly only almost 2 thirds of the BUSCO proteins have been successfully anchored on the Myzus persicae genome. None assembly give better results than the other. 6 Mapping ACYPI proteins 6.1 Protocol We used exactly the same protocol using the pea aphid proteins from AphidBase. 6.2 Blast hits We used different threshold for filtering blast results, and observe at each threshold, the number of missing proteins, or proteins with unique or multiple matchs. The results are reported in the tables 13 and 14. 9

10 Figure 4: Cumulative distribution of the BUSCO drosophila melanogaster set completion on the Minia genome evalue no hit unique hit multiple hits Median Mean 3rd qu. Max. 1e e e e Table 13: Blast hits at different e-value threshold while comparing the pea aphid proteins set to the Allpaths-LG assembly evalue no hit unique hit multiple hits Median Mean 3rd qu. Max. 1e e e e Table 14: Blast hits at different e-value threshold while comparing the pea aphid proteins to the Minia assembly 10

11 Figure 5: Cumulative distribution of the pea aphid protein set completion on the Allpaths-LG genome 6.3 Completion of the proteins The and proteins having a hit with respectively Allpaths and Minia assemblies, we aligned the pea aphid proteins to the corresponding scaffold using GeneWise [1]. As previously, we plotted the cumulative distribution of the ratio protein size in figures 5 and 6. plots are very similar. For both assembly, 50% of the proteins are at least complete at 96.5% (red lines), and 64% of the proteins are 90% complete (orange lines). 6.4 Conclusion On both genomes, lot of pea aphid proteins were unmapped on the Myzus genome, especially using a high threshold. Also, lot of proteins were not uniquely mapped. For most of the proteins, anchoring their best hits gives a close to complete prediction. 11

12 Figure 6: Cumulative distribution of the pea aphid set completion on the Minia genome Min. 1st Qu. Median Mean 3rd Qu. Max. Sum Table 15: Statistics on the Allpaths-LG scaffolds including buchnera genome parts 7 Buchnera genome 7.1 Analysis of the buchnera sequences in the genomes We compared the genome sequences to the Buchnera aphidicola str. LSR1 (Acyrthosiphon pisum), whole genome shotgun sequence (Accession number : NZ ACFK ) using blastn (e-value threshold : 1e-20). In the Allpaths- LG genome, we found 168 scaffolds that are including a buchnera genome part, all of them are very short (see table 15), but the sum of all these scaffolds is lower than the expected buchnera size ( bp for the pea aphid LSRA strain). Contrarily, when analyzing the 34 minia scaffolds that include part of buchnera we found that their size distribution (see table 16) is much larger. Surprisingly, some scaffolds larger than the buchnera genome include the buchnera genome (see table 16). 12

13 Min. 1st Qu. Median Mean 3rd Qu. Max. Sum Table 16: Statistics on the Allpaths-LG scaffolds includin buchnera genome parts Figure 7: Dot plot of the buchnera scaffolds 1 aligned to the buchnera pea aphid strain LSR1 7.2 New buchnera genome assembly As a result, none of the assemblies were able to predict the complete buchnera genome. The reason is that this genome have a very high coverage compared to the nuclear genome, and because there is a lot of reads, there is also lot of errors creating very complex regions in the De Bruijn graph and prevent the assemblers to achieve their traversal. Thus, we proceed to a specific assembly of the buchnera genome using Minia and increasing the min abundance to 20 and kmer size threshold to 91 in order to remove a large part of the errors. This stringency is too high to build the nuclear genome because its coverage is too low, but is is efficient to assemble the buchnera genome. Indeed using blast we retreve the scaffolds that were similar to the buchnera genome, and found only 2 scaffolds with respective size of and These 2 scaffolds correspond to 2 distinct parts of the buchnera genome and cover them almost completely (see figures 7 and 8). 8 Duplications 8.1 Protocol We used Nucmer algorithm from the Mummer 3.22 package [8] to compare all the scaffolds against each other in order to find similar regions which might be due to assembly artefacts. We were considering as putatively artefactually duplicated regions (PADR), if they were larger than 1000bp with a percentage 13

14 Figure 8: Dot plot of the buchnera scaffolds 1 aligned to the buchnera pea aphid strain LSR1 of similarity higher than 90. On the Minia genome we found 4415 PADR, corresponding to bp, and 4662 PADR corresponding to bp in the Allpaths-LG genome. Thus, we removed scaffolds covered by 70% by a PADR. As a result we discarded 184 scaffolds from the AllPaths-LG assembly and 115 from the Minia assembly. 9 Gap Closing and final assembly 9.1 Protocol As a conclusion, we propose to use the AllPaths assembly without the scaffolds with a coverage higher than 1000x, without scaffolds similar to the buchnera aphidicola genome and without scaffolds covered by a putative duplicated regions. We perform the gap closing (i.e. covering the N between the contigs in scaffolds using the reads), using Gap Closer v1.12 from soapdenovo2 [12] using the 3 libraries. 9.2 Results As a result we obtained clone G006 assembly with the metrics summarized in table Conclusion We used various metrics to select the best assembly among the 5 assemblies computed using the Myzus G006 clone reads. The Abyss assemblies have bad general metrics, and we compared AllPaths-LG and Minia using other different metrics. Although they gave very similar results, in particular about the protein mapping statistics, it appears that the AllPaths assembly has a lower level of 14

15 Metric G006 assembly Number of scaffolds 4022 Total size of scaffolds Longest scaffold Shortest scaffold 959 Number of scaffolds > 500 nt % Number of scaffolds > 1K nt % Number of scaffolds > 10K nt % Number of scaffolds > 100K nt % Number of scaffolds > 1M nt % Mean scaffold size Median scaffold size 7170 N50 scaffold length L50 scaffold count 224 scaffold %A scaffold %C scaffold %G scaffold %T scaffold %N 0.53 scaffold N nt Table 17: Scaffolds metrics redundancy (reflected by the number of PE remapping statistics) but a larger region of putative artefactually duplicated regions. Moreover, Minia shows up some large nuclear scaffolds including parts of buchnera genome, while Allpaths- LG created separated scaffolds for buchenra genome. Both assemblies were not able to assemble completely the buchnera genome, and we did a new specific assembly to achieve this goal. As a result, we retrieve almost the complete buchnera sequence in 2 large scaffolds. References [1] E. Birney, M. Clamp, and R. Durbin. GeneWise and Genomewise. Genome Res., 14(5): , May [2] M. Boetzer, C. V. Henkel, H. J. Jansen, D. Butler, and W. Pirovano. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics, 27(4): , Feb [3] R. Chikhi and P. Medvedev. Informed and automated k-mer size selection for genome assembly. Bioinformatics, Jun [4] Rayan Chikhi and Dominique Lavenier. Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph. In Springer, editor, WABI 2011, Sarrebruck, Germany, August [5] Rayan Chikhi and Delphine Naquin. Graph-based scaffolding for nextgeneration sequencing. In JOBIM, [6] D. Earl, K. Bradnam, J. St John, A. Darling, D. Lin, J. Fass, H. O. Yu, V. Buffalo, D. R. Zerbino, M. Diekhans, N. Nguyen, P. N. Ariyaratne, W. K. Sung, Z. Ning, M. Haimel, J. T. Simpson, N. A. Fonseca,?. Birol, T. R. Docking, I. Y. Ho, D. S. Rokhsar, R. Chikhi, D. Lavenier, G. Chapuis, D. Naquin, N. Maillet, M. C. Schatz, D. R. Kelley, A. M. Phillippy, S. Koren, S. P. Yang, W. Wu, W. C. Chou, A. Srivastava, T. I. Shaw, J. G. 15

16 Ruby, P. Skewes-Cox, M. Betegon, M. T. Dimon, V. Solovyev, I. Seledtsov, P. Kosarev, D. Vorobyev, R. Ramirez-Gonzalez, R. Leggett, D. MacLean, F. Xia, R. Luo, Z. Li, Y. Xie, B. Liu, S. Gnerre, I. MacCallum, D. Przybylski, F. J. Ribeiro, S. Yin, T. Sharpe, G. Hall, P. J. Kersey, R. Durbin, S. D. Jackman, J. A. Chapman, X. Huang, J. L. DeRisi, M. Caccamo, Y. Li, D. B. Jaffe, R. E. Green, D. Haussler, I. Korf, and B. Paten. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res., 21(12): , Dec [7] D. R. Kelley, M. C. Schatz, and S. L. Salzberg. Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11(11):R116, [8] S. Kurtz, A. Phillippy, A. L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S. L. Salzberg. Versatile and open software for comparing large genomes. Genome Biol., 5(2):R12, [9] B. Langmead and S. L. Salzberg. Fast gapped-read alignment with Bowtie 2. Nat. Methods, 9(4): , Apr [10] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16): , Aug [11] R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, S. Li, H. Yang, J. Wang, and J. Wang. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20(2): , Feb [12] R. Luo, B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, Y. Chen, Q. Pan, Y. Liu, J. Tang, G. Wu, H. Zhang, Y. Shi, Y. Liu, C. Yu, B. Wang, Y. Lu, C. Han, D. W. Cheung, S. M. Yiu, S. Peng, Z. Xiaoqian, G. Liu, X. Liao, Y. Li, H. Yang, J. Wang, T. W. Lam, and J. Wang. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1(1):18, [13] I. Maccallum, D. Przybylski, S. Gnerre, J. Burton, I. Shlyakhter, A. Gnirke, J. Malek, K. McKernan, S. Ranade, T. P. Shea, L. Williams, S. Young, C. Nusbaum, and D. B. Jaffe. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol., 10(10):R103, [14] A. R. Quinlan and I. M. Hall. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6): , Mar [15] J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. Jones, and I. Birol. ABySS: a parallel assembler for short read sequence data. Genome Res., 19(6): , Jun

LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads

LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads biorxiv preprint first posted online Mar. 14, 2015; doi: http://dx.doi.org/10.1101/016519. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted biorxiv

More information

LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads

LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads biorxiv preprint first posted online Mar. 14, 2015; doi: http://dx.doi.org/10.1101/016519. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted biorxiv

More information

Assembly of Ariolimax dolichophallus using SOAPdenovo2

Assembly of Ariolimax dolichophallus using SOAPdenovo2 Assembly of Ariolimax dolichophallus using SOAPdenovo2 Charles Markello, Thomas Matthew, and Nedda Saremi Image taken from Banana Slug Genome Project, S. Weber SOAPdenovo Assembly Tool Short Oligonucleotide

More information

De novo genome assembly with next generation sequencing data!! "

De novo genome assembly with next generation sequencing data!! De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature

More information

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 Genome Assembly J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 From reads to molecules What s the Problem? How to get the best assemblies for the smallest expense (sequencing) and

More information

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler Luo et al. GigaScience 2012, 1:18 TECHNICAL NOTE Open Access SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler Ruibang Luo 1,2, Binghang Liu 1,2, Yinlong Xie 1,2,3, Zhenyu

More information

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler Luo et al. GigaScience 2012, 1:18 TECHNICAL NOTE Open Access SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler Ruibang Luo 1,2, Binghang Liu 1,2, Yinlong Xie 1,2,3, Zhenyu

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads

More information

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing Illumina Assembly 1 Outline The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing 2 Illumina Sequencing Paired end Illumina

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation

More information

N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11

N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11 N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11 twitter: @assemblathon web: assemblathon.org Should N50 die in its role as a frequently used measure of genome assembly quality? Are there other

More information

de novo paired-end short reads assembly

de novo paired-end short reads assembly 1/54 de novo paired-end short reads assembly Rayan Chikhi ENS Cachan Brittany Symbiose, Irisa, France 2/54 THESIS FOCUS Graph theory for assembly models Indexing large sequencing datasets Practical implementation

More information

TIGER: tiled iterative genome assembler

TIGER: tiled iterative genome assembler PROCEEDINGS TIGER: tiled iterative genome assembler Xiao-Long Wu 1, Yun Heo 1, Izzat El Hajj 1, Wen-Mei Hwu 1*, Deming Chen 1*, Jian Ma 2,3* Open Access From Tenth Annual Research in Computational Molecular

More information

Next Generation Sequencing Technologies

Next Generation Sequencing Technologies Next Generation Sequencing Technologies Julian Pierre, Jordan Taylor, Amit Upadhyay, Bhanu Rekepalli Abstract: The process of generating genome sequence data is constantly getting faster, cheaper, and

More information

Assemblathon Summary Report

Assemblathon Summary Report Assemblathon Summary Report An overview of UC Davis results from Assemblathon 1: 2010/2011 Written by Keith Bradnam with results, analysis, and other contributions from Ian Korf, Joseph Fass, Aaron Darling,

More information

Reference-free detection of isolated SNPs Additional File 1

Reference-free detection of isolated SNPs Additional File 1 discosnp NAR3 add file1 2014/9/26 10:52 page 1 #1 Published online XX XXX 2014 Nucleic Acids Research, 2014, Vol. XX, No. XX 1 5 doi:10.1093/nar/gkn000 Reference-free detection of isolated SNPs Additional

More information

Mapping. Main Topics Sept 11. Saving results on RCAC Scaffolding and gap closing Assembly quality

Mapping. Main Topics Sept 11. Saving results on RCAC Scaffolding and gap closing Assembly quality Mapping Main Topics Sept 11 Saving results on RCAC Scaffolding and gap closing Assembly quality Saving results on RCAC Core files When a program crashes, it will produce a "coredump". these are very large

More information

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure

More information

GENOME ASSEMBLY FINAL PIPELINE AND RESULTS

GENOME ASSEMBLY FINAL PIPELINE AND RESULTS GENOME ASSEMBLY FINAL PIPELINE AND RESULTS Faction 1 Yanxi Chen Carl Dyson Sean Lucking Chris Monaco Shashwat Deepali Nagar Jessica Rowell Ankit Srivastava Camila Medrano Trochez Venna Wang Seyed Alireza

More information

State of the art de novo assembly of human genomes from massively parallel sequencing data

State of the art de novo assembly of human genomes from massively parallel sequencing data State of the art de novo assembly of human genomes from massively parallel sequencing data Yingrui Li, 1 Yujie Hu, 1,2 Lars Bolund 1,3 and Jun Wang 1,2* 1 BGI-Shenzhen, Shenzhen, Guangdong 518083, China

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION 1 Supplementary Figures Supplementary Figure 1 Assembly of the P. reichenowi and P. gaboni genome sequences. process. The process by which the genomes were assembled, are detailed

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

A Roadmap to the De-novo Assembly of the Banana Slug Genome

A Roadmap to the De-novo Assembly of the Banana Slug Genome A Roadmap to the De-novo Assembly of the Banana Slug Genome Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America April 6th-10th, 2015 Outline

More information

Assembly and Validation of Large Genomes from Short Reads Michael Schatz. March 16, 2011 Genome Assembly Workshop / Genome 10k

Assembly and Validation of Large Genomes from Short Reads Michael Schatz. March 16, 2011 Genome Assembly Workshop / Genome 10k Assembly and Validation of Large Genomes from Short Reads Michael Schatz March 16, 2011 Genome Assembly Workshop / Genome 10k A Brief Aside 4.7GB / disc ~20 discs / 1G Genome X 10,000 Genomes = 1PB Data

More information

White paper on de novo assembly in CLC Assembly Cell 4.0

White paper on de novo assembly in CLC Assembly Cell 4.0 White Paper White paper on de novo assembly in CLC Assembly Cell 4.0 June 7, 2016 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com

More information

Direct determination of diploid genome sequences. Supplemental material: contents

Direct determination of diploid genome sequences. Supplemental material: contents Direct determination of diploid genome sequences Neil I. Weisenfeld, Vijay Kumar, Preyas Shah, Deanna M. Church, David B. Jaffe Supplemental material: contents Supplemental Note 1. Comparison of performance

More information

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus Author s response to reviews Title: High-quality genome assembly of channel catfish, Ictalurus punctatus Authors: Qiong Shi (shiqiong@genomics.cn) Xiaohui Chen (xhchenffri@hotmail.com) Liqiang Zhong (lqzhongffri@hotmail.com)

More information

Faction 2: Genome Assembly Lab and Preliminary Data

Faction 2: Genome Assembly Lab and Preliminary Data Faction 2: Genome Assembly Lab and Preliminary Data [Computational Genomics 2017] Christian Colon, Erisa Sula, David Lu, Tian Jin, Lijiang Long, Rohini Mopuri, Bowen Yang, Saminda Wijeratne, Harrison Kim

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

Genome Assembly and Annotation of Isochrysis Galbana

Genome Assembly and Annotation of Isochrysis Galbana Genome Assembly and Annotation of Isochrysis Galbana By: Yi Wang Institution: California State University San Marcos Date: May 14, 2014 Abstract Isochrysis Galbana is a species of cocoolithophores, which

More information

De novo assembly in RNA-seq analysis.

De novo assembly in RNA-seq analysis. De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct

More information

Parts of a standard FastQC report

Parts of a standard FastQC report FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are

More information

Workflow of de novo assembly

Workflow of de novo assembly Workflow of de novo assembly Experimental Design Clean sequencing data (trim adapter and low quality sequences) Run assembly software for contiging and scaffolding Evaluation of assembly Several iterations:

More information

IDBA-UD: A de Novo Assembler for Single-Cell and Metagenomic Sequencing Data with Highly Uneven Depth

IDBA-UD: A de Novo Assembler for Single-Cell and Metagenomic Sequencing Data with Highly Uneven Depth Category IDBA-UD: A de Novo Assembler for Single-Cell and Metagenomic Sequencing Data with Highly Uneven Depth Yu Peng 1, Henry C.M. Leung 1, S.M. Yiu 1 and Francis Y.L. Chin 1,* 1 Department of Computer

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 156 162 RESEARCH PAPER doi: 10.1007/s11427-013-4444-x Comparative analysis of de novo transcriptome assembly CLARKE Kaitlin 1, YANG

More information

Analysis Datasheet Exosome RNA-seq Analysis

Analysis Datasheet Exosome RNA-seq Analysis Analysis Datasheet Exosome RNA-seq Analysis Overview RNA-seq is a high-throughput sequencing technology that provides a genome-wide assessment of the RNA content of an organism, tissue, or cell. Small

More information

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler High-Throughput Bioinformatics: Re-sequencing and de novo assembly Elena Czeizler 13.11.2015 Sequencing data Current sequencing technologies produce large amounts of data: short reads The outputted sequences

More information

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Supplemental Materials

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Supplemental Materials Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads Supplemental Materials 1. Supplemental Methods... 3 1.1 Algorithm Detail... 3 1.1.1 k-mer coverage distribution

More information

A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly

A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly Genes 2010, 1, 263-282; doi:10.3390/genes1020263 OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Article A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence

More information

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing

More information

Evaluation of genome scaffolding tools using pooled clone sequencing. Elif DAL 1, Can ALKAN 1, *Correspondence:

Evaluation of genome scaffolding tools using pooled clone sequencing. Elif DAL 1, Can ALKAN 1, *Correspondence: Evaluation of genome scaffolding tools using pooled clone sequencing Elif DAL 1, Can ALKAN 1, 1 Department of Computer Engineering Bilkent University, Bilkent, Ankara, Turkey *Correspondence: calkan@cs.bilkent.edu.tr

More information

Computational Genomics [2017] Faction 2: Genome Assembly Results, Protocol & Demo

Computational Genomics [2017] Faction 2: Genome Assembly Results, Protocol & Demo Computational Genomics [2017] Faction 2: Genome Assembly Results, Protocol & Demo Christian Colon, Erisa Sula, Juichang Lu, Tian Jin, Lijiang Long, Rohini Mopuri, Bowen Yang, Saminda Wijeratne, Harrison

More information

Current'Advances'in'Sequencing' Technology' James'Gurtowski' Schatz'Lab'

Current'Advances'in'Sequencing' Technology' James'Gurtowski' Schatz'Lab' Current'Advances'in'Sequencing' Technology' James'Gurtowski' Schatz'Lab' Outline' 1. Assembly'Review' 2. Pacbio' Technology'Overview' Data'CharacterisFcs' Algorithms' Results' 'Assemblies' 3. Oxford'Nanopore'

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Qi Sun Bioinformatics Facility Cornell University Sequencing platforms Short reads: o Illumina (150 bp, up to 300 bp) Long reads (>10kb): o PacBio SMRT; o Oxford Nanopore

More information

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

GenScale Scalable, Optimized and Parallel Algorithms for Genomics. Dominique LAVENIER

GenScale Scalable, Optimized and Parallel Algorithms for Genomics. Dominique LAVENIER GenScale Scalable, Optimized and Parallel Algorithms for Genomics Dominique LAVENIER Context New Sequencing Technologies - NGS Exponential growth of genomic data Drastic decreasing of costs Emergence of

More information

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Table of Contents SUPPLEMENTARY TEXT:... 2 FILTERING OF RAW READS PRIOR TO ASSEMBLY:... 2 COMPARATIVE ANALYSIS... 2 IMMUNOGENIC

More information

Optimizing k-mer size using a variant grid search to enhance de novo genome assembly

Optimizing k-mer size using a variant grid search to enhance de novo genome assembly www.bioinformation.net Volume 12(2) Prediction Model Optimizing k-mer size using a variant grid search to enhance de novo genome assembly Soyeon Cha & David McK Bird* Bioinformatics Research Center and

More information

Genome Assembly. Background and Approach 28 Jan Jillian Walker Diana Williams

Genome Assembly. Background and Approach 28 Jan Jillian Walker Diana Williams Genome Assembly Background and Approach 28 Jan 2015 Jillian Walker Diana Williams Ke Qi Xin Wu Bhanu Gandham Anuj Gupta Taylor Griswold Yuanbo Wang Sung Im Maxine Harlemon Nicholas Kovacs ObjecOves Evaluate

More information

Genome Assembly Software for Different Technology Platforms. PacBio Canu Falcon. Illumina Soap Denovo Discovar Platinus MaSuRCA.

Genome Assembly Software for Different Technology Platforms. PacBio Canu Falcon. Illumina Soap Denovo Discovar Platinus MaSuRCA. Genome Assembly Software for Different Technology Platforms PacBio Canu Falcon 10x SuperNova Illumina Soap Denovo Discovar Platinus MaSuRCA Experimental design using Illumina Platform Estimate genome size:

More information

arxiv: v1 [q-bio.gn] 20 Apr 2013

arxiv: v1 [q-bio.gn] 20 Apr 2013 BIOINFORMATICS Vol. 00 no. 00 2013 Pages 1 7 Informed and Automated k-mer Size Selection for Genome Assembly Rayan Chikhi 1 and Paul Medvedev 1,2 1 Department of Computer Science and Engineering, The Pennsylvania

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

Genome Assembly: Background and Strategy

Genome Assembly: Background and Strategy Genome Assembly: Background and Strategy Monday, February 8, 2016 BIOL 7210: Genome Assembly Group Aroon Chande, Cheng Chen, Alicia Francis, Alli Gombolay, Namrata Kalsi, Ellie Kim, Tyrone Lee, Wilson

More information

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Assemblathon 1: A competitive assessment of de novo short read assembly methods 1 1 1 1 1 1 1 1 0 1 0 1 Assemblathon 1: A competitive assessment of de novo short read assembly methods Dent Earl 1,, Keith Bradnam, John St. John 1,, Aaron Darling, Dawei Lin,, Joseph Fass,, Hung On Ken

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to

More information

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the Supplementary Information Supplementary Figures Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the strain M8 of S. ruber and a fosmid containing the S. ruber M8 virus M8CR4

More information

Biol 478/595 Intro to Bioinformatics

Biol 478/595 Intro to Bioinformatics Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12

More information

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010 Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong

More information

The MaSuRCA genome Assembler Aleksey Zimin 1,*, Guillaume Marçais 1, Daniela Puiu 2, Michael Roberts 1, Steven L. Salzberg 2, and James A.

The MaSuRCA genome Assembler Aleksey Zimin 1,*, Guillaume Marçais 1, Daniela Puiu 2, Michael Roberts 1, Steven L. Salzberg 2, and James A. Bioinformatics Advance Access published August 29, 2013 Genome Analysis The MaSuRCA genome Assembler Aleksey Zimin 1,*, Guillaume Marçais 1, Daniela Puiu 2, Michael Roberts 1, Steven L. Salzberg 2, and

More information

Haploid Assembly of Diploid Genomes

Haploid Assembly of Diploid Genomes Haploid Assembly of Diploid Genomes Challenges, Trials, Tribulations 13 October 2011 İnanç Birol Assembly By Short Sequencing IEEE InfoVis 2009 2 3 in Literature ~40 citations on tool comparisons ~20 citations

More information

Bioinformatics in next generation sequencing projects

Bioinformatics in next generation sequencing projects Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Standard sequence library generation Illumina

More information

Data Retrieval from GenBank

Data Retrieval from GenBank Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Minghui Wang Bioinformatics Facility Cornell University DNA Sequencing Platforms Illumina sequencing (100 to 300 bp reads) Overlapping reads ~180bp fragment

More information

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before Jeremy Preston, PhD Marketing Manager, Sequencing Illumina Genome Analyzer: a Paradigm Shift 2000x gain in efficiency

More information

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Bioinformatics Advance Access published October 8, 2012 COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Binghang Liu 1,2,, Jianying Yuan 2,, Siu-Ming Yiu 1,3,

More information

Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability

Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability Title Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability Author(s) Ho, DWH; Sze, MF; Ng, IOL Citation, 2015, v. 6, n. 25,

More information

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Assemblathon 1: A competitive assessment of de novo short read assembly methods Resource Assemblathon 1: A competitive assessment of de novo short read assembly methods Dent Earl, 1,2 Keith Bradnam, 3 John St. John, 1,2 Aaron Darling, 3 Dawei Lin, 3,4 Joseph Fass, 3,4 Hung On Ken

More information

A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool of BAC Clones and High-throughput Technology

A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool of BAC Clones and High-throughput Technology Send Orders for Reprints to reprints@benthamscience.ae 210 The Open Biotechnology Journal, 2015, 9, 210-215 Open Access A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool

More information

Transcriptome analysis

Transcriptome analysis Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize

More information

Introduction: Methods:

Introduction: Methods: Eason 1 Introduction: Next Generation Sequencing (NGS) is a term that applies to many new sequencing technologies. The drastic increase in speed and cost of these novel methods are changing the world of

More information

Genome Assembly Background and Strategy

Genome Assembly Background and Strategy Genome Assembly Background and Strategy February 6th, 2017 BIOL 7210 - Faction I (Outbreak) - Genome Assembly Group Yanxi Chen Carl Dyson Zhiqiang Lin Sean Lucking Chris Monaco Shashwat Deepali Nagar Jessica

More information

Genomics and Transcriptomics of Spirodela polyrhiza

Genomics and Transcriptomics of Spirodela polyrhiza Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence

More information

Identifying wrong assemblies in de novo short read primary sequence assembly contigs

Identifying wrong assemblies in de novo short read primary sequence assembly contigs Identifying wrong assemblies in de novo short read primary sequence assembly contigs VANDNA CHAWLA 1,2, RAJNISH KUMAR 1 and RAVI SHANKAR 1, * 1 Studio of Computational Biology & Bioinformatics, Biotechnology

More information

ACCEPTED. Korean patient isolate in an effort to understand the prevalence, antibiotic resistance, and

ACCEPTED. Korean patient isolate in an effort to understand the prevalence, antibiotic resistance, and JB Accepts, published online ahead of print on June 00 J. Bacteriol. doi:./jb.00-0 Copyright 00, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved. 1 1 1 1 1

More information

Genome Assembly Workshop Titles and Abstracts

Genome Assembly Workshop Titles and Abstracts Genome Assembly Workshop Titles and Abstracts TUESDAY, MARCH 15, 2011 08:15 AM Richard Durbin, Wellcome Trust Sanger Institute A generic sequence graph exchange format for assembly and population variation

More information

Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome Supplemental Notes and Figures

Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome Supplemental Notes and Figures Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome Supplemental Notes and Figures Table of Contents Supplemental Note 1. Flowcell performance... 2 Supplemental Note 2. Nanopore sequencing

More information

Supplementary Data for Hybrid error correction and de novo assembly of single-molecule sequencing reads

Supplementary Data for Hybrid error correction and de novo assembly of single-molecule sequencing reads Supplementary Data for Hybrid error correction and de novo assembly of single-molecule sequencing reads Online Resources Pre&compiledsourcecodeanddatasetsusedforthispublication: http://www.cbcb.umd.edu/software/pbcr

More information

TR-IIS Li-An Yang, Wei-Chun Chung, Yu-Jung Chang, Shu-Hwa Chen, Chung-Yen Lin and Jan-Ming Ho

TR-IIS Li-An Yang, Wei-Chun Chung, Yu-Jung Chang, Shu-Hwa Chen, Chung-Yen Lin and Jan-Ming Ho TR-IIS-18-001 The Spiral Assembler: An Iterative Process of NGS De Novo Genome Assembly with Machine-Learning for Subset Selection on Quality-Score and K-Mer Landscape Li-An Yang, Wei-Chun Chung, Yu-Jung

More information

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads Authors Rei Kajitani 1, Kouta Toshimoto 1,2, Hideki Noguchi 3, Atsushi Toyoda 3,4, Yoshitoshi Ogura 5, Miki

More information

Complete genome sequence of Clostridium acetobutylicum. DSM 1731, a solvent producing strain with multi-replicon

Complete genome sequence of Clostridium acetobutylicum. DSM 1731, a solvent producing strain with multi-replicon JB Accepts, published online ahead of print on 8 July 2011 J. Bacteriol. doi:10.1128/jb.05596-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.

More information

Gene Prediction Group

Gene Prediction Group Group Ben, Jasreet, Jeff, Jia, Kunal TACCTGAAAAAGCACATAATACTTATGCGTATCCGCCCTAAACACTGCCTTCTTTCTCAA AGAAGATGTCGCCGCTTTTCAACCGAACGATGTGTTCTTCGCCGTTTTCTCGGTAGTGCA TATCGATGATTCACGTTTCGGCAGTGCAGGCACCGGCGCATATTCAGGATACCGGACGCT

More information

Finishing of Fosmid 1042D14. Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae

Finishing of Fosmid 1042D14. Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae Schefkind 1 Adam Schefkind Bio 434W 03/08/2014 Finishing of Fosmid 1042D14 Abstract Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae genomic DNA. Through a comprehensive analysis of forward-

More information

Infectious Disease Omics

Infectious Disease Omics Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and

More information

Next-Generation Sequencing: Quality Control

Next-Generation Sequencing: Quality Control Next-Generation Sequencing: Quality Control Bingbing Yuan BaRC Hot Topics January 2017 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Why QC? Do you want to

More information

RNA-Seq de novo assembly training

RNA-Seq de novo assembly training RNA-Seq de novo assembly training Training session aims Give you some keys elements to look at during read quality check. Transcriptome assembly is not completely a strait forward process : Multiple strategies

More information

Assembly. Ian Misner, Ph.D. Bioinformatics Crash Course. Bioinformatics Core

Assembly. Ian Misner, Ph.D. Bioinformatics Crash Course. Bioinformatics Core Assembly Ian Misner, Ph.D. Bioinformatics Crash Course Multiple flavors to choose from De novo No prior sequence knowledge required Takes what you have and tries to build the best contigs/scaffolds possible

More information

DE NOVO GENOME ASSEMBLY OF THE AFRICAN CATFISH (CLARIAS GARIEPINUS)

DE NOVO GENOME ASSEMBLY OF THE AFRICAN CATFISH (CLARIAS GARIEPINUS) DE NOVO GENOME ASSEMBLY OF THE AFRICAN CATFISH (CLARIAS GARIEPINUS) Kovács B. a,, Barta E. c, Pongor S. L. b, Uri Cs. a, Patócs A. b, Orbán L. d, Müller T. a, Urbányi B. a a Department of Aquaculture,

More information

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013 Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA March 2, 2013 Steven R. Kain, Ph.D. ABRF 2013 NuGEN s Core Technologies Selective Sequence Priming Nucleic Acid Amplification

More information

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris

More information

Eucalyptus gene assembly

Eucalyptus gene assembly Eucalyptus gene assembly ACGT Plant Biotechnology meeting Charles Hefer Bioinformatics and Computational Biology Unit University of Pretoria October 2011 About Eucalyptus Most valuable and widely planted

More information

arxiv: v2 [q-bio.gn] 21 May 2012

arxiv: v2 [q-bio.gn] 21 May 2012 1 arxiv:1203.4802v2 [q-bio.gn] 21 May 2012 A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data C. Titus Brown 1,2,, Adina Howe 2, Qingpeng Zhang 1, Alexis B. Pyrkosz 3,

More information

Bioinformatics for Microbial Biology

Bioinformatics for Microbial Biology Bioinformatics for Microbial Biology Chaochun Wei ( 韦朝春 ) ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Fall 2013 1 Outline Part I: Visualization tools for microbial genomes Tools: Gbrowser Part II:

More information

Alignment and Assembly

Alignment and Assembly Alignment and Assembly Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which

More information

Determining Error Biases in Second Generation DNA Sequencing Data

Determining Error Biases in Second Generation DNA Sequencing Data Determining Error Biases in Second Generation DNA Sequencing Data Project Proposal Neza Vodopivec Applied Math and Scientific Computation Program nvodopiv@math.umd.edu Advisor: Aleksey Zimin Institute

More information

From Infection to Genbank

From Infection to Genbank From Infection to Genbank How a pathogenic bacterium gets its genome to NCBI Torsten Seemann VLSCI - Life Sciences Computation Centre - Genomics Theme - Lab Meeting - Friday 27 April 2012 The steps 1.

More information

Strain/species identification in metagenomes using genome-specific markers. Tu, He and Zhou Nucleic Acids Research

Strain/species identification in metagenomes using genome-specific markers. Tu, He and Zhou Nucleic Acids Research Strain/species identification in metagenomes using genome-specific markers. Tu, He and Zhou. 2014 Nucleic Acids Research Journal Club Triinu Kõressaar 25.04.2014 Introduction (1/2) Shotgun metagenome sequencing

More information

AGOUTI: improving genome assembly and annotation using transcriptome data

AGOUTI: improving genome assembly and annotation using transcriptome data DOI 10.1186/s13742-016-0136-3 TECHNICAL NOTE AGOUTI: improving genome assembly and annotation using transcriptome data Simo V. Zhang 1*, Luting Zhuo 1 and Matthew W. Hahn 1,2 Open Access Abstract Background:

More information