Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive

Size: px
Start display at page:

Download "Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive"

Transcription

1 Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive years. It is well adapted to drought and salinity.

2 Supplementary Figure2. 17-mer frequency distribution of sequencing reads and heterozygosity simulation. A K-mer refers to an artificial sequence division of K nucleotides. A sequencing read with L bp contains (L-K+1) K-mers if the length of each K-mer is K bp. Typically, K was set as 17 for genome size estimation. The K-mer frequency follows a Poisson distribution in a given data set. During deduction, the genome size G=K_num/Peak_depth, where the K_num is the total number of K-mer, and Peak_depth is the expected value of K-mer depth. Furthermore, if the heterozygous rate is higher, then a small peak will be presented at 1/2 of Peak_depth. So this K-mer analysis can be used to roughly determine the heterozygous rate of a given genome. The x-axis is depth (X); the y-axis is the proportion, which represents the frequency at that depth divide by the total frequency of all the depth. The high heterozygosis rate of jujube caused a sub peak (depth 30X) at the position of the half of the main peak (depth 59X). M is stand for heterozygosity, and the two curves are derived from heterozygosis simulation. The heterozygosis simulation revealed that the heterozygosis ratio was about 1.9% (between 1.8% and 2.0%).

3 2 Ziziphus jujuba 405 Mb, avg_gc: Vitis vinifera 486 Mb, avg_gc: Prunus persica 227 Mb, avg_gc: percent of bins, bin=500 bp G+C Content Supplementary Figure 3. GC distribution in Ziziphus jujuba, Vitis vinifera and Prunus persica.

4 Supplementary Figure 4. The pseudo-molecules of the 12 chromosomes ordered by genetic length. Construction of the genetic map with 55,743 segregated SNP sites, we got all the genotype s X 2 test. In Join map 4.0, X 2 10 (P value 0.01). At last, we just have 4033 SNP remained. With these markers, the genetic map had been constructed. The parameter was LOD start-end (7.0 to 15, step, 1.0). The locations of the 4 randomly selected BACs analyzed in Supplementary Fig. 5 are shown by red arrows.

5 Supplementary Figure 5. Alignments for the 4 randomly picked BAC clones and scaffolds. To evaluate the completeness and accuracy of the jujube assembly, read depth on the BACs was calculated by mapping the short reads onto the BAC sequences. The predicted genes, SSR and annotated transposable elements (TEs) are shown, respectively.

6 Supplementary Figure 6. The tissue-specific genes of the jujube.

7 Supplementary Figure 7. Divergence rate of transposable elements in the jujube genome. The divergence rate was calculated based on the alignment between the RepeatMasker annotated repeat copies and the consensus sequence in the repeat library (RepBase).

8 Supplementary Figure 8. The synteny of some gene blocks in Chr1 ( Mb) with other chromosomes.

9 a b c d Supplementary Figure 9. Deciduous bearing shoot (D) and persistently lignified bearing shoot (P). a, fruiting status of D; b, fruiting status of P; c, left shows the joint point of D and right shows the joint point of P; d, the stem of D (lower) and P (up) after removing leaves.

10 Supplementary Table 1 The wide adaptation of Chinese jujube Parameter Value Minimum annual average temperature ( C) 5.5 Maximum annual average temperature ( C) 22 The lowest temperature ( C) Minimum frost-free period (days) 100 Annual rainfall (mm) Minimum annual sunshine (hr) 1100 ph Maximum soil water NaCl concentration (%) 0.15 Maximum soil water Na 2 CO 3 concentration (%) 0.3 Supplementary Table 2 Jujube nutritional facts in comparison to other (fruit) crops Fruit species Carbohydrate (%) Vc (mg/100g) Mineral element (mg/100g) Ca K Fe Zn Jujube Apple Peach Persimmon Grape Orange Pineapple Litchi Chestnut Walnut Sugarcane Sugar beet Kiwifruit Date from China Food Composition (2009) 1. means no data. Supplementary Table 3 Estimation of jujube genome size based on K-mer statistics K-mer K-mer number K-mer depth Genome Size Used Base Used Read Coverage 17 26,191,979, ,931,860 31,430,375, ,399, From the distribution curve of depth-frequency (Supplementary Fig.2), we get that the expected depth is 59 and the total input reads number is 327,399,747, the total base is 31,430,375,712, the total K-mer number is 26,191,979,760. We calculated the genome size according to the formula: Genome Size =K-mer_num/Peak_depth 2.

11 Supplementary Table 4 Comparison of jujube genome features with those of other 9 species Species Jujube (Ziziphus jujuba) Mulberry (Morus alba) Kiwifruit (Actinidia chinensis) Peach (Prunus persica) Pear (Pyrus bretschneideri) Apple (Malus domestica) Strawberry (Fragaria vesca) Poplar (Populus trichocarpa) Sweet orange (Citrus sinensis) Family Genome size (Mb) Heterozygosity (%) Repeat elements (%) Rhamnaceae Moraceae Arctinaceae Rosaceae Rosaceae Rosaceae Rosaceae Salicaceae Rutaceae Prunus mume Rosaceae Data from our research and the papers on the genome of relevant species, and means no data. GC content (%)

12 Supplementary Table 5 SSR distribution of jujube and six closely-related species Species Jujube Apple 3 Pear 4 Peach 5 Strawberry 6 Prunus mume 7 Mulberry 8 Total size of examined sequences (Mb) Distribution of different repeat classes Number of di-nucleotide repeats 106,537 84,914 50,455 32,038 23,054 33,763 58,651 Number of tri-nucleotide repeats 38,168 25,955 11,829 8,086 7,910 8,698 20,072 Number of tetra-nucleotide repeats 7,144 3,718 3,314 1, ,668 3,515 Number of penta-nucleotide repeats Number of hexa-nucleotide repeats Total 153, ,870 66,449 42,367 32,102 44,719 83,544 SSR density ( /Mb) Data from our research and the papers on the genome of relevant species. We found SSR sequence for jujube genome using SSRIT ( Supplementary Table 6 SSR comparison of jujube and grape or peach in collinear gene blocks (gene pairs 20) Jujube vs Grape Jujube vs Peach Jujube Grape Jujube/Grape Jujube Peach Jujube/Peach Number of blocks Gene pairs 2,911 4,391 Block length (N counted in bp) 44,869,077 91,407,764 74,625,771 81,675,856 Block length (without N, bp) 46,671,967 93,327,470 77,585,925 82,140,745 GC content (%) Number of SSR 7,963 6,286 13,031 8,023 Total length of SSR (bp) 230, , , ,967 Percent of SSR (%) We found SSR sequence for the three genomes using SSRIT (

13 Supplementary Table 7 Construction of libraries and the generation and filtering of the sequencing data used for the genome assembly Raw data Clean data Insert size Read length(bp) Total data(gb) Sequence coverage(x) Physical coverage(x) Read length(bp) Total data(gb) Sequence coverage(x) Physical coverage(x) 170bp bp bp bp kb kb kb kb kb Total We construct different insert-size WGS libraries using DNA sample of jujube. According to the strategy, the insert size of library is: 170bp, 250bp, 500bp, 800bp, 2Kb, 5Kb, 10Kb, 20Kb and 40Kb. After library constructing, we use Hiseq2000 to sequence PE reads for each library. Assuming the genome size is 444Mb. The quality requirement for de novo sequencing is far higher than re-sequencing. In order to facilitate the assembling works, we have taken a series of checking and filtering measures on raw data (generated by Solexa-Pipeline) to get clean data. We filtered raw reads to generate clean reads by the following criteria: (1) we remove reads with >2% Ns or with poly-a structure; (2) we remove reads with 40% low quality bases for short insert size libraries, and 60% for large insert size libraries; (3) we remove reads containing adapters; (4) we remove paired reads with mutual overlaps; (5) we remove PCR duplicates. After filtering, Gb high-quality data were retained, which representing genome coverage.

14 Supplementary Table 8 BAC data statistics Insert Size Number of libraries Number of lanes Total Raw Data (G) Total Clean Data (G) 500bp 21, Total 21, ,504 BAC clones with 120 kb in average length were randomly selected (about 5.8 genome size of jujube) and one 500bp WGS library was constructed for each clone. All libraries were sequenced on the Illumina HiSeq 2000 sequencing system. Supplementary Table 9 Statistics of the final genome assembly Contig Scaffold Size (bp) Number Size(bp) Number N90 7,347 13,344 73,568 1,497 N80 13,410 9, ,418 1,066 N70 19,527 6, , N60 26,037 4, , N50 33,948 3, , Longest 334, ,141, Total size 417,332, ,645, Total Number( 100bp) , ,898 Total Number( 2kb) , ,139 Supplementary Table 10 Assessment of genome coverage for four randomly selected BAC clones Number of Number of Aligned BAC Length Coverage Number alignment aligned scaffold length ID (bp) ratio (%) of gaps blocks scaffolds (bp) Gap length (bp) Gap ratio (%) BAC1 107, , BAC2 112, ,035, , BAC3 128, ,723, , BAC4 106, , The jujube assembly (scaffolds) was then aligned to four Sanger-sequenced BACs (average length kb) via BLASTN. The coverage of the BAC sequences by our assembled scaffolds was calculated. Supplementary Table 11 Assessment of the transcript coverage with data from the 1942 published ESTs Bases Sequences with > 90% sequence with > 50% sequence Dataset Number Total length (bp) covered by covered by in one scaffold in one scaffold assembly assembly Number Percent Number Percent > 0 bp 1,942 1,169, , , > 200 bp 1,933 1,167, , , > 500 bp 1, , , , Data from downloaded on July, 2013 Supplementary Table 12 Assessment of the transcript coverage with the transcriptome assembly contig (TAC) data Dataset Number Total length (bp) Bases covered by assembly Sequences Covered by assembly with > 90% sequence in one scaffold with > 50% sequence in one scaffold Number Percent Number Percent > 0 bp 51,514 46,167, , , > 200 bp 51,514 46,167, , , > 500 bp 26,805 38,578, , , > 1000 bp 16,059 30,851, , ,

15 Supplementary Table 13 The alignment results of two parents and their 105 progenies Sample ID Paired mapping(pe) (PE%) Singled mapping(se) Mapping Reads (SE%) All mapping (All mapping%) P1 6,655, ,383, ,038, P2 4,272, , ,220, S100 5,787, ,258, ,046, S101 7,002, ,833, ,835, S102 3,031, , ,718, S103 3,778, , ,773, S104 4,637, ,247, ,884, S105 3,796, ,019, ,816, S106 4,244, ,049, ,293, S107 4,348, ,305, ,653, S108 3,215, , ,935, S109 5,374, ,439, ,813, S10 10,434, ,260, ,694, S110 5,844, ,559, ,403, S111 3,756, ,113, ,869, S112 3,962, ,072, ,035, S113 4,091, , ,041, S114 3,961, ,099, ,061, S115 4,849, ,179, ,029, S116 5,235, ,365, ,601, S117 5,622, ,551, ,173, S118 4,470, ,233, ,703, S119 4,232, ,148, ,380, S12 2,423, , ,853, S13 7,729, ,464, ,194, S14 3,398, , ,996, S15 2,799, , ,287, S16 3,598, , ,327, S17 4,521, , ,330, S18 3,611, , ,287, S19 17,532, ,639, ,171, S20 8,059, ,599, ,659, S21 7,130, ,398, ,528, S22 1,557, , ,815, S24 4,729, , ,655, S25 6,815, ,355, ,170, S27 9,072, ,913, ,986, S28 2,887, , ,417, S2 7,815, ,604, ,419, S31 4,360, , ,163, S32 4,000, , ,778, S33 2,404, , ,841, S36 4,771, , ,724, S37 1,835, , ,176, S38 11,287, ,389, ,677, S39 3,069, , ,655, S3 2,460, , ,937, S40 7,756, ,520, ,277, S41 12,025, ,325, ,350, S42 5,401, ,027, ,429, S43 3,094, , ,685, S44 5,473, ,096, ,569, S45 2,952, , ,499, S46 5,005, ,088, ,093, S47 12,866, ,630, ,497, S49 3,467, , ,135, S50 5,366, ,060, ,426, S52 4,576, , ,507, S53 3,779, , ,432, S54 7,630, ,361, ,991, S55 12,735, ,375, ,110, S56 5,653, ,438, ,091,

16 S57 4,833, ,127, ,960, S58 6,391, ,649, ,040, S59 6,043, ,516, ,560, S62 4,951, ,269, ,220, S63 5,584, ,477, ,061, S64 5,839, ,582, ,422, S65 4,329, ,058, ,387, S66 5,436, ,396, ,833, S67 6,241, ,613, ,854, S68 3,499, , ,393, S69 4,754, ,222, ,976, S6 12,991, ,596, ,588, S70 9,145, ,980, ,125, S71 4,969, , ,933, S72 7,649, ,872, ,521, S73 7,718, ,924, ,643, S74 5,927, ,551, ,479, S75 4,449, ,281, ,731, S76 4,335, ,192, ,528, S77 4,064, ,147, ,211, S78 4,642, ,249, ,891, S79 4,452, ,183, ,636, S7 5,772, ,123, ,896, S80 4,578, ,127, ,706, S81 5,425, ,559, ,985, S82 4,890, ,306, ,197, S83 5,403, ,475, ,878, S84 4,470, ,035, ,505, S85 3,051, , ,707, S86 5,840, ,491, ,331, S87 2,365, , ,846, S88 3,087, , ,796, S89 4,329, ,167, ,497, S8 7,591, ,395, ,987, S90 5,211, ,325, ,536, S91 4,212, ,148, ,360, S92 3,341, , ,251, S93 5,545, ,487, ,033, S94 2,605, , ,347, S95 5,225, ,469, ,695, S96 5,541, ,361, ,903, S97 4,292, ,037, ,330, S98 6,647, ,776, ,424, S99 6,146, ,680, ,827, S9 15,026, ,632, ,659, Supplementary Table 14 General statistics of predicted protein-coding genes for jujube Annotation methods Number Average transcript length (bp) Average CDS length (bp) Average exon per gene Average exon length (bp) Average intron length (bp) De novo AUGUSTUS 41,367 3, , Genescan 40,451 6, , , C. sinensis 22,510 3, , M. domestica 22,400 3, , Homolog P. trichocarpa 22,989 3, , G. max 21,523 3, , P. persica 23,322 2, , V. vinifera 20,926 3, , GLEAN 28,585 4, , RNASeq 29,051 4, , , Final gene 32,808 3, ,

17 Supplementary Table 15 Functional annotation of predicted genes for jujube Number Percentage (%) Total 32, Annotated InterPro 3, GO 3, KEGG 15, SwissProt 21, TrEMBL 26, Unannotated 5, Supplementary Table 16 The numbers of genes with certain numbers of exons in the jujube genome Total gene number 32,808 Number of genes containing one exon 7,128 (21.73%) Number of genes containing two exons 5,195 (15.83%) Number of genes containing three exons 5,579 (17.00%) Number of genes containing more than three exons 14,906 (45.43%) Supplementary Table 17 The numbers of genes with certain numbers of exons in the jujube genome Total length of annotation genes(nt) Covered by transcript reads from different tissue(nt) Coverage rate(%) Supplementary Table 18 Identification of non-coding RNA genes in the jujube genome Type Copy Average length (bp) Total length (bp) % of genome mirna , trna 1, , Total rrna , S , rrna 28S , S , S , Total snrna , CD-box , snrna HACA-box , Splicing , CD-box: C box (UGAUGA) and the D box (CUGA); HACA-box: H/ACA-type snornas Supplementary Table 19 General statistics of SNPs in the jujube genome SNP number Effective length (bp) SNP density (/kb) Chromosome level 3,794, ,005, Scaffold level 4,769, ,332, Rate of anchored (%) Supplementary Table 20 General statistics of jujube repetitive elements Type Repeat Size (bp) Rate of Genome (%) TRF 23,048, RepeatMasker 47,307, RepeatProteinMask 52,059, De novo 199,215, Total 216,570,

18 Supplementary Table 21 Classification of jujube transposable elements Retrotransposons Types Length (bp) Rate of Transposable Elements (%) Rate of Genome (%) Total of Retrotransposons 166,416, Gypsy 75,815, Copia 55,318, Line 7,741, Sine 856, Other 6, Unclassified elements 26,679, DNA transposons 38,501, Total transposable elements 204,918, Supplementary Table 22 Summary of synteny blocks among Z. jujuba, F. vesca, V. vinifera and P. persica Genome Block size Total < 1Mb 1-3Mb > 3M Ziziphus jujuba vs Fragaria vesca Fragaria vesca vs Ziziphus jujuba Ziziphus jujuba vs Vitis vinifera Vitis vinifera vs Ziziphus jujuba Ziziphus jujuba vs Prunus persica Prunus persica vs Ziziphus jujuba Supplementary Table 23 Fructose, glucose, sucrose and total sugar content in different stages of fruit (mg/g DW) Stage Sucrose Glucose Fructose Total sugar Young fruit White mature fruit Half red fruit Full red fruit Supplementary Table 24 The log2 ratio of expression values (RPKM) of genes related to plant hormone signal transduction in deciduous and lignified bearing shoots of jujube Gene ID Log2 ratio Up/Down Predicted Deciduous Lignified (lignified/ Regulation(lignified/ Gene deciduous) deciduous) P-value FDR CCG SAUR Up E-07 CCG SAUR Up E-07 CCG SAUR Up E-35 CCG CYCD Up E-93 CCG CYCD Up E-129 CCG ARR-A Up E-68 addgene1990 ARR-A Up E-46 CCG PYL Down E-05 CCG PYL Down E-08 CCG SNRK Down E-05 CCG ERF Down E-22 CCG ERF Down E-16 CCG JAR Down E-21 P-value corresponds to differential gene expression test. Since gene expression analysis generates a large multiplicity problems in which thousands of hypothesis (is gene x differentially expressed between the two groups) are tested simultaneously, correction for false positive (type I errors) and false negative (type II) errors are performed. Assume that we have picked out R differentially expressed genes in which S genes really show differential expression and the other V genes are false positive. If we decide that the error ratio Q = V / R must stay below a cutoff (e.g. 5%), we should preset the FDR to a number no larger than We use FDR and the absolute value of Log2 Ratio 1 as the threshold to judge the significance of gene expression difference.

19 Supplementary Table 25 The expression of genes involved in the response to osmotic stress at different stages of fruit (RPKM) Gene Young fruit White mature fruit Half red fruit Full red fruit CCG CCG CCG addgene CCG CCG Supplementary Table 26 The high expression of chitinase genes in jujube (RPKM) Gene Primary shoot Secondary shoot Bearing shoot Mother shoot CCG CCG CCG CCG CCG CCG CCG KEGG_Orthology K e gmx: c hitinase [EC: ] K e zma: c hitinase [EC: ] K e rcu:rcom_ chitinase [EC: ] K e osa: chiti nase [EC: ] K e gmx: c hitinase [EC: ] K e vvi: ch itinase [EC: ] K e pop:poptr_ chitinase [EC: ] The gene expression level is calculated by using RPKM 9 method (Reads per kilobase transcriptome per million mapped reads). Supplementary Table 27 The number of genes encoding autophagy-related protein 9 in jujube and other species IPR_ID Z. jujuba F. vesca M. alba M.domestica P. bretschneider P. mume P. persica p_value IPR IPR ID reference to the InterPro database Supplementary Table 28 R genes in jujube and 11 other species Species CC-NBS TIR-CC- TIR-NBS- CC-NBS LRR-RLK NBS-LRR NBS -LRR NBS-LRR LRR TIR-NBS Citrullus lanatus Citrus sinensis Fragaria vesca Musa acuminata Morus alba Malus domestica Pyrus bretschneideri Phoenix dactylifera Prunus mume Prunus persica Vitis vinifera Ziziphus jujuba

20 Supplementary Table 29 Unique genes and positively selected genes with NB-ARC domains in the jujube genome IPR ID IPR Title Unique genes and positively selected genes Total number IPR NB-ARC IPR ID reference to the InterPro database 10. Gene IDs addgene2520 addgene2764 addgene2765 addgene2783 addgene2784 addgene3224 addgene3626 addgene3669 CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG CCG Supplementary References 1. Yang, Y. X., Wang, G Y. & Pan, X.Ch. China Food Composition (Book1.2nd Edition) (Peking University Medical Press, 2009). 2. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, (2010). 3. Velasco, R. et al. The genome of the domesticated apple (Malus x domestica Borkh.). Nat Genet 42, (2010). 4. Wu, J. et al. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res 23, (2013). 5. International Peach Genome, I. et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45, (2013). 6. Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nat Genet 43, (2011). 7. Zhang, Q. et al. The genome of Prunus mume. Nat Commun 3, 1318 (2012). 8. He, N. et al. Draft genome sequence of the mulberry tree Morus notabilis. Nat Commun 4, 2445 (2013). 9. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5, (2008). 10.Hunter, S. et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40, (2012).

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly Supplementary Tables Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly Library Read length Raw data Filtered data insert size (bp) * Total Sequence depth Total Sequence

More information

Sequencing and assembly of the sheep genome reference sequence

Sequencing and assembly of the sheep genome reference sequence Sequencing and assembly of the sheep genome reference sequence Yu Jiang Kunming Institute of Zoology, CAS, China the International Sheep Genomics Consortium (ISGC) ISGC Presentations Yu Jiang, Kunming

More information

Supplementary Information Draft Genome Sequence of the Mulberry Tree Morus notabilis

Supplementary Information Draft Genome Sequence of the Mulberry Tree Morus notabilis Supplementary Information Draft Genome Sequence of the Mulberry Tree Morus notabilis Ningjia He 1, Chi Zhang 2, Xiwu Qi 1, Shancen Zhao 2, Yong Tao 2, Guojun Yang 3, Tae-Ho Lee 4, Xiyin Wang 4,9, Qingle

More information

Nature Biotechnology: doi: /nbt.3943

Nature Biotechnology: doi: /nbt.3943 Supplementary Figure 1. Distribution of sequence depth across the bacterial artificial chromosomes (BACs). The x-axis denotes the sequencing depth (X) of each BAC and y-axis denotes the number of BACs

More information

Genomics and Transcriptomics of Spirodela polyrhiza

Genomics and Transcriptomics of Spirodela polyrhiza Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence

More information

Supplementary Information. The genome of Prunus mume. Inventory of Supplementary Information: Supplementary Figures S1-S9. Supplementary Tables S1-S22

Supplementary Information. The genome of Prunus mume. Inventory of Supplementary Information: Supplementary Figures S1-S9. Supplementary Tables S1-S22 Supplementary Information The genome of Prunus mume Qixiang Zhang 1,6,*, Wenbin Chen 2,6, Lidan Sun 1,6, Fangying Zhao 3,6, Bangqing Huang 2,6, Weiru Yang 1, Ye Tao 2, Jia Wang 4, Zhiqiong Yuan 3, Guangyi

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

Biol 478/595 Intro to Bioinformatics

Biol 478/595 Intro to Bioinformatics Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12

More information

Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms.

Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Supplemental Figures Figure S1. Data flow of de novo genome assembly using next generation sequencing data from

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids. Supplementary Figure 1 Number and length distributions of the inferred fosmids. Fosmid were inferred by mapping each pool s sequence reads to hg19. We retained only those reads that mapped to within a

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

Fruit and Nut Trees Genomics and Quantitative Genetics

Fruit and Nut Trees Genomics and Quantitative Genetics Fruit and Nut Trees Genomics and Quantitative Genetics Jasper Rees Department of Biotechnology University of the Western Cape South Africa jrees@uwc.ac.za The Challenges of Tree Breeding Long breeding

More information

Background Wikipedia Lee and Mahadavan, JCB, 2009 History (Platform Comparison) P Park, Nature Review Genetics, 2009 P Park, Nature Reviews Genetics, 2009 Rozowsky et al., Nature Biotechnology, 2009

More information

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance

More information

Genome Assembly With Next Generation Sequencers

Genome Assembly With Next Generation Sequencers Genome Assembly With Next Generation Sequencers Personal Genomics Institute 3 May, 2011 Jongsun Park Table of Contents 1 Central Dogma and Omics Studies 2 History of Sequencing Technologies 3 Genome Assembly

More information

Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline

Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline Xi Wang Bioinformatics Scientist Computational Life Science Page 1 Bayer 4:3 Template 2010 March 2016 17/01/2017

More information

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris

More information

Genome annotation & EST

Genome annotation & EST Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:1.138/nature11233 Supplementary Figure S1 Sample Flowchart. The ENCODE transcriptome data are obtained from several cell lines which have been cultured in replicates. They were either left intact (whole

More information

Supplementary Figure 1 Taxonomy of the family Camelidae. The graph shows the six. species of extant camelids distributed among three genera.

Supplementary Figure 1 Taxonomy of the family Camelidae. The graph shows the six. species of extant camelids distributed among three genera. Supplementary Figure 1 Taxonomy of the family Camelidae. The graph shows the six species of extant camelids distributed among three genera. 1 a b 2 c Supplementary Figure 2 17 bp-mer estimation of the

More information

Deep Sequencing technologies

Deep Sequencing technologies Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

Steps in Genetic Analysis

Steps in Genetic Analysis Molecular Tools Steps in Genetic Analysis 1. Knowing how many genes determine a phenotype, and where the genes are located, is a first step in understanding the genetic basis of a phenotype 2. A second

More information

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), 2012-01-26 What is a gene What is a transcriptome History of gene expression assessment RNA-seq RNA-seq analysis

More information

Genome Sequencing-- Strategies

Genome Sequencing-- Strategies Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that

More information

The genome of Fraxinus excelsior (European Ash)

The genome of Fraxinus excelsior (European Ash) The genome of Fraxinus excelsior (European Ash) Elizabeth Sollars, Laura Kelly, Bernardo Clavijo, David Swarbreck, Jasmin Zohren, David Boshier, Jo Clark, Anika Joecker, Sarah Ayling, Mario Caccamo, Richard

More information

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis -Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification

More information

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types

More information

The Diploid Genome Sequence of an Individual Human

The Diploid Genome Sequence of an Individual Human The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.

More information

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,

More information

NEXT GENERATION SEQUENCING. Farhat Habib

NEXT GENERATION SEQUENCING. Farhat Habib NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp

More information

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)

More information

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ) Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ) Martin Mascher IPK Gatersleben PAG XXII January 14, 2012 slide 1 Proof-of-principle in barley Diploid model for wheat 5 Gb

More information

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN ... 2014 2015 2016 2017 ... 2014 2015 2016 2017 Synthetic

More information

RNA-SEQUENCING ANALYSIS

RNA-SEQUENCING ANALYSIS RNA-SEQUENCING ANALYSIS Joseph Powell SISG- 2018 CONTENTS Introduction to RNA sequencing Data structure Analyses Transcript counting Alternative splicing Allele specific expression Discovery APPLICATIONS

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

RNA-Seq de novo assembly training

RNA-Seq de novo assembly training RNA-Seq de novo assembly training Training session aims Give you some keys elements to look at during read quality check. Transcriptome assembly is not completely a strait forward process : Multiple strategies

More information

Experimental Design Microbial Sequencing

Experimental Design Microbial Sequencing Experimental Design Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu General rules for preparing

More information

De novo assembly in RNA-seq analysis.

De novo assembly in RNA-seq analysis. De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

Supplementary Data 1.

Supplementary Data 1. Supplementary Data 1. Evaluation of the effects of number of F2 progeny to be bulked (n) and average sequencing coverage (depth) of the genome (G) on the levels of false positive SNPs (SNP index = 1).

More information

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold Mapping and quantifying mammalian transcriptomes by RNA-Seq Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold Supplementary figures and text: Supplementary Figure 1 RNA shatter

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

NGS developments in tomato genome sequencing

NGS developments in tomato genome sequencing NGS developments in tomato genome sequencing 16-02-2012, Sandra Smit TATGTTTTGGAAAACATTGCATGCGGAATTGGGTACTAGGTTGGACCTTAGTACC GCGTTCCATCCTCAGACCGATGGTCAGTCTGAGAGAACGATTCAAGTGTTGGAAG ATATGCTTCGTGCATGTGTGATAGAGTTTGGTGGCCATTGGGATAGCTTCTTACC

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig. S1 Diagram of Pst genome sequencing and assembly using a fosmid to fosmid strategy. Fosmid pooling and sequencing: Fosmid librarywas constructed according to Kim

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA The most sensitive cdna synthesis technology, combined with next-generation

More information

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype Next Generation Genetics: Using deep sequencing to connect phenotype to genotype http://1001genomes.org Korbinian Schneeberger Connecting Genotype and Phenotype Genotyping SNPs small Resequencing SVs*

More information

Array-Ready Oligo Set for the Rat Genome Version 3.0

Array-Ready Oligo Set for the Rat Genome Version 3.0 Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.

More information

Finding Genes with Genomics Technologies

Finding Genes with Genomics Technologies PLNT2530 Plant Biotechnology (2018) Unit 7 Finding Genes with Genomics Technologies Unless otherwise cited or referenced, all content of this presenataion is licensed under the Creative Commons License

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013 Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA March 2, 2013 Steven R. Kain, Ph.D. ABRF 2013 NuGEN s Core Technologies Selective Sequence Priming Nucleic Acid Amplification

More information

Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights into molecular basis of its polyphenol-rich characteristics

Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights into molecular basis of its polyphenol-rich characteristics Giga Science, 6, 2017, 1 14 doi: 10.1093/gigascience/gix023 Advance Access Publication Date: 28 March 2017 Research RESEARCH Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights

More information

Supplementary Figures

Supplementary Figures Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived

More information

Microsatellite markers

Microsatellite markers Microsatellite markers Review of repetitive sequences 25% 45% 8% 21% 13% 3% Mobile genetic elements: = dispersed repeat included: transposition: moving in the form of DNA by element coding for transposases.

More information

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus Author s response to reviews Title: High-quality genome assembly of channel catfish, Ictalurus punctatus Authors: Qiong Shi (shiqiong@genomics.cn) Xiaohui Chen (xhchenffri@hotmail.com) Liqiang Zhong (lqzhongffri@hotmail.com)

More information

Transcriptome analysis

Transcriptome analysis Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize

More information

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader

More information

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent

More information

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput Chapter 11: Gene Expression The availability of an annotated genome sequence enables massively parallel analysis of gene expression. The expression of all genes in an organism can be measured in one experiment.

More information

GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS. Olivier GARSMEUR & Stéphanie SIDIBE-BOCS

GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS. Olivier GARSMEUR & Stéphanie SIDIBE-BOCS GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS Olivier GARSMEUR & Stéphanie SIDIBE-BOCS Introduction two main concepts: Identify the different elements of the genome, (location and stucture) :

More information

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database

More information

Introduction to RNA-Seq

Introduction to RNA-Seq Introduction to RNA-Seq Monica Britton, Ph.D. Bioinformatics Analyst September 2014 Workshop Overview of Today s Activities Morning RNA-Seq Concepts, Terminology, and Work Flows Two-Condition Differential

More information

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. David Wang Bio 434W 4/27/15 Annotation of contig27 in the Muller F Element of D. elegans Abstract Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. Genscan predicted six

More information

A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries.

A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries. A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries. O. A. Olsen, T. Belova, B. Zhan, S. R. Sandve, J. Hu, L. Li, J. Min, J. Chen,

More information

RNA-Seq analysis workshop

RNA-Seq analysis workshop RNA-Seq analysis workshop Zhangjun Fei Boyce Thompson Institute for Plant Research USDA Robert W. Holley Center for Agriculture and Health Cornell University Outline Background of RNA-Seq Application of

More information

Supplementary Information Supplementary Figures

Supplementary Information Supplementary Figures Supplementary Information Supplementary Figures Figure S. The number of reads mapped to the and models for 76 human plasmablasts (AW-AW dataset) using bowtie reconstructed from (A) (B) (C) IMGT_mapped

More information

Wheat CAP Gene Expression with RNA-Seq

Wheat CAP Gene Expression with RNA-Seq Wheat CAP Gene Expression with RNA-Seq July 9 th -13 th, 2018 Overview of the workshop, Alina Akhunova http://www.ksre.k-state.edu/igenomics/workshops/ RNA-Seq Workshop Activities Lectures Laboratory Molecular

More information

Lecture 7. Next-generation sequencing technologies

Lecture 7. Next-generation sequencing technologies Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

Genome Projects. Part III. Assembly and sequencing of human genomes

Genome Projects. Part III. Assembly and sequencing of human genomes Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences

More information

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before Jeremy Preston, PhD Marketing Manager, Sequencing Illumina Genome Analyzer: a Paradigm Shift 2000x gain in efficiency

More information

High quality reference genome of the domestic sheep (Ovis aries) Yu Jiang and Brian P. Dalrymple

High quality reference genome of the domestic sheep (Ovis aries) Yu Jiang and Brian P. Dalrymple High quality reference genome of the domestic sheep (Ovis aries) Yu Jiang and Brian P. Dalrymple CSIRO Livestock Industries on behalf of the International Sheep Genomics Consortium Outline of presentation

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool of BAC Clones and High-throughput Technology

A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool of BAC Clones and High-throughput Technology Send Orders for Reprints to reprints@benthamscience.ae 210 The Open Biotechnology Journal, 2015, 9, 210-215 Open Access A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool

More information

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure

More information

Key Area 1.3: Gene Expression

Key Area 1.3: Gene Expression Key Area 1.3: Gene Expression RNA There is a second type of nucleic acid in the cell, called RNA. RNA plays a vital role in the production of protein from the code in the DNA. What is gene expression?

More information

Introduction to Next Generation Sequencing

Introduction to Next Generation Sequencing The Sequencing Revolution Introduction to Next Generation Sequencing Dena Leshkowitz,WIS 1 st BIOmics Workshop High throughput Short Read Sequencing Technologies Highly parallel reactions (millions to

More information

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Table of Contents SUPPLEMENTARY TEXT:... 2 FILTERING OF RAW READS PRIOR TO ASSEMBLY:... 2 COMPARATIVE ANALYSIS... 2 IMMUNOGENIC

More information

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms

More information

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies Statistical Genomics and Bioinformatics Workshop: Genetic Association and RNA-Seq Studies RNA Seq and Differential Expression Analysis Brooke L. Fridley, PhD University of Kansas Medical Center 1 Next-generation

More information

De Novo Repeats Construction: Methods and Applications

De Novo Repeats Construction: Methods and Applications University of Connecticut DigitalCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 5-5-2017 De Novo Repeats Construction: Methods and Applications Chong Chu chong.chu@engr.uconn.edu

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences. Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or

More information

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Protocols Next generation sequencing protocol cdna, not RNA sequencing

More information

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 156 162 RESEARCH PAPER doi: 10.1007/s11427-013-4444-x Comparative analysis of de novo transcriptome assembly CLARKE Kaitlin 1, YANG

More information

BIO 4342 Lecture on Repeats

BIO 4342 Lecture on Repeats BIO 4342 Lecture on Repeats Jeremy Buhler June 14, 2006 1 How RepeatMasker Works Running RepeatMasker is the most common first step in annotating genomic DNA sequences. What exactly does it do? Given a

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group RNA-Seq analysis With reference assembly Cormier Alexandre, PhD student UMR8227, Algal Genetics Group Summary 2 Typical RNA-seq workflow Introduction Reference genome Reference transcriptome Reference

More information

Genomes: What we know and what we don t know

Genomes: What we know and what we don t know Genomes: What we know and what we don t know Complete draft sequence 2001 October 15, 2007 Dr. Stefan Maas, BioS Lehigh U. What we know Raw genome data The range of genome sizes in the animal & plant kingdoms!

More information

Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group

Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group Functional genomics to improve wheat disease resistance Dina Raats Postdoctoral Scientist, Krasileva Group Talk plan Goal: to contribute to the crop improvement by isolating YR resistance genes from cultivated

More information

measuring gene expression December 5, 2017

measuring gene expression December 5, 2017 measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA

More information

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

High throughput sequencing technologies

High throughput sequencing technologies High throughput sequencing technologies and NGS applications Mei-yeh Lu 呂美曄 High Throughput Sequencing Core Manager g g p q g g Academia Sinica 6/30/2011 Outlines Evolution of sequencing technologies Sanger

More information

The tomato genome re-seq project

The tomato genome re-seq project The tomato genome re-seq project http://www.tomatogenome.net 5 February 2013, Richard Finkers & Sjaak van Heusden Rationale Genetic diversity in commercial tomato germplasm relatively narrow Unexploited

More information

Results WCP (Whole chromosome paint) FISH

Results WCP (Whole chromosome paint) FISH Results 61 3 Results The proband as well as her mother and grand mother with an inversion chromosome 3 and short stature were studied in this project to characterize the breakpoints. Cytogenetic analysis

More information