Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive

Similar documents
Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

Sequencing and assembly of the sheep genome reference sequence

Supplementary Information Draft Genome Sequence of the Mulberry Tree Morus notabilis

Nature Biotechnology: doi: /nbt.3943

Genomics and Transcriptomics of Spirodela polyrhiza

Supplementary Information. The genome of Prunus mume. Inventory of Supplementary Information: Supplementary Figures S1-S9. Supplementary Tables S1-S22

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Biol 478/595 Intro to Bioinformatics

Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Genomic resources. for non-model systems

Fruit and Nut Trees Genomics and Quantitative Genetics


Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Genome Assembly With Next Generation Sequencers

Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Genome annotation & EST

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

ChIP-seq and RNA-seq. Farhat Habib

SUPPLEMENTARY INFORMATION

Supplementary Figure 1 Taxonomy of the family Camelidae. The graph shows the six. species of extant camelids distributed among three genera.

Deep Sequencing technologies

Mate-pair library data improves genome assembly

Steps in Genetic Analysis

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Genome Sequencing-- Strategies

The genome of Fraxinus excelsior (European Ash)

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

The Diploid Genome Sequence of an Individual Human

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

NEXT GENERATION SEQUENCING. Farhat Habib

ChIP-seq and RNA-seq

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

RNA-SEQUENCING ANALYSIS


RNA-Seq de novo assembly training

Experimental Design Microbial Sequencing

De novo assembly in RNA-seq analysis.

Contact us for more information and a quotation

Supplementary Data 1.

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold

Mapping strategies for sequence reads

NGS developments in tomato genome sequencing

Supplementary Figures

RNA-Sequencing analysis

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Array-Ready Oligo Set for the Rat Genome Version 3.0

Finding Genes with Genomics Technologies

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights into molecular basis of its polyphenol-rich characteristics

Supplementary Figures

Microsatellite markers

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus

Transcriptome analysis

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS. Olivier GARSMEUR & Stéphanie SIDIBE-BOCS

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

Introduction to RNA-Seq

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries.

RNA-Seq analysis workshop

Supplementary Information Supplementary Figures

Wheat CAP Gene Expression with RNA-Seq

Lecture 7. Next-generation sequencing technologies

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Genome Projects. Part III. Assembly and sequencing of human genomes

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

High quality reference genome of the domestic sheep (Ovis aries) Yu Jiang and Brian P. Dalrymple

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool of BAC Clones and High-throughput Technology

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Key Area 1.3: Gene Expression

Introduction to Next Generation Sequencing

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

De Novo Repeats Construction: Methods and Applications

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly

BIO 4342 Lecture on Repeats

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

Genomes: What we know and what we don t know

Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group

measuring gene expression December 5, 2017

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies

High throughput sequencing technologies

The tomato genome re-seq project

Results WCP (Whole chromosome paint) FISH

Transcription:

Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive years. It is well adapted to drought and salinity.

Supplementary Figure2. 17-mer frequency distribution of sequencing reads and heterozygosity simulation. A K-mer refers to an artificial sequence division of K nucleotides. A sequencing read with L bp contains (L-K+1) K-mers if the length of each K-mer is K bp. Typically, K was set as 17 for genome size estimation. The K-mer frequency follows a Poisson distribution in a given data set. During deduction, the genome size G=K_num/Peak_depth, where the K_num is the total number of K-mer, and Peak_depth is the expected value of K-mer depth. Furthermore, if the heterozygous rate is higher, then a small peak will be presented at 1/2 of Peak_depth. So this K-mer analysis can be used to roughly determine the heterozygous rate of a given genome. The x-axis is depth (X); the y-axis is the proportion, which represents the frequency at that depth divide by the total frequency of all the depth. The high heterozygosis rate of jujube caused a sub peak (depth 30X) at the position of the half of the main peak (depth 59X). M is stand for heterozygosity, and the two curves are derived from heterozygosis simulation. The heterozygosis simulation revealed that the heterozygosis ratio was about 1.9% (between 1.8% and 2.0%).

2 Ziziphus jujuba 405 Mb, avg_gc: 0.334 Vitis vinifera 486 Mb, avg_gc: 0.345 Prunus persica 227 Mb, avg_gc: 0.375 1.5 percent of bins, bin=500 bp 1 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 G+C Content Supplementary Figure 3. GC distribution in Ziziphus jujuba, Vitis vinifera and Prunus persica.

Supplementary Figure 4. The pseudo-molecules of the 12 chromosomes ordered by genetic length. Construction of the genetic map with 55,743 segregated SNP sites, we got all the genotype s X 2 test. In Join map 4.0, X 2 10 (P value 0.01). At last, we just have 4033 SNP remained. With these markers, the genetic map had been constructed. The parameter was LOD start-end (7.0 to 15, step, 1.0). The locations of the 4 randomly selected BACs analyzed in Supplementary Fig. 5 are shown by red arrows.

Supplementary Figure 5. Alignments for the 4 randomly picked BAC clones and scaffolds. To evaluate the completeness and accuracy of the jujube assembly, read depth on the BACs was calculated by mapping the short reads onto the BAC sequences. The predicted genes, SSR and annotated transposable elements (TEs) are shown, respectively.

Supplementary Figure 6. The tissue-specific genes of the jujube.

Supplementary Figure 7. Divergence rate of transposable elements in the jujube genome. The divergence rate was calculated based on the alignment between the RepeatMasker annotated repeat copies and the consensus sequence in the repeat library (RepBase).

Supplementary Figure 8. The synteny of some gene blocks in Chr1 (9.20-14.68 Mb) with other chromosomes.

a b c d Supplementary Figure 9. Deciduous bearing shoot (D) and persistently lignified bearing shoot (P). a, fruiting status of D; b, fruiting status of P; c, left shows the joint point of D and right shows the joint point of P; d, the stem of D (lower) and P (up) after removing leaves.

Supplementary Table 1 The wide adaptation of Chinese jujube Parameter Value Minimum annual average temperature ( C) 5.5 Maximum annual average temperature ( C) 22 The lowest temperature ( C) -38.2 Minimum frost-free period (days) 100 Annual rainfall (mm) 87 2000 Minimum annual sunshine (hr) 1100 ph 4.5 8.4 Maximum soil water NaCl concentration (%) 0.15 Maximum soil water Na 2 CO 3 concentration (%) 0.3 Supplementary Table 2 Jujube nutritional facts in comparison to other (fruit) crops Fruit species Carbohydrate (%) Vc (mg/100g) Mineral element (mg/100g) Ca K Fe Zn Jujube 30.5 243 22 375 1.2 1.52 Apple 13.5 4 4 119 0.6 0.19 Peach 12.2 7 6 166 0.8 0.34 Persimmon 18.5 30 9 151 0.2 0.08 Grape 10.3 25 5 104 0.4 0.18 Orange 11.1 33 20 159 0.4 0.14 Pineapple 10.8 18 12 113 0.6 0.14 Litchi 16.6 41 2 151 0.4 0.17 Chestnut 42.2 24 17 442 1.1 0.57 Walnut 6.1 10 Sugarcane 16 2 14 95 0.4 1.00 Sugar beet 23.5 8 56 254 0.9 0.31 Kiwifruit 14.5 62 27 144 1.2 0.57 Date from China Food Composition (2009) 1. means no data. Supplementary Table 3 Estimation of jujube genome size based on K-mer statistics K-mer K-mer number K-mer depth Genome Size Used Base Used Read Coverage 17 26,191,979,760 59 443,931,860 31,430,375,712 327,399,747 75 From the distribution curve of depth-frequency (Supplementary Fig.2), we get that the expected depth is 59 and the total input reads number is 327,399,747, the total base is 31,430,375,712, the total K-mer number is 26,191,979,760. We calculated the genome size according to the formula: Genome Size =K-mer_num/Peak_depth 2.

Supplementary Table 4 Comparison of jujube genome features with those of other 9 species Species Jujube (Ziziphus jujuba) Mulberry (Morus alba) Kiwifruit (Actinidia chinensis) Peach (Prunus persica) Pear (Pyrus bretschneideri) Apple (Malus domestica) Strawberry (Fragaria vesca) Poplar (Populus trichocarpa) Sweet orange (Citrus sinensis) Family Genome size (Mb) Heterozygosity (%) Repeat elements (%) Rhamnaceae 444 1.90 49.5 33.4 Moraceae 357 47.0 35.0 Arctinaceae 758 0.54 36.0 35.2 Rosaceae 265 18.6 37.5 Rosaceae 527 1.02 53.1 37.2 Rosaceae 742 67.0 37.9 Rosaceae 240 22.0 38.3 Salicaceae 485 0.26 40.8 33.7 Rutaceae 367 20.5 34.1 Prunus mume Rosaceae 280 0.03 45.0 Data from our research and the papers on the genome of relevant species, and means no data. GC content (%)

Supplementary Table 5 SSR distribution of jujube and six closely-related species Species Jujube Apple 3 Pear 4 Peach 5 Strawberry 6 Prunus mume 7 Mulberry 8 Total size of examined sequences (Mb) 405.686802 871.560583 497.594737 224.608292 201.882818 217.021147 304.105745 Distribution of different repeat classes Number of di-nucleotide repeats 106,537 84,914 50,455 32,038 23,054 33,763 58,651 Number of tri-nucleotide repeats 38,168 25,955 11,829 8,086 7,910 8,698 20,072 Number of tetra-nucleotide repeats 7,144 3,718 3,314 1,574 747 1,668 3,515 Number of penta-nucleotide repeats 935 929 600 413 215 360 940 Number of hexa-nucleotide repeats 591 354 251 256 176 230 366 Total 153,375 115,870 66,449 42,367 32,102 44,719 83,544 SSR density ( /Mb) 378.06 132.95 133.54 188.63 159.01 206.06 274.72 Data from our research and the papers on the genome of relevant species. We found SSR sequence for jujube genome using SSRIT (http://www.gramene.org/db/markers/ssrtool). Supplementary Table 6 SSR comparison of jujube and grape or peach in collinear gene blocks (gene pairs 20) Jujube vs Grape Jujube vs Peach Jujube Grape Jujube/Grape Jujube Peach Jujube/Peach Number of blocks 94 138 Gene pairs 2,911 4,391 Block length (N counted in bp) 44,869,077 91,407,764 74,625,771 81,675,856 Block length (without N, bp) 46,671,967 93,327,470 77,585,925 82,140,745 GC content (%) 0.334844775 0.344801958 0.333273166 0.37327228 Number of SSR 7,963 6,286 13,031 8,023 Total length of SSR (bp) 230,778 207,226 375,064 271,967 Percent of SSR (%) 0.51433641 0.226705031 2.27 0.502593132 0.332983348 1.51 We found SSR sequence for the three genomes using SSRIT (http://www.gramene.org/db/markers/ssrtool).

Supplementary Table 7 Construction of libraries and the generation and filtering of the sequencing data used for the genome assembly Raw data Clean data Insert size Read length(bp) Total data(gb) Sequence coverage(x) Physical coverage(x) Read length(bp) Total data(gb) Sequence coverage(x) Physical coverage(x) 170bp 100 38.07 86.52 61.54 96 30.58 69.50 61.54 250bp 150 13.87 31.52 26.26 150 11.43 25.98 21.65 500bp 100 41.90 95.23 191.13 96 32.29 73.39 191.13 800bp 100 10.92 24.82 99.28 100 5.20 11.81 47.24 2kb 49 21.71 49.35 1007.04 49 12.98 29.50 602.06 5kb 49 20.02 45.51 2321.89 49 8.61 19.57 998.32 10kb 49 22.00 49.99 5101.43 49 5.60 12.74 1299.59 20kb 49 16.63 37.80 7713.88 49 2.16 4.90 999.39 40kb 49 3.74 8.510 3472.65 49 1.03 2.33 952.65 Total 188.86 429.25 19995.10 109.88 249.72 5173.57 We construct different insert-size WGS libraries using DNA sample of jujube. According to the strategy, the insert size of library is: 170bp, 250bp, 500bp, 800bp, 2Kb, 5Kb, 10Kb, 20Kb and 40Kb. After library constructing, we use Hiseq2000 to sequence PE reads for each library. Assuming the genome size is 444Mb. The quality requirement for de novo sequencing is far higher than re-sequencing. In order to facilitate the assembling works, we have taken a series of checking and filtering measures on raw data (generated by Solexa-Pipeline) to get clean data. We filtered raw reads to generate clean reads by the following criteria: (1) we remove reads with >2% Ns or with poly-a structure; (2) we remove reads with 40% low quality bases for short insert size libraries, and 60% for large insert size libraries; (3) we remove reads containing adapters; (4) we remove paired reads with mutual overlaps; (5) we remove PCR duplicates. After filtering, 109.88 Gb high-quality data were retained, which representing 249.72 genome coverage.

Supplementary Table 8 BAC data statistics Insert Size Number of libraries Number of lanes Total Raw Data (G) Total Clean Data (G) 500bp 21,504 8 177.17 140.64 Total 21,504 8 177.17 140.64 21,504 BAC clones with 120 kb in average length were randomly selected (about 5.8 genome size of jujube) and one 500bp WGS library was constructed for each clone. All libraries were sequenced on the Illumina HiSeq 2000 sequencing system. Supplementary Table 9 Statistics of the final genome assembly Contig Scaffold Size (bp) Number Size(bp) Number N90 7,347 13,344 73,568 1,497 N80 13,410 9,216 131,418 1,066 N70 19,527 6,653 190,940 788 N60 26,037 4,801 245,989 587 N50 33,948 3,392 301,045 426 Longest 334,926 --- 3,141,199 --- Total size 417,332,479 --- 437,645,007 --- Total Number( 100bp) --- 28,930 --- 5,898 Total Number( 2kb) --- 21,710 --- 5,139 Supplementary Table 10 Assessment of genome coverage for four randomly selected BAC clones Number of Number of Aligned BAC Length Coverage Number alignment aligned scaffold length ID (bp) ratio (%) of gaps blocks scaffolds (bp) Gap length (bp) Gap ratio (%) BAC1 107,191 99.98 23 1 314,261 2 155 0.0014 BAC2 112,608 100.00 243 34 14,035,191 32 12,353 0.1097 BAC3 128,002 98.11 159 17 5,723,084 24 15,953 0.1246 BAC4 106,863 95.90 25 3 132,904 2 278 0.0026 The jujube assembly (scaffolds) was then aligned to four Sanger-sequenced BACs (average length 113.666kb) via BLASTN. The coverage of the BAC sequences by our assembled scaffolds was calculated. Supplementary Table 11 Assessment of the transcript coverage with data from the 1942 published ESTs Bases Sequences with > 90% sequence with > 50% sequence Dataset Number Total length (bp) covered by covered by in one scaffold in one scaffold assembly assembly Number Percent Number Percent > 0 bp 1,942 1,169,253 94.94 97.58 1,766 90.94 1,842 94.85 > 200 bp 1,933 1,167,757 94.94 97.62 1,758 90.95 1,834 94.88 > 500 bp 1,403 971,481 95.22 98.08 1,274 90.81 1,337 95.30 Data from http://www.ncbi.nlm.nih.gov/nucest/?term=jujube, downloaded on July, 2013 Supplementary Table 12 Assessment of the transcript coverage with the transcriptome assembly contig (TAC) data Dataset Number Total length (bp) Bases covered by assembly Sequences Covered by assembly with > 90% sequence in one scaffold with > 50% sequence in one scaffold Number Percent Number Percent > 0 bp 51,514 46,167,283 92.57 94.43 42,194 81.91 47,465 92.14 > 200 bp 51,514 46,167,283 92.57 94.43 42,194 81.91 47,465 92.14 > 500 bp 26,805 38,578,018 93.31 97.78 21,681 80.88 25,401 94.76 > 1000 bp 16,059 30,851,763 93.50 99.12 12,689 79.01 15,355 95.62

Supplementary Table 13 The alignment results of two parents and their 105 progenies Sample ID Paired mapping(pe) (PE%) Singled mapping(se) Mapping Reads (SE%) All mapping (All mapping%) P1 6,655,150 66.21 1,383,017 13.76 8,038,167 79.97 P2 4,272,154 63.63 948,401 14.13 5,220,555 77.76 S100 5,787,922 64.04 1,258,384 13.92 7,046,306 77.97 S101 7,002,300 61.34 1,833,027 16.06 8,835,327 77.39 S102 3,031,360 63.2 687,336 14.33 3,718,696 77.53 S103 3,778,010 61.06 995,907 16.1 4,773,917 77.16 S104 4,637,788 61.14 1,247,135 16.44 5,884,923 77.59 S105 3,796,280 60.88 1,019,879 16.36 4,816,159 77.24 S106 4,244,088 62.14 1,049,797 15.37 5,293,885 77.51 S107 4,348,862 58.93 1,305,000 17.68 5,653,862 76.62 S108 3,215,028 63.58 720,580 14.25 3,935,608 77.82 S109 5,374,538 61.47 1,439,240 16.46 6,813,778 77.93 S10 10,434,398 64.29 2,260,354 13.93 12,694,752 78.22 S110 5,844,074 61.16 1,559,176 16.32 7,403,250 77.47 S111 3,756,594 58.98 1,113,164 17.48 4,869,758 76.46 S112 3,962,984 61.37 1,072,075 16.6 5,035,059 77.97 S113 4,091,734 63.43 950,068 14.73 5,041,802 78.16 S114 3,961,560 60.41 1,099,955 16.77 5,061,515 77.19 S115 4,849,868 62.53 1,179,921 15.21 6,029,789 77.74 S116 5,235,194 61.39 1,365,948 16.02 6,601,142 77.41 S117 5,622,354 60.41 1,551,322 16.67 7,173,676 77.07 S118 4,470,012 60.33 1,233,605 16.65 5,703,617 76.97 S119 4,232,200 61.1 1,148,506 16.58 5,380,706 77.68 S12 2,423,356 66.97 429,711 11.88 2,853,067 78.85 S13 7,729,824 66.28 1,464,802 12.56 9,194,626 78.85 S14 3,398,824 66.91 598,144 11.78 3,996,968 78.69 S15 2,799,450 66.91 488,380 11.67 3,287,830 78.58 S16 3,598,774 64.99 728,530 13.16 4,327,304 78.15 S17 4,521,180 67.36 808,839 12.05 5,330,019 79.41 S18 3,611,304 66.91 676,451 12.53 4,287,755 79.44 S19 17,532,146 64.14 3,639,688 13.32 21,171,834 77.45 S20 8,059,870 64.77 1,599,860 12.86 9,659,730 77.63 S21 7,130,860 66.25 1,398,123 12.99 8,528,983 79.24 S22 1,557,504 67.32 258,458 11.17 1,815,962 78.49 S24 4,729,730 66.1 925,934 12.94 5,655,664 79.04 S25 6,815,392 65.24 1,355,235 12.97 8,170,627 78.21 S27 9,072,960 63.98 1,913,264 13.49 10,986,224 77.48 S28 2,887,050 65.84 530,074 12.09 3,417,124 77.93 S2 7,815,452 65.02 1,604,518 13.35 9,419,970 78.37 S31 4,360,822 67.18 802,958 12.37 5,163,780 79.55 S32 4,000,224 65.57 778,368 12.76 4,778,592 78.32 S33 2,404,690 66.32 437,043 12.05 2,841,733 78.38 S36 4,771,328 65.41 953,336 13.07 5,724,664 78.47 S37 1,835,190 66.95 341,772 12.47 2,176,962 79.42 S38 11,287,996 64.69 2,389,749 13.7 13,677,745 78.39 S39 3,069,680 65.84 585,702 12.56 3,655,382 78.4 S3 2,460,550 66.1 477,281 12.82 2,937,831 78.93 S40 7,756,358 66.28 1,520,981 13 9,277,339 79.28 S41 12,025,038 66 2,325,437 12.76 14,350,475 78.76 S42 5,401,856 66.92 1,027,677 12.73 6,429,533 79.65 S43 3,094,910 67.15 590,692 12.82 3,685,602 79.96 S44 5,473,278 65.77 1,096,244 13.17 6,569,522 78.94 S45 2,952,522 66 547,284 12.23 3,499,806 78.24 S46 5,005,520 65.55 1,088,059 14.25 6,093,579 79.79 S47 12,866,378 65.02 2,630,628 13.29 15,497,006 78.31 S49 3,467,130 67.02 668,023 12.91 4,135,153 79.93 S50 5,366,518 65.85 1,060,107 13.01 6,426,625 78.85 S52 4,576,160 65.43 931,518 13.32 5,507,678 78.75 S53 3,779,578 68.68 652,567 11.86 4,432,145 80.53 S54 7,630,762 67.35 1,361,194 12.01 8,991,956 79.37 S55 12,735,024 66.21 2,375,670 12.35 15,110,694 78.56 S56 5,653,278 61.84 1,438,536 15.74 7,091,814 77.58

S57 4,833,290 62.84 1,127,697 14.66 5,960,987 77.51 S58 6,391,228 60.98 1,649,037 15.73 8,040,265 76.72 S59 6,043,638 62.21 1,516,616 15.61 7,560,254 77.82 S62 4,951,394 61.73 1,269,500 15.83 6,220,894 77.56 S63 5,584,450 60.99 1,477,201 16.13 7,061,651 77.12 S64 5,839,430 60.71 1,582,656 16.45 7,422,086 77.16 S65 4,329,360 62.7 1,058,582 15.33 5,387,942 78.03 S66 5,436,936 61.86 1,396,961 15.89 6,833,897 77.75 S67 6,241,124 62.08 1,613,658 16.05 7,854,782 78.13 S68 3,499,786 61.89 893,422 15.8 4,393,208 77.68 S69 4,754,668 62.13 1,222,116 15.97 5,976,784 78.1 S6 12,991,180 65.2 2,596,876 13.03 15,588,056 78.23 S70 9,145,252 64.19 1,980,161 13.9 11,125,413 78.09 S71 4,969,314 65.84 964,244 12.78 5,933,558 78.62 S72 7,649,416 62.7 1,872,169 15.34 9,521,585 78.04 S73 7,718,732 62.26 1,924,455 15.52 9,643,187 77.79 S74 5,927,464 61.56 1,551,756 16.12 7,479,220 77.68 S75 4,449,726 60.28 1,281,417 17.36 5,731,143 77.64 S76 4,335,480 60.69 1,192,715 16.7 5,528,195 77.38 S77 4,064,094 60.51 1,147,610 17.09 5,211,704 77.6 S78 4,642,386 60.49 1,249,577 16.28 5,891,963 76.78 S79 4,452,786 60.72 1,183,620 16.14 5,636,406 76.86 S7 5,772,734 66.19 1,123,383 12.88 6,896,117 79.07 S80 4,578,736 61.89 1,127,551 15.24 5,706,287 77.13 S81 5,425,574 59.49 1,559,671 17.1 6,985,245 76.59 S82 4,890,628 61 1,306,758 16.3 6,197,386 77.3 S83 5,403,588 60.9 1,475,118 16.62 6,878,706 77.52 S84 4,470,524 63.19 1,035,473 14.64 5,505,997 77.82 S85 3,051,720 64.18 655,847 13.79 3,707,567 77.97 S86 5,840,188 62.02 1,491,429 15.84 7,331,617 77.86 S87 2,365,234 65.22 481,160 13.27 2,846,394 78.48 S88 3,087,544 63.63 708,967 14.61 3,796,511 78.24 S89 4,329,624 61.29 1,167,673 16.53 5,497,297 77.82 S8 7,591,906 66.48 1,395,249 12.22 8,987,155 78.7 S90 5,211,142 62.14 1,325,096 15.8 6,536,238 77.94 S91 4,212,026 60.91 1,148,902 16.62 5,360,928 77.53 S92 3,341,440 44.06 910,515 12.01 4,251,955 56.07 S93 5,545,910 61.04 1,487,359 16.37 7,033,269 77.41 S94 2,605,550 60.01 742,317 17.1 3,347,867 77.11 S95 5,225,852 60.11 1,469,892 16.91 6,695,744 77.02 S96 5,541,976 62.2 1,361,499 15.28 6,903,475 77.48 S97 4,292,348 62.9 1,037,763 15.21 5,330,111 78.11 S98 6,647,938 61.05 1,776,728 16.32 8,424,666 77.36 S99 6,146,526 60.61 1,680,825 16.58 7,827,351 77.19 S9 15,026,702 61.76 3,632,430 14.93 18,659,132 76.69 Supplementary Table 14 General statistics of predicted protein-coding genes for jujube Annotation methods Number Average transcript length (bp) Average CDS length (bp) Average exon per gene Average exon length (bp) Average intron length (bp) De novo AUGUSTUS 41,367 3,099.10 1,087.29 4.34 250.73 602.97 Genescan 40,451 6,016.93 1,101.93 5.17 213.07 1,178.19 C. sinensis 22,510 3,305.07 1,146.13 4.73 242.27 578.70 M. domestica 22,400 3,665.82 1,124.09 4.64 242.25 698.24 Homolog P. trichocarpa 22,989 3,037.24 1,120.79 4.64 241.67 526.82 G. max 21,523 3,033.93 1,117.46 4.72 236.65 514.91 P. persica 23,322 2,926.60 1,126.36 4.66 241.90 492.36 V. vinifera 20,926 3,643.23 1,110.30 5.21 213.05 601.42 GLEAN 28,585 4,058.00 1,226.81 4.60 266.47 785.59 RNASeq 29,051 4,038.42 1,222.83 4,58 267.05 786.71 Final gene 32,808 3,799.94 1,190.50 4.50 264.40 702.43

Supplementary Table 15 Functional annotation of predicted genes for jujube Number Percentage (%) Total 32,808 -- Annotated InterPro 3,525 10.74 GO 3,024 9.22 KEGG 15,693 47.83 SwissProt 21,933 66.85 TrEMBL 26,202 79.86 Unannotated 5,298 16.15 Supplementary Table 16 The numbers of genes with certain numbers of exons in the jujube genome Total gene number 32,808 Number of genes containing one exon 7,128 (21.73%) Number of genes containing two exons 5,195 (15.83%) Number of genes containing three exons 5,579 (17.00%) Number of genes containing more than three exons 14,906 (45.43%) Supplementary Table 17 The numbers of genes with certain numbers of exons in the jujube genome Total length of annotation genes(nt) Covered by transcript reads from different tissue(nt) Coverage rate(%) 36996702 33224882 0.898049832 Supplementary Table 18 Identification of non-coding RNA genes in the jujube genome Type Copy Average length (bp) Total length (bp) % of genome mirna 272 119.39 32,475 0.0074 trna 1,209 76.22 92,151 0.0211 Total rrna 410 316.86 129,912 0.0297 18S 94 957.30 89,986 0.0206 rrna 28S 218 133.07 29,010 0.0066 5.8S 42 154.38 6,484 0.0015 5S 56 79.14 4,432 0.0010 Total snrna 286 116.55 33,332 0.0076 CD-box 186 102.09 18,989 0.0043 snrna HACA-box 23 124.00 2,852 0.0007 Splicing 77 149.23 11,491 0.0026 CD-box: C box (UGAUGA) and the D box (CUGA); HACA-box: H/ACA-type snornas Supplementary Table 19 General statistics of SNPs in the jujube genome SNP number Effective length (bp) SNP density (/kb) Chromosome level 3,794,273 322,005,718 11.783247 Scaffold level 4,769,958 417,332,479 11.429635 Rate of anchored (%) 79.55 77.16 Supplementary Table 20 General statistics of jujube repetitive elements Type Repeat Size (bp) Rate of Genome (%) TRF 23,048,852 5.27 RepeatMasker 47,307,945 10.81 RepeatProteinMask 52,059,471 11.90 De novo 199,215,585 45.52 Total 216,570,752 49.49

Supplementary Table 21 Classification of jujube transposable elements Retrotransposons Types Length (bp) Rate of Transposable Elements (%) Rate of Genome (%) Total of Retrotransposons 166,416,957 81.21 38.03 Gypsy 75,815,091 37 17.32 Copia 55,318,493 27 12.64 Line 7,741,729 3.78 1.77 Sine 856,025 0.42 0.2 Other 6,609 0 0 Unclassified elements 26,679,010 13.02 6.1 DNA transposons 38,501,526 18.79 8.8 Total transposable elements 204,918,483 100 46.82 Supplementary Table 22 Summary of synteny blocks among Z. jujuba, F. vesca, V. vinifera and P. persica Genome Block size Total < 1Mb 1-3Mb > 3M Ziziphus jujuba vs Fragaria vesca 1843 61 1 1905 Fragaria vesca vs Ziziphus jujuba 1891 14 0 1905 Ziziphus jujuba vs Vitis vinifera 2485 86 2 2573 Vitis vinifera vs Ziziphus jujuba 1520 896 157 2573 Ziziphus jujuba vs Prunus persica 2466 100 3 2569 Prunus persica vs Ziziphus jujuba 2410 159 0 2569 Supplementary Table 23 Fructose, glucose, sucrose and total sugar content in different stages of fruit (mg/g DW) Stage Sucrose Glucose Fructose Total sugar Young fruit 228.58 11.03 133.496 9.09 139.057 5.87 668.57 8.08 White mature fruit 231.22 9.93 112.569 3.44 154.312 6.75 724.29 68.79 Half red fruit 281.39 14.68 59.554 1.68 94.634 8.39 736.86 67.48 Full red fruit 284.69 15.94 85.923 3.77 132.898 11.29 784.57 59.44 Supplementary Table 24 The log2 ratio of expression values (RPKM) of genes related to plant hormone signal transduction in deciduous and lignified bearing shoots of jujube Gene ID Log2 ratio Up/Down Predicted Deciduous Lignified (lignified/ Regulation(lignified/ Gene deciduous) deciduous) P-value FDR CCG000613.1 SAUR1 0.001 2.443 11.254 Up 2.02 10 7 4.13E-07 CCG000989.1 SAUR2 0.001 2.104 11.039 Up 3.95 10 7 7.99E-07 CCG009175.1 SAUR3 0.001 22.693 14.470 Up 3.13 10 36 1.63E-35 CCG013912.1 CYCD3-1 0.239 19.344 6.337 Up 5.74 10 94 5.55E-93 CCG000859.1 CYCD3-2 2.383 31.277 3.714 Up 1.11 10 130 1.39E-129 CCG012193.1 ARR-A1 5.380 45.331 3.075 Up 1.20 10 68 9.30E-68 addgene1990 ARR-A2 4.368 25.546 2.548 Up 9.28 10 47 5.59E-46 CCG011967.1 PYL1 1.600 0.001-10.644 Down 1.04 10 5 1.97E-05 CCG011949.1 PYL2 2.400 0.001-11.229 Down 3.40 10 8 7.19E-08 CCG023478.1 SNRK2 2.988 0.001-11.545 Down 1.03 10 5 2.20E-05 CCG024904.1 ERF1-1 6.644 0.153-5.438 Down 6.23 10 23 2.49E-22 CCG007987.1 ERF1-2 7.839 0.677-3.533 Down 1.21 10 16 4.13E-16 CCG024930.1 JAR1 8.336 2.693-1.630 Down 7.79 10 22 3.04E-21 P-value corresponds to differential gene expression test. Since gene expression analysis generates a large multiplicity problems in which thousands of hypothesis (is gene x differentially expressed between the two groups) are tested simultaneously, correction for false positive (type I errors) and false negative (type II) errors are performed. Assume that we have picked out R differentially expressed genes in which S genes really show differential expression and the other V genes are false positive. If we decide that the error ratio Q = V / R must stay below a cutoff (e.g. 5%), we should preset the FDR to a number no larger than 0.05. We use FDR 0.001 and the absolute value of Log2 Ratio 1 as the threshold to judge the significance of gene expression difference.

Supplementary Table 25 The expression of genes involved in the response to osmotic stress at different stages of fruit (RPKM) Gene Young fruit White mature fruit Half red fruit Full red fruit CCG000538.1 770.147 692.075 795.314 761.125 CCG009211.1 330.709 122.295 375.324 524.289 CCG009990.1 595.140 1205.253 585.496 410.768 addgene964 203.681 396.362 351.036 402.276 CCG010121.1 164.179 307.521 260.622 333.607 CCG019519.1 1362.017 1421.147 499.776 321.566 Supplementary Table 26 The high expression of chitinase genes in jujube (RPKM) Gene Primary shoot Secondary shoot Bearing shoot Mother shoot CCG005296.1 944.854 117.202 0.056 86.649 CCG005390.1 706.708 89.242 0.057 32.767 CCG011079.1 520.448 1536.693 93.778 30.526 CCG011490.1 313.140 303.040 81.483 115.652 CCG017732.1 8978.994 1087.301 8.086 3746.561 CCG023017.1 579.624 568.797 397.713 61.971 CCG023019.1 271.027 383.458 87.016 385.801 KEGG_Orthology K01183 1 2e-88 323 gmx:100818728 c hitinase [EC:3.2.1.14] K01183 1 2e-91 333 zma:100284575 c hitinase [EC:3.2.1.14] K01183 1 4e-136 482 rcu:rcom_080 6420 chitinase [EC:3.2.1.14] K01183 1 3e-46 184 osa:4338718 chiti nase [EC:3.2.1.14] K01183 1 4e-91 332 gmx:100818728 c hitinase [EC:3.2.1.14] K01183 1 4e-75 278 vvi:100250948 ch itinase [EC:3.2.1.14] K01183 1 8e-127 451 pop:poptr_11 16414 chitinase [EC:3.2.1.14] The gene expression level is calculated by using RPKM 9 method (Reads per kilobase transcriptome per million mapped reads). Supplementary Table 27 The number of genes encoding autophagy-related protein 9 in jujube and other species IPR_ID Z. jujuba F. vesca M. alba M.domestica P. bretschneider P. mume P. persica p_value IPR007241 13 1 1 2 2 1 1 6.38 10 7 IPR ID reference to the InterPro database Supplementary Table 28 R genes in jujube and 11 other species Species CC-NBS TIR-CC- TIR-NBS- CC-NBS LRR-RLK NBS-LRR NBS -LRR NBS-LRR LRR TIR-NBS Citrullus lanatus 5 0 181 14 17 0 12 3 Citrus sinensis 110 42 323 165 94 3 77 16 Fragaria vesca 44 14 200 70 45 0 21 15 Musa acuminata 24 7 301 63 27 0 0 0 Morus alba 26 17 183 59 44 2 19 10 Malus domestica 79 34 475 409 230 2 195 87 Pyrus bretschneideri 54 16 411 173 63 1 39 9 Phoenix dactylifera 15 2 90 42 26 0 0 0 Prunus mume 40 8 261 140 74 2 144 35 Prunus persica 42 6 267 176 57 0 128 15 Vitis vinifera 39 12 232 176 89 0 18 3 Ziziphus jujuba 115 32 294 231 136 0 26 15

Supplementary Table 29 Unique genes and positively selected genes with NB-ARC domains in the jujube genome IPR ID IPR Title Unique genes and positively selected genes Total number IPR002182 NB-ARC 95 555 IPR ID reference to the InterPro database 10. Gene IDs addgene2520 addgene2764 addgene2765 addgene2783 addgene2784 addgene3224 addgene3626 addgene3669 CCG000777.1 CCG000778.1 CCG001287.1 CCG001888.1 CCG002103.1 CCG002106.1 CCG003077.5 CCG004193.1 CCG004825.1 CCG005647.1 CCG005726.1 CCG005729.1 CCG005898.1 CCG006085.1 CCG006621.1 CCG006624.1 CCG006625.1 CCG007031.1 CCG007432.1 CCG007742.1 CCG007743.1 CCG007747.1 CCG007748.1 CCG008254.1 CCG008261.1 CCG008262.1 CCG008265.1 CCG008274.1 CCG008608.1 CCG008610.1 CCG008971.1 CCG009417.1 CCG009551.1 CCG009553.1 CCG009556.8 CCG010027.1 CCG010199.1 CCG010356.2 CCG010358.1 CCG010366.1 CCG010368.3 CCG010796.1 CCG011103.1 CCG011820.2 CCG011821.1 CCG011822.1 CCG011983.1 CCG012198.1 CCG013253.1 CCG013254.1 CCG013281.1 CCG013290.1 CCG013865.1 CCG014171.1 CCG014223.1 CCG014571.1 CCG015464.2 CCG018096.1 CCG018278.1 CCG018998.1 CCG018999.1 CCG019360.1 CCG019912.1 CCG022235.1 CCG022237.1 CCG022239.1 CCG022240.1 CCG022442.1 CCG023563.1 CCG023965.1 CCG025294.1 CCG025431.1 CCG027224.1 CCG027274.1 CCG027350.2 CCG027494.1 CCG027495.1 CCG027500.1 CCG027528.2 CCG027884.1 CCG027885.1 CCG028859.1 CCG028883.1 CCG028931.1 CCG028932.1 CCG028936.1 CCG028940.1 Supplementary References 1. Yang, Y. X., Wang, G Y. & Pan, X.Ch. China Food Composition (Book1.2nd Edition) (Peking University Medical Press, 2009). 2. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311-317 (2010). 3. Velasco, R. et al. The genome of the domesticated apple (Malus x domestica Borkh.). Nat Genet 42, 833-839 (2010). 4. Wu, J. et al. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res 23, 396-408 (2013). 5. International Peach Genome, I. et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45, 487-494 (2013). 6. Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nat Genet 43, 109-116 (2011). 7. Zhang, Q. et al. The genome of Prunus mume. Nat Commun 3, 1318 (2012). 8. He, N. et al. Draft genome sequence of the mulberry tree Morus notabilis. Nat Commun 4, 2445 (2013). 9. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5, 621-628 (2008). 10.Hunter, S. et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40, 306-312 (2012).