Nature Genetics: doi: /ng Supplementary Figure 1. Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions.

Similar documents
Transcription:

Supplementary Figure 1 Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions. Relationships of cultivated and wild rice correspond to previously observed relationships 40. Wild rice accessions (dark green) are divided into different groups. The japonica (orange) and aromatic (light green) rice varieties form a clade. The BHA (red), SH (purple), and Chinese (black) weedy rice strains cluster with indica (light blue) and aus (pink), respectively. Bootstrap values (>50) are shown on each branch. 1

Supplementary Figure 2 Ancestral population structures of 183 rice accessions. (a,b) Analyses of the full data set (a) and no-missing data set (b) were performed separately. Each vertical bar represents one accession, and different colors indicate distinct ancestry states. Cross-validation error was estimated for diverse K values from two to five. K = 3 minimizes the cross-validation error. 2

Supplementary Figure 3 Distributions of wild- and crop-specific private SNPs in SH (orange), BHA (red), and Chinese (green) weedy rice at the wholegenome level. The box corresponds to the 95% confidence interval, and the black bar within each box is the mean value. The blue horizontal dashed line represents the equal proportion of wild- and crop-specific private SNPs. Positive and negative values represent crop-like and wildlike, respectively. 3

Supplementary Figure 4 Distribution patterns of wild- and crop-specific private SNPs within each chromosome of SH (orange), BHA (red), and Chinese (green) weedy rice. The box corresponds to the 95% confidence interval, and the black bar within each box is the mean value for each chromosome. The blue horizontal dotted line represents zero. Positive and negative values represent the presence of crop-like and wild-like SNPs, respectively. 4

Supplementary Figure 5 Decrease in nucleotide diversity ( ) in SH and BHA weedy rice as compared to their crop ancestors. The y axis represents the ratio of nucleotide diversity ( ) between weedy rice and their crop ancestors. Each bar is a 100-kb window, and the red solid line corresponds to the threshold for the top 5%. The numbers on the red line are the exact thresholds for each comparison. 5

Supplementary Figure 6 Gene Ontology (GO) analysis of the 178 candidate genes in SH weedy rice. 6

Supplementary Figure 7 Gene Ontology (GO) analysis of the 307 candidate genes in BHA weedy rice. 7

Supplementary Figure 8 Ratio of nucleotide diversity ( ) between indica and SH weedy rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 8

Supplementary Figure 9 Genome-wide genetic differentiation (F ST) between indica and SH weedy rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 9

Supplementary Figure 10 Ratio of nucleotide diversity ( ) between BHA and aus across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 10

Supplementary Figure 11 Genome-wide genetic differentiation (F ST) between BHA and aus across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 11

Supplementary Figure 12 Ratio of nucleotide diversity ( ) between wild and cultivated rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 12

Supplementary Figure 13 Genome-wide genetic differentiation (F ST) between wild and cultivated rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 13

Supplementary Figure 14 Ratio of nucleotide diversity ( ) between wild and cultivated rice. The identified domestication genes are shown for each chromosome. 14

Supplementary Figure 15 Ratio of nucleotide diversities ( and ) between wild, cultivated, and weedy rice. The y and x axes are the ratio of and, respectively. 15

Supplementary Figure 16 Neighbor-joining tree based on the 54 selected rice accessions. Relationships of cultivated and wild rice correspond to previously observed relationships 40. Wild rice accessions (dark green) are divided into different groups. The japonica (orange) and aromatic (light green) rice varieties form a clade. The BHA (red), SH (purple), and Chinese (black) weedy rice strains cluster with indica (light blue) and aus (pink), respectively. Bootstrap values (>50) are shown on each branch. 16

Supplementary Table 2. Number of raw SNPs and their distributions in the wild, cultivated and weedy rice genome. Group name Exon UTR Intron Intergenic Nonsynonymous* Synonymous* Ka Ks Ka/Ks aus 963745 154023 961081 3757112 512771 358401 1255074 876228 1.43 aromatic 409770 64131 418414 1546190 214040 151316 298347 209550 1.42 indica 1400094 209055 1349328 5209655 741728 526050 2797740 2008557 1.39 japonica 694387 109533 698002 2653409 370863 262536 1085172 753697 1.44 BHA 678053 90386 625493 2488462 354260 255452 1045895 755960 1.38 SH 505256 57853 431634 1709568 254137 189222 677946 504069 1.34 Chinese weed 190649 26469 181954 732023 92723 66595 92704 66594 1.39 Wild rice 4135675 803202 4350743 16390689 2433497 1378369 4336080 2607605 1.66 Total 4912847 881389 4941348 18673333 2854339 1664537 8476517 5442703 1.56 Note: *, Stops and continuous mutations within the same codon were not included.

Supplementary Table 4. Nucleotide diversity (π) and number of raw variants (SNPs and INDELs) for each chromosome of wild, cultivated and weedy rice. Chromosome Wild rice All cultivar indica aus SH BHA Variants π Variants π Variants π Variants π Variants π Variants π 1 3147009 0.007817 1302635 0.00606 982296 0.00449 711959 0.00486 309448 0.00184 465714 0.00306 2 2696614 0.007856 1054893 0.00595 809467 0.00451 559672 0.00481 282171 0.00180 437247 0.00330 3 2685772 0.007806 982923 0.00571 759105 0.00409 527751 0.00463 275832 0.00187 337176 0.00264 4 2597369 0.007608 1203961 0.00587 956169 0.00512 582931 0.00491 272870 0.00188 358359 0.00242 5 2207720 0.007890 875849 0.00553 655711 0.00408 468923 0.00505 270879 0.00227 349483 0.00327 6 2379909 0.008252 1055649 0.00643 735854 0.00483 585881 0.00553 265573 0.00197 426224 0.00405 7 2261240 0.008255 990868 0.00624 721697 0.00457 538885 0.00544 275380 0.00249 338777 0.00310 8 2126571 0.007770 1048890 0.00645 775767 0.00533 510706 0.00537 266907 0.00252 379318 0.00305 9 1753967 0.008100 788027 0.00656 572112 0.00482 439386 0.00585 198416 0.00223 247657 0.00277 10 1808396 0.008488 832016 0.00693 597830 0.00530 399779 0.00491 224931 0.00257 291542 0.00248 11 2256643 0.008462 1214995 0.00764 933313 0.00681 633067 0.00636 319954 0.00323 418102 0.00375 12 2122349 0.008094 1020007 0.00661 802565 0.00581 551144 0.00586 216182 0.00186 439378 0.00451 Mean 2336963 0.008033 1030893 0.00633 9301886 0.00498 542507 0.00530 264878 0.00221 374081 0.00320 Note: the number of variants was estimated using VCFtools.

Supplementary Table 5. Number and proportion of wild- and crop-specific private SNPs in each of the three weedy rice strains. Chromosome SH weedy rice BHA weedy rice Chinese weedy rice Crop-specific* Wild-specific Crop-specific Wild-specific Crop-specific Wild-specific 1 14153 (43.3%) 18507 (56.7%) 15128 (39.0%) 23624 (61.0%) 10501 (78.9%) 2814 (21.1%) 2 21293 (58.8%) 14905 (41.2%) 19260 (45.9%) 22665 (54.1%) 9749 (80.5%) 2357 (19.5%) 3 16616 (58.0%) 12025 (42.0%) 11858 (40.8%) 17240 (59.2%) 8561 (81.0%) 2010 (19.0%) 4 15118 (43.1%) 19960 (56.9%) 13868 (27.8%) 36069 (72.2%) 8439 (69.4%) 3720 (30.6%) 5 13300 (45.2%) 16131 (54.8%) 8133 (29.4%) 19536 (70.6%) 6987 (49.0%) 7285 (51.0%) 6 12523 (43.0%) 16632 (57.0%) 13894 (38.8%) 21919 (61.2%) 4232 (50.5%) 4155 (49.5%) 7 14215 (40.1%) 21261 (59.9%) 12523 (32.4%) 26073 (67.6%) 6368 (70.3%) 2696 (29.7%) 8 16587 (50.4%) 16353 (49.6%) 13927 (38.1%) 22670 (61.9%) 6732 (70.3%) 2396 (26.2%) 9 9806 (44.0%) 12466 (56.0%) 6855 (26.6%) 18914 (73.4%) 4876 (68.5%) 2245 (31.5%) 10 10092 (39.6%) 15381 (60.4%) 10946 (33.6%) 21621 (66.4%) 3064 (55.8%) 2429 (44.2%) 11 17925 (46.8%) 20400 (53.2%) 18039 (39.1%) 28134 (60.9%) 6250 (66.8%) 3105 (33.2%) 12 12342 (39.3%) 19091 (60.7%) 20386 (39.8%) 30796 (60.2%) 8203 (76.4%) 2527 (23.6%) Total 173970 (46.1%) 203112 (53.9%) 164817 (36.3%) 289261 (63.7%) 83962 (69.0%) 37739 (31.0%) Note, * indicates the total number and proportion of private SNPs in each chromosome of the three weedy rice strains.

Supplementary Table 6. Divergence time between cultivated (indica and aus) and weedy (BHA, SH and Chinese weed) rice. Summary Statistic Tmrca Tmrca Tmrca Tmrca Tmrca Tmrca (BHA_aus) (SH_indica) (CN weed_indica) (aus_indica) (aus_wild) (indica_wild) Mean 8.99 10-3 5.66 10-3 5.02 10-3 13.8 10-3 87.6 10-3 87.6 10-3 Standard error of mean 6.60 10-4 4.15 10-4 3.68 10-4 1.01 10-3 6.42 10-3 6.42 10-3 Standard deviation 0.0161 0.0101 8.98 10-3 0.0246 0.1566 0.1566 Variance 2.59 10-4 1.02 10-4 8.06 10-5 6.06 10-4 0.0245 0.0245 Median 5.55 10-3 3.50 10-3 3.10 10-3 8.51 10-3 0.0542 0.0542 Geometric mean 6.32 10-3 3.98 10-3 3.53 10-3 9.66 10-3 0.0616 0.0616 95% HPD interval [1.771 10-3, [1.1059 10-3, [9.8262 10-4, [2.6902 10-3, 0.0235] 0.0148] 0.0131] 0.0359] [0.0165, 0.2287] [0.0165, 0.2287] Auto-correlation time 15135.6424 15129.1392 15123.2719 15122.6664 15132.0367 15132.0367 Effective sample size 594.689 594.9446 595.1754 595.1993 594.8307 594.8307 Note: Tmrca, divergence time of most recent common ancestor; CN weed, Chinese weedy rice.

Supplementary Table 14. Parameters generated from different K values of Admixture. Cross validation error Log likelihood K = 1 0.2324-1366434606.22 K = 2 0.1781-1150295302.11 K = 3 0.1390-1038218670.15 K = 4 0.3645-985846206.76 K = 5 0.2974-909274281.53