Nature Genetics: doi: /ng Supplementary Figure 1. Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions.

Size: px
Start display at page:

Download "Nature Genetics: doi: /ng Supplementary Figure 1. Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions."

Transcription

1 Supplementary Figure 1 Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions. Relationships of cultivated and wild rice correspond to previously observed relationships 40. Wild rice accessions (dark green) are divided into different groups. The japonica (orange) and aromatic (light green) rice varieties form a clade. The BHA (red), SH (purple), and Chinese (black) weedy rice strains cluster with indica (light blue) and aus (pink), respectively. Bootstrap values (>50) are shown on each branch. 1

2 Supplementary Figure 2 Ancestral population structures of 183 rice accessions. (a,b) Analyses of the full data set (a) and no-missing data set (b) were performed separately. Each vertical bar represents one accession, and different colors indicate distinct ancestry states. Cross-validation error was estimated for diverse K values from two to five. K = 3 minimizes the cross-validation error. 2

3 Supplementary Figure 3 Distributions of wild- and crop-specific private SNPs in SH (orange), BHA (red), and Chinese (green) weedy rice at the wholegenome level. The box corresponds to the 95% confidence interval, and the black bar within each box is the mean value. The blue horizontal dashed line represents the equal proportion of wild- and crop-specific private SNPs. Positive and negative values represent crop-like and wildlike, respectively. 3

4 Supplementary Figure 4 Distribution patterns of wild- and crop-specific private SNPs within each chromosome of SH (orange), BHA (red), and Chinese (green) weedy rice. The box corresponds to the 95% confidence interval, and the black bar within each box is the mean value for each chromosome. The blue horizontal dotted line represents zero. Positive and negative values represent the presence of crop-like and wild-like SNPs, respectively. 4

5 Supplementary Figure 5 Decrease in nucleotide diversity ( ) in SH and BHA weedy rice as compared to their crop ancestors. The y axis represents the ratio of nucleotide diversity ( ) between weedy rice and their crop ancestors. Each bar is a 100-kb window, and the red solid line corresponds to the threshold for the top 5%. The numbers on the red line are the exact thresholds for each comparison. 5

6 Supplementary Figure 6 Gene Ontology (GO) analysis of the 178 candidate genes in SH weedy rice. 6

7 Supplementary Figure 7 Gene Ontology (GO) analysis of the 307 candidate genes in BHA weedy rice. 7

8 Supplementary Figure 8 Ratio of nucleotide diversity ( ) between indica and SH weedy rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 8

9 Supplementary Figure 9 Genome-wide genetic differentiation (F ST) between indica and SH weedy rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 9

10 Supplementary Figure 10 Ratio of nucleotide diversity ( ) between BHA and aus across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 10

11 Supplementary Figure 11 Genome-wide genetic differentiation (F ST) between BHA and aus across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 11

12 Supplementary Figure 12 Ratio of nucleotide diversity ( ) between wild and cultivated rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 12

13 Supplementary Figure 13 Genome-wide genetic differentiation (F ST) between wild and cultivated rice across the genome. (a) The criteria MQ 10 and DP 1 were used to filter the reported variants. (b) The criteria MQ 20 and DP 3 were used to filter the reported variants. (c) The criteria MQ 10 and DP 1 were used, excluding low-frequency (<2%) heterozygosity. Each bar represents a 100-kb window. 13

14 Supplementary Figure 14 Ratio of nucleotide diversity ( ) between wild and cultivated rice. The identified domestication genes are shown for each chromosome. 14

15 Supplementary Figure 15 Ratio of nucleotide diversities ( and ) between wild, cultivated, and weedy rice. The y and x axes are the ratio of and, respectively. 15

16 Supplementary Figure 16 Neighbor-joining tree based on the 54 selected rice accessions. Relationships of cultivated and wild rice correspond to previously observed relationships 40. Wild rice accessions (dark green) are divided into different groups. The japonica (orange) and aromatic (light green) rice varieties form a clade. The BHA (red), SH (purple), and Chinese (black) weedy rice strains cluster with indica (light blue) and aus (pink), respectively. Bootstrap values (>50) are shown on each branch. 16

17 Supplementary Table 2. Number of raw SNPs and their distributions in the wild, cultivated and weedy rice genome. Group name Exon UTR Intron Intergenic Nonsynonymous* Synonymous* Ka Ks Ka/Ks aus aromatic indica japonica BHA SH Chinese weed Wild rice Total Note: *, Stops and continuous mutations within the same codon were not included.

18 Supplementary Table 4. Nucleotide diversity (π) and number of raw variants (SNPs and INDELs) for each chromosome of wild, cultivated and weedy rice. Chromosome Wild rice All cultivar indica aus SH BHA Variants π Variants π Variants π Variants π Variants π Variants π Mean Note: the number of variants was estimated using VCFtools.

19 Supplementary Table 5. Number and proportion of wild- and crop-specific private SNPs in each of the three weedy rice strains. Chromosome SH weedy rice BHA weedy rice Chinese weedy rice Crop-specific* Wild-specific Crop-specific Wild-specific Crop-specific Wild-specific (43.3%) (56.7%) (39.0%) (61.0%) (78.9%) 2814 (21.1%) (58.8%) (41.2%) (45.9%) (54.1%) 9749 (80.5%) 2357 (19.5%) (58.0%) (42.0%) (40.8%) (59.2%) 8561 (81.0%) 2010 (19.0%) (43.1%) (56.9%) (27.8%) (72.2%) 8439 (69.4%) 3720 (30.6%) (45.2%) (54.8%) 8133 (29.4%) (70.6%) 6987 (49.0%) 7285 (51.0%) (43.0%) (57.0%) (38.8%) (61.2%) 4232 (50.5%) 4155 (49.5%) (40.1%) (59.9%) (32.4%) (67.6%) 6368 (70.3%) 2696 (29.7%) (50.4%) (49.6%) (38.1%) (61.9%) 6732 (70.3%) 2396 (26.2%) (44.0%) (56.0%) 6855 (26.6%) (73.4%) 4876 (68.5%) 2245 (31.5%) (39.6%) (60.4%) (33.6%) (66.4%) 3064 (55.8%) 2429 (44.2%) (46.8%) (53.2%) (39.1%) (60.9%) 6250 (66.8%) 3105 (33.2%) (39.3%) (60.7%) (39.8%) (60.2%) 8203 (76.4%) 2527 (23.6%) Total (46.1%) (53.9%) (36.3%) (63.7%) (69.0%) (31.0%) Note, * indicates the total number and proportion of private SNPs in each chromosome of the three weedy rice strains.

20 Supplementary Table 6. Divergence time between cultivated (indica and aus) and weedy (BHA, SH and Chinese weed) rice. Summary Statistic Tmrca Tmrca Tmrca Tmrca Tmrca Tmrca (BHA_aus) (SH_indica) (CN weed_indica) (aus_indica) (aus_wild) (indica_wild) Mean Standard error of mean Standard deviation Variance Median Geometric mean % HPD interval [ , [ , [ , [ , ] ] ] ] [0.0165, ] [0.0165, ] Auto-correlation time Effective sample size Note: Tmrca, divergence time of most recent common ancestor; CN weed, Chinese weedy rice.

21 Supplementary Table 14. Parameters generated from different K values of Admixture. Cross validation error Log likelihood K = K = K = K = K =