Nature Genetics: doi: /ng Supplementary Figure 1. The pedigree information for American upland cotton breeding.

Size: px
Start display at page:

Download "Nature Genetics: doi: /ng Supplementary Figure 1. The pedigree information for American upland cotton breeding."

Transcription

1 Supplementary Figure 1 The pedigree information for American upland cotton breeding. The integrated figure was modified from Fig. 1 to 10 in Calhoun, Bowman & May (1994). The accessions with blue color were collected and analyzed in our study. 1

2 Supplementary Figure 2 Phylogenetic tree of all resequenced accessions. Neighbour-joining tree of all these accessions was constructed using the whole-genome SNPs, including outgroup, landrace and improved modern cultivars marked with light blue, red and dark blue lines, respectively. All these cultivars could be classified into three clades, including a special clade I, Stoneville 2B clade (subclade 1) and Deltapine 15 clade (subclade 2). 2

3 Supplementary Figure 3 Phylogenetic relationships of 318 cotton accessions. (a) Principal component analysis of all cotton accessions using whole-genome SNP data. The outgroups were clustered together. Clade Ⅰ mainly contained recently domesticated founders and from which developed cultivars such as Acala, Burling s Mexican and the Maryland Green seed. Clade Ⅱ included most of Upland cotton cultivars mainly derived from the landraces originated from the same ancestral resource. Subclade Ⅰ included the American landraces Stoneville 2B (STV2B), Auburn 56 and Paymaster 54, and those modern cultivars developed from STV2B and were planted mainly in the Yellow river cotton growing area in China. Subclade Ⅱ contained the American landraces such as Deltapine 15 (DPL15), Lankart 57 and Dixie king, and the cultivars developed mainly from DPL15 and were planted in the Yellow river, Yangtze river and Northwest (Xinjiang) cotton growing areas in China. (b) Population structure of cotton accessions determined using STRUCTURE. When K was set to 4, the outgroup component was present. The components of clade Ⅰ and clade Ⅱ including landraces and modern cultivars were difficult to be distinguished from K=2 to K=4. 3

4 Supplementary Figure 4 Frequency distribution of phenotypic variation of ten agronomic traits in 258 accessions. The yield traits included lint percent, seed index, boll weight and boll number. The fiber quality traits included fiber length, fiber elongation, fiber strength, micronaire and fiber uniformity. The Verticillium wilt was for biotic disease resistance. 4

5 Supplementary Figure 5 The overlapping regions between selective sweeps and GWAS associated loci. Four selective sweeps were found located around associated loci for fiber length (FL) (A11: ), fiber strength (FS) (A11: ) and lint percentage (LP) (D02: and D03: ). The horizontal dashed line indicated the genome-wide threshold (2.5) defining the 1% of π landrace/π cultivar values. The horizontal line in Manhattan plot indicated the threshold of GWAS ( ). 5

6 Supplementary Figure 6 The independent assortment between two elite alleles of GhLYI-A02 and GhLYI-D08. These two alleles segregated independently and randomly united at fertilization. Along with crossing over, they would get together through independent assortment and introduce into some cultivars leading to increased number of bolls per plant and lint percentage. The horizontal box indicated the chromosomes A02 and D08. The vertical line in chromosome indicated two elite alleles. 6

7 Supplementary Figure 7 Manhattan plots for lint percentage in nine environments, determined with EMMAx. Negative log 10 (P-value) from a genome-wide scan was plotted against position on each of 26 chromosomes. The horizontal line indicated the threshold (10-6 ). 7

8 Supplementary Figure 8 Two significant associated loci for lint percentage. (a) Manhattan plot for lint percentage. Negative log 10 (P-value) from a genome-wide scan was plotted against position on chromosome D03 and D11. The horizontal line indicated the threshold (10-6 ). The arrows indicated the associated signal peak D03: and D11: (b) Local Manhattan plot and LD heatmap. The candidate region was identified between dashed lines. The arrow indicated the SNP in candidate gene. 8

9 Supplementary Figure 9 Manhattan plots for seed index in nine environments, generated with EMMAx. Negative log 10 (P-value) from a genome-wide scan was plotted against position on each of 26 chromosomes. The horizontal line indicated the threshold (10-6 ). 9

10 Supplementary Figure 10 The candidate gene AHP5 within the most strongly associated locus for seed index. (a) Negative log 10 P-values for association of seed index (SI) was plotted against SNP positions (X axis). The genome-wide significant P-value threshold (10-6 ) was indicated by a horizontal blue line. The arrow indicated the signal peak containing the candidate gene (AHP5). (b) Transcription level of the gene AHP5 in different tissues using FPKM with single experiment. (c) Comparison of expression level of AHP5 in ovule development stage from two accessions, TM-1 and ZMS12. The asterisk indicated the significant difference at P < 0.01 (two-side t-test, three independent biological replications). 10

11 Supplementary Figure 11 GWAS of Verticillium wilt resistance in nine environments, done with EMMAx. (a) Manhattan plot for Verticillium wilt-resistance. Negative log 10 (P-value) from a genome-wide scan was plotted against position on chromosome D06. The horizontal line indicated the threshold (10-6 ). The arrows indicated the associated signal peak D06: (b) Local Manhattan plot and LD heatmap. The candidate region was identified between dashedlines. Red arrow indicated the SNP in candidate gene. 11

12 Supplementary Figure 12 The pedigrees of ten deep-sequenced cultivars and landraces extensively planted in China. The accessions for deep sequencing included three founder landraces, DPL15, STV2B and UGDmian (UDGM). The remaining seven grown cultivars for deep sequencing included SM2, SM3, 86-1, Shiyuan321, ZMS12, Jumian 1 and XLZ42. 12

13 Supplementary Figure 13 The detection of in-frame indels in the NAC gene in accessions. This gene was located in two GWAS associated loci, D02: and D02: , which were associated with boll weight (BW) and number of bolls per plant (BN), respectively. A 7-bp indel insertion was detected in the founders compared to the reference genome (TM-1). 13

14 Supplementary Tables 1-32 Supplementary Table 1. Summary of 318 cotton samples and sequencing (Separate file). Supplementary Table 2. Distribution of SNPs in 26 chromosomes of TM-1. Supplementary Table 3. Summary of sequencing and variations of all accessions. Supplementary Table 4. SNP accuracy verified using ten deeply sequenced accessions Supplementary Table 5. SNP accuracy verified using PCR-based sequencing. Supplementary Table 6. The detection of SNP quality based on PCR sequencing (Separate file). Supplementary Table 7. Levels of genetic differentiation in different chromosomes. Supplementary Table 8. Genome-wide detection of selective sweep regions in Upland cotton improvement. Supplementary Table 9. Cotton QTLs overlapping with improvement sweeps (Separate file). Supplementary Table 10. Genome-wide association signals of yield, fiber quality and disease resistance (Separate file). Supplementary Table 11. The associated loci overlapped with QTLs (Separate file). Supplementary Table 12. Correlation of phenotypic variation among agronomic traits. Supplementary Table 13. The SNP information in associated locus A02_ for lint yield. Supplementary Table 14. Identification of candidate gene in associated locus A02_ for lint yield (Separate file). Supplementary Table 15. The SNP information in associated locus D08 _ for lint yield. Supplementary Table 16. Identification of candidate gene in associated locus D08_ for lint yield (Separate file). Supplementary Table 17. The average phenotype data of accessions based on independent assortment of two associated loci. Supplementary Table 18. The SNP information in associated loci for fiber quality (Separate file). Supplementary Table 19. Identification of candidate gene in associated loci for fiber quality (Separate file). Supplementary Table 20. The SNP information in associated locus D03_ for LP 1

15 Supplementary Table 21. Identification of candidate gene in associated locus D03_ for LP (Separate file). Supplementary Table 22. The phenotype changes of accessions with different SNP type. Supplementary Table 23. Correlation analysis of nonsynonymous SNP in four candidate genes associated with LP. Supplementary Table 24. The summary of indels identified in ten cultivars based on deep-sequencing (Separate file). Supplementary Table 25. The indels overlapped with GWAS loci (Separate file). Supplementary Table 26. The genes involved in copy number variation (Separate file). Supplementary Table 27. The genetic constitution of the deep-sequencing cultivars extensively planted in china. Supplementary Table 28. The Identity-by-decent (IBD) regions comparing the Chinese cultivars to the traditional landraces (Separate file). Supplementary Table 29. The detection and transmission of indels in IBD regions between the founders and modern cultivars. Supplementary Table 30. The transferring percentage of GWAS-related regions from DPL15, STV2B and UGDM. Supplementary Table 31. The GWAS loci detected in seven modern cultivars. Supplementary Table 32. The position of 53 SNPs located in the 24 regions for PCR amplification. 2

16 Supplementary Table 1. Summary of 318 cotton samples and sequencing (Separate file). Supplementary Table 2. Distribution of SNPs in 26 chromosomes of TM-1. Chromosome Length SNP number SNP/Kb A01 99,884, , A02 83,447, , A03 100,263, , A04 62,913, , A05 92,047, , A06 103,170, , A07 78,251, , A08 103,626, , A09 74,999, , A10 100,866, , A11 93,316, , A12 87,484, , A13 79,961, , D01 61,456, , D02 67,284, , D03 46,690, , D04 51,454, , D05 61,933, , D06 64,294, , D07 55,312, , D08 65,894, , D09 50,995, , D10 63,374, , D11 66,087, , D12 59,109, , D13 60,534, ,

17 Supplementary Table 3. Summary of sequencing and variations of all accessions. Group Accession number Raw data depth ( ) Uniquely mapping rate to the A subgenome Uniquely mapping rate to the D subgenome Heterozygosity Outgroup % 23.6% 17.9% Landrace % 24.9% 14.3% Cultivars % 19.8% 13.5% Total % 20.5% 14.8% 4

18 Supplementary Table 4. SNP accuracy verified using ten deeply sequenced accessions Accessions Genome coverage Accuracy 1 Zhongmiansuo 7# (ZMS 7) % 2 ZMS % 3 Simian 2# % 4 Xinluzao 42# % % 6 Simian 3# % 7 Deltapine % 8 Stoneville 2B % 9 Shiyuan % 10 Junmian 1# % 5

19 Supplementary Table 5. SNP accuracy verified using PCR-based sequencing Sample No Accessions Validated genotype False genotype Accuracy 8 Yuanmoulihemumian % 11 G. tomentosum % 13 Acala % 16 Coker % 76 XinXiang89S % 97 BaZhou % 210 XiaoXianDaLing % 235 ZhongMianSuo % 311 Miscot % Supplementary Table 6. The detection of SNP quality based on PCR sequencing (Separate file). 6

20 Supplementary Table 7. Levels of genetic differentiation in different chromosomes A subgenome F ST D subgenome F ST A D A D A D A D A D A D A D A D A D A D A D A D A D Average Average

21 Supplementary Table 8. Genome-wide detection of selective sweep regions in Upland cotton improvement Chr. Peak locus (Mb) π Landrace/ π Cultivar Selective sweeps regions (Mb) XP-CLR (Mb) A A A A A A A A A A A D D D D D D D D D D D D D D Supplementary Table 9. Cotton QTLs overlapping with improvement sweeps (Separate file). Supplementary Table 10. Genome-wide association signals of yield, fiber quality and disease resistance (Separate file). Supplementary Table 11. The associated loci overlapped with QTLs (Separate file). 8

22 Supplementary Table 12. Correlation of phenotypic variation among agronomic traits. FS FL FM BN BW LP FL 0.828** FM ** ** BN ** * 0.388** BW 0.115* ** LP ** ** 0.334** 0.521** 0.119* SI 0.335** 0.244** ** ** 0.637** ** FL, Fiber length; FM, micronaire; BN, boll number; BW, boll weight; LP, lint percentage; SI, seed index. 9

23 Supplementary Table 13. The SNP information in associated locus A02_ for lint yield. Gene ID SNP location Ref/Alt Region Variation Type Associated signal (P-value) two-tailed t-test Gh_A02G1390 A02: T/C upstream Gh_A02G1390 A02: A/G upstream Gh_A02G1390 A02: A/G upstream Gh_A02G1390 A02: A/C downstream Gh_A02G1391 A02: T/C intron Gh_A02G1391 A02: A/G intron Gh_A02G1391 A02: G/T intron Gh_A02G1391 A02: A/T intron Gh_A02G1391 A02: G/A exonic nonsynonymous Gh_A02G1391 A02: C/T intron Gh_A02G1391 A02: C/A intron Gh_A02G1391 A02: A/G intron Gh_A02G1391 A02: C/T exonic nonsynonymous Gh_A02G1391 A02: T/A exonic synonymous Gh_A02G1392 A02: A/C upstream Gh_A02G1392 A02: C/T upstream Gh_A02G1392 A02: T/A exonic nonsynonymous Gh_A02G1392 A02: A/G exonic nonsynonymous Gh_A02G1392 A02: G/A exonic nonsynonymous E-07 Gh_A02G1392 A02: T/C exonic synonymous SNP Gh_A02G1392 A02: C/G intron Gh_A02G1392 A02: G/A intron Gh_A02G1392 A02: A/C downstream Gh_A02G1392 A02: A/T downstream Gh_A02G1392 A02: G/A downstream Gh_A02G1393 A02: T/G upstream Gh_A02G1394 A02: A/C downstream Gh_A02G1394 A02: C/T exonic synonymous Gh_A02G1394 A02: A/C intron Gh_A02G1394 A02: C/T intron Gh_A02G1394 A02: G/A intron Gh_A02G1395 A02: G/T upstream Gh_A02G1397 A02: T/C downstream Gh_A02G1397 A02: C/T exonic synonymous Gh_A02G1397 A02: C/T exonic nonsynonymous Gh_A02G1397 A02: C/G upstream Gh_A02G1398 A02: G/A upstream E-07 Gh_A02G1398 A02: G/A upstream 10

24 Gh_A02G1398 A02: T/C upstream Gh_A02G1398 A02: T/G intron Gh_A02G1398 A02: A/T intron Gh_A02G1398 A02: T/A intron Gh_A02G1398 A02: A/C exonic nonsynonymous Gh_A02G1399 A02: T/C exonic synonymous Gh_A02G1399 A02: A/G exonic synonymous Gh_A02G1399 A02: A/T intron Gh_A02G1399 A02: A/C intron Gh_A02G1399 A02: T/C upstream Gh_A02G1399 A02: T/C upstream Supplementary Table 14. Identification of candidate gene in associated locus A02_ for lint yield (Separate file). 11

25 Supplementary Table 15. The SNP information in associated locus D08 _ for lint yield. Gene ID SNP location Ref/Alt Region Veriation Type Associated signal in GWAS (P-value) two-tailed t-test Gh_D08G0288 D08: A/G intron Gh_D08G0288 D08: T/G intron Gh_D08G0288 D08: C/T intron Gh_D08G0288 D08: A/T intron Gh_D08G0288 D08: T/C intron Gh_D08G0288 D08: A/C exonic nonsynonymous Gh_D08G0288 D08: G/A intron Gh_D08G0288 D08: G/T upstream Gh_D08G0288 D08: T/G upstream Gh_D08G0290 D08: G/C intron Gh_D08G0290 D08: G/A intron Gh_D08G0290 D08: C/T intron Gh_D08G0290 D08: G/A intron Gh_D08G0290 D08: A/G exonic synonymous Gh_D08G0290 D08: G/A downstream Gh_D08G0291 D08: G/A downstream Gh_D08G0291 D08: T/C downstream Gh_D08G0291 D08: A/G intron Gh_D08G0291 D08: G/C intron Gh_D08G0291 D08: C/A intron Gh_D08G0291 D08: C/T intron E-07 Gh_D08G0291 D08: T/C intron Gh_D08G0291 D08: G/A intron Gh_D08G0291 D08: G/C exonic nonsynonymous Gh_D08G0291 D08: C/T exonic nonsynonymous Gh_D08G0291 D08: C/T upstream Gh_D08G0291 D08: A/T upstream Gh_D08G0291 D08: C/A upstream Gh_D08G0291 D08: G/T upstream Gh_D08G0291 D08: A/G upstream Gh_D08G0291 D08: T/C upstream Gh_D08G0293 D08: A/G upstream Gh_D08G0293 D08: G/A upstream Gh_D08G0293 D08: G/A exonic nonsynonymous Gh_D08G0293 D08: T/C exonic nonsynonymous Gh_D08G0293 D08: G/A exonic nonsynonymous Gh_D08G0293 D08: G/A downstream Gh_D08G0293 D08: A/G downstream Gh_D08G0293 D08: G/A downstream Gh_D08G0293 D08: A/G downstream Gh_D08G0295 D08: G/A upstream Gh_D08G0295 D08: T/A exonic nonsynonymous Gh_D08G0295 D08: C/T intron Gh_D08G0295 D08: G/A exonic synonymous Gh_D08G0295 D08: T/A downstream Gh_D08G0296 D08: G/A upstream Gh_D08G0296 D08: C/T upstream Gh_D08G0296 D08: T/G intron Gh_D08G0296 D08: A/C intron Gh_D08G0296 D08: A/T intron Gh_D08G0296 D08: T/A intron Gh_D08G0296 D08: G/C intron Gh_D08G0296 D08: G/C intron Gh_D08G0296 D08: C/G exonic nonsynonymous 12

26 Gh_D08G0296 D08: G/A exonic nonsynonymous Gh_D08G0298 D08: G/A exonic nonsynonymous Gh_D08G0298 D08: G/A exonic nonsynonymous Gh_D08G0298 D08: A/C intron Gh_D08G0298 D08: T/A exonic synonymous Gh_D08G0298 D08: T/A intron Gh_D08G0299 D08: A/G exonic synonymous Gh_D08G0299 D08: C/T exonic synonymous Gh_D08G0299 D08: G/A exonic synonymous Gh_D08G0299 D08: C/A intron Gh_D08G0299 D08: T/C intron Gh_D08G0299 D08: G/A exonic synonymous Gh_D08G0299 D08: A/T upstream Gh_D08G0299 D08: G/A upstream Gh_D08G0300 D08: G/A downstream E-07 Gh_D08G0300 D08: A/C downstream Gh_D08G0300 D08: C/T downstream Gh_D08G0300 D08: C/A downstream Gh_D08G0300 D08: T/C exonic synonymous Gh_D08G0300 D08: C/G intron Gh_D08G0300 D08: T/G intron Gh_D08G0300 D08: T/G upstream Gh_D08G0301 D08: C/A upstream Gh_D08G0301 D08: A/G exonic synonymous Gh_D08G0301 D08: G/C exonic nonsynonymous Gh_D08G0301 D08: A/G exonic synonymous Gh_D08G0301 D08: T/C exonic nonsynonymous Gh_D08G0301 D08: C/T exonic nonsynonymous Gh_D08G0302 D08: A/G upstream Gh_D08G0302 D08: A/G upstream Gh_D08G0302 D08: T/C upstream Gh_D08G0302 D08: A/C upstream Gh_D08G0302 D08: C/T exonic synonymous Gh_D08G0302 D08: A/G exonic synonymous Gh_D08G0302 D08: A/T intron Gh_D08G0303 D08: G/A upstream Gh_D08G0303 D08: T/C intron Gh_D08G0303 D08: T/C intron Gh_D08G0303 D08: G/A intron Gh_D08G0303 D08: A/C exonic synonymous Gh_D08G0303 D08: C/T intron Gh_D08G0303 D08: G/A intron Gh_D08G0303 D08: A/T intron Gh_D08G0303 D08: T/A downstream Gh_D08G0303 D08: C/G downstream Gh_D08G0304 D08: T/A exonic synonymous Gh_D08G0304 D08: G/A exonic synonymous Gh_D08G0304 D08: A/G exonic nonsynonymous Gh_D08G0304 D08: C/G upstream Gh_D08G0304 D08: C/T upstream Gh_D08G0304 D08: T/C upstream Gh_D08G0304 D08: A/G upstream Gh_D08G0304 D08: T/C upstream Gh_D08G0304 D08: A/G upstream Gh_D08G0305 D08: G/A exonic synonymous Gh_D08G0305 D08: T/G exonic synonymous Gh_D08G0305 D08: A/G exonic nonsynonymous Gh_D08G0305 D08: C/T exonic nonsynonymous Gh_D08G0305 D08: A/G exonic synonymous Gh_D08G0305 D08: T/C intron Gh_D08G0305 D08: C/A upstream 13

27 Gh_D08G0305 D08: G/C upstream Gh_D08G0305 D08: A/G upstream Gh_D08G0305 D08: T/G upstream Gh_D08G0305 D08: C/G upstream Gh_D08G0305 D08: T/A upstream Gh_D08G0306 D08: A/G downstream Gh_D08G0306 D08: C/T exonic nonsynonymous Gh_D08G0306 D08: A/C intron Gh_D08G0308 D08: G/A exonic nonsynonymous Gh_D08G0308 D08: T/C exonic synonymous Gh_D08G0308 D08: T/C exonic synonymous Gh_D08G0308 D08: G/T splicing Gh_D08G0308 D08: T/C intron Gh_D08G0308 D08: A/G intron Gh_D08G0308 D08: T/C exonic synonymous Gh_D08G0308 D08: T/C intron Gh_D08G0308 D08: G/A exonic nonsynonymous Gh_D08G0308 D08: C/G intron Gh_D08G0308 D08: A/G intron Gh_D08G0308 D08: G/A downstream Gh_D08G0308 D08: C/T downstream Gh_D08G0308 D08: C/T downstream Gh_D08G0309 D08: G/C downstream Gh_D08G0309 D08: C/T exonic nonsynonymous Gh_D08G0309 D08: C/G intron Gh_D08G0309 D08: C/T intron Gh_D08G0309 D08: C/A intron Gh_D08G0309 D08: G/T exonic synonymous Gh_D08G0309 D08: G/T exonic synonymous Gh_D08G0309 D08: A/G exonic synonymous Gh_D08G0309 D08: T/A intron Gh_D08G0309 D08: A/C intron Gh_D08G0309 D08: T/C intron Gh_D08G0309 D08: A/G intron Gh_D08G0309 D08: T/G intron Gh_D08G0309 D08: T/A intron Gh_D08G0309 D08: C/T intron Gh_D08G0309 D08: T/G intron Gh_D08G0309 D08: G/A intron Gh_D08G0309 D08: A/G intron Gh_D08G0309 D08: T/C intron Gh_D08G0309 D08: C/A intron Gh_D08G0309 D08: A/G intron Gh_D08G0309 D08: C/T intron Gh_D08G0309 D08: T/C intron Gh_D08G0309 D08: C/A intron Gh_D08G0309 D08: T/C intron Gh_D08G0309 D08: A/G intron Gh_D08G0309 D08: T/A intron Gh_D08G0309 D08: T/C intron Gh_D08G0310 D08: C/T downstream Gh_D08G0310 D08: A/C downstream Gh_D08G0310 D08: C/T exonic synonymous Gh_D08G0310 D08: C/T intron Gh_D08G0310 D08: A/G intron Gh_D08G0310 D08: T/C intron Gh_D08G0310 D08: A/T intron Gh_D08G0310 D08: G/T exonic nonsynonymous Gh_D08G0310 D08: T/C upstream Gh_D08G0310 D08: C/T upstream Gh_D08G0311 D08: T/C downstream 14

28 Gh_D08G0311 D08: T/A downstream Gh_D08G0311 D08: A/G downstream Gh_D08G0311 D08: G/A downstream Gh_D08G0311 D08: C/T exonic nonsynonymous Gh_D08G0311 D08: T/C intron Gh_D08G0311 D08: T/C exonic nonsynonymous Gh_D08G0311 D08: A/G intron Gh_D08G0311 D08: G/T exonic nonsynonymous E-07 Gh_D08G0311 D08: C/T upstream Gh_D08G0312 D08: A/G upstream Gh_D08G0312 D08: T/G upstream Gh_D08G0312 D08: C/T upstream Gh_D08G0312 D08: C/A upstream Gh_D08G0312 D08: T/C upstream Gh_D08G0312 D08: A/G upstream Gh_D08G0312 D08: A/C exonic nonsynonymous E-06 Gh_D08G0312 D08: G/A downstream Gh_D08G0312 D08: A/G downstream Gh_D08G0312 D08: C/T downstream Gh_D08G0313 D08: A/C intron E-07 Gh_D08G0313 D08: A/T exonic nonsynonymous Gh_D08G0313 D08: G/T intron Gh_D08G0313 D08: A/T intron Gh_D08G0313 D08: C/T intron Gh_D08G0313 D08: C/A exonic nonsynonymous E-07 Gh_D08G0313 D08: G/T exonic nonsynonymous Gh_D08G0313 D08: T/G exonic nonsynonymous Gh_D08G0313 D08: T/A upstream Gh_D08G0313 D08: C/G upstream Gh_D08G0313 D08: C/T upstream Gh_D08G0314 D08: G/A upstream E-07 Gh_D08G0314 D08: C/T upstream E-07 Gh_D08G0314 D08: A/G upstream Gh_D08G0314 D08: T/C upstream E-07 Gh_D08G0314 D08: C/T intron Gh_D08G0314 D08: C/T intron Gh_D08G0314 D08: T/G intron Gh_D08G0314 D08: G/T intron Gh_D08G0314 D08: A/G intron Gh_D08G0314 D08: C/T exonic synonymous Gh_D08G0314 D08: G/A exonic synonymous Gh_D08G0314 D08: C/A intron Gh_D08G0314 D08: A/G intron Gh_D08G0314 D08: C/T intron Gh_D08G0314 D08: A/C exonic synonymous Gh_D08G0314 D08: T/C downstream Gh_D08G0314 D08: G/A downstream Gh_D08G0315 D08: A/T upstream Gh_D08G0315 D08: T/A upstream Gh_D08G0315 D08: A/C intron Gh_D08G0315 D08: G/A exonic nonsynonymous Gh_D08G0315 D08: C/T downstream Gh_D08G0315 D08: C/A downstream Gh_D08G0315 D08: C/G downstream E-08 15

29 Supplementary Table 16. Identification of candidate gene in associated locus D08_ for lint yield (Separate file). Supplementary Table 17. The average phenotype data of accessions based on independent assortment of two associated loci Recombination of two loci No. accessions LP BN SI GhLYI-A02 LLB and GhLYI-D08 LLB GhLYI-A02 LLB and GhLYI-D08 HLB GhLYI-A02 HLB and GhLYI-D08 LLB GhLYI-A02 HLB and GhLYI-D08 HLB Supplementary Table 18. The SNP information in associated loci for fiber quality (Separate file). Supplementary Table 19. Identification of candidate gene in associated loci for fiber quality (Separate file). 16

30 Supplementary Table 20. The SNP information in associated locus D03_ for LP. Gene ID SNP location Ref/Alt Region Variation Type Associated signal (P-value) two-tailed t-test Gh_D03G1063 D03: G/C intronic Gh_D03G1063 D03: G/C intronic Gh_D03G1063 D03: T/G intronic Gh_D03G1063 D03: C/T upstream Gh_D03G1064 D03: A/G upstream Gh_D03G1064 D03: G/C upstream Gh_D03G1064 D03: G/A exonic synonymous Gh_D03G1064 D03: T/G intronic Gh_D03G1064 D03: G/A exonic nonsynonymous Gh_D03G1065 D03: T/G exonic nonsynonymous Gh_D03G1065 D03: T/C exonic nonsynonymous Gh_D03G1065 D03: T/C exonic nonsynonymous Gh_D03G1065 D03: G/A downstream Gh_D03G1065 D03: G/A downstream Gh_D03G1066 D03: A/G downstream Gh_D03G1066 D03: A/G downstream Gh_D03G1066 D03: A/T downstream Gh_D03G1066 D03: T/C downstream Gh_D03G1066 D03: G/A downstream Gh_D03G1066 D03: T/C downstream Gh_D03G1066 D03: G/A downstream Gh_D03G1066 D03: G/A downstream Gh_D03G1066 D03: A/G downstream Gh_D03G1066 D03: A/G downstream Gh_D03G1066 D03: A/G downstream Gh_D03G1066 D03: T/C downstream Gh_D03G1066 D03: A/C exonic synonymous Gh_D03G1066 D03: C/T exonic nonsynonymous Gh_D03G1066 D03: C/T exonic nonsynonymous Gh_D03G1066 D03: G/A upstream Gh_D03G1067 D03: C/A upstream Gh_D03G1067 D03: C/T exonic nonsynonymous Gh_D03G1067 D03: T/A downstream Gh_D03G1067 D03: T/C downstream Gh_D03G1069 D03: T/C upstream E-07 Gh_D03G1069 D03: A/C upstream Gh_D03G1069 D03: A/C exonic nonsynonymous E-08 Gh_D03G1069 D03: G/A exonic synonymous SNV Gh_D03G1069 D03: C/T intronic Gh_D03G1070 D03: C/A intronic E-07 Gh_D03G1071 D03: G/C downstream Gh_D03G1071 D03: A/G downstream 17

31 Supplementary Table 21. Identification of candidate gene in associated locus D03_ for LP (Separate file). Supplementary Table 22. The phenotype changes of accessions with different SNP type SNP type (Accession number) BN LP SI Gh_D03G1069 (Protein kinase) AA (58) CC (126) Difference 1.35 (14.32%) 3.01 (8.54%) (-7.68%) two-tailed t-test (P value) E Gh_D03G1067 (SAM domain-containing protein) CC (126) TT (69) Difference 1.21 (12.42%) 2.14 (5.90%) (-3.71%) two-tailed t-test (P value) Gh_D03G1065 (unknown gene) TT (117) GG (65) Difference 1.16 (11.89%) 2.36 (6.50%) (-5.38%) two-tailed t-test (P value) Gh_D03G1064 (FRIGIDA-like protein) GG (118) AA (65) Difference 1.20 (12.24%) 2.41 (6.61%) (-5.94%) two-tailed t-test (P value) E

32 Supplementary Table 23. Correlation analysis of nonsynonymous SNP in four candidate genes associated with LP Gene Gene Description SNP position in CDS SNP position in protein Correlation analysis of nonsynonymous SNP in exon two-tailed t-test Gh_D03G1064 FRIGIDA-like protein 1733 bp; G to A 578; Arginine (Arg) to Glutamine (Gln) 5.29E-07 Gh_D03G1065 unknown gene 380 bp; T to G 127; Valine (Val) to Glycine (Gly) 3.74E-06 Gh_D03G1067 SAM domain-containing protein 623 bp; C to T 208; Threonine (Thr) to Isoleucine (Ile) 3.61E-06 Gh_D03G1069 Protein kinase 242 bp; A to C 81; Glutamic acid (Glu) to Alanine (Ala) 6.20E-10 19

33 Supplementary Table 24. The summary of indels identified in ten cultivars based on deep-sequencing (Separate file). Supplementary Table 25. The indels overlapped with GWAS loci (Separate file). Supplementary Table 26. The genes involved in copy number variation (Separate file). Supplementary Table 27. The genetic constitution of the deep-sequencing cultivars extensively planted in china. ID DPL15 (Mb) STV2B (Mb) UGDM (Mb) DPL15/UGDM (Mb) UGDM/STV2B (Mb) DPL15/STV2B (Mb) Simian Junmian Shiyuan Simian Zhongmiansuo XLZ Supplementary Table 28. The Identity-by-decent (IBD) regions comparing the Chinese cultivars to the traditional landraces (Separate file). 20

34 Supplementary Table 29. The detection and transmission of indels in IBD regions between the founders and modern cultivars. Varieties in F and M Founder Modern Distinguished Detected in landraces cultivar IBD Total Only Only M and F (F) (M) (Mb) detected in detected in F M Variation rate (per bp) DPL E-06 STV2B E-06 UGDmian E-06 DPL15 Simian E-06 STV2B Simian E-06 UGDmian Simian E-06 DPL15 ZMS E-06 STV2B ZMS E-06 UGDmian ZMS E-06 DPL15 Simian E-06 STV2B Simian E-06 UGDmian Simian E-06 DPL15 Shiyuan E-06 STV2B Shiyuan E-06 UGDmian Shiyuan E-06 DPL15 Junmian E-06 STV2B Junmian E-06 UGDmian Junmian E-06 DPL15 XLZ E-06 STV2B XLZ E-06 UGDmian XLZ E-06 21

35 Supplementary Table 30. The transferring percentage of GWAS-related regions from DPL15, STV2B and UGDM. Trait DPL15 STV2B UGDM DPL15/UGDM STV2B/UGDM DPL15/STV2B Other Fiber Quality FE 16.07% 18.75% 3.57% 15.18% 0.00% 2.68% 43.75% FL 2.86% 5.71% 0.00% 42.86% 0.00% 0.00% 48.57% FM 33.33% 26.19% 0.00% 4.76% 0.00% 0.00% 35.71% FS 19.05% 14.29% 1.59% 11.11% 0.00% 0.00% 53.97% FU 19.05% 9.52% 3.17% 11.11% 6.35% 9.52% 41.27% Total 18.07% 14.89% 1.67% 17.00% 1.27% 2.44% 44.65% Yield BN 13.39% 10.71% 21.43% 8.93% 3.57% 4.46% 37.50% BW 30.95% 15.48% 13.10% 1.19% 0.00% 1.19% 38.10% LP 13.53% 9.77% 12.03% 9.77% 0.00% 0.00% 54.89% SI 26.19% 10.71% 11.90% 5.95% 1.19% 0.00% 44.05% Total 21.02% 11.67% 14.61% 6.46% 1.19% 1.41% 43.63% VW-resistance 28.57% 33.33% 9.52% 4.76% 0.00% 4.76% 19.05% All traits 20.30% 15.45% 7.63% 11.56% 1.11% 2.26% 41.69% FE, Fiber elongation; FL, Fiber length; FM, Fiber micronaire; FS, Fiber strength; FU, Fiber uniformity; BN, Boll number; BW, Boll weight; LP, lint percentage; SI, seed idex; VW, Verticillium wilt. 22

36 Supplementary Table 31. The GWAS loci detected in seven modern cultivars. GWAS position Traits 86-1 Simian2 ZMS12 Simian3 Shiyuan321 Junmian1 XLZ42 A02: BN A02: BN U/D U/D U/D U/D U/D U/D U/D A03: BN U U S U S S U A05: BN U U - A06: BN D D - S D D - A06: BN S S S - A07: BN - - U - U - - A07: BN S U - D - U - A08: BN S S S S S S - A09: BN U/D - U/D U/D U/D U/D - A09: BN - - D A09: BN A10: BN A11: BN D U U U S U - A11: BN U - A11: BN D01: BN D U D U U D - D02: BN D/S D/S D/S D/S U U U D02: BN D D D - D D - D03: BN U/D U/D U/D - U U - D04: BN D05: BN D D D U D D - D06: BN S S S U/D S - - D07: BN - U S U S U - D08: BN U D U - U U U D08: BN - - U - S - - D09: BN D U U - D10: BN S S - U - S - D10: BN U S D - U U S D11: BN D D D - - D - D12: BN D13: BN S S D - D D S D13: BN U/D S U/D U/D U/D S U/D A02: BW D S D U D U S A07: BW S U - D - U - A08: BW D S - D A08: BW - D U S A10: BW A10: BW A11: BW D - S - S - - A11: BW U - A12: BW S D01: BW U/D - S - U/D U/D - D02: BW D S S D D D D D05: BW D - D D D05: BW D S D D - D - D05: BW D D D U D D - D07: BW U/D U - D09: BW U/D D10: BW D S U U U D S D11: BW S D U D11: BW S D U S S D - D12: BW S U/D S S S S U/D A02: LP U/D U/D U/D U/D U/D U/D U/D A05: LP D U - D U U - A06: LP

37 A06: LP A06: LP D/S D/S D/S D/S D/S D/S U A07: LP S U - D - U - A08: LP A09: LP U/D - A11: LP D D S - S - - A11: LP D S D D U/D D S D02: LP U - S D03: LP U U U D U D - D03: LP U/D U/D - D06: LP S S S S S U/D - D06: LP - S D11: LP - D - - D D - A02: SI U/D - D D D D D A04: SI - - U - U - - A06: SI A06: SI A06: SI - - U D U D - A06: SI D D - S D D - A07: SI S U - D - U - A09: SI - D D - - U D A10: SI U/D U/D S S S U/D U/D A11: SI A13: SI D D D S D D U D05: SI D U/D D U/D U/D U/D - D05: SI D D D U D D - D07: SI D D S D S D D D07: SI U/D U/D U/D U/D U/D U/D S D08: SI U D U - U U U D09: SI D D D - S U S D10: SI S S - U - S - D11: SI D D U D - D - D12: SI D13: SI - - D D D - - D13: SI S U D - D - - D13: SI S - D D D - S D13: SI S S U U U U S D13: SI U/D S U/D U/D U/D S U/D D13: SI U/D U/D U/D U/D U/D U/D - A03: VW - S D - D S - D06: VW S S U D/S U S - D07: VW D D U/D D D D D D08: VW S S S S A01: FE S - S D A02: FE D S D U D U S A02: FE - - D D D D - A07: FE S S S U/D S S - A11: FE D - A13: FE U/D U/D U/D U/D U/D U/D U/D D03: FE D U/D U/D - D05: FE - S S - U D - D05: FE D - D - D D - D05: FE D - D D05: FE - - D D08: FE S S S S D10: FE S S - U - S - D11: FE - - U D12: FE - U/D S - A11: FL A11: FL A13: FL S U/D U/D S U/D U/D S 24

38 D05: FL U/D S U/D - U/D U/D U/D D13: FL - - D - D D - A03: FM A07: FM S S S U/D U/D S S A12: FM S D - - A12: FM S D - - D02: FM S S D S U S D D06: FM D09: FM D D D S D D S D09: FM D D D - D D - D13: FM D - D D A01: FS - S S - A03: FS A03: FS A03: FS D D D - D D - A04: FS - S S S S - - A06: FS D S S D U S S A10: FS D A11: FS A12: FS - - D D - D - A13: FS A13: FS U/D U/D U/D U/D U/D U/D U/D D05: FS D D D S D D - D12: FS D D D - D - - D13: FS D D D D D D - A06: FU D/S D/S D/S D/S D/S D/S U A11: FU A13: FU S U/D U/D S U/D U/D S A13: FU U/D U/D U/D U/D U/D U/D U/D D01: FU D - D - D01: FU D06: FU S S - - D S - D10: FU D S U U U D - D, DPL15; S, STV2B; U, UGDmian; U/D, UGDmian or DPL15; D/5, DPL15 or STV2B. 25

39 Supplementary Table 32. The position of 53 SNPs located in the 24 regions for PCR amplification. PCR amplification region A01: A02: A03: A04: A05: Primer (F: Forward; R: Reverse) F: AATTTTGAGATAGTAAAACTTCCGTAAGAA R: AATATAACAACAACTTTACCATTCATCATT F: GCTAAACATTAACTTTCTATTGTGTCCTTAG R: GTAACGGGCTTATTCATGTCGTCAA F: CATATTCAGCCTTTGAATCTCAAGC R: ACTTCGTTGATGAGATTGCTTCAAA F: TAGCTTCCACACGCCACTTATAGCC R: AGCTGATTTGCCCTTTGAGGAGGAG F: TACACCTTCCACAAAACATCTTCCT R: TGGAACCCTGCTTATAGTTGCTTTA A06: F: GGGTCTTAAGCATGAGATTACTGAA R: ATGCAGTACTGAAGTCCAACTGTGT A07: F: TATGTGTTGTACTGTGCAATGCGGT R: TTAACCCTTGGCTAAACTTAAACAAGG A08: F: GACACAAAATAATAAAGTCTAAGTCCTATACAT R: AAGATTCGTCTTAGGTTTCATTCCC A09: F: CTACAAAATGTCATTTGTTTGGAGCTAA A10: R: GCAAGCACCTCACTGCCATATGAAT F: CAGGCATCATCTTCAACTCTCTACA R: AAAAGAAACTACATTGGTACTTGCC A11: F: GTGTTGGCAAAATTAAGATGTTCGG R: TCCCCTTTCTTATTGGTGCATTTTA A12: F: AACTAATAACAAAGTTTTGGGGCAA A13: R: GGCAAAATTGGGTTTAGAAACCCTA F: GATTGTCGTTACCACCTGAACTAGA R: TAGACCCTATTCAATAAATCGACATTC D01: F: GAGATCATTACCTCCTATTCAGGCTAATTTCC R: CTGAATTTGTATGCAGGTGGGCCTT D02: F: TCCACATTTTCAAGGATATAGTCGA R: TATTAGAGGATTGAGAAGGGGATCA D03: F: CTATGTGAGCAAGCTACAATTCAACA R: ATCAAACTTTACATCCCTACTTGCC D04: F: AAATGAATCTTCAGCAGAATACTAA R: AACCTAACCAATAAAATTACTTAAC D05: F: TTTTAATGTCCAAGGTCTATCACTA R: TGTTGATAATCAGTAGAGGAAATCA D06: F: TTTAGACTATGATCGCTCGGTTCCT 26

40 R: AACTTTCCAAACACCATTTGGCTTC D07: F: ATTCTTGGTTACCCTTTCGTGAAGA R: TTAGAGGAATCCGTCAAGGATGTCA D10: F: TTGAGTCTTCAATCTCTCAATTTCA R: GAAGGATTTCGTGGTAAAAATAAAG D11: F: CAGATACATTTTTTTGAGGTTTCCA R: TTGCGAGTTAAAGACTCTAATGTGC D12: F: ATCTTTGCAGTAACGAAACAAA R: TTAGCAATTAGAATTTAAAAAAAACA D13: F: ATGAAATAAACAATTAAGAAATAATTGGGTCTT R: CACATGGCTGTGTGTGATGCTTCAG 27