(C-D)(E-F) (C-D) (A-B)(C-D)(E-F) D. simulans. D. sechellia. D.melanogaster. D. yakuba. D. erecta. D. ananassae D. pseudoobscura.

Similar documents
Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Drosophila White Paper 2003 August 13, 2003

Enzyme that uses RNA as a template to synthesize a complementary DNA

Genome annotation & EST

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

7.012 Problem Set 5. Question 1

MODULE 5: TRANSLATION

PrimePCR Assay Validation Report

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Biology 105: Introduction to Genetics PRACTICE FINAL EXAM Part I: Definitions. Homology: Reverse transcriptase. Allostery: cdna library

PLNT2530 (2018) Unit 6b Sequence Libraries

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

7.03, 2005, Lecture 20 EUKARYOTIC GENES AND GENOMES I

Fine scale structural variants distinguish the genomes of Drosophila melanogaster and D. pseudoobscura

Draft 3 Annotation of DGA06H06, Contig 1 Jeannette Wong Bio4342W 27 April 2009

Annotation of contig62 from Drosophila elegans Dot Chromosome

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Fatchiyah

Construction of plant complementation vector and generation of transgenic plants

PrimePCR Assay Validation Report

PrimePCR Assay Validation Report

30 Gene expression: Transcription

PrimePCR Assay Validation Report

Chapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc.

Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

PrimePCR Assay Validation Report

Chimp Sequence Annotation: Region 2_3

PrimePCR Assay Validation Report

Before starting, write your name on the top of each page Make sure you have all pages

PrimePCR Assay Validation Report

Site directed mutagenesis, Insertional and Deletion Mutagenesis. Mitesh Shrestha

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.

PrimePCR Assay Validation Report

Annotating Fosmid 14p24 of D. Virilis chromosome 4

PrimePCR Assay Validation Report

Molecular Genetics of Disease and the Human Genome Project

CHAPTER 21 GENOMES AND THEIR EVOLUTION

Transcription in Eukaryotes

PrimePCR Assay Validation Report

PrimePCR Assay Validation Report

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Supporting Information

Drosophila ficusphila F element

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Figure S1 Correlation in size of analogous introns in mouse and teleost Piccolo genes. Mouse intron size was plotted against teleost intron size for t

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Student Learning Outcomes (SLOS)

BS 50 Genetics and Genomics Week of Oct 24

Bootcamp: Molecular Biology Techniques and Interpretation

Source of D. littoralis fosmid clones

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018

Supporting Information

Chapter 5. Structural Genomics

Chapter 20 Biotechnology

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Important gene-information's

Erhard et al. (2013). Plant Cell /tpc

Map-Based Cloning of Qualitative Plant Genes

Results WCP (Whole chromosome paint) FISH

PrimePCR Assay Validation Report

Annotation of a Drosophila Gene

Annotating the D. virilis Fourth Chromosome: Fosmid 99M21

BIOLOGY - CLUTCH CH.17 - GENE EXPRESSION.

7 Gene Isolation and Analysis of Multiple

CHAPTER 18 LECTURE NOTES: CONTROL OF GENE EXPRESSION PART B: CONTROL IN EUKARYOTES

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

PrimePCR Assay Validation Report

Somatic Primary pirna Biogenesis Driven by cis-acting RNA Elements and Trans-Acting Yb

Supplemental Information. Autoregulatory Feedback Controls. Sequential Action of cis-regulatory Modules. at the brinker Locus

3 -end. Sau3A. 3 -end TAA TAA. ~9.1kb SalI TAA. ~12.6kb. HindШ ATG 5,860bp TAA. ~12.7kb

d. reading a DNA strand and making a complementary messenger RNA

Molecular Biology: DNA sequencing

Finishing Drosophila Ananassae Fosmid 2728G16

TRANSGENIC ANIMALS. -transient transfection of cells -stable transfection of cells. - Two methods to produce transgenic animals:

Genome Sequence Assembly

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016

CELL BIOLOGY - CLUTCH CH. 7 - GENE EXPRESSION.

Drosophila Board White Paper 2007

PrimePCR Assay Validation Report

PrimePCR Assay Validation Report

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas

PrimePCR Assay Validation Report

Biol 478/595 Intro to Bioinformatics

PrimePCR Assay Validation Report

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

Branches of Genetics

Problem Set 8. Answer Key

The fourth chromosome: targeting heterochromatin formation in Drosophila. Drosophila melanogaster chromosomes

Design. Construction. Characterization

Kerkel et al., Supplementary Tables and Figures

Time allowed: 2 hours Answer ALL questions in Section A, ALL PARTS of the question in Section B and ONE question from Section C.

Name AP Biology Mrs. Laux Take home test #11 on Chapters 14, 15, and 17 DUE: MONDAY, DECEMBER 21, 2009

The Biotechnology Toolbox

Lecture Four. Molecular Approaches I: Nucleic Acids

MRC-Holland MLPA. Description version 11; 20 November 2015

Quick Review of Protein Synthesis

Nature Methods: doi: /nmeth Supplementary Figure 1. DMS-MaPseq data are highly reproducible at elevated DMS concentrations.

Transcription:

ADDITIONAL FILE 1 Segmental duplication, microinversion and gene loss associated with a complex inversion breakpoint region in Drosophila Oriol Calvete; Josefa Gonzalez; Esther Betran; Alfredo Ruiz. Molecular Biology and Evolution 2012; doi: 10.1093/molbev/mss067

D. simulans D. sechellia D.melanogaster D. yakuba (AB) [(C),CG6490] [(E),Syx18] [(D),CG1866] [(F),RhoL] (C-D)(E-F) D. erecta Sophophora (AB)(CD)(EF) (C-D) (A-B)(C-D)(E-F) D. ananassae D. pseudoobscura D. persimilis D. willistoni D. mojavensis D. buzzatii D. virilis (AB) (CD) (EF) [(C),CG6089] (AB) [(D),CG6966] (EF) (AB) (CD) (EF) (AC) (BE) (DF) Drosophila D. grimshawi (AB) (CD) (EF) 50 40 30 20 10 0 mya

ΨCG4673 - exon 2 1 2 3 4 5 6 7 8 9 L ΨCG4673 - exon 3 ΨCG4673 - exon 7 1 2 3 4 5 6 7 8 9 L 1 2 3 4 5 6 7 8 9 L 1 Kb 0,5 Kb 0,1Kb 1 Kb 0,5 Kb 0,1Kb 1 Kb 0,5 Kb 0,1Kb ΨCG4673-3 UTR 1 2 3 4 5 6 7 8 9 L 1 Kb 0,5 Kb 0,1Kb CG4673 - exons 5-6 1 2 3 4 5 6 7 8 9 L 1 Kb 0,5 Kb 0,1Kb L: ladder 1: Egg 0-1,30h 2: Egg 0-2,15h 3: Egg 0-2,30h 4: Egg 0-22h 5: Larvae 6: Pupae 7: Adult female 8: Adult male 9: DNA

1 L 12 3 4 5 6 L (bp) 1 2 3 4 5 6 L L: RNA ladder 6.583 4.981 3.638 2.604 1.908 1.383 1: mrna Adult (Female) 2: mrna Adult (Male) 3: mrna Pupa 4: mrna Larva 955 5: mrna Egg 623 6: mrna (D. mojavensis) 281 Figure S4. Analysis of CG4673 expression by Northern blot. Molecular weight of the hybridized bands is showed in the marker between membranes 1 and 2. mrna loaded in each lane is shown in the legend.

ΨCG4673 - exon 2 1 2 3 4 5 6 7 L ΨCG4673 - exon 3 ΨCG4673 - exon 7 1 2 3 4 5 1 2 3 5 4 L Gapdh 1 2 3 4 5 6 7 L 1 Kb 0,5 Kb 0,1 Kb 1 Kb 0,5 Kb 0,1 Kb 1 Kb 0,5 Kb 0,1 Kb 1: D+/RT+ 2: D+/RT- 3: D+/RT+( N) 4: D+/RT+(T) 5: D-/RT+ 6: D-/RT- 7: D-/RT+(T) L: ladder

A B C D A B C D A B C D A B C D A B C D A B C D A B C D A C B D

ADDITIONAL FILE 2 Segmental duplication, microinversion and gene loss associated with a complex inversion breakpoint region in Drosophila Oriol Calvete; Josefa Gonzalez; Esther Betran; Alfredo Ruiz. Molecular Biology and Evolution 2012; doi: 10.1093/molbev/mss067

Additional file 2. Cloning and sequencing the inversion breakpoints To clone and sequence the breakpoints of inversions 2m and 2n we applied the following experimental approach: (i) Identification of D. buzzatii BAC clones encompassing the breakpoints. Three D. buzzatii BAC clones, 14B19, 20O19 and 16H04, encompassing the AC, BE and DF breakpoint regions, respectively, were identified by González et al (2007) For each inversion breakpoint, we used in situ hybridization to isolate from the BAC library two additional clones, overlapping the initial one, and also bearing the breakpoint regions (Figure S1 in Additional file 1). The names of the identified clones are given in Table 1 in the main text. (ii) BAC-end sequencing and localization of the sequences in the D. mojavensis genome. We successfully generated 14 BAC-end sequences (BES), out of the 18 possible, and mapped 10 of them by similarity searches onto unique sites within the D. mojavensis genome (Table S5 in additional file 3). All sequences were anchored in scaffold 6540 (~34 Mb long), corresponding to D. mojavensis chromosome 2 (Muller s element E; Schaeffer et al. 2008). The arrangement of BES on this scaffold allowed us to locate the AB, CD and EF breakpoints regions in three regions near coordinates 25.7 Mb, 16.8 Mb and 4.7 Mb respectively (Table S5 in additional file 3). In each of these region, the breakpoint was narrowed down to a specific segment with the aid of at least two BES (Figure 2 in main text): AB breakpoint region was located within the 152,933 bp-long region limited by BES 14B19-T7 and 22B03-T7 and; CD breakpoint region was located inside the 91,453 bp-long

region between BES 1N19-T7 and 14E21-SP6; and EF breakpoint region was located within the 111,770 bp-long region limited by BES 20O19-SP6 and 16H04-T7. (iii) Walking along the D. mojavensis genome and characterization of the breakpoint regions in the non-inverted genome. We carried out an in situ hybridization walk in order to locate precisely the breakpoint within the previously delimited regions of D. mojavensis scaffold 6540 (Figure 2 in main text). DNA probes were generated by PCR (Table S6 in additional file 3) along the region and then hybridized to D. buzzatii chromosomes to determine their location (Table S7 in additional file 3). In this way, the AB breakpoint region (inversion 2m distal breakpoint) was located within 10,256 bp-long region delimited by probes A5 and B2, designed downstream of gene msi (A5) and within the Ssadh (B2) coding sequence (Figure 2 in main text). An attempt to further narrow down the breakpoint region using probes B3 and B4, both of them designed in the CG4673 gene, failed (Figure 2 in main text). The CD breakpoint (2m/2n shared breakpoint) was located in the 1,551 bplong segment between probes C2 and D2 designed in the intergenic region upstream of the scrib gene (C2) and downstream of the Or98b gene (D2) (Figure 2 in main text). Finally, the EF breakpoint (2n proximal breakpoint) was located in the 2,138 bp-long segment between probes E3 and F4, downstream of genes Wsck (E3) and CG8147 (F4) (Figure 2 in main text). (iv) Isolation of the breakpoint regions in D. buzzatii by PCR. Once the breakpoint regions were pinpointed in the D. mojavensis genome, we attempted to isolate the three breakpoint regions in D. buzzati (AC, EB and DF) by PCR amplification using combinations of primers designed in the expected orientation according to Figure 1 (in main text). Primers were designed inside the probes used to narrow down the breakpoints

in D. mojavensis (Figure 2 in main text and Table S8 in additional file 3). Only the DF breakpoint could be successfully amplified. The DF PCR product was ~4.4 kb in size and correspond to the expected intergenic region between genes Or98b and CG8147. Comparison of this 4.4 kb region with the CD and EF regions of D. mojavensis allowed us to further narrow down the breakpoint region to a ~3 kb segment as follows. The first 1,050 bp of the 4.4 kb DF region show similarity with CD region and the last 165 bp of the DF sequence are homologous to the last exon and 3 UTR of gene CG8147 from D. mojavensis EF region. Thus the DF breakpoint is located in a 3,185 bp-long segment between coordinates 1,051 bp and 4,235 bp of D. buzzatii DF sequence (Figure 4 in main text). The AC and EB breakpoint regions could not be amplified although several attempts using different primers were made. We also attempted to locate in D. buzzatii the D. mojavensis region between A and B containing CG4673 (Figure 2 in main text). We combined primers A and B with primers from B3 and B4 probes but this region could not be amplified. (v) Sequencing and annotation of the BAC clones containing AC and EB breakpoints. To isolate and characterize the AC and EB breakpoint regions we sequenced and annotated BAC clones 1N19 and 20O19, respectively (Table S9 in additional file 3). Clone 1N19 has a 138,724-bp long insert and contains exons 1-5 of the gene scribbled (scrib, CG4562), genes CG5071, CG12250 and CG4582, and exons 2-5 of musashi (msi, CG5099) (Figure 3 in main text). In D. melanogaster, msi encodes two transcripts, RA and RB. D. buzzatii BAC 1N19 fully contains the three exons (3-5) encoding transcript RA and four of the five exons (1-5) encoding transcript RB (only exon 1 is lacking). The complex organization of the genes msi, CG12250 and CG4582 seems

conserved between D. buzzatii and D. mojavensis, with CG4582 fully nested within intron 2 of msi and CG12250 exons interleaved with those of msi. The short (63 bp) exon 1 of CG12250 is not present in the current annotation of the D. mojavensis genome (CG12250 lacks the ATG start codon) giving the false impression that this gene is fully nested within msi intron 2. However we have detected and annotated downstream of the msi stop codon the putative initial exon of CG12250 that is conserved between D. mojavensis and D. buzzatii. Therefore in D. buzzatii CG12250 is closest to the breakpoint than msi. Comparison of BAC clone 1N19 containing AC breakpoint region with AB and CD regions of D. mojavensis allowed us to further narrow down the breakpoint to a ~700 bp segment as follows. The similarity between D. buzzatii AC region and D. mojavensis AB region reaches 12,841 bp upstream of the CG12250 start codon. When the AC breakpoint region was compared to the D. mojavensis CD region, the similarity reached 1,808 bp upstream of scrib start codon. These observations place the D. buzzatii AC breakpoint in a 693 bp-long segment between coordinates 113,151 and 113,843 of BAC 1N19 (Figure 4 in main text). Clone 20O19 insert is 143,293 bp long and contains the complete coding sequences of 22 genes: CG4774, CG31258, Rpl27 (CG4743), CG5828, CG4743, CG5039, CG4730, Lgr3 (CG31096), CG5053, tankyrase (CG4719), jigr1 (CG17383), Ssadh (CG4585), CG4673, Wsck (CG31127), at1 (CG6668), CG8721, CG5794, ash2 (CG6677), CG31125, CG6695, CG31126 and Ppox (CG5796) (see Figure 3 in main text). In addition it contains the first 7 exons of the gene slowpoke (slo, CG10693) a 45,122 bp-long gene comprising 28 exons in D. mojavensis. Sequence comparison between D. buzzatii 20O19 BAC clone containing EB breakpoint and D. mojavensis AB and EF regions allowed us to narrow down the

breakpoint to a ~1.1 kb segment. D. mojavensis AB region extends 5,108 bp downstream from the Ssadh stop codon. On the other hand, the similarity with D. mojavensis EF region extends 2,019 downstream from the stop codon of Wsck. Thus the EB breakpoint is located in a 1,114 bp-long segment between coordinates 79,447 bp and 80,560 bp of BAC 20O19 (Figure 4 in main text). References cited in this file 1. Gonzalez J, Casals F, Ruiz A: Testing chromosomal phylogenies and inversion breakpoint reuse in Drosophila. Genetics 2007, 175:167-177. 2. Schaeffer SW, Bhutkar A, McAllister BF, Matsuda M, Matzkin LM, O'Grady PM, Rohde C, Valente VL, Aguade M, Anderson WW, et al: Polytene chromosomal maps of 11 Drosophila species: the order of genomic scaffolds inferred from genetic and physical maps. Genetics 2008, 179:1601-1655.

ADDITIONAL FILE 3 Segmental duplication, microinversion and gene loss associated with a complex inversion breakpoint region in Drosophila Oriol Calvete; Josefa Gonzalez; Esther Betran; Alfredo Ruiz. Molecular Biology and Evolution 2012; doi: 10.1093/molbev/mss067

Additional file 3. Table S1. Similarity blocks between the CG4673 gene region of D. mojavensis as query and the AC breakpoint region (BAC 1N19) of D. buzzatii as subject. Search was carried out using BLAST (Bl2seq). Only hits with a minimum size of 45 bp and E-value 1e-04 are included in this list. CG4673 region Coordinates in Coordinates Identity (%) E-value in D. mojavensis D. mojavensis scaffold in 6540 D. buzzatii BAC 1N19 Exon 0 + 5 UTR 25,756,516-25,756,613 12,802-12,704 72/99 (72.7) 2e-07 Exon 2-3 25,757,533-25,757,920 12,632-12,284 273/393 (69.5) 2e-37 Intron 6 25,759,443-25,759,569 11,708-11,583 99/128 (77.3) 5e-21 Intron 6 25,759,465-25,759,577 10,089-9,966 91/124 (73.4) 7e-13 Intron 6 25,760,725-25,760,896 10,246-10,055 132/196 (67.3) 3e-10 Intron 6 25,761,139-25,761,215 9,726-9,652 59/77 (76.6) 4e-09 Intron 6 25,761,270-25,761,379 9,643-9,549 79/110 (71.8) 3e-10 CG5071 25,761,635-25,762,277 8,571-7,930 520/643 (80.9) 1e-173 CG5071 25,762,432-25,762,593 6,719-6,552 122/168 (72.6) 2e-20 Intron 6 25,763,858-25,763,918 4,202-4,148 47/61 (77.0) 6e-07 Intron 6 a 25,766,399-25,766,446 3,679-3,726 39/48 (81.2) 2e-06 CG5079 25,764,640-25,764,730 2,991-2,895 71/97 (73.2) 3e-10 Intron 6 25,765,297-25,765,341 2,490-2,446 39/45 (86.7) 4e-09 Intron 6 25,765,806-25,765,925 2,129-2,007 90/127 (70.9) 8e-12 Intron 6 25,767,941-25,768,091 992-806 122/187 (65.2) 3e-10 Intron 6 25,768,145-25,768,198 801-748 48/54 (88.9) 5e-14 Intron 6 25,768,279-25,768,338 627-568 45/60 (75.0) 9e-05 Exon 7 25,768,450-25,768,573 518-396 98/125 (78.4) 4e-22 3 UTR 25,769,034-25,769,094 278-218 53/61 (86.9) 4e-15 a This fragment is inverted

Table S2. Structure and comparison of the CG4673 gene of D. buzzatii (EB breakpoint region) with D. mojavensis and D. melanogaster. Exons 1 to 8 encode RA transcript. Exons 0 and 2 to 8 encode transcript RB and RD. Dbuz Dmoj Dmel Dbuz/Dmoj Dbuz/Dmel Region (bp) (bp) (exon a ) Identity (%) Identity (%) Exon 0 15 15 18 (11) 86.7 66.7 Intron 0 846 853 1183 Exon 1 102 102 102 (10) 95.1 84.3 Intron 1 73 74 72 Exon 2 234 234 234 (8) 93.2 80.8 Intron 2 62 55 62 Exon 3 170 167 158 (7) 85 68.3 Intron 3 237 80 61 Exon 4 529 529 529 (6) 89.6 80.1 Intron 4 59 52 60 Exon 5 199 199 199 (5) 93.0 77.9 Intron 5 61 76 62 Exon 6 161 161 161 (4) 89.4 82.6 Intron 6 475 9355 9942 Exon 7 416 413 416 (3) 91.5 82.0 Intron 7 80 72 72 Exon 8 157 157 160 (2/1) 87.9 77.1 a exon number according to FlyBase.

Table S3. PCR and sequences of primers designed for the analysis of CG4673 and ΨCG4673 of D. buzzatii. PCR Primer Sequence (5-3 ) Annotation 2 2R TTACAGCGTACAAAATGTGT Exon 2 of ΨCG4673 2L GAGTTCTGCCCACCAGCTCA Exon 2 of ΨCG4673 3 3R GCACGAGTACCACAGCCAAA Exon 3 of ΨCG4673 3L CCTGATTACTTCAATGTATT Exon 3 of ΨCG4673 7 7R GCCGCTGCAGCCGCACATTC Exon 7 of ΨCG4673 7L GATAGATGGCTGATCAACAC Exon 7 of ΨCG4673 3 UTR 3 UTR-R TCCAACTGTCAGCTTGACAC 3 UTR of ΨCG4673 3 UTR-L CATGCTTATTGAATCTTTAT 3 UTR of ΨCG4673 56 5R CTGACCGCTCAGGAGTGCAT Exon 5 of CG4673 6L GGTGCATCCTTGGTAGGTAT Exon 6 of CG4673 3 RACE Outer 8R GACGACCAGCTATCCCAGTG Exon 8 of CG4673 3 Outer Adapter GCGAGCACAGAATTAATACGACT * 3 RACE Inner 8Rint TGGACCTGCAACCACTGCAC Exon 8 of CG4673 3 Inner Adapter CGCGGATCCGAATTAATACGACTCACTATAGG * 5 RACE Outer 1L GCAGACTGCACGCGTATTAG Exon 1 of CG4673 5 Outer Adapter GCTGATGGCGATGAATGAACACTG * 5 RACE Inner 1Lint GAAGAGACGATGTCGCACAA Exon 1 of CG4673 5 Inner Adapter CGCGGATCCGAACACTGCGTTTGCTGGCTTTGATG * 3 RACE Outer 7R GCCGCTGCAGCCGCACATTC Exon 7 of ΨCG4673

3 Outer Adapter GCGAGCACAGAATTAATACGACT * 3 RACE Inner 7Rint ACGAAAGACGCCAGCCATTG Exon 7 of ΨCG4673 3 Inner Adapter CGCGGATCCGAATTAATACGACTCACTATAGG * 5 RACE Outer 2L GAGTTCTGCCCACCAGCTCA Exon 2 of ΨCG4673 5 Outer Adapter GCTGATGGCGATGAATGAACACTG * 5 RACE Inner 2Lint CTTTAGCTCCGTGGCCAAGC Exon 2 of ΨCG4673 5 Inner Adapter CGCGGATCCGAACACTGCGTTTGCTGGCTTTGATG * Gapdh H1 ATGTCGAAGATTGGTATTAATGG Gapdh H2 GTTCGACACGACCTTCATGT Gapdh *primers from the RLM-RACE kit of Ambion (AM1700).

Table S4. Functional information of the genes located nearby the 2mn inversion breakpoints (FlyBase r5.36). Gene CG12250 (A) Ssadh (B) CG4673 (AB) CG5079 (AB) CG5071 (AB) Scrib (C) Molecular Function - catalytic step 2 spliceosome - precatalytic spliceosome a succinatesemialdehyde dehydrogenase activity b - structural constituent of nuclear pore - zinc ion binding GO annotation Biological Process nuclear mrna splicing, via spliceosome - gammaaminobutyric acid catabolic process - oxidationreduction process InterPro domains Evidence of Cellular expresion component none none High expression level in 1-5 day-old adult males (not expressed in adult females) mitochondrion - Aldehyde dehydrogenase domain none nuclear pore - Zinc finger, RanBP2-type - NPL4, zincbinding putative - Polyubiquitintagged protein recognition complex, Npl4 component High expression in L2 and L3 larvae Very highly expressed in adult male and female none none none none Low expression in pupae - peptidyl-prolyl cis-trans isomerase activity - zinc ion binding Protein binding protein folding intracellular - Zinc finger, B- box and RINGtype - Peptidyl-prolyl cis-trans isomerase, cyclophilin-type - cell cycle process - olfactory behaviou r - plasma membrane f - PDZ/DHR/GLGF - Leucine-rich repeat Moderately high expressed in males (no expression in adult females) High expression in embryo Other information None None Nonsense-mediated mrna decay (NMD) down-regulates CG4673- RB transcript c RNAi generated by PCR using primers directed to this gene causes a cell growth and viability phenotype when assayed in Kc167 and S2R + cells d. None Essential for odour guided behaviou r

Or98b (D) Wsck (E) CG8147 (F) - odorant binding - olfactory receptor activity - ATP binding - protein tyrosine kinase activity - alkaline phosphatase activity - sensory perception of smell - protein phosphorylation - metabolic process a Inferred from direct assay (Herold et al. 2009) a Inferred from direct assay (Rothacker 2008) - integral to membrane - membrane - plasma membrane none - olfactory receptor, drosophila - Protein kinase, catalytic domain - Carbohydratebinding WSC - Fibronectin, type III - Serinethreonine/tyrosineprotein kinase - Immunoglobulinlike fold - Alkaline phosphatase Extremely low expression in larvae and pupae Moderately high in embryo High in L2 larvae none none Identified in S2/cycloheximide assay as a direct target of Clk (circadian regulation of gene expression) mediated transcriptional regulation c Hansen et al 2009 d Boutros et al. 2004 e Infered from mutant phenotype f Inferred from direct assay g Ganguly et al. 2003

Table S5. Mapping of the D. buzzatii BAC ends by similarity search to the D. mojavensis genome sequence. All D. mojavensis coordinates belong to scaffold 6540. Note that in D. mojavensis scaffold 6540 numbering increases from centromere to telomere (Schaeffer et al. 2008). Breakpoint BAC End Read length (bp) Hit on D. mojavensis genome (scaffold 6540) Annotation Score E-value Identity (%) AC 14B19 T7 800 25,824,685-25,824,451 msi 339 2e-89 226/240 (94.1) AC 1N19 T7 711 16,861,094-16,861,229 scrib 150 7e-33 116/135 (85.9) AC 14B19 SP6 745 16,810,607-16,811,089 Intergenic (Twd1-CG5467) 335 2e-88 408/500 (81.6) AC 15L19 T7 640 16,761,866-16,762,001 Intergenic (beat-vii-twd1) 329 9e-87 203/219 (92.7) BE 22B03 T7 569 25,671,558-25,671,752 CG4774 196 7e-47 169/200 (84.5) BE 2N19 SP6 513 4,823,128-4,822,869 slo 183 7e-43 220/267 (82.4) BE 20O19 SP6 578 4,804,621-4,804,495 slo 220 4e-56 123/127 (96.8) DF 14E21 SP6 632 16,952,682-16,952,590 Intergenic (Cyp6d5-beat-VI) 90 1e-16 79/93 (84.9) DF 16H04 T7 774 4,692,135-4,692,725 CG9363 758 0.0 535/592 (90.4) DF 14E21 T7 841 4,644,326-4,6445,63 Intergenic (alpha-manii-ps) 361 3e-98 224/238 (94.1)

Table S6. Sequences of primers used to generate probes for the BAC walking (see Figure 2 in main text). PCR Forward (5 3 ) Reverse (5 3 ) A5 GCCGTACGCGTTCTCTATAA TGCCGAACTTGTTAGACGTA A4 CCGATTGACTAACGTTAAGA GTCTCACGTTTCGCATACCA A3 TGGTATGCGAAACGTGAGAC TTGACTGGAGCAGCATGTAC A2 GTTGTTGTCAGGCAGTTGCA TGGTCCAAGTAAAACACAC A1 ATATTGCAGGTTTCAGACAC TTGTCATTATCGCCGATTAC B1 GCGAGGATCCAGATTATGAA GCCATGCAGACCATCATAAC B2 GCTCGAGCAATTCATCTACA TTATCACGAACCAGAGCCAT B3 CCAGCACTTCGATACCATCA GCGATACTCCACAGCTATTG B4 GTGGAGTATCTGCTAGTTGA TATAGAGAACGCGTACGGCA C2 GCTTGCATGCTAACGAGTTG AAAAAGAGTTCCTCGAGGGT C1 TATGACAACGCGGAAATTGT CAGGTGGAATTCGTGGACAA D1 GGCATGGCCATCTACGATAT GGATCCGGGAAGTATTCCTC D2 GACAAGGCCAGAAGCATAAT GCTCTAATTAGGCGCACATA E3 TATTGTGCTAATCTGGCAAG TTACGTTCATCGCTAACAGA E2 TACTATTACGTTGGCTGCTA GAGGAGAGGTCATCAGCTGA E1 CTTACTCAATCTCATGTCCA TTGAAGTAGGTGTGCTCGAA F1 TACGACACACATCGGAACTC GCGCCAATACGAGTAGAGTA F2 GCTGATGAAGTGAAAGTCAA GATAGACACGCCTTGTAAGT F3 AAGGATAAACGTTGCCGAAG GACTTTTGGTTGGCTTGTCA F4 CGAATATGTCGTTCTTGCGA TATGGAACCGTGCTCGACTA

Table S7. In situ hybridization of PCR probes for the BAC walking along the breakpoint regions from anchored BAC ends. PCR reactions were carried out using DNA from D. buzzatii BAC clones as template. See Table S6 in this file for sequences of PCR primers used to produce the probes. (-): No amplification product obtained. *In situ hybridization using D. mojavensis probe due to lack of amplification in D. buzzatii. PCR Coordinates of Annotation Expected length of Length of PCR Length of PCR Location of the in primers in D. PCR product in D. in bp (BAC product in bp situ hybridization mojavensis mojavensis (bp) used as (BAC used as signal in D. template) template) buzzatii F1 4693132-4695292 CG9363 2161 ~2100 (16H04) - (20O19) F6h/F2a F2 4722897-4724593 Intergenic (CG16779-CG8147) 1697 ~1800 (16H04) - (20O19) F6h/F2a F3 4730053-4730865 Intergenic (CG16779-CG8147) 813 ~820 (16H04) - (20O19) F6h/F2a F4 4735053-4736519 CG8147 1467 ~1500 (16H04) - (20O19) F6h/F2a E3 4738657-4739571 Intergenic (CG8147- Wsck) 915 - (16H04) ~1200 (20O19) D3e/F6h E2 4741543-4743800 Wsck 2258 - (16H04) - (20O19) D3e/F6h* E1 4755646-4757683 CG5794 2038 - (16H04) ~2000 (20O19) D3e/F6h D1 16963581-16962296 Cyp6d5 1286 - (1N19) ~1500 (16H04) No signal D2 16895728-16893264 Or98b 2465 - (1N19) ~2400 (16H04) F6h/F2a C2 16891713-16889967 scrib 1747 - (1N19) - (16H04) D3e/F2a* C1 16890058-16887461 scrib 2598 - (1N19) ~2100 (16H04) D3e/F2a B1 25752005-25753543 Intergenic (jigr1-ssadh) 1539 ~1500 (20O19) - (1N19) D3e/F6h

B2 25757303-25758924 Ssadh 1622 ~1850 (20O19) - (1N19) D3e/F6h B3 25761704-25764947 CG5071-CG5079 3244 - (20O19) - (1N19) No signal* B4 25768418-25769198 CG4673 781 ~750 (20O19) ~750 (1N19) no signal A5 25769180-25773097 CG12250 3918 - (20O19) - (1N19) D3e/F2a* A4 25775114-25776743 CG12250 1630 - (20O19) ~3500 (1N19) D3e/F2a A3 25776724-25778892 msi 2169 - (20O19) ~2050 (1N19) D3e/F2a A2 25783641-25786366 msi 2726 - (20O19) - (1N19) no signal* A1 25822631-25824594 msi 1964 - (20O19) ~2000 (1N19) D3e/F2a

Table S8. Primers used to attempt to amplify the breakpoints of inversions 2m and 2n (AC, EB and DF) in D. buzzatii. All primers were designed in D. mojavensis. Only the DF breakpoint was successfully amplified. Breakpoint Primer Sequence (5 3 ) Annotation AC EB DF A GCAACAACAAGTCACAATGA Intergenic (msi-cg4673) C TGAATTAGAAACCACCGTCA scrib B GGCATACGTCAGCTGATGAC Ssadh E ATCAATATGGCAACGAGGTG CG31127 D TGTTCGAGCAGCACTACATA Or98b conserved downstream region F AATAAGCAGCAAGTGCACAG CG8147

Table S9. Details of the sequencing results of the 1N19 and 20O19 BAC clones. Note that the actual size of the BAC clones is similar to the original estimates (given in brackets) BAC clone 1N19 20O19 Corresponding breakpoint of D. buzzatii AC EB Size of BAC clone (bp) 138,724 (138,197) 143,293 (142,697) Size of fragments selected from the partial digestion to 2 to 5 2 to 5 construct the library (kb) Number of reads in shotgun sequencing 1,536 1,344 Number of reads excluded 186 148 Trimmed Read length mean (std) bases 869±140 900±150 Coverage 8.47x 7.56x Number of contigs 2 3 Gaps 1 2 References cited in this file 1. Schaeffer SW, Bhutkar A, McAllister BF, Matsuda M, Matzkin LM, O'Grady PM, Rohde C, Valente VL, Aguade M, Anderson WW, et al: Polytene chromosomal maps of 11 Drosophila species: the order of genomic scaffolds inferred from genetic and physical maps. Genetics 2008, 179:1601-1655.