Non-coding Function & Variation, MPRAs II. Mike White Bio /5/18

Size: px
Start display at page:

Download "Non-coding Function & Variation, MPRAs II. Mike White Bio /5/18"

Transcription

1 Non-coding Function & Variation, MPRAs II Mike White Bio /5/18

2 MPRA Review Problem 1: Where does your CRE DNA come from? DNA synthesis Genomic fragments Targeted regulome capture Problem 2: How do you read out the reporter? Synthetic barcodes Self-transcribing enhancers Sort-seq (fluorescence/flow cytometry + DNA sequencing)

3 Evaluating MRPAs DNA source/library construction. What are the controls? Reporter design/barcoding scheme Assay readout - does it measure what the paper claims? Parameters: Library size, cellular pool, sequencing coverage - how many input barcodes were recovered? Reproducibility

4 Reproducibility Varies with Cell Type and Assay Plasmid library Genome-integrated Kwasnieski, et al., Genome Res Oct; 24(10): Maricque, et al., Nucleic Acids Res (4): e16. PMC /

5 Library parameters How complex is the library? How large is the cellular pool? How transfectable are the cells? How much sequencing to cover the barcodes?

6 Library parameters Optimal library parameters are usually determined empirically. Here s how to judge whether experimenters were successful: How reproducible are replicates? What fraction of the input barcodes where recovered?

7 Genome-Integrated MPRA 1) Clone into lentiviral or TE vector 2) Make virus & infect/cotransfect with transposase Maricque, et al., A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cisregulatory activity in neural cells, Nucleic Acids Res Feb 28; 45(4): e16. PMC /

8 Genome-Integrated MPRA TRIP Thousands of Reporters Integrated in Parallel Akhtar, et al., Chromatin position effects assayed by thousands of reporters integrated in parallel, Cell Aug 15;154(4):914-27

9 What is the effect of genome position on enhancer activity? Step 1: Make library with >20,000 barcodes, single enhancer, in lentiviral or transposable element vector. Step 2: Integrate into mouse ES cells, grow, map integration locations for each barcode. (How do you map?) Step 3: Perform MRPA - barcode RNA/ barcode DNA

10 Map Barcode Integrations with Inverse PCR

11 MPRA Quality Check DNA source: Only two promoters, many synthetic barcodes Library size: 10^5 unique barcodes for each promoter Library construction: Genome integration PiggyBac transposon Reporter readout: 1) Map genome location 2) RNA-seq on barcodes Mapped 17k and 10 k integrations for each promoter

12 Reproducibility

13 Key Results Domains of chromatin effects on the Mb scale (median 1.23 Mb) 1000-fold expression range from SAME promoter! Akhtar, et al., Chromatin position effects assayed by thousands of reporters integrated in parallel, Cell Aug 15;154(4):

14 Domains correlated with Lamina Associated Domains

15 Take aways Genome-integrated MRPAs with transposable element/ lentiviral vectors Genome position effects are HUGE fold expression range of same promoter Some correlation with 3D genome structure What about many CREs at many genome positions?

16 MPRA for RNA Stability Maternal mrnas in zebrafish are rapidly degraded 3 hpf. What 3 UTR sequences control RNA stability? Rabani, et al., A Massively Parallel Reporter Assay of 3 UTR Sequences Identifies In Vivo Rules for mrna Degradation, Molecular Cell Volume 68, Issue 6, 21 December 2017, Pages e5

17 MPRA for RNA Stability Constant Sequence Variable Sequence Constant Sequence GFP AAA 3 UTR

18 Assay Take in vitro transcribed reporters and inject into embryos: Rabani, et al., A Massively Parallel Reporter Assay of 3 UTR Sequences Identifies In Vivo Rules for mrna Degradation, Molecular Cell Volume 68, Issue 6, 21 December 2017, Pages e5

19 MPRA Quality Check DNA source: Array synthesized Library size: 90k unique 3 UTR sequences Library construction: In vitro transcription, then inject mrna Reporter readout: 3 UTR RNA seq across time points Library input: recovered 95% of barcodes. Output: 39% of barcodes

20 Reproducibility INPUT: Recovered 95% of 90k sequences OUTPUT (post-injection): Recovered 39%, 34,809

21 Results Model decay with onset and decay rate parameters Classify reporters with model

22 Take aways Reporter design ensures that assay measured mrna decay and not transcription Directly test hypotheses about 3 UTR sequences by keeping all else constant Used synthetic model to predict endogenous maternal mrna decay. Model performance: r = 0.27

23 MPRA of mirna Binding Sites What sequence features make up a functional mirna binding site? Flow-sorting + barcode sequencing Genome integrated assay in K562 cells (yet again!) Rosenberg, et al., Unraveling the determinants of microrna mediated regulation using a massively parallel reporter assay, Nature Communications vol 9, Article number: 529 (2018)

24 MPRA of mirna Sites

25 MPRA Quality Check DNA source: Synthesized DNA (designed sequences) Library size: 14,151 3 UTR variants Library construction: Genome integration with ZFNs, 1 integrant per cell Reporter readout: Flow sorting based on NeonGreen, then sequencing Library versus single reporter correlation = 0.98 Technical variance = 10.5% Recovered 92.1% of barcodes

26 Experimental Design Here we designed synthetic MREs [mirna regulatory elements] for ten mirnas highly expressed in K562 This design allows us to test sequences that resemble native MREs as well as MRE types more relevant for synthetic biology, such as bulged and perfect match MREs, since they are essentially never observed in humans. We used these MREs, along with control sequences, in four major mutagenesis schemes

27 30 s Primer on mirnas Rosenberg, et al., Unraveling the determinants of microrna mediated regulation using a massively parallel reporter assay, Nature Communications vol 9, Article number: 529 (2018)

28 Key Results

29 Take aways Flow sorting/sequencing useful for measuring impact on protein expression Deep exploration of sequence space useful for learning the rules or sequence features

30 Screening the Human Genome for Regulatory Sequences What are the core functional elements of promoters? SuRE: Like STARR-seq but different (assay sheared genomic DNA) Episomal assay (useful for finding chromatin-independent function) van Arensbergen, et al., Genome-wide mapping of autonomous promoter activity in human cells, Nat Biotechnol Feb; 35(2): PMC /

31 SuRE Design

32 Let s Recap STARR-seq Arnold, et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq, Science Mar 1;339(6123):1074-7

33 SuRE Readout

34 Proximal sequences, distal enhancers, local chromatin context, and 3D conformation of the genome may all contribute to promoter activity. There is currently no estimate of the relative importance of these factors One important perturbation strategy is to take sequence elements out of their native context, to separate regulatory activities that are intrinsic to the underlying sequence from those that are extrinsic to it.

35 MPRA Quality Check DNA source: Kb genomic fragments. Library size: 150,000,000 - HUGE library! (Fortunately K562 cells are easily cultured and efficiently transfected.) Library construction: Classic episomal barcoded MRPA, except 5 barcode. Need extra sequencing step to link barcode with genomic fragement. Reporter readout: barcode sequencing 96% of genome with minimal 15-fold coverage (55-fold mean)

36 Reproducibility

37 Key Results Called 55,453 peaks across genome

38 Key Results Genomic Pol II Activity A substantial part of promoter activity is reproduced by sequence elements <2kb from the TSS, that is, in the absence of distal enhancers, chromatin context, and 3D organization. SuRE Expression (Same lab did the genomic position effect paper)

39 Take aways SuRE is a hybrid assay STARR-seq + synthetic barcodes. Some advantages to this. (What are they?) Tries to solve problem of screening large genomes for activity. Did they have right level of coverage for such a large library? ID regulatory function that acts independently of chromatin. TSS activity shows strong autonomous function.

40 Why Don t Transcription Factors Get Lost in Large Genomes? Transcription factors recognize short, degenerate sequence motifs Large genomes are packed with millions of these motifs Only a small fraction of motifs are ever bound (< 1%)

41 Why are so few TF motifs bound? Is accessibility the answer? (What drives accessibility then?) TF repressive permissive

42 Why are so few TF motifs bound? Is accessibility the answer? (What drives accessibility then?) TF repressive permissive What happens if you take these sites out of the genome and test on plasmids?

43 ChIP Peaks vs Genomically Unbound Motif Sites 1. Clone genomically bound and unbound motif occurrences into plasmid MPRA library 2. Measure TF binding to sites on plasmids 3. Measure CRE expression Grossman, et al. Systematic dissection of genomic features determining transcription factor binding and enhancer function.proc Natl Acad Sci U S A Feb 14;114(7):E1291-E

44 MPRA Quality Check DNA source: Synthesize barcoded oligos (145 bp CREs). Library size: 3000 (750 bound, 750 unbound PPARγ sites + matching motif mutants) Library construction: Classic episomal barcoded MRPA Reporter readout: barcode sequencing from adipocytes Reproducibility: Binding r = 0.93, MRPA r = 0.96

45 Prediction 1: Are genomically unbound PPARγ motif occurrences bound on plasmids?

46 Prediction 1: Are genomically unbound PPARγ motif occurrences bound on plasmids? YES!

47 Prediction 2: Are genomically unbound PPARγ motif occurrences active in the MRPA?

48 Prediction 2: Are genomically unbound PPARγ motif occurrences active in the MRPA? No!

49 Take aways A TF motif can drive binding in the absence of repressive chromatin A motif by itself is not sufficient for CRE activity Activity in episomal MPRA correlates with binding state in the genome!

50 Basic MRPA Data Analysis 1. Match sequencing reads to barcodes. (Discard reads that don t match.) 2. Normalize barcode RNA (cdna) by barcode plasmid DNA 3. Filter barcodes (or perhaps CREs) by read count 4. Calculate CRE means over all barcodes 5. Compare CREs - for example, one allele vs other allele

51 More MPRA Refs A review: White MA, Understanding how cis-regulatory function is encoded in DNA sequence using massively parallel reporter assays and designed sequences. Genomics Sep;106(3): Alternative splicing: Rosenberg AB, et al., Learning the sequence determinants of alternative splicing from millions of random sequences. Cell Oct 22;163(3): Human accelerated regions, behavior, genetic variation: Doan RN, et al., Mutations in Human Accelerated Regions Disrupt Cognition and Social Behavior. Cell Oct 6;167(2): e12 Exhaustive mutagenesis of cis-regulatory sequences: Chaudhari and Cohen, Local sequence features that influence AP-1 cis-regulatory activity. Genome Res Feb;28(2): More mirnas and RNA binding proteins: Cottrell K, et al. PTRE-seq reveals mechanism and interactions of RNA binding proteins and mirnas. Nat Commun Jan 19;9(1): 30,