Go to Bottom Left click WashU Epigenome Browser. Click

Size: px
Start display at page:

Download "Go to Bottom Left click WashU Epigenome Browser. Click"

Transcription

1 Now you are going to look at the Human Epigenome Browswer. It has a more sophisticated but weirder interface than the UCSC Genome Browser. All the data that you will view as tracks is in reality just files that look like this: chr2l chr2l chr2l chr2l chr2l The first column is the chromosome, the 2nd column is the start of the feature in question, the 3rd column is the end of the feature, and the 4th column the value/magnitude for this feature. This example is tiny, the real ones are big because they include every spot in the genome. Go to Bottom Left click WashU Epigenome Browser Click This is human genome release 19. This is the place where you can access other genomes if you wish (do it later). Then click Now load some interesting tracks Click tracks Choose Public track hubs You will see this. Please load the Roadmap and the Encylcopedia. These contain collections of tracks. Each track identifies features on the chromosome. Some tracks belong to multiple sets.

2 Click See tracks Now you should see this: Every plus, expands the list.! There are many different! types of cancer cells and! many different types of! Expression measures.! Click and wait for the expansion! to see what is available. Green numbers are available choices.! Red numbers are the ones! that have been selected

3 When you are ready, click Remove all to clear the choice list. You are going to select tracks that will allow us to compare an embryonic stem cell to a differentiated tissue. We will use the publicly-available data. Click the + next to ES/IPS Cells. This a choice called Embryonic Stem Cells. Click the + next to it. Now you see a list of different embryonic stem cell lines. In the horizontal row click the + below Epigenetic Marks. The click the + below DNA Methylation Now find the column that says "Bisulfite-Seq" and the row that says H1. The cell at the intersection of these two says 0/9. These are 9 experiments in which the entire genome from embryonic stem cell H1 was bisulfite treated and sequenced. It was then compared to the sequence of the genome (w/o bisulfite treatment) to determine the position of the methylated C's. Click the green 9 and choose the top entry. Be sure to click the Add 1 track button. Click outside of this box to close it. In row H1, now click the "5" in the DNase hypersensitivity column. Add the first entry for "DNA hypersensitivity of H1." Now let's load some information for an adult tissue. Close the ES/iPS row by clicking the minus. Open the Adult Cells/Tissue row by clicking the + next to it. Find the entry called Brain and click the + next to it to open the row. Now add the Bisulfite Sequencing for Brain Germinal Matrix (which brain tissue is not critical-i just picked one). Scroll to the top and click the X to close this window. It should look like this:

4 Change the order of the rows by dragging the NAME up or down. Make it look like this: All of those red tics in the first two rows are methylated C's. Row 1 has the methylated C's in our stem cell. Row 2 has the methylated C's in the brain tissue. Row 3 shows DNase hypersensitive sites in the stem cell chromatin (discussed below). What is the RepeatMasker Ensemble line? From RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). The Refseq genes shows transcription units. Things overlap because of alternative splicing and alternative promoters and alternative polya sites. Right click on the RepeatMasker Ensemble label and choose configure from the popup menu. Now you see what the colors mean. They represent different types of repetitive sequences. The first four entries are all transposon sequences or sequences derived from transposon. Some of the ones below are transposons as well.

5 Let's add another track. Click Tracks. Choose Public Track Hubs. Click "See tracks" in the first entry. Open ES/iPS Cells (click the +). Open Embryonic Stem Cells. Open Transcription Regulator. Click this plus. In the H1 row Choose SP1. Select the top experiment and Add 1 Track. This is a ChIP experiment in which the antibody was specific for the transcription factor SP1. Click the little "i" to read about it.

6 Why did we choose SP1? Because it is one of the transcription factors that binds GC boxes and will bind in CpG Islands. Close the window by clicking the X. The new line is called HudsenAlpha -blah, blah, blah, SP1. Close the window by clicking the X. Now let's add another track. Click Tracks, then click Annotation tracks. Then click G/C related on the left and CpG Island on the right.

7 A track that of "computationally" predicted to be CpG Islands is shown. Right click on the name of the track and change it to Density. Drag the CpG Island track up and place it just above the RepeatMasker Ensemble track. How to save your work. Now would be a good time to save you work. Click Apps and then click Session. click Save. Click Link

8 Copy (don't Click ) the http address into a different browser window. Then bookmark it. The Rita the stealth is just a funny name that the programmers give as a default name. You can also download a file that can be uploaded later to restore your session. The session ID can be used too. I just use the bookmark. "X" out to close the window. Now let's look around a bit. Your coordinates should be: You can click the blue coordinate box and type them in if necessary. What do you see? Scroll around the genome, use the +, -1/3, -1, -5 controls to zoom. Use the < or > to move or just drag left or right. Zoom way out. Zoom way in. What is the DNase hypersenstivity sites track? To determine this, chromatin was taken and exposed to a DNase I (cuts anywhere it reaches). Some areas are cut really easily - these are the hypersensitive sites. They get a tick. Some are not. They don't have a tick. Think about it. Answer these questions. While you are doing this remember that the genome is a place of degrees not absolutes. The Hox regions (chr7: neighborhood) does not have all of the answers. Look here and then scroll around and zoom a bit.

9 1. Do DNase hypersensitive sites tend to overlap with highly methylated DNA? Why do you think that this is so? 2. Do DNase hypersenstive sites tend to overlap with transcription start sites? Why do you think that this is so? 3. SP6 is one of the proteins that can bind GC rich regions. SP6 then binds TFIID. What is the relationship between SP6 binding and the other tracks? 4. Now consider the CpG Island (computed) track. How does it relate to Sp6, hypersensitivity, C methylation, AND the beginning of genes? 5. Is the methylation pattern of the stem cell and the brain tissue similar? Why or why not? Time to add more tracks This time you will do it on your own. If you are starting from the bookmark then it sometimes helps to open a new window or restart the browser if something unexpected happens. Go to the Public track hubs. Load the Roadmap and Encyclopedia. Reloading it will add the 10 default tracks. From Public track hubs. Click see tracks and turn off the 10 tracks found in ES/iPS Cells/ Derived Cells. Choose it and click remove all, as shown below). For the tissue ES/iPS /Embryonic Stem Cells / H1 I want you to add a track for the histone mark Histone H3 lysine 3 trimethylation (H3K4me3). Just add one of them.

10 Then add H3K4me1 for the H1 cell line. Now add H3K4me3 and H3K4me1 for the Brain tissue. Organize things so that it looks like this. Now we are going to look at a group of genes all at once. Click Apps.

11 Then click Gene & region set. click add new set. Delete whatever is in the box on the left and copy and paste these in. SP1 Frat2 GPD1L PRR13 Name it and submit it. Click it to open it again and choose view and edit. Click this button: Adjust everything so that you have 2000 nucleotides upstream and 2000 nucleotides downstream CENTERED on the transcription start site. Later try making this bigger or smaller. Click "done" in the lower right corner to finish. Click the name of the gene set again but this time choose Gene set view.

12 Notice that for the histone methylation mark that the stem cell line does not look like the brain tissue. Answer these. 6. For the brain tissue is me1 or me3 associated with +1? 7. Notice that H3K4me3 often has a gap right at +1. Why? 8. How do you turn of the Gene Set View? Make another gene set that is a superset of this one. Here is the list. Add some genes that interest you. Explore. SP1 Frat2 GPD1L PRR13 slo CREB1 ACTB Take a break BUT FIRST SAVE YOUR SESSION. Apps / Session/ Click Save/ Link button/ copy out the web address. ADD A TRACK ANOTHER WAY. Click the tracks button. See the search box? Type CBP in it. CBP/p300 is a histone acetyl transferase that is involved in activating many genes. Choose Broad ChipSeq Osteobl P300. Broad is the name of a place, ChipSeq is the technique, K562 is a tissue culture line, and P300 is the enzyme. The data is chromatin immunoprecipitation data that shows you where CBP binds - at least in the K562 line. The other entries are essentially CBPs that have

13 different names. Look around. p300 binds is often used to activate gene expression. 9. Tell me something else that you notice by snooping around. Take a look at Gene plot and Scatter plot. What do they do? Don't answer this yet. Just take a look.