Important gene-information's

Size: px
Start display at page:

Download "Important gene-information's"

Transcription

1 Sequences, domains and databases. How to gather information on a gene. Jens Bohnekamp, Institute for Biochemistry Important gene-information's Protein sequence Nucleotide sequence Gene structure Protein domains Expression Variations (SNP) Regulation Conservations Publications

2 Insulin is a hormone that is central to regulating energy and glucose metabolism in the body. Insulin causes cells in the liver, muscle, and fat tissue to take up glucose from the blood, storing it as glycogen in the liver and muscle. Insulin stops the use of fat as an energy source. When insulin is absent, glucose is not taken up by body cells and the body begins to use fat as an energy source, for example, by transfer of lipids from adipose tissue to the liver for mobilization as an energy source. General gene-information's Source, Genecards, NCBI, Ensembl,

3 General information's General information's

4 General information's insulin

5 official gene symbol It is the best to search with the official gene symbol

6 Ontology Annotation Evidence Source Annotations evidence codes Code Meaning IMP: inferred from mutant phenotype IGI: inferred from genetic interaction IPI: inferred from physical interaction ISS: inferred from sequence or structural similarity IDA: inferred from direct assay IEP: inferred from expression pattern IEA: inferred from electronic annotation TAS: traceable author statement NAS: non-traceable author statement NR: not recorded E: experimental evidence P: predicted/computed nucleotide and protein sequence

7 The main features of the Reference Sequence (RefSeq) collection non-redundancy explicitly linked nucleotide and protein sequences updates to reflect current knowledge of sequence data and biology data validation and format consistency distinct accession series (all accessions include an underscore '_' character) ongoing curation by NCBI staff and collaborators, with reviewed records indicated nucleotide and protein sequence

8 RefSeq-Protein RefSeq-Nucleotide

9 How to find a protein sequence? From a gene (e. g. insulin) Source, Gencards, NCBI From a nucleotide sequence (translation) AGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCG CTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGG GGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGG CAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGA ACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCCGCCGCCTCCTGCACCGAGAGAGATGGAATAAAGCCCTTGAACCAGCAAAA Reading frames 1. TCTAAAATGGGTGAC S K M G D 2. TCTAAAATGGGTGAC L K W V T 3. TCTAAAATGGGTGAC Stop N G Stop

10 The first ATG in the nucleotide sequence is mostly the start codon for the peptide!

11

12

13 Stop-codon Start-codon

14 Gene structure On which chromosome located? Are there exons and introns? Where is the coding region? Nearby genes? Entrez Gene:

15

16 Gene structure NCBI Entrez gene: insulin (INS) Protein domains A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact threedimensional structure and often can be independently stable and folded. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. Pyruvate kinase, a protein from three domains Examples: Basic Leucine zipper, Zinc finger DNA binding domain, EF hand, Cadherin repeats, Immunoglobulin-like domains

17 Protein domains NCBI: InterProScan: Pfam:

18 Protein domains

19

20 Protein domains Expression Which tissues or cells express the gen of interest? Source: Genecards: BioGPS:

21 SORCE search: Insulin Expression

22 Expression Expression

23 Expression Expression

24 Blast/Blat 1. What is my nucleotide or protein sequence? 2. Where exactly is it located in the genome? 1. Blast NCBI: 2. Blat UCSC: If you have no idea where your sequence is from: take nucleotide blast

25

26

27 Blast/Blat 1. What is my nucleotide or protein sequence? 2. Where exactly is it located in the genome? 1. Blast NCBI: 2. Blat UCSC:

28

29

30

31 Get 5 and 3 flanking sequences of the gene

32

33 Restriction Enzymes (Restriction Endonucleases) enzymes that cut double-stranded or single stranded DNA at specific recognition nucleotide sequences (restriction sites) found in bacteria and archaea thought to have evolved to provide a defense mechanism against invading viruses inside a bacterial host, the restriction enzymes selectively cut up foreign DNA (restriction); host DNA is methylated by a modification enzyme (a methylase) to protect it from the restriction enzyme s activity categorized into three general groups (Types I, II and III) fundamental tools in genetic engineering NEBcutter:

34

35

36

37 BanII cut Primer for PCR (Polymerase Chain Reaction)

38 PCR Pimer Design Primer3:

39 Use brackets to mark the sequence of interest...ag[tg.tt]cg

40

41 potential anti-sense Primer Reverse complement!!! potential sense Primer

42 Potential AS Primer Reverse complement!!! Potential S Primer

43 Single-Nucleotide Polymorphism (SNP) A SNP is a single base substitution of one nucleotide with another, and both versions are observed in the general population at a frequency greater than 1%. For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. In this case we say that there are two alleles : C and T. Almost all common SNPs have only two alleles. Entrez Gene:

44

45

46

47