Experiment Guide of Bioinformatics

Size: px
Start display at page:

Download "Experiment Guide of Bioinformatics"

Transcription

1 Experiment Guide of Bioinformatics -- 农学院生物信息学教学实验课 毛凌峰导师 : 樊龙江 Date:

2 Introduction of IBI

3 Introduction of Bioinplant

4 Introduction of Bioinplant 课件下载

5

6 Part1 Case1 如何构建系统发生树 1. 批量下载序列 2. Mega 多序列联配 3. Mega 进化树构建 4. Mega, FigTree, ITOL 编辑进化树 Case 2: 一堆未知序列, 如何批量获得序列注释信息? 1. 基于网站功能注释 2. 基于 windows 软件功能注释 3. 基于 Linux 本地功能注释

7 Part 1 Case1 1. 批量下载序列 16 sequence id: YP_ AAS ABP CAA CAA AAA YP_ YP_ NP_ AAZ ABI AFS NP_ YP_ YP_ YP_

8 Part 1 Case1 2. Mega 多序列联配 Kumar et. al., Mol. Biol. Evol. (2018).

9 Part 1

10 Part 1 Case1 3. Mega 进化树构建 构建系统发生树的意义 : 1. 对于一个未知的基因或蛋白质序列, 确定其亲缘关系最近的物种 2. 预测一个新发现的基因或者蛋白质的功能 3. 有助于预测一个分子功能的走势 4. 追溯一个基因的起源 构建系统发生树的方法 1 Distance-based methods 距离法 UPGMA) 非加权分组平均法 Neighbor joining(nj) 邻位归并法 2 Character-based methods 特征法 Maximum parsimony(mp) 最大简约法 Maximum likelihood method(ml) 最大似然法 3 Bayesian inference 贝叶斯推断法

11 Part 1

12 Part1 Yang, Z. & Rannala, B. Molecular phylogenetics: Principles and practice. Nat. Rev. Genet. 13, (2012).

13 Part1

14 Part1

15 Part 1 Case1 4. Mega, FigTree, ITOL 编辑进化树 进化树格式 :NWK

16 Part 1 FigTree: 1Rambaut, A. FigTree. See http//tree. bio. ed. ac. uk/software/figtree (2007).

17 Part 1 ITOL Letunic, I. & Bork, P. Nucleic Acids Res. 44, W242 W245 (2016); Cheng, F. et al. Nat. Genet. 48, (2016).

18 Part 1 Case 2: 一堆未知序列, 如何批量获得序列注释信息?

19 Part 1 Case2 1. 基于网站功能注释 植物序列数据库 导入 fasta 格式多序列文件 选择软件和类型 选择物种数据库 输出最匹配结果数量 Goodstein, D. M. et al. Nucleic Acids Res. 40, D1178 D1186 (2011).

20 Part1

21 Part1 蛋白功能在线分析 Hmmer

22 Part1

23 Part 1 Case2 2. 基于 windows 软件功能注释 Conesa, A. et al. Bioinformatics 21, (2005).

24 Part 1 Blast2go output

25 Part 1 Case2 3. 基于 Linux 本地功能注释 Jones, P. et al. Bioinformatics 30, (2014).

26 Part 2 1# (1). How many entries from rice (Oryza sativa) are in the public DNA database (such as GenBank)? (2). How many Waxy (granule-bound starch synthase) gene sequences from rice in the database?

27 Part 2

28 Part 2

29 Part 2

30 Part 2 #2 Please find the best hit(s) of an unknown transcript sequence in the public database and predict its potential function. >an unknown sequence CCTCGGAGATCTTCATGGGGGGCAAGAGCACCATCGTGCTGCACAACACCTGCGAGGAC TCGCTCCTCGCTGCACCCATCATTCTTGATCTGGTGCTCCTGGCGGAGCTCAGCACCAGG ATTCAGCTGAAGGCCGAGGGAGAGGTAAGAGTCTGACGAGATATGTTGCTAGTCTACTCT GTAGTCGAGATATACTTTGGGAGCCAAACTGAAGATTTCGCTGCTCCACTTGCATTTGTGC AGGACAAGTTCCATTCCTTCCATCCGGTTGCCACCATCCTGAGCTACCTCACCAAGGCAC CCCTGGTAAGAAACAATTCTCGACTGTTTGCTCTAAATAACCTATAGATAAATAAAGACGATT AACTGACGTGCCACTGAATTCCTCTGTTAACAGGTTCCTCCTGGCACGCCGGTGGTGAAC GCCCTGGCGAAGCAAAGGGCGATGCTGGAGAACATCATGAGGGCGTGTGTCGGCCTGG CGCCCGAAAACAACATGATCCTGGAGTACAAGTGAGGAGCGTGGCCCAAGCTCGCGGAG CCGAGAGCGACCGTACGTACGTAGCAAGTGGCGAGGGGCGACGGGAGGGCAGGACGAA GAAGAAGGCGAGATCGGCTGTGGAATTATTTGGCGGCTTGTCTTTAGTTTCCTTTGCGAAT CTTTCCCTGGTTAAGTTTACCCCAGTGAGTGTGTGTCCTTGCGAGAAAAG

31 #2 NCBI blast EMBL blast:

32 #2 NCBI blast Result

33 #2 EMBL blast result:

34 Part 2 #3 Use dynamic programming method, the Needleman- Wunsch algorithm, to perform global alignment of the sequences: P1=AGWGAHEA P2=PAWHEAEAG Scoring system: BLOSUM50 scoring matrix with gap penalty 8. BLOSUM50 (partial) A E G H P W A E G H P 10-4 W 15

35 Part 2 A E G H P W A E G H P 10-4 W 15 P1 A G W G A H E A P P A W H -32 E -40 A -48 E -56 A -64 G -72

36 Part 2 #4 Please annotate a genomic sequence from paddy weed, E. crus-galli(

37 Part 2

38 Part 2 Ref: S. bicolor Score: Ref: S. italica Score: 968.5

39 Part 2 #5 Perform multiple alignment for protein sequences of plant disease resistance genes(download)and find their potential consensus sequences. otein.txt Note: transfer to fasta format! Alignment Online:

40 Part 2

41 Part2 Mafft:

42 Part 2 Genedoc:

43 Part 2

44 Part 2 #6 Please write a mini review about one of two following topics: (1) Progresses in plant genomics in 2018 (2) Applications of bioinformatics tools on genomic studies in 2018 *You can choose one of two topics for your review; References related to your review paper should be listed at end of your paper. For example: Jarvis D, Ho YS, Lightfoot D, Schmöckel SM, Li B, Borm T, Ohyanagi H, Mineta K, Michell C and Saber N. The genome of xxx. Nature 542: (2018) **The review should have two pages of text.

45 Part 2

46 Part 2 有基本格式, 没有自己的想法像在记录流水账, 没有对生信方法的如何使用进行介绍非 2017 内容, 均为定义内容首先格式不对 内容不符合题意 不符合年份 说的测序技术也说不明白内容疑似摘抄, 只有两篇参考文献从文章直接抄袭了一段, 迟交

47 Part 2 课程论文格式 1 选题 :1~2 2 论文题目 3 名字学号专业 4 摘要 5 关键词 6 正文 7 参考文献

48 Part 3

49 CASE 1 如何电脑上进行快速搜索文件 Find the file in your PC

50 Part 3 CASE2 如何紧跟最新文献?Follow the papers in time

51 Part 3 Inoreader

52 Part3 Mendeley

53 Part 3 Case3 作物如何来做 GO 分析?

54 Part 3 CASE4 不会生信编程怎么做简单的序列分析?Sequence analysis

55 Part3 CASE5 怎么快速简单的做热图?

56 Happy new year! Please submit your homework in PDF format to 格式 : 学号 _ 姓名 _ 专业 _ 生信作业.pdf 截止日期 :2019 年 1 月 23 日