The Federated Plant Database Initiative for the Legumes

Size: px
Start display at page:

Download "The Federated Plant Database Initiative for the Legumes"

Transcription

1 The Federated Plant Database Initiative for the Legumes Steven Cannon (ARS ISU) Ethy Cannon (ISU) David Fernandez-Baca (ISU; PI) Andrew Farmer (NCGR) Jeremy DeBarry (iplant) Chris Town (JCVI)

2 The Federated Plant Database Initiative for the Legumes Steven Cannon (ARS ISU) Ethy Cannon (ISU) David Fernandez-Baca (ISU; PI) Andrew Farmer (NCGR) Jeremy DeBarry (iplant) Chris Town (JCVI) Sudhansu Dash, Vivek Krishnakumar, Nathan Weeks, Jacqueline Campbell, Akshay Yadov Others in federation projects... Tripal: Dorrie Main, Stephen Ficklin, Lacey Sanderson, Sook Jeung; Phytozome, SoyBase, KnowPulse, MedicagoGenome, USAID chickpea project, Noble Foundation alfalfa project,...

3 "The Federated Plant Database Initiative for the Legumes "Legume Federation" for short

4 "The Federated Plant Database Initiative for the Legumes "Legume Federation" for short Outline: 1. Project overview 2. Some tools for plant biology research, especially in the legumes; focusing on the Legume Information System: legumeinfo.org Contact us: or legumefederation.org/contact

5 Challenges/opportunities: 1) Many competing /overlapping genomic data portals: SoyBase, the Legume Information System (LIS), PeanutBase, Cool Season Food Legume database (UW), KnowPulse (USask), LegumeIP (Noble), MedicagoGenome (JCVI), PhaseolusGenes (UC Davis), LegumeBase (Kazusa), MedicagoHapmap (UMN).

6 Challenges/opportunities: 1) Many competing /overlapping genomic data portals. 2) Genomic data is growing rapidly - in volume (about ten legume genomes have sequenced genomes now, and many of these have many resequenced accessions), and in complexity (genetic data is complex: maps, phenotypes and traits, literature, various ontologies, breeding information, GWAS data, etc.)

7 Challenges/opportunities: 1) Many competing /overlapping genomic data portals. 2) Genomic data is growing rapidly. 3) It is important to have specialists for each crop. The genetic characteristics often differ (e.g. genetic mapping challenges for peanut - a recent allotetraploid), and each crop has its own breeding objectives, population and germlasm peculiarities, etc.

8 Challenges/opportunities: 1) Many competing /overlapping genomic data portals. 2) Genomic data is growing rapidly. 3) It is important to have specialists for each crop. 4) Software development is hard. All groups are short-staffed. Wheels should not be reinvented.

9 Challenges/opportunities: 1) Many competing /overlapping genomic data portals. 2) Genomic data is growing rapidly. 3) It is important to have specialists for each crop. 4) Software development is hard. 5) We should utilize knowledge gained about many legumes: Medicago and Lotus for e.g. nodulation, flowering, developmental mutants, population biology, mycorrhization; soybean for e.g. seed development, protein, disease responses; common bean for e.g. domestication processes, determinacy, seed patterning; pea for developmental mutants, pod development, carbohydrate storage, etc. etc.

10 Challenges/opportunities: 1) Many competing /overlapping genomic data portals. 2) Genomic data is growing rapidly. 3) It is important to have specialists for each crop. 4) Software development is hard. 5) We should utilize knowledge gained about many legumes.

11 Our response: 1) Get easier-to-use genomic visualization and analysis software closer to the data generators and biologists/breeders. 2) Encourage and help enable those groups to contribute to the development of the same software. 3) Build methods that enable sharing of data among sites. 4) Encourage use of common formats and repositories.

12

13 We re trying not to do this. Instead, build on and support existing standards; look for commonalities among existing data portals; make sharing easier.

14 Some tools for the Legume Federation project(s) Chado is a standardized database schema for storing biological data. Scott Cain, Chris Mungall, et al. Tripal is a set of customizable Drupal modules for constructing biological websites. Dorrie Main, Stephen Ficklin, Lacey Sanderson, et al.

15 Some tools for the Legume Federation project(s) Chado is a standardized database schema for storing biological data. Scott Cain, Chris Mungall, et al. LegumeInfo PeanutBase MedicagoGenome KnowPulse CoolSeasonFoodLegume [Noble Foundation alfalfa] LegumeFederation Tripal is a set of customizable Drupal modules for constructing biological websites. Dorrie Main, Stephen Ficklin, Lacey Sanderson, et al.

16 Some tools for the Legume Federation project(s) MedicagoGenome MedicMine Araport (AIP) ThaleMine LegumeInfo 2016 PeanutBase 2016 Biological data warehouse (and interfaces) Gos Micklem et al. 16

17 Some tools... Ken Yuens-Clark; will be rebuilt for greater interactivity (Ethy Cannon, Andrew Wilkey) MedicagoGenome MedicMine Araport (AIP) ThaleMine LegumeInfo 2016 PeanutBase 2016 LegumeInfo.org 17

18 Some tools for the Legume Federation project(s) 18

19 Some tools for the Legume Federation project(s) iplant tools, methods, storage 19

20

21 Templates for maps, markers, publications, QTLs, GWAS, traits 21

22 QTL & traits template (7 worksheets) QTL worksheet Researchers as curators: templates available at LegumeInfo, PeanutBase, and LegumeFederation

23 Main open-source packages

24 Various cross-data-set search tools

25 Some tools for the Legume Federation project(s) iplant tools, methods, storage Data Store for major data sets; Metadata methods (irods); Integration with CoGE and other tools 25

26 LegumeInfo.org Browsers

27 Outline: 1. Project overview 2. Some tools for plant biology research, especially in the legumes; focusing on the Legume Information System: legumeinfo.org

28 Monthly updates (first Wed of each month) LegumeInfo.org

29 Coming soon: Lotus 3.0, then mungbean, then lupin LegumeInfo.org

30 Sequence search (BLAST & BLAT); keyword search (of features in the browsers)

31 BLAT or keyword hits on Arachis ipaensis: genome overview Sequence search (BLAST & BLAT); keyword search (of features in the browsers)

32 Synteny mappings between each genome

33 QTLs, traits, maps (search interface)

34 QTLs, traits, maps (CMap: bean, on consensus genetic map)

35 Marker-Assisted Selection pages (peanut) Markers for the trait Informa0ve for learners

36 Marker-Assisted Selection pages (peanut) Researchers as curators: These pages are directly editable by experts

37 Gene families: based on (and linking to) phytozome families, but legumefocused

38

39

40 Context viewer: displays genomic contexts of orthologs, in terms of gene families (work of Alan Cleary)

41 Context viewer: displays genomic contexts of orthologs, in terms of gene families (work of Alan Cleary) Choose track, search for similar contexts...

42 Context viewer: gene alignments and context views (local & global)

43 Examples of interoperability and shared development

44 "The Federated Plant Database Initiative for the Legumes "Legume Federation" for short Outline: 1. Project overview 2. Some tools for plant biology research, especially in the legumes; focusing on the Legume Information System: legumeinfo.org Contact us: or legumefederation.org/contact