Executive Committee Meeting

Size: px
Start display at page:

Download "Executive Committee Meeting"

Transcription

1 Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: Access Code: For international call in numbers, please visit:

2 Agenda Action Items Executive Committee Actions/Discussions Core Development Team Actions/Discussions Prototype User Interface Demo Working Group Actions/Discussions biocaddie Upcoming Deliverables Supported by the NIH grant 1U24 AI to the University of California, San Diego 2

3 Actions Items Task Provide suggestions for invitees to the biocaddie All-Hands meeting and review invitation letters Google Drive > All-Hands Meeting 2015 Pilot Projects final report submitted 8/31 Compose a joint response to Scientific Data draft guidelines and send comments to Lucila Who Executive Committee Admin Team Executive Committee Supported by the NIH grant 1U24 AI to the University of California, San Diego 3

4 Actions Items BD2K retreat: proposal for two workshops Journal publishers and DDI Identifiers and metadata information Minimal information and processes to allow indexing and sharing of digital objects DDI beyond the prototype Engage users and stakeholders to explore how to scale the DDI prototype beyond biomedical data Supported by the NIH grant 1U24 AI to the University of California, San Diego 4

5 Harvester RFA Pilot Project on Harvester for DDI Schema RFA Submissions: 7 Applications: biocaddie Google Drive Pilot Project on Harvester reviewers Institution Brown University Emory University Lawrence Berkeley National Laboratory Mayo Clinic University of North Carolina at Chapel Hill University of Utah Yale School of Medicine Name Indra Neil Sarkar Steve Qin Christopher Mungall Guoqian Jiang Arcot Rajasekar Ramkiran Gouripeddi Peter Gershkovich Supported by the NIH grant 1U24 AI to the University of California, San Diego 5

6 Core Dev. Team - Progress Task ElasticSearch endpoint deployment Working with CTRI IT staff to resolve JVM memory issues Data Ingestion Pipeline Beginning development of management interface Reconciliation of indices with WG3 metadata model ElasticSearch open access proxy Due Ongoing Ongoing Ongoing Deployed API keys created for developers Public documentation for ElasticSearch EndPoint 8/28 Data Set Ingestion (PDB, GEO, LINCS, dbgap, ICPSR ) dbgap, ICPSR: 9/30 Supported by the NIH grant 1U24 AI to the University of California, San Diego 6

7 Core Dev. Team - Progress Task Pilot project integration PP 1.1 Integration of specialized advanced search PP 2.1 Ranking function based on citation metrics PP 2.2 isee similarity metric in ElasticSearch, Search expansion, DELVE integration PP 3.2 Data Mention Pipeline for PDB UI Development Includes integration of PP 2.2 (DELVE part) as an exploratory search and visualization option. Preliminary UI ready for demo Ontology web services: Integration with search and indexing pipelines Due 9/30 Ongoing Ongoing 9/1 Presentation on 9/1 Ongoing 9/01 Ongoing Supported by the NIH grant 1U24 AI to the University of California, San Diego 7

8 WG2 Identifiers (Grethe) Task/Description/Topic 1 st Meeting: Presented initial draft for DDI prototype s handling of identifiers. Feedback: agreed with basic internal handling of identifiers Initial Meetings: Began discussion of identifier ecosystem via working group participants Review Meeting: Based on initial meetings and discussion, develop a list of specific deliverables for biocaddie WG2 to address Continue work on specific deliverables discussed as in scope during the review meeting Date 6/11 6/25 7/23 8/6 9/3 Supported by the NIH grant 1U24 AI to the University of California, San Diego 8

9 WG3 Metadata (Sansone) Task/Description/Topic Phase 1 = completed Metadata Specification v1, released: doi: zenodo Document describing the process, the material reviewed (Appendix I), the use cases used to identify an initial set of metadata elements (Appendix II); a JSON schemata and examples. Live Google document open for comments Files also shared with BD2K Metadata WG (with George and Mark s approval) Evaluation phase = started Model being implemented with Dev. Team. The WG3 members will be consulted only if needed, see below. Phase 2 = planned The results of the evaluation will inform this phase, where the metadata elements and the model may be revised, simplified and/or enriched, as needed. Date 7/14 ongoing TBD Supported by the NIH grant 1U24 AI to the University of California, San Diego 9

10 WG4 Use Cases and Testing Benchmarks (Xu) Progress Date WG4 membership First meeting: Goals and deliverables for the WG discussed. Meetings to be held biweekly. White paper and Use cases document to be reviewed by WG members. Use cases and datasets to be prioritized for benchmark development Completed 8/6 Second meeting scheduled 9/3 User needs interviews: Initial round of interviews finished Analysis completed Report 7/22 8/25 Ongoing Supported by the NIH grant 1U24 AI to the University of California, San Diego 10

11 WG5 Dataset Citation Metrics (Grethe & Sansone) Progress Date Identifying groups outside of biocaddie working on Citation Metrics (CASRAI/RDA, PLOS/CDL/DataOne, ) Initial conference call held with CASRAI/RDA group and PLOS/CDL/NISO group Action Items: Additional participants identified (THOR) Conference call held with CASRAI/RDA, PLOS/CDL/DataONE, NISO and THOR groups Agenda: Get lesson learned and/or recommendation they can share; also get DDI qualitative and quantitative metrics in their use cases Developing initial working group document outlining information on current efforts highlighting what biocaddie could potentially adopt and implement Completed 6/26 8/27 Ongoing Supported by the NIH grant 1U24 AI to the University of California, San Diego 11

12 WG6 Criteria for Being Included in the DDI (TBD) Task/Description/Topic Date Discussion of working group activities Supported by the NIH grant 1U24 AI to the University of California, San Diego 12

13 WG7 Machine Actionable Licenses (Alter) Task/Description/Topic Date GOAL The DDI should present users with information about licenses and terms of access for data. Access conditions should be described using a standard machine readable format and terminology. Many of the resources in the DDI will be accessed by automated processes. License information should be both human and machine readable. ACTIVITIES/RESPONSIBILITIES Review existing models of data access conditions (e.g., RDA Practical Policies WG). Review mechanisms (e.g., Schema.org) for embedding license information with the data it covers. DELIVERABLES Metadata model and controlled vocabularies for data access that can be incorporated in the DDI core metadata. Recommendations for implementing the machine readable portion of the license. ACTION Discuss participation invitation list: biocaddie > Working Groups > WG7 > Invitation List Supported by the NIH grant 1U24 AI to the University of California, San Diego 13

14 WG8 Ranking Algorithm (Xu) Task/Description/Topic Date GOAL Bring together information retrieval experts and biomedical domain experts with expertise in searching and using large datasets to identify priorities for the system users to facilitate aggregation and sorting of the results. ACTIVITIES/RESPONSIBILITIES Define the characteristics of a set of ranking metrics for relevance. Examples might include number of citations to the dataset and/or initial publication or a PageRank-like metric, number of downloads (if available), completeness of metadata, and an estimate of the relative importance of indexed metadata fields to users. DELIVERABLES Report that identifies the categories that are useful features (based on user needs) for ranking of search results for implementation in the prototype. ACTION Discuss participation invitation list: biocaddie > Working Groups > WG8 > Invitation List Supported by the NIH grant 1U24 AI to the University of California, San Diego 14

15 Upcoming Working Groups Y2Q1 WGs WG Title Leader 6 Criteria for Being Included in the DDI TBD 7 Machine Actionable License George Alter 8 Ranking Algorithm Hua Xu Y2Q2 WGs WG Title Leader 9 End User Evaluation Criteria TBD 10 Repository Collaboration Sansone & Alter Supported by the NIH grant 1U24 AI to the University of California, San Diego 15

16 Actions June-August June Task Set up WG4 (Hua/Anu) and WG5 (Jeff, Susanna, Steph) Completed Due Anu/Steph June Prepare RFA for Pilot on Harvester Completed Hua/Jeff July Review and select PP on Harvester Tech Team July August Tech team to work with existing PPs to provide instructions for PP tool integration. (PP1.1, PP2.1, PP2.2) In Progress Collect final reports and slide shows from all existing pilots Completed Tech Team Cleo Supported by the NIH grant 1U24 AI to the University of California, San Diego 16

17 Deliverables June-August Core Dev team Data identifier implementation ON TARGET Working Groups (WG)/Pilot Projects (PP) WG4 - Use cases and testing benchmarks (Hua, Anu) ON TARGET Index datasets using metadata standards, tested on 2-3 datasets from BD2K centers (per WG3) REVISION IN DISCUSSION, WAITING ON NIH Set up testing benchmarks ON TARGET Design a repository search function ON TARGET WG5 Dataset Citation Metrics (Jeff, Susanna, Steph) ON TARGET Wrap-up of Pilot Projects (Cleo) ON TARGET RFA for Pilot on Harvester for DDI schema (metadata compliant sets) (per WG3) IN PROGRESS Supported by the NIH grant 1U24 AI to the University of California, San Diego 17

18 Deliverables Supported by the NIH grant 1U24 AI to the University of California, San Diego 18

19 Deliverables Supported by the NIH grant 1U24 AI to the University of California, San Diego 19