Nordic health data project. Magnus Eriksson, Chair of metadatagroup 28. nov 2018

Size: px
Start display at page:

Download "Nordic health data project. Magnus Eriksson, Chair of metadatagroup 28. nov 2018"

Transcription

1 Nordic health data project Magnus Eriksson, Chair of metadatagroup 28. nov 2018

2 Research Infrastructure for Nordic Health Data NordForsk Program on Health and Welfare Recent background: NordForsk Policy Paper 2017: Nordic Biobanks and Registers A basis for innovative research on health and Welfare. Juni Palmgren, Karolinska Institutet, SE A vision of a Nordic Commons a Nordic integrated platform for cross-border sharing of data and tools Norwegian Presidency 2017 "Norden i omstilling. Nordic Collaboration in clinical trials and health data. Maiken Engelstad, MoH (Ministry of Health), NO 2

3 Nordic Commons - vision 3

4 Data integration and interoperability? Secure sharing? In the Nordics? Unique opportunities! Obstacles: Many data owners Multitude of regulations and practices Weak documentation, metadata, ontologies Tedious and complicated within countries more so between countries Several reports: Bo Könberg 2014, NOS-M Nordic White Paper 2014 and other 4

5 Interoperability Challenges 5

6 Components of a Commons eco-system A computing environment, such as the cloud and/or HPC (High Performance Computing) resources, which support access, utilization and storage of digital objects. Data & metadata sets that adhere to a set of Digital Object Compliance Principles which describe the properties of digital objects that enables them to be findable, accessible, interoperable and reusable (FAIR). Software services and tools that enable; Scalable provisioning of compute resources. Interoperability between digital objects within the Commons. Indexing and thus discoverability of digital objects. Sharing of digital objects between individuals or groups. Access to and deployment of scientific analysis tools and pipeline workflows. Connectivity with other repositories, registries and resources that support scholarly research. 6

7 Governance NordForsk Programme Committee Nordic Programme for Health and Welfare Nordic Health Data e-infrastructure J Palmgren SE NTA 2.0 O A Opdalshei NO Health Data e-infrastructure Metadata M Eriksson SE Secure Private Cloud P Løngreen DK Legislation M Salokannel FI 7 7

8 Nordic health metadata Nordic health metadata work group Magnus Eriksson 18. april 2018

9 Metadatagroup Vision To create a Nordic health (meta)data population cohort of 26 mil people according to the FAIR principles 9

10 Metadata descriptions - Goal Creating a Nordic common Metadata repository ecosystem for harvesting and consuming Nordic metadata resources. Giving the researchers opportunity to browse, find and evaluate data resources across the Nordics. Automating contributions of metadata from the researchers processing and analysis back to the ecosystem and enabling efficient reuse. 10

11 Domain definitions Health data Biobanks Health Registries Socioeconomic Registries Laboratory data Registries of Clinical Quality Health Surveys/Cohort studies OMICS data [Electronic health records Primary use?] 11

12 Metadata Descriptions of data content and context On different levels of detail F A R I 12

13 Status in the nordics rough estimate (Aut-18) 13

14 Levels of detail metadata & Semantics Descriptions of Content Examples Ex. Standards Dataset level standards Framework standards Domain specific standards Related standards Semantics Persistent Identifiers Attributes to describe the dataset. How to describe data and concepts used for descriptions What should be described and details on how. Concepts and terms to define meaning and context for humans and computers. Unique keys for metadata and data resources. Creator, Title, Publisher, Publication year, ResourceType, Funding (DataCite) Concept, ConceptSystem, Variable, Population, Dataset Patient (resource, domain, unittype ) Birth time (attribute, variable ) Nationality Organisation Alias Period Medication (HL7 FHIR) Läkemedel SCTID: , Läkemedel (SnomedCT) Läkemedel för humant eller veterinärt bruk, i sin bruksfärdiga form. Hit räknas också de ämnen som används i framställningen av den färdiga preparatformen. (Mesh) Persistent Identifiers for researchers, Data Sets DataCite DDI DCAT-AP ISO11179, GSIM HL7 FHIR HL7 V3 DDI OMOP SnomedCT Mesh Loinc Nationellt fackspråk DOI ORCHID 14

15 Proposed actions metadata & semantics Establishment of two networks/working groups covering Nordic (i) standards competence and (ii) domain expertise is crucial. The former works on how data should be described in order to be effectively interpreted, understood and exchanged by machines and humans. It coordinates the work on metadata for the Nordic orchestrator. The latter sets up a common Nordic foundation for a clinical and health language, which describes and defines data in health domain terminologies for health registers, biobanks, electronic journals, laboratory results, population registers etc. International standards and international domain terminology are used where possible. 15

16 Nordic secure orchestrator Magnus Eriksson 28.nov 2018

17 National policy programs for Integrated Health Data Denmark - Unique position with Integrated Health Data Finland - Isaacus programme Norway - The Norwegian Health Data Program is working on concepts for a national health analysis platform Sweden No specific national health data program To date, there is no specific national health data program for Sweden. The landscape is rather fragmented. Vetenskapsrådet has a Register Infrastructure Programme with a RUT data interface. Vinnova has a strategic innovation program SweLife and a recent initiative Genomic Medicine Sweden. Nordic Secure Cloud Infrastructure 17

18 Current status National e-infrastructures for Sensitive Personal Data Denmark Computerome DeiC - National Life Science Supercomputer: Computerome is the National dedicated e-infrastructure for health care and life sciences. It supports 1600 users locally and on European scene through its involvement in the ELIXIR and initiatives NeIC Tryggve. It provides a secure cloud service. Finland CSC epouta CSC epouta is a Finnish cloud computing environment delivered as Iaas (Infrastructure as a Service) designed for processing sensitive data. The epouta cloud is being routinely used by several user groups, including national Center of Excellence for Tumor Genetics and Finnish Institute for Molecular Medicine. Norway TSD The project Services for Sensitive Data (TSD), initiated by USIT (The University Centre of Information Technology) at The University of Oslo, is a national service to researchers in Norway and abroad for storing and processing sensitive data, including health data. TSD provides a secure cloud service in production environment. Sweden Bianca, Mosler, RUT and MONA Currently no unified national cloud solution for health and welfare, but several actors are involved offering their own local solutions to health and welfare data producers and users. However, the e-infrastructures for sensitive research data are in the forefront and are best qualified to be considered national cloud solutions. These would be the: Bianca system on the Swedish National Infrastructure for Computing Swedish ELIXIR system Mosler Swedish Registry Utilizer Tool being built (RUT) Statistics Sweden s Microdata Online system (MONA) Collaboration through the Tryggve/Tryggve2 ( ) projects for sensitive data hosted by the Nordic einfrastructure Collaboration NeIC 18

19 A Nordic solution The Orchestration engine distributes the tasks to the available data centers. All operations are logged in a Nordic Log Store. The derived data are assigned an identifier (e.g. DOI). Metadata of the analysis process feeds into the original metadata repositories. The loop is closed! 19

20 Type of Nordic projects 1. Temporary pooling of all data, performing all analyses in one location. 2. Distributed analyses of harmonised data. Aggregates of anonymised data are the results after processing 3. Centralised processing of distributed data that is streamed to the central platform without making any local copies of it. Project types 1 and 2 are feasable with current infrastructures Project types 3 should be the innovation focus A Nordic infrastructure must be flexible in its design to accommodate such wide range of approaches all the while upholding the FAIR principles. 20

21 EXAMPLE FLOW 3RD PARTY PROVIDER RESEARCH INSTITUTION BIOBANK/REGISTRIES Nordic Data source #2 21 Nordic Data source #1

22 Proposed actions secure orchestration The goal for the Nordic orchestrator is to establish a federated, secure, scalable environment for using Nordic sensitive biomedical datasets in research. The iinnovation focus lies on centralised processing of distributed data that is streamed to the central platform without making any local copies. Data will be available for joint analyses on permission and analyses will be carried out using traditional software environments (e.g. R, SAS). A detailed design plan for a common orchestrator architecture needs to be formulated and agreed on by the respective national cloud compute infrastructures for sensitive data. In each country a part of the cloud needs to be allocated to the orchestrator and governed through common software. Authoriszation and authentication procedures need to be defined as well as the processes for moving data from national repositories to the orchestration space 22

23 Nordic Healthdata 23

24 Tack. Takk. Tak. Kiitos. 24