The DLR Institute of Data Science

Size: px
Start display at page:

Download "The DLR Institute of Data Science"

Transcription

1 DLR.de Chart 1 The DLR Institute of Data Science Marcus Paradies, Sirko Schindler, Bunjamin Memishi DLR Institute of Data Science, Jena Data Management Technologies Group

2 DLR.de Chart 2 DLR Institute of Data Science, Jena Partners:

3 DLR.de Chart 3 DLR Institute of Data Science, Jena Secure Software Engineering Visual Analytics Citizen Science Data Management Technologies Climate Informatics Digital Production Platforms

4 DLR.de Chart 4 Climate Informatics Understand the earth system and climate change Project global mean temperature for different scenarios Addressed challenges: Reduce uncertainty in climate models Big climate data (Petabyte scale) Different nonlinear processes interacting on different time scales

5 DLR.de Chart 5 Secure Software Engineering Alarming amount of vulnerabilities over the years Software engineering process needs improvements Addressed challenges: Categorize vulnerabilities and find patterns How are vulnerabilities introduced to software? Can we detect susceptible code fragments at development time and notify the developer?

6 DLR.de Chart 6 Digital Production Platforms Digitalize spacecraft manufacturing Reduce human efforts in data communication Production Planning (CEF) Test (Components + System) Launch Utilization (Mission Data) Utilization (Health of Components) Platform Services Addressed challenges: Industry 4.0 Platform Unified data sharing platform for all stakeholders Electronic data sheets Agencies ESA JAXA NASA CNES Translate between different vocabularies using semantic techniques

7 DLR.de Chart 7 Citizen Science Turn citizen generated data into usable knowledge for science, crisis management, Addressed challenges: Engaging people (gamification, learning, ) Exact localization without expensive hardware Extracting knowledge from social media Engaging people Making data fit for (re)use Gaining insights Reuse of generated data (FAIR-principles) Citizen Science Lab Jena

8 DLR.de Chart 8 Visual Analytics Interactive and explorative visualizations of huge, complex, multi dimensional, and heterogeneous datasets Extend human analysis skills Addressed challenges: Scale independent analysis Visualization of uncertainties Interactive methods for multi dimensional data Cooperative data analysis Data Mining Statistics Machine Learning Data Retrieval Compression Parallel Computing Infrastructure Computer Graphics Data Analysis Data Management Scientific Visualization Information Visualization Visualization Visual Analytics Perception & Cognition Human- Computer Interaction Visual Intelligence Gestalt Psychology Sense Making Memory Problem Solving Decision Making Theory Computer Supported Cooperative Work Information Design Interaction Design

9 DLR.de Chart 9 Data Management Technologies Handle large volumes of heterogeneous data Ease data access Near-data processing Addressed challenges: Semantic Annotation Explore data reduction capabilities on all layers to save resources in processing Storage organization for efficient access Near-Data Processing Data Management Technologies Indexing Resource allocation in computing clusters Domain-specific metadata descriptions Storage Organization Resource Elasticity/ Adaptivity

10 DLR.de Chart 10 Big Data Buzzword or Reality? Source:

11 Slide 11 Earth Observation Data End-to-End Production Chain Data reception Application Calibration Product generation Archiving Access

12 DLR.de Chart 12 Data Access Data Integration Data Quality Dataset (Primary) Data Metadata Storage organization Multi-level caching strategies Index structures (spatial, temporal, multi-dimensional) Harmonization and translation of vocabularies Discoverability Provenance

13 DLR.de Chart 13 Challenge: Hierarchical Storage Management in D-SDA How to pick a workload-aware cache eviction policy? How to organize data on tapes efficiently, i.e., workload-aware? Loaded? Mount Get/Put Return Load Tape Drives Unmount Disk Cache (~175 TB capacity) How to schedule incoming requests? Archive Server How long to keep tapes in the drive until unmounting them again? Tape Library (~50 PB capacity)

14 DLR.de Chart 14 Approach: Tape Library Simulation (and Benchmarking) Prerequisite for technology generation change and configuration management No dedicated test hardware, so simulation (and benchmarking) of what-if scenarios Goal: Simulate current and potential future setups and configurations (Tape and VTL) Synthetic Workload Archive Access Benchmark Configuration Production System Simulator Benchmarking Simulation

15 DLR.de Chart 15 Challenge: Semantic Gap between producer and consumer NASA Jonathan Billinger

16 DLR.de Chart 16 Approach: Metadata using Semantic Annotation Metadata uses concepts from ontologies Discovery follows connections to other concepts isa isa Sentinel isa isa All stakeholders retain their custom vocabulary Ontologies to connect both worlds NDVI

17 DLR.de Chart 17 Questions, Comments,?

18 DLR.de Chart 18 Contact Marcus Paradies (team lead) Sirko Schindler