ESA LTDP PRODUCT FEATURE EXTRACTION AND ANALYSIS

Size: px
Start display at page:

Download "ESA LTDP PRODUCT FEATURE EXTRACTION AND ANALYSIS"

Transcription

1 ESA LTDP PRODUCT FEATURE EXTRACTION AND ANALYSIS Project Overview Norman Fomferra (BC) Project Kick-Off, ESA ESRIN,

2 Meeting Agenda Time Dur. Item By 09:40 Project Overview 09:40 10 PFA objectives & Long Term Data Preservation programme Pier Giorgio 09:50 30 Project overview, technical & management PPT Norman 10:20 15 Achieve common understanding of user needs round table All 10:35 10 Contractual matters Berenice 10:45 15 Coffee break 11:00 Project Activities by WP Leaders 11:00 30 Scenarios, Algorithms PPT Lorenzo 11:30 10 Dataset Collection PPT Luis 11:40 10 Architectural Design PPT Luis 11:50 20 Algorithm Implementation PPT Ralf 12:10 20 Algorithm Evaluation PPT Francesca 12:30 10 Roadmap Definition PPT Luis 12:40 20 AoB All 13:00 End of official KO 13:00 40 Lunch 13:40 80 Technical discussions & WP planning details round table Team 15:00 End of meeting 2

3 Presentation Outline Project Objectives and Challenges Cornerstones of our Approach Team, Deliverables, Schedule Seed Questions for Discussion 3

4 PFA ITT Objectives Help users exploit existing EO data archives By extracting features from EO data By using these features to perform EO data queries Study, assess and select effective feature extraction and analysis algorithms Elaborate and implement a number of utilisation scenarios to better exploit EO data archives Develop concept enriched EO data products Develop a demonstration system as prototype for later integration into existing ESA EO infrastructure, e.g. PDGS, G-POD, ngeo 4

5 PFA ITT Basic Data Project duration: 24 months Overall budget: 300 k Six tasks outputting 35 deliverables 1. Project Management 2. Analysis and Review 3. Dataset and Architecture Definition 4. Algorithm Implementation 5. Algorithm Evaluation and Production 6. Roadmap Definition 5

6 PFA Context High-throughput EO data processing cluster Background bulk-processing and number-crunching for (primitive) feature extraction Feature Extraction Feature Database Aux Feature Analysis and Image Retrieval EO Image Data Archives SAR + Optical EO Data reference Aux-backed, enhanced EO catalogue service Interactive, highly responsive, user interfaces for CBIR and Image Information Mining 6

7 Response PFA Developer Context develop, configure, test, validate, Feature Extraction Aux Feature Analysis and Image Retrieval Request and improve algorithms SAR + Optical EO Data Images Features 7

8 Response PFA End-User Context reference Aux Feature Analysis and Image Retrieval Request SAR + Optical EO Data Images Features 8

9 Proposed Approach (1) Involve existing user communities by extending their use cases provide new use cases based on new capabilities State-of-the-art image information mining methods CBIR and classification algorithms Taking into account sensor synergies (data fusion) and methods developed in other domains (medical, microscopic, astronomic) Generic utilisation scenarios for representative End-user applications and EO datasets (SAR and optical) in use Primary datasets from Supersites, G-POD collections and Envisat MERIS full-mission hosted at BC premises 9

10 Proposed Approach (2) Demonstration system implementation Use of NEST (SAR) and BEAM (Optical) Toolboxes BEAM Development Platform BEAM & NEST plug-ins Calvalus EO Data Bulk Processing Technology, Apache Hadoop Optional 3 rd party libraries, e.g. RapidMiner, WEKA, LIRe Establish link to related ESA projects and services: ngeo, EOLib, G-POD Agile software development approach, short incremental cycles with frequent software releases 10

11 User and User Communities Initial Communities: Community Users Interest / Phenomena EO data Forestry Management Canadian Forestry Coastal Management Natural Disaster Management GMES MarCoast network of Water Quality Service Providers, ESA CoastColour users JRC, Global Security and Crisis Management Unit Land cover classification Land use change detection Algal bloom detection, classification Bloom events and trends Various SAR and optical, high resolution Optical, medium resolution SAR and optical, high resolution Establish link to communities which are already involved in other ESA image data mining studies (RSS, SSE) 11

12 Community Involvement Selected communities shall become multipliers of the PFA ideas Scenario must fulfil user requirements (users must be made happy) Scenario must be representative for other users Communication channels must be established and used Community representatives help us define Requirements Algorithms Dataset selection Definition of scenarios Test and validation 12

13 Communication Channels Phone consultations and interviews Questionnaires Requirements: what currently constraints data selection or makes it difficult? Scenarios: what currently constraints applications? Algorithm and tools Test cases Datasets preferred Benefits and recommendations (project end) Provide and solicit feedback on project documents Community Development Plan Demonstration and Validation Scenarios Technical Note Participation in project meetings when possible Provide Project Progress Reports to keep engagement in project Participation in software iteration cycles BEAM & NEST tools 13

14 Project Web Page RSS Join and Share Area Wiki Page Content General information about the objective of the project Information about the algorithms, scenarios and applications Links to related projects Contact persons All public documents and reports generated by the project All binary software packages, e.g. BEAM and NEST modules Source code and generated API documentation Issue tracker 14

15 Feature Extraction and Scenarios Feature extraction part aiming at effectively derive sets of primitive features: SAR & optical data, time series of SAR & optical data Applicable to full mission datasets Applicable to various types of EO data Backed by representative end-user applications Scenarios part aiming at exploiting the extracted features. Initially suggested scenarios: Content based image retrieval (active learning + SVM) Content based time-series retrieval (CBIR for trends and changes) Unsupervised classification (kernel k-means clustering) Optional: Supervised classification (kernel k-means + SVM) 15

16 Architecture: System Components 16

17 Feature Extraction Requirements High data throughput, number crunching Pluggable algorithms, workflows, processing chains Fast, full-mission dataset access Processing infrastructure: cluster, bulk job management, massive parallelisation Fast and effective feature extractors Enhanced aux database for effective primitive features persistence Enriched EO Data Products Ultimate goal is operational use RT, NRT, reprocessing reextraction Sentinel 1, 2,3 and Proba-V missions Consider on-the-fly processing of higher level products (L2, L3) from (L0, L1) 17

18 Feature Analysis & CBIR Requirements Highly responsive service Enhanced EO data catalogue user interface Standard spatial, temporal, metadata search Enhanced aux database enabled search Highly interactive (active learning, supervised classification) Fast access to enhanced aux database Download satellite data via standard distribution mechanisms Fast, pluggable classificators Optional: machine-to-machine interface (OGC WPS, WCS) 18

19 Toolboxes & Development Platform BEAM Java API: Rapid prototyping of efficient raster data processors Abstraction from raster data formats, generic data product model Implementation of (primitive) feature extraction operators, feature analysis and classification PFA tools Existing pre-processing chains for optical and SAR PFA tools will be open-source plugins for NEST and BEAM. Used by the team and also by the user community 19

20 Calvalus System Originally developed by BC in an ESA LET SME Study Level bulk processing Level 2 match-ups with in-situ data Level 3 Times series On-the-fly proc. from Level 1b Apache Hadoop for EO Data Now in operational use 72 x 4-core CPUs ~500 TB disk space Full mission Envisat MERIS L1b Land Cover CCI, Ocean Colour CCI, CoastColour Will easily run on G-POD 20

21 Calvalus = Hadoop for EO Data MapReduce Programming model allows for very efficient spatial & temporal aggregation Time series analyses Primitive feature extraction Classification Processors developed with the BEAM Graph Processing Framework perfectly integrate with Hadoop s Java MapReduce API Streaming of data Processing chains/graphs On-the-fly processing Distributed File System Network I/O bottleneck Data-local processing 21

22 Workpackage Breakdown WP10: Project Management N. Fomferra (BC) WP20: Analysis and Review L. Bruzzone (UniTN) WP30: Dataset Collection and Architectural Design L. Veci (Array) WP40: Algorithm Implementation R. Quast (BC) WP50: Algorithm Evaluation L. Bruzzone (UniTN) WP110: Coordinate Project at Array WP21: Preselect Methods and Algorithms WP31: Define and collect Datasets WP41: Implement Algorithms and Tools WP51: Develop Software Validation Procedure WP120: Coordinate Project at UniTN WP22: Prepare Requirements Baseline WP32: Develop Community Involvement Plan WP42: Implement Enrichment Database WP52: Perform End-to-end Demonstrations WP23: Prepare Technical Specification WP33: Design Architecture WP43: Implement Service Infrastr. WP60: PFA Roadmap Definition R. Iha (Array) 22

23 Study Logic 23

24 Team Roles PM Optical tools dev. Service infrastructure & provision Project Lead N. Fomferra (dep. Ralf Quast) WP1: Project Management WP4: Algorithm Implementation Science Lead L. Bruzzone (dep. F. Bovolo) WP2: Analysis and Review WP5: Algorithm Evaluation Dataset coll. Architecture SAR tools dev. Roadmap Software Lead Luis Veci (dep. Jun Lu) WP3: Dataset Collection & Architectural Design WP6: Roadmap Definition Analysis & Review Algorithms Validation & Evaluation 24

25 Project Management Analysis and Review Dataset coll. / Architect. Design Algorithm Implementation Algorithm Evaluation Roadmap definition Team Key Personnel Name Responsibility Norman Fomferra (BC) X X X X Ralf Quast (BC) X X X X Luis Veci (Array) X X X X Jun Lu (Array) X X X X Rajesh Iha (Array) X X Lorenzo Bruzzone (UoT) X X X X Francesca Bovolo (UoT) X X X X Begüm Demir (UoT) X X 25

26 Project Schedule 26

27 Schedule Workpackage Start End Dur. 1 Project Management Jan Analysis and Review Preselect methods, RB, TS 3 Dataset Collection and Arch. Design Data, CIP, archicture 4 Algorithm Implementation Tools, processing chains, database, services 5 Algorithm Evaluation and Production SVP, end-to-end demonstrations Jan 2013 May 2013 Nov 2013 Apr Roadmap Definition Nov 2014 Jan 2015 Sep 2013 Jan 2014 Sep 2014 Dec 2014 Nov M 8 M 8 M 10 M 8 M 12 M 27

28 Meeting Plan Milestone Meeting Date Location KO: Kick-off Today ESRIN SRR: System Requirements Review May 2013 (KO+4) PDR: Preliminary Design Review Sep 2013 (KO+8) CDR: Critical Design Review Jan 2014 (KO+12) AR: Acceptance Review Nov 2014 (KO+22) FP: Final Presentation of Results Jan 2015 (KO+24) (WebEx) UoT, Italy ESRIN BC, Germany ESRIN Plus intermediate progress meetings every 2 months held via WebEx. 28

29 Deliverables KO SRR PDR CDR AR FP PMP v1 v2 v3 v4 v5 Web site v1 v2 v3 Final Activity Report v1 v2 Executive Summary v1 v2 RB v1 v2 Emerging Technologies TN v1 v2 TS v1 v2 DJF v1 v2 v3 DDF v1 v2 v3 Collection of selected datasets v1 v2 v3 Communities and Datasets TN v1 v2 v3 Roadmap v1 v2 Software package v0 v0 v1 Software Validation Procedure and Report Demonstration and Validation Scenarios TN v1 v1 29

30 Next Steps WP1: Project Management Update PMP from draft to v1 Setup web information pages v1 Setup issue tracker and source repository Establish team collaboration, let members learn from each other WP2: Analysis and Review Compile candidate user communities What data are they looking for What are their specific problems when searching for data Map their requirements to scenarios Start state-of-the-art methods and algorithm review for candidate scenarios Study related projects, establish link Start the RB 30