9/26/2017 1 9/26/2017 1 Big spatio-temporal* data analytics Hendrik F. Hamann IBM T.J. Watson Research Center
Outline IBM Research Technology Trends Big data platform for spatio-temporal analytics (PAIRS) Applications Seasonal weather forecasting Global irrigation forecasting Land use recognition Super resolution estimations Industry activity monitoring Archeological discovery using LIDAR data..
9/26/2017 3 Research is IBM s Innovation Engine IBM Employee Population ~400K Total ~250K Technical IBM Research 3K employees <1% Innovative 50% of external honors 46% of IBM Fellows 25% of all patents filed 26% of new academy members Recognized 5 Nobel Laureates 13 National Medals 6 Turing Awards 80 Members, National Academies 11 Inductees, National Inventors Hall of Fame
IBM Research - 70 year Record of Innovation 9/26/2017 4
9/26/2017 5 IBM Research is world-wide T.J Watson Austin Ireland Haifa India China Almaden Brazil Zurich Africa Japan Australia
The Digitization of the Physical World is accelerating Internet of Things Trend Source: BI Intelligence Estimates (2014) & IDC (2012) Remote Sensing & Model Data o Numbers of satellites are growing, e.g. Geostationary satellites (@15 min, 1 km resolution, >20 bands) High resolution satellites (biweekly, <1 m resolution, >20 bands) Nano-satellites o Drones, LIDAR, o Detailed model data for weather, climate Increase in global scientific climate data J T Overpeck et al. Science 2011;331:700-702 o Internet-of-things generated data grows ~ 4x faster (currently @ 20EBytes/month) than social and computer generated data 9/26/2017 6
9/26/2017 7 Data is transforming very industry Industry Past Selling a Product Future - Service Health Diabetes pumps Diabetes care Consumer Produce Nutrition C&P Chemicals Chemistry Agriculture Fertilizer Fertilization IT Industry Computers Computation Leveraging big data, companies are changing from product-based enterprises to service-based ones.
(Big) Data gets physical 9/26/2017 8
9/26/2017 9 Spatio-temporal data includes IoT and geospatial This presentation 95% of todays technologies Vector Spatio-temporal Data Raster IoT data Geo-spatial Data Tiny data - usually on premise Mega big data - cloud is a must
Search-ability Internet Business transaction Social networks 9/26/2017 10 9/26/2017 10 Spatio-temporal data is not discoverable or searchable 45 billion web pages have been indexed allowing to search and discover them in ~ 0.5 second Perhaps one of the last frontiers of digital discovery is the area of spatio-temporal data Spatiotemporal Domain
9/26/2017 11 9/26/2017 11 Massive spatio-temporal data sets are generated every day Takes ~22 hours to move from disk to processor s memory
Spatio-temporal context is be linked Everything in the physical world can to exploit in space andkey time but hasthe todata be integrated 9/26/2017 IBM Confidential 2017 IBM Corporation 12
9/26/2017 13 How do we analyze spatio-temporal data today? TODAY Each data set is kept at a different place in different formats, projections, units etc. Exabytes stored in billions of individual scenes, or files typically on tape. Analysts has to order scenes from each source: Download, assemble, re-sample, re-project, align, classify scenes, etc.. Data is moved to the application Time to value is limited by data curation (90%). THIS WORK (PAIRS*) Large scale, pre-processed big data store with prealigned layers and data sets Common formats and projections Spatial & temporal joins A global reference system Accessible as a data service Analyze without moving data Access to new layers of analytics. Time to value reduced by orders of magnitudes. LandSat scenes/tiles from different satellite passes Pre-aligned data sets / layers Global Mastergrid with matching resolution layers * Physical Analytics Integrated Repository and Services
9/26/2017 14 What are we working on? PAIRS TODAY Each data set is kept at a different place in different formats, projections, units etc. Exabytes stored in billions of individual scenes, or files typically on tape. Analysts has to order scenes from each source: Download, assemble, re-sample, re-project, align, classify scenes, etc.. Data is moved to the application Time to value is limited by data curation (90%). THIS WORK (PAIRS*) Large scale, pre-processed big data store with prealigned layers and data sets Common formats and projections Spatial & temporal joins A global reference system Accessible as a data service Analyze without moving data Access to new layers of analytics. Time to value reduced by orders of magnitudes. LandSat scenes/tiles from different satellite passes Pre-aligned data sets / layers Global Mastergrid with matching resolution layers * Physical Analytics Integrated Repository and Services
9/26/2017 15 9/26/2017 15 PAIRS*: A big data platform for scalable spatio-temporal data and analytics Data bus feeds in (near) real-time open spatio-temporal data into PAIRS (Physical Analytics Integrated Repository and Services) Full data curation process (filtering, classifiying, aligning, resampling, reprojecting etc) at ingestion. Large-scale Hadoop / Hbase system for efficient distributed data store and processing System allows complex queries, e.g. Find all real estate in California with elevation gradient and high rain fall and certain soil type. Access to new layers of analytics Irrigation forecasts Improved weather forecasts Curated data and analytics accessible as a Service via an integration layer REST APIs to run queries Basic web interface to run queries
9/26/2017 16 9/26/2017 16 PAIRS has a global spatial and temporal reference system Resolution Δθ, Δφ [degree] Δy [km] Δx[km](φ=0 O ) Δx [km] (φ=40 O ) 1 0.000008 0.00089 0.00067 2 0.000016 0.00178 0.00134 3 0.000032 0.00356 0.00268 4 0.000064 0.00712 0.00536 5 0.000128 0.01424 0.01072 6 0.000256 0.02848 0.02144 26 268.43546 29863.444 22481.469 Key is a combination of spatial and temporal information Global grid cell resolution spans from 0.8 m grid cell to 260 km grid cell All resolution layer are nested and aligned at lower left corner or cell grid
9/26/2017 17 9/26/2017 17 PAIRS scales to big data and complex analysis PAIRS PAIRS queries (almost) independent of data size Conventional systems require more time of larger data sizes
9/26/2017 18 9/26/2017 18 Anyone can upload and contextualize its own data Example: Drone images from Watson Research Center Example: Curated images in PAIRS from Watson Research Center User enabled data ingestion/ curation now online (via REST API) aligns automatically user data with other PAIRS data and analytics makes user data searchable along with other PAIRS data and analytics
PAIRS supports multi-dimensional data Position of a drone camera and images used for reconstruction 3D Reconstruction of house in Westchester County, USA
How to access PAIRS? Stay tuned * Physical Analytics Integrated Repository and Services IBM Confidential 20
9/26/2017 21 9/26/2017 21 PAIRS enabled analytics I: Improved global seasonal weather forecasting PAIRS data layers: Multiple seasonal forecast models > 5000 weather station data, Re-analysis Analytics: Machine-learnt, situation dependent, multi-model blending using historical forecasts and weather data New analytics layer: Improved seasonal forecasts Hurricane Ike path forecasts from 8 different weather models* 1800UTC 9/9/08 Seasonal Models Spatial Res & Coverage Temporal Resolution Forecasting Horizon Ensemble Forecast NOAA CFS v2 ECMWF ENS Extended ECMWF SEAS EUROSIP Beijing CC CGCM Tokyo CC AGCM 0.5 Deg 0.4 Deg 0.75 Deg 2.5 Deg 2.5 deg 2.5 deg global global global global global global 6 Hourly 6 Hourly 6 Hourly 6 Hourly Monthly Weekly 0 to 6 months 0 to 45 days 0 to 7 months 0 to 6 month 0 to 11 months 0 to 3 months 4 Members 51 Members 51 Members 41 Members Ensemble Mean Ensemble Mean *M.J. Brennan, S.J. Majumdar, Weather and Forecasting 26, 848 (2011).
Historically, NWP* model accuracy improvements have been (only) ~6% per** decade * Numerical weather prediction ** Peter Bauer, Alan Thorpe & Gilbert Brunet doi:10.1038/nature14956
Creating a unique new analytics layer via machinelearnt, situation dependent multi-model blending 9/26/2017 23 9/26/2017 23
Wind Speed (m/s) Machine-learnt, situation-dependent multi-model blending Which model was more accurate, when, where, under what weather situation? o Apply functional analysis of variance to understand 0 th, 1 st,2 nd,3 rd,.order errors 11 2 nd Order Error NOAA CFSv2 Model 1, Temperature Forecast 30 days ahead @ Bondville, Il Error (K) o Model accuracy can depend strongly on weather situation category. 1 2 o Weather situation is determined using a set of parameters including forecasted ones on which model error depends on strongly. 3 4 0 Solar Irradiance (W/m 2 ) 9/26/2017 24 717
9/26/2017 25 30 Day-ahead temperature forecast example PAIRS
More than 30% error reduction for 30 day-ahead forecasting Developed a gridded forecast Validated by 7 Weather stations across CONUS 9/26/2017 IBM Confidential PAIRS 26
PAIRS enabled analytics II: Global Evapo-transpiration forecasting PAIRS data layers: Weather forecast data (radiation, wind, humidity, temperature, ) Soil, elevation, satellite (IR, NDVI) On farm measurements Weather stations Analytics: Evapo-transpiration modeling New analytics layer: Global irrigation forecasts Vegetation index Rn H Energy Balance: ET Rn H G ET G ET-Evapo transpiration R n -Net radiation Flux (W/m 2 ) H-Sensible heat Flux (W/m 2 ) G-Soil heat Flux (W/m 2 ) Net Radiation: R 1 R n L L s out L in incoming long wave radiation L out outgoing long wave radiation R s solar radiation emissivity a surface albedo in Sensible Heat Flux H c a bt / r r air c p a,b T s r ah air p s ah density specific heat specific parameters surface temperature transfer resistance Soil Heat Flux G T a b 1 cndvi S 4 R n NDVI vegetation index a,b,c specific parameters 9/26/2017 27
PAIRS layer New PAIRS layer 9/26/2017 28 Creating a unique new analytics layer for optimal irrigation t0 t1 t2 Temp Humidity Radiance Wind.. Evapo-transpiration Model t0 t1 t2
9/26/2017 29 9/26/2017 29 Highly accurate global evapo-transpiration forecasting 2 year validation of evapo-transpiration forecasts across 130 sites in California Evapo-transpiration forecasts for China computed and delivered by PAIRS Date of forecast
9/26/2017 30 9/26/2017 30 Yield maps after 2 years of precision irrigation show significant improvements 26% more yield in the precision area compared to conventional one 11 % higher water efficiency 50 % higher uniformity Improved quality index (Brix value) Technology provided to all of Gallo s vineyards provides $120M of annual value (100,000 acres x $1.0k) = $100M )
9/26/2017 31 PAIRS enabled analytics III: Land use recognition PAIRS data layers: Multiple satellites (MODIS, Sentinel, Landsat) Weather information, soil data Historical crop surveys Analytics: Deep learning model New analytics layer: Crop acreage forecasts Historical crop type MODIS satellite Vegetation Index Data Tmax Anomaly Soil (clay %) Day of the year
IBM Confidential 32
9/26/2017 33 9/26/2017 33 PAIRS enabled analytics IV: Super-resolution PAIRS data layers: Various satellite observations at different spatial and temporal resolutions Contextual information: weather, land-use Analytics: Machine-learnt kernel New PAIRS layer: Super-resolution observations
9/26/2017 34 9/26/2017 34 Resolution enhancement analytics High Low PAIRS resolution enhanced observation resolution observation in Aug Dec Oct - 2015 2016 in shows Aug Dec - 2015 2016. major for Two months construction to Two before comparison. learn months the intelligent high before resolution the kernel high satellite function res satellite revisits revisits, it shows little change
9/26/2017 35 9/26/2017 35 PAIRS data layers: Weather data, atmospheric conditions Satellites Analytics: Radiative Transfer Model Energy Balance Model New PAIRS layer(s): Industry Activity PAIRS enabled analytics V: Industrial Productivity Monitoring
Industrial Productivity Monitoring Actual Monthly Production (10,000 ton) 33 32 31 30 29 28 Satellite Measured Heat Generation (W/m^2) 400 450 500 550 600 650 700 Monthly production prediction March to Aug 2014 Donghua Plant: 4M ton of iron annually Remote sensing provides accurate 27 estimate of monthly production. Result validated using after-the-fact 26 published production data. 26 27 28 29 30 31 32 33 Predicted Monthly Production (10,000 ton) 9/26/2017 36 9/26/2017 36
9/26/2017 37 9/26/2017 37 PAIRS enabled analytics VI: Feature recognition with deep learning PAIRS data layers: Satellite data (Multi-spectral) LIDAR data Cognitive computing: Deep learning New PAIRS layer(s): Livestock density
Identification of Ancient Structures Structure identified Matching manual search of expert. Error Rate: Missing ~5% 9/26/2017 38 25
Muchas gracias 9/26/2017 39 9/26/2017 39