Verifying Weather Scale Applications: Land From Evaluation to Benchmarking

Size: px
Start display at page:

Download "Verifying Weather Scale Applications: Land From Evaluation to Benchmarking"

Transcription

1 Verifying Weather Scale Applications: Land From Evaluation to Benchmarking Christa D. Peters-Lidard 1, David M. Mocko 1,2, Sujay V. Kumar 1, Grey S. Nearing 3, Youlong Xia 4,5, Michael B. Ek 6 1 NASA/GSFC; 2 SAIC; 3 U. Alabama; 4 NOAA/NCEP/EMC; 5 IMSG; 6-NCAR

2 Outline 1. Evaluation NLDAS background Land Verification Toolkit NLDAS Testbed 2. Benchmarking in PLUMBER 3. Benchmarking NLDAS 2

3 Background: NLDAS-2 includes four separate land-surface models with output states and fluxes Mosaic (GSFC) Noah (NCEP) January 1979 to near realtime (~3.5-day lag); hourly and monthly available 1/8 th -degree (~12.5km) over CONUS domain (25-53 N; W) VIC (Princeton Univ. and Univ. of Washington) SAC (OHD) 19-Jul-16 3

4 The NLDAS Drought Monitor is updated daily, and is one of the datasets used for the weekly U.S. Drought Monitor Percentiles and anomalies of: precipitation, soil moisture, snow, evaporation, runoff, and streamflow (from river routing) Streamflow Soil Snow Percentile Moisture Water Anomaly Equiv. (mm) Anomaly (mm)

5 There are many NLDAS-2 evaluations NLDAS-2 Evaluations Water Budget: Xia et al. (2012a, JGR; 2016, JGR) Streamflow: Xia et al. (2012b, JGR) Soil Temperatures: Xia et al. (2013, JAMC) Soil Moisture: Xia et al. (2014, JoH) Evapotranspiration: Xia et al. (2014, HP); Matsui and Mocko (2014, book chapter), Kumar et al., 2018 Terrestrial Water Storage: Xia et al., (2017, JHM) Diurnal cycle of precipitation: Matsui et al. (2010, GRL) Forcing data issues: Mo et al. (2012, JHM); Ferguson and Mocko (2017; JHM) 5

6 The Land Verification Toolkit (LVT) computes published LSM metrics in the NLDAS Testbed using in situ, remote sensing and reanalysis Kumar, S.V., C.D. Peters-Lidard, J. Santanello, K. Harrison, Y. Liu, and M. Shaw, 2012: Land surface Verification Toolkit (LVT) - a generalized framework for land surface model evaluation, Geosci. Model Dev., 5, , doi: /gmd a

7 Soil Moisture evaluation Anomaly correlations of surface soil moisture for an 11-year period against quality-controlled USDA ARS & SCAN networks: Noah-3.6 and Noah-MP-3.6 versions perform better than the Noah-2.8 LSM, and generally perform amongst the best of all LSMs for this evaluation 7

8 Streamflow evaluation Comparisons to USGS streamflow gauges in small unregulated river basins (anomaly correlation and Nash-Sutcliffe Efficiency): Noah-3.6 is improved from Noah-3.3. Noah-MP-3.6 does very well over the Northwest and Northeast, but not as well as Noah-3.6 over the central U.S. and in the Southeast. 8

9 Snow depth evaluation Comparisons to GHCN in situ quality-controlled snow depth observations over CONUS (anomaly correlations and RMSE): Noah-3.6 has high anomaly correlation of snow depth, and low RMSE against the GHCN observations. The Noah-MP dynamic vegetation option has lower RMSE than the default option, but the anomaly correlation is also slightly lower. 9

10 Evapotranspiration Intercomparison 10 Kumar, S.V., T. Holmes, D.M. Mocko, S. Wang, and C.D. Peters-Lidard, 2018: Attribution of flux partitioning variations between land surface models over the continental U.S. Remote Sensing, 10(5), 751, doi: /rs

11 Evaluation vs. Benchmarking Protocol for the Analysis of Land Surface Models (PALS) Land Surface Model Benchmarking Evaluation Project (PLUMBER) DOI: A benchmark consists of 1) a specific reference value for 2) a particular performance metric that is computed against 3) a specific dataset. 11

12 Evaluation vs. Benchmarking 12

13 After decades of LSM development, PLUMBER showed that models beat P-M and bucket but not regressions 13

14 Information for benchmarking Shannon s mutual information function I(zz; ζ) 14

15 Benchmarking with this approach allows us to separate sources of information loss Nearing, Grey S., David M. Mocko, Christa D. Peters-Lidard, Sujay V. Kumar, and Youlong Xia, 2016: Benchmarking NLDAS-2 Soil Moisture and Evapotranspiration to Separate Uncertainty Contributions Journal of Hydrometeorology, 17:3, , DOI: 15

16 Summary and Conclusions 1. NLDAS Overview The NLDAS ensemble does a credible job of monitoring soil moisture, land surface fluxes and streamflow with applications for drought monitoring LVT supports further evaluation of new NLDAS models as well as the impact of satellite data assimilation 2. Benchmarking in PLUMBER PLUMBER showed that despite decades of LSM research, we still cannot beat simple regression models. However, this does not explain why or what elements of the model. Objective benchmarks are essential to measure progress. 3. Benchmarking NLDAS By quantifying information in the model and observations, we can track losses of information. For NLDAS models trying to model soil moisture, information is lost mostly to parameters, with some lost to forcings and physics For NLDAS models trying to model evapotranspiration, information is lost mostly to forcing data with some lost to parameters and forcings. 16

17 PLUMBER experimental design 1. Obtain observations of sensible and latent heat fluxes as well as required forcings at 20 sites worldwide 2. Compute empirical benchmarks at 20 sites using Swdown alone or with air temperature and relative humidity 3. Run LSM at site using available forcing data and standard parameter sets 4. For each site, compute Qh and Qe statistics for all of the LSMs and all empirical benchmarks. 5. Rank each LSM relative to all of the benchmarks, with the best performing sample element given a score of 1 and the worst given a score of Average rankings over all statistics and all sites to give an average ranking for both Qe and Qh separately. 17