Some Uses of R at USDA NASS

Size: px
Start display at page:

Download "Some Uses of R at USDA NASS"

Transcription

1 Some Uses of R at USDA NASS Nathan B. Cruze nathan.cruze@nass.usda.gov United States Department of Agriculture National Agricultural Statistics Service (NASS) EPA R Users Group Workshop Washington, DC September 11, providing timely, accurate, and useful statistics in service to U.S. agriculture. 1

2 Me and My Agency About me Mathematical statistician Research and Development Division, Sampling and Estimation Research Section Day-to-day applied research activities often begin with R About USDA NASS Conducts Census of Agriculture every 5 years Conducts over 400 surveys annually Primarily a SAS shop Some Uses of R at USDA NASS 2

3 Project I: Iterative Sequential Regression (ISR) NASS conducts Agricultural Resource Management Survey (ARMS) in collaboration with USDA s Economic Research Service Main Idea: Proposed ISR imputation methodology improves distributional characteristics and helps preserve relationships in the data Data augmentation and fully conditional specification Bayesian technique implemented through Gibbs sampling Development involved multiple stakeholders and extended parallel testing Some Uses of R at USDA NASS 3

4 Project I: Iterative Sequential Regression (ISR)

5 Project I: Iterative Sequential Regression (ISR) Development and testing resulted in the R scripts used in practice Conditional mean imputation module replaced with ISR module ISR machine imputation performed on a separate Linux box Interaction with existing infrastructure Written in R and C code File input-output with SAS interface ISR output subsequently loaded for editing R has been used to execute ISR imputation in production of ARMS Phase III estimates for the past three years Some Uses of R at USDA NASS 5

6 Basic Mapping: Corn for Grain USDA NASS Corn for Grain Estimation Program Non Spec Speculative Annual program: 41 states (production of grain) Speculative region: 10 states Yield forecasts: August-November, January (final)

7 Basic Mapping: Soybeans USDA NASS Soybean Estimation Program Non Spec Speculative Annual program: 31 states Speculative region: 11 states Yield forecasts: August-November, January (final)

8 Basic Mapping: Upland Cotton USDA NASS Upland Cotton Estimation Program Non Spec Speculative Annual program: 17 states Speculative region: 6 states Yield forecasts: August-January, May (final)

9 Basic Mapping: Winter Wheat USDA NASS Winter Wheat Estimation Program Non Spec Speculative Annual program: 42 states Speculative region: 10 states Yield forecasts: May-September (final)

10 Basic Mapping: Winter Wheat Harvest Progress 50 Winter Wheat Percent Harvested as of Week Percent Harvested lat long Some Uses of R at USDA NASS 10

11 Basic Mapping: Winter Wheat Harvest Progress 50 Winter Wheat Percent Harvested as of Week Percent Harvested lat long Some Uses of R at USDA NASS 11

12 Basic Mapping: Winter Wheat Harvest Progress 50 Winter Wheat Percent Harvested as of Week Percent Harvested lat long Some Uses of R at USDA NASS 12

13 Basic Mapping: Winter Wheat Harvest Progress 50 Winter Wheat Percent Harvested as of Week Percent Harvested lat long Some Uses of R at USDA NASS 13

14 Basic Mapping: Winter Wheat Harvest Progress 50 Winter Wheat Percent Harvested as of Week Percent Harvested lat long Some Uses of R at USDA NASS 14

15 Project II: Crop Yield Forecasting Official NASS crop yield forecasts in Crop Production Report Posterior Density of Yield Forecast for State 2 (June 2012 Forecast) represent consensus of NASS Agricultural Statistics Board (ASB) OYS AYS OYS.adj AYS.adj Covariates Posterior Mean Main Idea: Develop Bayesian hierarchical models to produce one-number forecasts I Synthesize several survey estimates I Produce measures of uncertainty I Gibbs sampler used to obtain Monte Carlo estimates Yield (bushels/acre) I Currently we support speculative regions and member states 80 Posterior Density of Yield Forecast for State 4 (June 2012 Forecast) OYS AYS OYS.adj AYS.adj Covariates Posterior Mean Yield (bushels/acre) Posterior Density of Yield Forecast for State 9 (June 2012 Forecast) Some Uses of R at USDA NASS 15

16 Project II: Crop Yield Forecasting Use in Production: Research staff members have used R as a conduit to execute a Gibbs sampler R calls C code for corn and soybean yield models (2011) and winter wheat yield models (2015) R calls JAGS for new upland cotton yield models (2017) Modeled estimates are provided to Agricultural Statistics Board for their deliberations Crop Production Report Survey and Publication Timeline for Winter Wheat Crop Production Report Crop Production Report Crop Production Report Small Grains Summary OYS OYS OYS OYS OYS OYS AYS AYS AYS AYS AYS APS APS May Jun Jul Aug Sep Oct Month Some Uses of R at USDA NASS 16

17 Project II: Crop Yield Forecasting Visualizing the sequence of model-based forecasts versus NASS official statistics Year 2012 Comparisons: Published Yield and Model based Yield Indications State 1 State 2 State 3 State 4 State 5 State 6 State 7 State 8 State 9 State 10 Region Yield Source ASB Model 95% CI.u 95% CI.l May June July Aug Sept May June July Aug Sept May June July Aug Sept May June July Aug Sept May June July Aug Sept May June July Aug Sept Month May June July Aug Sept May June July Aug Sept May June July Aug Sept May June July Aug Sept May June July Aug Sept Some Uses of R at USDA NASS 17

18 Project II: Crop Yield Forecasting Visualizing a model-based rule-of-thumb for the combination of several disparate estimates 1.00 May June July August September Weight Source AYS Covariates OYS Sept. APS 0.00 CO IL KS MO MT NE OH OK TX WA SPEC CO IL KS MO MT NE OH OK TX WA SPEC CO IL KS MO MT NE OH OK TX WA SPEC CO IL KS MO MT NE OH OK TX WA SPEC CO IL KS MO MT NE OH OK TX WA SPEC Unit Some Uses of R at USDA NASS 18

19 Project III: Integer Calibration for Dual System Estimation NASS adopted dual system estimation (DSE) approach in 2012 Census of Agriculture applied DSE in other surveys Main Ideas: DSE weights help adjust for undercoverage, nonresponse, misclassification Desire integer weights consistent totals Desire estimates that satisfy many calibration targets Integer Calibration (INCA) improves rounding of weights and satisfaction of multiple targets Some Uses of R at USDA NASS 19

20 Project III: Integer Calibration for Dual System Estimation Previous approach: INCA: INCA was developed in R and the INCA package exists on CRAN. DSE with INCA was executed in R for the the 2015 Local Foods Survey. Some Uses of R at USDA NASS 20

21 Project III: Integer Calibration for Dual System Estimation INCA weights are closer to (more highly correlated with) original DSE weights Some Uses of R at USDA NASS 21

22 Project III: Integer Calibration for Dual System Estimation INCA weights ensure that estimated totals miss fewer calibration targets Some Uses of R at USDA NASS 22

23 Additional Production and Research Projects in R R in Production 1. Small area models for cash rental rates (2013) 2. Quarterly model-based estimates of cattle inventory 3. Simulated annealing for substratification of primary sampling units in June Area Survey R in Research and Development 1. Quarterly model-based estimates of hog inventory 2. Small area crop acreage, yield, and production estimates Some Uses of R at USDA NASS 23

24 References Cruze, N. B. (2016). A Bayesian hierarchical model for combining several crop yield indications. In JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association. Miller, D., Dau, A., and Lisic, J. (2015). Comparison of modern imputation methodologies on complex data from agricultural operations. In Proceedings of the 2015 FCSM Research Conference [online, accessed 10-sept-2017]. _Presentations_and_Conferences/reports/conferences/FCSM/D1_ Miller_2015FCSM.pdf. Nandram, B., Berg, E., and Barboza, W. (2014). A hierarchical Bayesian model for forecasting state-level corn yield. Environmental and Ecological Statistics, 21(3): Sartore, L., Toppin, K., and Spiegelman, C. (2016). Integer programming for calibration. In Presentations of 2016 Federal CASIC Workshops [online, accessed 10-sept-2017]. Wang, J. C., Holan, S. H., Nandram, B., Barboza, W., Toto, C., and Anderson, E. (2012). A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental Statistics, 17(1):