Kefei Chen, Rebecca O'Leary & Fiona Evans Department of Agriculture and Food, WA

Size: px
Start display at page:

Download "Kefei Chen, Rebecca O'Leary & Fiona Evans Department of Agriculture and Food, WA"

Transcription

1 Yield prediction using real paddock data and generalised additive modelling Kefei Chen, Rebecca O'Leary & Fiona Evans Department of Agriculture and Food, WA

2 Wheat yield prediction Yield prediction is one of the major determinants of many management decisions for crop production. Farmers and advisors are in favour of user-friendly methods for predicting crop yield. Currently, many models have been proposed in relation to crop yield prediction. However, they can be complex and difficult to parameterise. The dynamics of crop-soil-weather interactions have been proposed as a primary principle for comprehensive yield prediction models. We aim to produce a simple and parsimonious but still reasonably accurate model that can be used by farmers to predict wheat yield on a paddock level. 2

3 Modelling approach Simulation model Based on experiments and knowledge of crop growth and development. Most of the important processes within the simulation models are described by empirical functions. Employ a large set of input variables. Statistical model Derived from large amounts of yield data (e.g. variety, management and soil conditions). The statistical models can be very difficult to calibrate because of large number of uncertain parameters. 3

4 Yield Data CVT (Crop Variety Trial) data 14,053 observations from the CVT data set over The CVT data involve 6 varieties and 457 unique locations. NVT (National Variety Trial) data 13,226 observations from the NVT data set in The NVT data involve 108 varieties and 380 unique locations. Focus paddocks data 428 observations from the 164 different Focus Paddocks. 4

5 Weather Data Daily weather data for each nearest weather station were extracted from the Patched Point Database (PPD). The amount of water available for crop use in any growing season is captured by the variable, which is calculated using the formula: Wavail Wavail = R F GS R F summer where and RF GS RF summer represents growing season rainfall (April-October), represents summer rainfall (January-March). 5

6 Map of the paddocks and PPD stations Red: Paddocks White: Closest PPD stations The map was interpolated of average rainfall over 30 yrs 6

7 Soil Data The dominant soil type for each intersection of latitude and longitude was estimated from a GIS system at DAFWA. Firstly, the agricultural soil groups were mapped throught the soil Oracle database. Then, the soil classes were customized to general functional types including: clays: clay & shallow loamy duplex; self-mulching clays sandy duplex: shallow sandy duplex; Alkaline shallow duplex; deep loamy duplex & earth gravel: Gravel; Stony loamy: Shallow loam; Deep loamy duplex & earth; Calcareous loamy earth; Sandy earth sandy: Shallow sand; Coloured sand; Pale sand wet: Saline wet 7

8 Soil map in WA The map was provided by Dennis van Gool and Karen Holmes at GIS/DAFWA. 8

9 Other Data In addition, some other variables of interest for the yield prediction model were also obtained including: variety variety maturity type (short, short-mid, mid, mid-long, long) maturity length (from germination to harvest) germination timing first two month rainfall (start from germination) 9

10 Statistical methods Generalised additive modelling (GAM) GAM is an extension to the linear or generalised linear model (GLM) which replaces parametric regression terms y i = ( β j. x ij ) with non-parametric functions y i = f j ( x ij ) The aim of non-parametric regression is to approximate the contribution of covariates to the response variable without making any assumptions about the underlying processes or trends, which may be highly complex. GAM in the 'mgcv' package in R 10

11 Model selection and checking Forward stepwise approach Model validation: 5-fold-cross-validation Model checking with deviance residuals Model performance statistics - AIC - BIC - RMSE (Akaike Information Criterion) (Bayesian Information Criterion) (Root Mean Square Error) - R 2 (Coefficient of determination) - r (Correlation coefficient) 11

12 Bayesian approach To make the model easy to update with new paddock data, the best GAM model was converted from the frequentist to Bayesian approach. 'jagam' function in 'mgcv' package converts GAMs model into BUGS code. A variety of diagnostic statistics along with associated plots were used to assess the convergence of the MCMC simulations: Geweke's Z-score Gelman and Rubin's convergence diagnostic Heidelberger and Welch's convergence diagnostic The probability of predicted yields at nine deciles were computed from the Bayesian model. 12

13 Comparison among various datasets D00_14: ; D04_14: 2004_2014 D75_99: ; D75_14: 1975_

14 Variation in yield and Wavail for NVT data 14

15 Model optimisation for s(wavail) & s(lat, long) 15

16 Forward stepwise model selection MODEL MODEL FORMULA LEVEL CORR.TST RMSE.TST CORR.TRN RMSE.TRN AIC BIC 1 lm(yield ~ 1) Null NA lm(yield ~ Wavail) gamm(yield~s(wavail, k = j) gamm(yield~s(wavail, k = 3) + s(lat, longi, k = j) 79 gamm(yield~s(wavail, k = 3) + s(lat, longi, k = 15) + soils 83 gamm(yield~s(wavail, k = 3) + s(lat, longi, k = 15) + soils + mattype 84 gamm(yield~s(wavail, k = 3) + s(lat, longi, k = 15) + soils + mattype + s(sumrf2m, k = 2)

17 Smooth plots Smooth plot for s(wavail) & smooth contour plot for s(lat, long) 17

18 Comparison between GAM and F&S F GS F&S potential yield: PY = (R + I) WUE S A 18

19 Summary Variations in actual wheat yields could be described to a large extend by the variables of rainfall, latitude/longitude and soil type. In general, waterlogging effect was observed in the WA wheatbelt, especially for clay soils. Comparing the predictions of potential and actual yields between F&S and GAM approaches, it supports that the yield potential is generally greater than the predicted actual yield, and different intercepts should be applied in different environments for estimation of yield potential. Interaction between soil and water available was observed. However, we do not have enough data to fit for different soil types. 19

20 Acknowledgments This project is made possible by Royalties for Regions. Thanks to many colleagues from the Department of Agriculture and Food, Western Australia (DAFWA) and grain growers for providing various data and help. Dennis Van Gool, Karen Holmes, Ted Griffin and Tony Leeming from the GIS soil group at DAFWA for providing the soil type data. Alain Baillard, Andrew Van Burgel, Art Diggle, Bas Roemermann, Bill Bowden, Brenda Shackley, Christine Zaicou, David Ferris, David Stephen, Dean Diepeveen, Dion Nicol, Ian Foster, Imma Farre Codina, Jeremy Lemon, Karyn Reeves, Mario D'Antuono and Tim Maling at DAFWA for helpful discussions and suggestions. 20

21 References Akaike H (1973) Information theory and an extension of the maximum likelihood principle. 2nd Intl Symposium on Info Theory French RJ, Schultz JE (1984) Water use efficiency of wheat in a Mediterranean-type environment. Aust J Agric Res 35, Hastie TJ, Tibshirani RJ (1990) 'Generalized additive models.' McCown R, et al., (1996) APSIM. Agricultural Systems 50, Oliver YM, et al., (2009) Improving estimates of water-limited yield of wheat by accounting for soil type and within-season rainfall. Crop and Pasture Sci 60, Ritchie J, et al., (1988) CERES-Wheat. Univ. of Tex. Press, Austin Wood S (2006) 'Generalized additive models: an introduction with R.' 21

22 Thank you for your attention! 22

23 Crop-Soil-Water interaction 23

24 Model checking with residuals 24