School of Environment, University of Auckland, New Zealand

Size: px
Start display at page:

Download "School of Environment, University of Auckland, New Zealand"

Transcription

1 A solution to the problem of calibration of low-cost air quality measurement sensors in networks Georgia Miskell a,b, Jennifer A. Salmond b, and David E. Williams a,c,* a School of Chemical Sciences, University of Auckland, Private Bag 9209, Auckland 022, New Zealand b School of Environment, University of Auckland, New Zealand c MacDiarmid Institute for Advanced Materials and Nanotechnology, Wellington, New Zealand Supplementary Information S Modelling the effect of different proxies using simulated data S2 Site descriptions and locations for the nine co-located sensors S3 Overall mean and standard deviation of ozone concentration at different sites S4 Land-use correlations for ozone concentration in the Lower Fraser Valley S5 Reference site correlation and slope correction coefficients for different land-use categories S6 Stepping semi-blind calibration results for different locations than Fig 5 in text S7 Assessment of the reliability of exceedance determination using hourly-averaged data, and the effect of data truncation S-

2 S: Modelling the effect of different proxies using simulated data Lognormal distributions are used for this since this distribution, in general, gives a reasonable representation of air quality data 2,3. Two-parameter lognormal frequency distributions were calculated for arithmetic mean and arithmetic standard deviation similar to those for O3 variation in the Lower Fraser Valley (see Methods). A criterion for reliable data for indicative purposes, as defined above can be set: aa = ± 0.3; aa 0 = 0 ± 5 ppb, similar to Miskell et al 4. Figure S illustrates that, even if the proxy distribution is significantly different from the true data distribution, estimates will satisfy this criterion. Figure S: illustration of the method using lognormal distributions as a model. A: example of proxy and true distributions with different arithmetic mean and arithmetic standard deviation. The sensor data are a linear transformation of the true distribution, with arbitrary slope and offset. Inset: sensor corrected value predicting true data value. Full distribution: computation with all data; left truncated: computation with values above the arithmetic mean only; right truncated: computation with values below the arithmetic mean only, for both sensor data and proxy data. B: prediction slope and offset variation as a result of alteration of the arithmetic mean or standard deviation of the proxy, where the full data distribution up to concentration = 80 is used. (i): proxy standard deviation fixed and mean varied; (ii): proxy mean fixed and standard deviation varied. The full line shows the variation of the slope estimate and the dashed line that of the offset estimate. S-2

3 S2: Site descriptions and locations for the nine co-located sensors Figure S2: Location of the co-location sites Table S: Site descriptions for the co-location sites Details Co-located regulatory monitoring stations Abbotsford Langley Maple North Pitt Port Richmond Second Surrey Ridge Delta Meadows Moody South Narrows East Station number T33 T27 T30 T3 T20 T9 T7 T6 T5 Abbreviation abb lan map del pit por ric sec sur Latitude (N) Longitude (W) Elevation (m amsl) Population (within km) Land use in this res. res. res. res. agr. com. res. com. res. study Land use (% within km) Agr. NA Com. NA Res. NA Other NA Further site descriptions are here: ormation.pdf S-3

4 S3 Overall mean and standard deviation of ozone concentration at different sites Figure S3: top: overall means and bottom: standard deviations across the analyzer network in the Lower Fraser Valley, Vancouver, Canada during the study period, May-November 202. Locations with red names abbreviated are those that make up the sub-network. Different shapes represent the three broad land-use categories at km buffer. S4 Land-use correlations for ozone concentration in the Lower Fraser Valley Local influences on O3 concentration in the LFV that are significantly variable across the region are emissions of volatile organics and nitrogen oxides. Significant regional influences are the ocean background O3, transported over the urban center at the seaward entrance of the valley by the wind, and NO2 transported downwind from the urban center into the rural areas at the head of the valley 5,6. Classification by land use into urban, residential, and rural categories is useful because both the effects of local sources and regional transport are captured: consumption of O3 by emissions dominates in the upwind urban area and photochemical production from NO2 transported downwind dominates in the eastern residential and rural areas 5,6. Figure S4 shows mean ozone concentration (May November 202) correlated with longitude and land use. S-4

5 Figure S4: land use correlations using data from regulatory stations in the Lower Fraser Valley for May September 202. A: mean ozone correlated with longitude for those stations in the central band of latitude; B: the ratio of standard deviation of ozone concentration to mean ozone concentration correlated with mean ozone concentration for all regulatory stations (including mountain, coastal and tributary valley sites); C: the ratio of standard deviation of ozone concentration to mean ozone concentration correlated with mean ozone concentration for stations in the central band of latitude. Land uses are: blue squares = urban sites, red circles = rural sites, and green triangles = suburban sites. The regulatory station locations are shown in Figure S3. Figure S4 shows data pertinent to our present study. For all the regulatory sites around the LFV, the ratio of standard deviation to mean linearly correlated with the mean (Figure S4B). For the non-elevated sites in the central valley, the ratio of standard deviation to mean was essentially constant, although linear correlation with the mean that is different for the different land use categories could offer further refinement of the result (Figure S4C). These correlations enable estimation of the mean, from LUR, and estimation of the standard deviation, from LUR and correlation with the mean. S-5

6 S5 categories. Reference site correlation and slope correction coefficients for different land-use abb abb bkp bkp bs bs 4 chi chi coq coq7 6 3 hop hop lan lan mp mp map map del del pit pit por por ric ric rob rob sec sec sur sur tsa tsa air air kits kits Figure S5: Pearson correlations across the 9 analyzer sites Below shows how we derived span correction coefficients: YY kk = 0 + rr ZZ,YY ZZ kk ; Where Yk is the observed data, rz,y is the derived coefficient for each land use, and where Zk is the entire data in that land use. As we have set three land use categories, each model run will give three coefficient spans. Table S2 shows each model run and their coefficients: S-6

7 Table S2: Values of rz,y obtained for each analyzer against the aggregate of data in each landuse category Analyzer as proxy Abbreviation Analyzer land-use Span: urban Span: Span: rural suburban Abbotsford Abb Suburban Burnaby Kensington Park Bkp Suburban Burnaby South Bs Suburban Chilliwack Chi Rural Coquitlam Coq Suburban Hope Airport Hop Rural Kitsilano Kits Suburban.06 Langley Lan Suburban Mahon Park Mp Suburban Maple Ridge Map Suburban North Delta Del Suburban Pitt Meadows Pit Rural Port Moody Por Suburban Richmond - Airport Air Urban Richmond South Ric Suburban Robson Square Rob Urban Second Narrows Sec Urban Surrey East Sur Suburban Tsawwassen Tsa Suburban Medians for each land use were as follows and used as Span coefficients to correct proxy data. Urban observation, suburban proxy: = mmmmmmmmmmmm(0.68, 6, 2, 0.76,, 0.68,, 0.69,, 4, 0.76, 0.7, 0.66) = 0.76 Urban observation, rural proxy: = mmmmmmmmmmmm(0.69, 0.63, 0.75) = 0.69 Suburban observation, urban proxy: = mmmmmmmmmmmm(.02,.35,.23) =.23 Suburban observation, rural proxy: = mmmmmmmmmmmm(7,, 0.96) = 8 Rural observation, urban proxy: = mmmmmmmmmmmm(.06,.38,.28) =.28 Rural observation, suburban proxy: = mmmmmmmmmmmm(0.94,.3,.07,.04,.06, 0.93,.03, 0.95,.07,.2, 0.99, 0.94, 6) =.03 Where the land use is the same as the proxy, the variation from rz,y = indicates the uncertainty associated with the assignment of proxies based on broad land-use classification. S-7

8 S5 Hourly averaged raw sensor data, compared to co-located analyzer Figure S5. Hourly-averaged time-series of the nine co-located sensors (black analyzer data, color factory calibrated sensor data); S6 Stepping semi-blind calibration results for different locations than Fig 5 in text A = scatterplot of the co-located analyzer data to semi-blind calibrated sensor data. Lighter colors represent earlier measurement weeks and darker colors later measurement weeks. B = same as A but with uncalibrated sensor data ( raw ). C = stepping intercept estimate, aa 0 D = stepping slope estimate, aa The dashed lines on C and D mark the drift detection thresholds used in 22 S-8

9 S-9

10 S-0

11 S-

12 S-2

13 The Port Moody example is for a sensor where a suitable proxy was difficult to identify. The analyzer station is the only location where O3 is monitored where water is the predominant land-use within km 7. Therefore, the nearest regulatory station, Second Narrows, was selected as the proxy. This proxy analyzer is within a commercial setting with around 35% water within km. At the start of the deployment, the estimates aa and aa 0 were within an acceptable range to identify the data as reliable 4 then, around the time of the July fires, they showed that the device was drifting. The major effect was a drift of the zero to higher values, confirmed by the co-location data. Calculation using {4} in the main text was successful in correcting the drifting offset. However, higher values of O3 concentration were likely to be under-predicted. That is: the proxy variance under-estimated the site variance. This would be a result of selecting a proxy in a different land-use. S7 Assessment of the reliability of exceedance determination using hourly-averaged data, and the effect of data truncation. Figure S6 shows the high-concentration results for correction made on hourly-averaged data, and illustrates the effect on the accuracy of prediction of high concentration values, of data truncation performed on the hourly averaged data. Left truncation was computation with values above the arithmetic mean only; right truncation was computation with values below the arithmetic mean only. S-3

14 Figure S6: scatterplots for the high concentration values, in hourly-averaged data. Red = calibration performed using all data; blue = calibration performed using left truncation; green = calibration performed using right truncation. The black line is the : line and td = 72 hours. Table S3 summarizes the MAE, false positive and false negative scores. The results showed that data truncation had little effect on the calibration fits for high concentrations when the data were hourly-averaged. The overall false negative rate was ~ 30% and the false positive rate was ~ 40%. S-4

15 Table S3: summary of high concentration values (HV) for hourly-averaged data. FP = false positive = # corrected sensor data > HV & regulatory data < HV / # corrected sensor data > HV. FN = false negative = # corrected sensor data < HV & regulatory data > HV / # regulatory data > HV. Here HV = 4.9 ppb. Site # regulatory All data Left truncated data Right truncated data values > HV MAE FP (%) FN (%) MAE FP (%) FN (%) MAE FP (%) FN (%) (ppb) (ppb) (ppb) Abbotsford /76 = /69 = /68 = /69 = /9 = /69 = 0.09 Langley /40 = /65 = /58 = /65 = /24 = /65 = 0.7 North Delta /67 = 0.69 /22 = /66 = 0.68 /22 = /34 = 0.5 5/22 = 0.23 Maple /60 = /7 = /64 = 0.3 5/7 = /47 = /7 = 0.48 Ridge Pitt /70 = 0.4 6/58 = /92 = /58 = /55 = /58 = 0.47 Meadows Port Moody /5 = /33 = /4 = 0 29/33 = /8 = /33 = 8 Richmond /52 = /3 = /58 = /3 = /45 = /39 = 0.23 Second /3 = 5 4/6 = /5 = /6 = 3.9 / = 6/6 = Narrows Surrey /7 = 0.3 5/54 = /66 = /54 = /84 = /54 = 0.5 SUM /454 = /409 = /49 = 0.4 /409 = /427 = /409 = 0.42 S-5

16 References () Johnson, N.; Kotz, S.; Balakrishnan, N.: Continuous Univariate Distributions John Wiley & Sons: NY, 994; Vol.. (2) Sharma, S.; Sharma, P.; Khare, M.; Kwatra, S. Statistical behavior of ozone in urban environment. Sustainable Environment Research 206, 26, (3) Bencala, K. E.; Seinfeld, J. H. On frequency distributions of air pollutant concentrations. Atmospheric Environment (967) 976, 0, (4) Miskell, G.; Salmond, J.; Alavi-Shoshtari, M.; Bart, M.; Ainslie, B.; Grange, S.; McKendry, I. G.; Henshaw, G. S.; Williams, D. E. Data verification tools for minimizing management costs of dense air-quality monitoring networks. Environmental science & technology 205, 50, (5) Ainslie, B.; Steyn, D.; Reuten, C.; Jackson, P. A retrospective analysis of ozone formation in the Lower Fraser Valley, British Columbia, Canada. Part II: influence of emissions reductions on ozone formation. Atmosphere-Ocean 203, 5, (6) Ainslie, B.; Steyn, D. G. Spatiotemporal trends in episodic ozone pollution in the Lower Fraser Valley, British Columbia, in relation to mesoscale atmospheric circulation patterns and emissions. Journal of Applied Meteorology and Climatology 2007, 46, (7) Station information: Lower Fraser Valley air quality monitoring network. S-6