TEHRAN UNIVERSITY OF MEDICAL SCIENCES SCHOOL OFPUBLIC HEALTH 1
Application of LUR model for chronic exposure estimation to SO 2 and PM 10 in Tehran, Iran Hasan Amini Seyed Mahmood Taghavi Shahri Sarah Henderson Masud yunesian Institute for Environmental Research and School of Public Health TUMS, Tehran, Iran 2
Outline Introduction Acute and chronic effects of air pollution Four generations of epidemiological studies Limitations of existing tools Conceptual framework of LUR Methods Model development Diagnostics Results Limitations Works in progress and future works 3
Introduction Health consequences of exposure to air pollution: Acute: Time series analysis (Ecological studies) studies Chronic: Cohort and cross sectional studies The importance of accurate exposure measurement 4
Introduction Generations of air pollution epidemiology (Irva Hertz-Picciotto in: Modern Epidemiology, 3rd Edition, 2008) First generation studies (similar to infectious diseases): within-community comparisons using a before-and-after design 5
Introduction Generations of air pollution epidemiology (Irva Hertz-Picciotto in: Modern Epidemiology, 3rd Edition, 2008) First generation studies (similar to infectious diseases): within-community comparisons using a before-and-after design Second generation (comparing communities with higher versus lower pollutant levels): Many standards and Clean Air Act of 1970 in the United States Individual and region specific confounders (some times >40) 6
Introduction Generations of air pollution epidemiology (Irva Hertz-Picciotto in: Modern Epidemiology, 3rd Edition, 2008) First generation studies (similar to infectious diseases): within-community comparisons using a before-and-after design Second generation (comparing communities with higher versus lower pollutant levels): Many standards and Clean Air Act of 1970 in the United States Individual and region specific confounders (some times >40) Third generation (comparisons over time within a given area): Good control of individual confounders Problem of ecological confounders Inability to evaluate chronic effects 7
Introduction Generations of air pollution epidemiology (Irva Hertz-Picciotto in: Modern Epidemiology, 3rd Edition, 2008) First generation studies (similar to infectious diseases): within-community comparisons using a before-and-after design Second generation (comparing communities with higher versus lower pollutant levels): Many standards and Clean Air Act of 1970 in the United States Individual and region specific confounders (some times >40) Third generation (comparisons over time within a given area): Good control of individual confounders Problem of ecological confounders Inability to evaluate chronic effects Newer generations (cohorts in which individual-level data are integrated with community-based exposure data): Better exposure measurement (Land use regression models) 8
Introduction : Limitation of existing tools for exposure measurement in Tehran The need to capture within cities variation on a large number of people (for cohort and cross sectional studies) Insufficient number of monitoring stations to capture all individuals Inadequacy of existing models for evaluation of personal exposure status in Tehran 9
Introduction : Limitation of existing tools for exposure measurement in Tehran-models 1. Proximity Based Assessment and Proxies 2. Geostatistical Interpolation Approaches Kriging, Spline, Inverse Distance Weighted, Theissen Triangulation 3. Dispersion Models Gaussian Plume Eulerian (grid-cell) LaGrangian or Puff Models 4. Integrated Meteorological-Emission (IME) Models 5. Hybrid Models 10
Introduction: Number of papers indexed in Pubmed by year papers 60 50 40 30 20 10 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 in Megacities, 3-5th September 2013? 0 11
Conceptual framework: How does it work Construct a model to estimate the averaged level of given pollutant at each monitoring station (Y) using predictor variables (X) To measure the value of each variable in the final model for any location using digitalized map Imputing these X values in the model to get the estimation of Y value 12
Study Area Characteristics Largest and the Most Populated City of Iran Resident Population is Roughly 8.7 million 22 Districts Surface Area =612 km 2 Average Elevation =1200 m (Roughly).... in Megacities, 3-5th 13
Methods Air Pollution Data sources (Response Variables): Air Quality Control Company (AQCC) 16 Department of Environment (DOE) 7 We ran the Amelia II program (10 times for each pollutant) 21 out of the 23 monitors were eligible for inclusion in the study in Megacities, 3-5th 15
Methods 17 in Megacities, 3-5th
Methods Response variables: Annual-mean concentrations for PM 10 and SO 2 were averaged from January 01, 2010 to January 1, 2011 for all the monitors after imputation for missing data Predictor Variables: Geographic attributes that were compiled within GIS 18 in Megacities, 3-5th
Methods The mean of the 10 imputation-filled datasets was calculated Warmer and cooler seasons April through September October through March (Based on WHO guidelines for countries in the Northern hemisphere) 20
Methods 210 potentially predictive variables (PPVs) in six classes and 73 sub-classes Traffic Surrogates Land Use Distance Variables Population Density Product Variables Geographic Location 21
Methods Traffic Surrogates The vehicular network in buffers with different radii around the air pollution monitoring stations 22
Methods Land Use Ten land use types within buffers around the stations: Residential Green space Urban facilities Industrial/workshop Official/commercial Transportation Military Agriculture Arid/undeveloped Other 23
Methods Distance Variables The distance (and natural logarithm of the distance) from each station to all of the Traffic Surrogate and Land Use types, and to a variety of other features (Due to exponential decay in air pollutant concentrations with increasing distance from pollution sources) 24
Methods Population Density The total population; and the population excluding unemployed people and children less than five years of age 25
Methods Product Variables The products of variables in the Traffic Surrogates class divided by variables in the Distance Variables class 26
Methods Geographic Location The elevation of each monitoring site, obtained from a digital elevation model of Tehran in meters above sea level A slope variable 27
Methods The raw GIS inputs were all in vector format (Originating from the Japan International Cooperation Agency (JICA) and the Centre for Earthquake and Environmental Studies of Tehran) The final predictive variables were all in raster format with a horizontal resolution of 5 meters 28
Methods: Model development and diagnostics A step-by-step algorithm considered four key pieces of information: Consistency with a priori assumptions about the direction of the effect for each variable A p-value of < 0.1 Increases in R 2 for a leave-one-out crossvalidation (LOOCV) A multicollinearity index called the variance inflation factor (VIF) 29
Methods: Model development and diagnostics The algorithm was programmed as a function in the R statistical package A single variable linear regression model for each of the PPVs in the eligible pool (210 to begin) Models check for consistency (with a priori assumptions, p-value and the variable with the strongest LOOCV R 2 value 30
Methods: Model development and diagnostics All possible second variables were added to the retained single variable model, similarly In the third step, all possible third variables were added to the two variable model, similarly Each variable was also removed from the model, and the LOOCV R2 value was calculated 31
Methods: Model development and diagnostics If any of the resulting two variable models had a higher LOOCV R 2 value than the model elevated from the second step, the elevated model was replaced and the third step was restarted If not, the third step model with the highest LOOCV R 2 value was elevated to the fourth step This process was followed until the LOOCV R 2 value could no longer be increased by the addition of further variables 32
Methods: Model development and diagnostics - LOOCV First we built a model using 20 stations omitting the first one Then we estimated the level air pollution of omitted station (y) using the model (x) Then we built the model again, omitting the second station and using the remaining other 20 stations This process was repeated for all monitoring station The Pearson correlation coefficient and its square was used as an index 33
Mehtods: Model development and diagnostics Multicollinearity: If VIF greater than 10, that model was considered unacceptable Sequentially removing each variable from the available pool of variables stability check of the models: Minimum, maximum, and coefficient of variation for the set of coefficients for the LOOCV 34
Methods: Regression mapping Raster cells outside of the buffer zones as null All null values for the Distance Variables class were set to zero All null values for the Product Variables class were set to the maximum values for the layers The Raster Calculator of the ArcGIS Spatial Analyst extension was used A quantification limit (QL) for predictions at the low end of the concentration distributions, defined as the lowest measured concentration divided by square root of 2 Very high predictions were set to 120% of the maximum observed concentrations 35
Results Of the 210 variables generated, 19 (9%) were significantly predictive in one or more of the LUR models SO 2 Annual: 6 SO 2 cooler: 7 SO 2 warmer: 7 PM 10 annual:4 PM 10 cooler:5 PM 10 warmer:4 36
Results The adjusted R 2 ranged from 0.83 to 0.93 for SO 2 and ranged from 0.53 to 0.72 for PM 10 models respectively The R 2 values for the leave-one-out cross validations ranged from 0.61 to 0.82 for the SO 2 models, and from 0.48 to 0.63 for the PM 10 models 37
Results: Model stability The minimum and maximum of the LOOCV coefficients had the same direction for all variables in all models For SO 2, the maximum coefficients of variation for the LOOCV coefficients in the annual, cooler season, and warmer season models were 12.2%, 10.2%, and 17.9%, respectively For PM 10, the maximum coefficients of variation for the LOOCV coefficients in the annual, cooler season, and warmer season models were 10.8%, 10.4%, and 8.5%, respectively 39
Results: Estimated annual SO 2 and PM 10 concentrations from the final land use regression models 40
Limitations Relatively small sample size (usually 20-100 monitoring stations have been used) Using LOOCR R 2 instead of Model adjusted R 2 (as the LOOCV tends to be less in models with lower monitoring stations) Using governmental monitoring stations (did not use location allocation approach) 41
Works in progress using LUR Deterioration of Multiple sclerosis Low birth weight Childhood Leukemia Breast cancer District specific life expectancy (in 22 districts) 42
Future works Construction of models for other criteria air pollutants Construction of models capturing both spatial and temporal variations (real time LUR models) Using location allocation approach 43
Aknowledgement This was part a MS thesis at Theran University of Medical Sciences We would like to express our appreciation to the organizers of this meeting Also appreciate kind cooperation of: Department of Environment Tehran Air quality Control Company Tehran Municipality And many other people and organizations who helped us 44
And Finally Thank you all for your attention and Appreciate any questions, comments or suggestions 46
47