Multi-site Time Series Analysis Motivation and Methodology SAMSI Spatial Epidemiology Fall 2009 Howard Chang hhchang@jhsph.edu 1
Epidemiology The study of factors affecting the health of human populations Some objectives of epidemiologic studies: Identify the cause of a disease and its risk factors. Measure the extent and occurrence of the disease. Quantify the burden of the disease. Evaluate current methods of health care delivery. Create preventive and intervention programs. Provide information for policy and regulatory decisions. 2
First Step in Epidemiology Exposure Exposure Examples? Adverse Health Outcome A few factors studied for breast cancer: genes, physical activity, schizophrenia, birth-weight, obesity, consumption of fruits and vegetables, total visual blindness, arthritis, (about 22,000 hits from PubMed) Health Outcome Examples Some ways to measure frailty in the elderly: slow walking speed, poor grip strength, exhaustion, unintended weight loss and low physical activity 3
Challenges in Epidemiologic Study Test subjects = Humans Study Design How to select and recruit subjects? experimental versus observational sample size and cost How to define and measure exposure? duration, intensity Ethical concerns Interpretations How to establish causation through associations? Can the results be generalized to the whole population? Bias, Confounder, Interaction 4
The London Smog (1952) Adverse health effects of extreme air pollution are well established. 5
Air Pollution Epidemiology Scientific Question: Does everyday level of air pollution affect human health? Motivations: Air pollution is experienced by everyone and there is no alternative to breathing! The health impact and economic cost of the population can be substantial. Ambient pollutants are mostly generated by human activities and regulatory policies are required to protect public health. 6
Background The EPA currently regulate six criteria pollutants: Ozone, particulate matter, carbon monoxide, nitrogen oxides, sulfur dioxide and lead The National Ambient Air Quality Standards (NAAQS) provide limits on both long-term and short-term exposure. Example: Fine particulate matter (PM 2.5 ) Averaging Time 24-hour Annual Level 35 µg/m 3 15 µg/m 3 Similarly the health effects of air pollution are classified as chronic or acute that are estimated using different study designs. 7
Time Series Analysis It is the most common population-based study design to estimate the short-term (acute) health effects of air pollution. IDEA: Quantify the association between daily variations in air pollution level and variations in daily adverse health outcomes. Example: Cook, IL 8
Chronic Health Effects Cannot use the time series design that relies on temporal (between days) comparison. Study of chronic health effect quantifies the association between spatial variation in air pollution level and health outcomes in different geographic areas. Annual Average Level of PM 2.5 (µg/m 3 ) 9
Multi-site Time Series Analysis Goal: Estimate the acute health effect of an exposure that varies both spatially and temporally. Daily Variation Spatial Variation King County Annual Average Level of PM 2.5 (µg/m3) Kern County Daily PM2.5 Level 0 50 100 150 Daily PM2.5 Level 0 10 20 30 40 50 1999 2000 2001 2002 2003 1999 2000 2001 2002 2003 10
Multi-site Time Series Analysis Stage I A single-site time series analysis is conducted within a community such as a city, a county, or a metropolitan area. Data: Outcome of interest: daily count for an adverse health outcome in the community. Example: hospital admissions, deaths Exposure of interest: daily community-level exposure to air pollution that reflects the average level of exposure experienced by all at-risk individuals. Other known predictors (confounders) of the health outcome, such as temperature, humidity, Stage II A multi-site analysis combines the health effects across locations. 11
Case Study Example: NMMAPS National Morbidity, Mortality, Air Pollution Study Study period: 1987 ~ 2000 108 urban communities (cities). Daily mortality count from National Center for Health Statistics Daily air pollution data (PM 2.5, PM 10, O 3, NO 2, SO 2, CO) Weather data from the National Climate Data Center City characteristics from the 2000 Census 12
Website: http://www.ihapss.jhsph.edu/ NMMAPS Resources Book: 13
Case Study Example: MCAPS Medicare and Air Pollution Study Study period: 1999 ~ 2005 (on-going) Approximately 204 counties Medicare enrollees aged 65 or above Daily hospital admission count for primary diagnosis 11.5 million Medicare enrollees residing an average of 5.9 miles from a PM2.5 14
Case Study Study Population Medicare Enrollees from 204 US counties with population greater than 200,000 Exposure Data Time series of daily county-level average concentrations of PM 2.5 were calculated using measurements from EPA's monitoring network. Health Outcome Data Time series of daily number of hospitalization for various cardiovascular and respiratory diseases were constructed for each county. Time series of the total number of at-risk individuals for each hospitalization outcome. 15
Stage I County-specific Model For each county separately, we model the count outcome via Poisson regression with over-dispersion: ( ) y ct ~ Poisson µ ct log µ ct = lognct + αc+ βcxc( t p) + confounders For county c: y ct x c(t-p) N ct = number of admission on day t = county-level PM 2.5 exposure on day with lag p (ex. p = 0 for same-day exposure; p = 1 for previous-day exposure) = population at risk on day t 16
Stage I Modelling Time series analysis is ecological in time: (1) We regress aggregated health outcome on aggregated exposure. (2) Day serves as the unit of comparison. Over-dispersion may be due to residual confounding, measurement error, or ecological bias. The acute health effect β c represents: county-specific log relative risk associated per unit increase in same-day PM 2.5 level controlling for known confounders. % increase in hospital admissions associated per unit increase in same-day PM2.5 level controlling for known confounders. a single number with great policy implication! 17
Confounders Also known as hidden variables or lurking variables. In establishing whether A causes B, factor C is a confounder if: (1) C is a known risk factor for B (2) C is associated with A but not in the causal pathway of A. (A) Air Pollution? (C) Temperature (B) Health Outcome 18
Controlling for Confounders It is important to rigorously control for confounders. A typical model will include: Day of the week Age-group categories (under 65 versus 65 to 75 versus 75+) Smooth function of calendar time to control for long-term trends and seasonality due to epidemics of influenza and respiratory infections. Interaction between age-group and smooth function of time Smooth functions of current-day and previous-day temperature Smooth function of current-day and previous-day dew-point temperature to control for humidity Smooth functions for the confounders are modelled via natural cubic spline. Note that confounders that do no vary with time is automatically controlled for! 19
Controlling for Confounders Examples (1) Mortality and Temperature Association between lag 1 PM 10 and mortality as the number of lags of temperature included in the model is increased, New York, NY, 1987 2000. (2) Mortality and Time Estimates of the log relative risk PM 10 for Denver, Colorado, 1987 2000, as the number of degrees of freedom per year in the smooth function of time is varied Peng RD, Dominici F (2008). Statistical Methods for Environmental Epidemiology in R: A Case Study in Air Pollution and Health, Springer. 20
Stage II Combining Across Locations A simple hierarchical model: β ~ Normal c ( µ, τ 2 ) Assuming the true location-specific log relative risks are independent across locations, µ = ( pooled / overall / average / national ) relative risk 2 τ = between-county variability (spatial heterogeneity) in relative risks One can view the adverse health effects of PM 2.5 as treatments that were randomly assigned to the selected counties or that the risks are exchangeable among counties. 21
Estimation We cannot carry out estimation for both Stage I and Stage II simultaneously because of the large number of county-specific regression coefficients for confounders. A two-stage approximation approach: 1. First estimate county-specific log relative risk and its variance 2. Use an MLE-based Normal approximation: βˆc Vˆc ˆ β β c c ~ β ~ Normal c Normal ( β, Vˆ c c 2 ( µ, τ ) ) The above two-level Normal-Normal model can be estimated via MCMC, programs for meta-analysis, or the TLNISE algorithm of Everson and Morris (2000) 22
National Estimates for PM2.5 and Admissions 23
Example: County-specific Effect of PM 10 on Mortality MLE Estimates F. Dominici, A. McDermott, M. Daniels, S. L. Zeger, and J. M. Samet. Mortality among residents of 90 cities. In Revised Analyses of Time-Series Studies of Air Pollution and Health, pages 9 24. The Health Effects Institute, Cambridge, MA, 2003. 24
Example: County-specific Effect of PM 10 on Mortality Bayesian Estimates F. Dominici, A. McDermott, M. Daniels, S. L. Zeger, and J. M. Samet. Mortality among residents of 90 cities. In Revised Analyses of Time-Series Studies of Air Pollution and Health, pages 9 24. The Health Effects Institute, Cambridge, MA, 2003. 25
County-specific Estimates The hierarchical framework borrows strength across studies (locations). In Stage I, county-specific relative risks estimates are often poorly estimated. MLE Example: Mortality and PM 10 Bayesian Log relative rates of mortality from exposure to PM10. areas of the circles are proportional to the posterior precisions of the Bayesian estimates; larger circles indicate more precise estimates. Black outline denote relative rates with posterior mean and posterior standard deviation ratio > 1.96 Dominici F. McDermott A. Zeger S.L. Samet J.M. National Maps of the Effects of PM on Mortality: Exploring Geographical Variation Environmental Health Perspectives vol 111 no 1, 39-43 26
Risk Heterogeneity The observed heterogeneity in risks can from unmeasured confounders and effect modifications due to county-specific characteristics. We can include higher level covariates in the hierarchical model: β c ~ Normal ( γz c, τ 2 ) County-specific covariates (Z c ) may include factors that potentially modify the true health effects. Examples: Variable % poverty % urbanicity Average distance between residents and monitor To test the effect of Socio-economic status Pollutant composition Exposure measurement error 27
Example of Risk Heterogeneity Does region (East versus West) modify the health effects? Air Pollution? East versus West Health Outcome 28
Example of Health Burden Estimates Annual reduction = [exp ( µ 10 ) 1 ] N 29
Advantages of Multi-site Time Series Can achieve large study population and long study period from utilizing publicly available national air pollution and health surveillance databases Day-to-day comparison allows a community to serve as its own control and unmeasured confounders that are relatively constant between days. A multi-site approach combine evidence, borrow information across locations, and potentially enhance statistical power. Multi-site ensures that the same analytic method is used at each location, minimizing publication/selection bias and allowing better generalizability of the results. Comparing risk estimates from different locations, effect modification due to location-specific characteristics can be examined. 30
Epidemiologic Evidence and Policy Regarding the time series design, the EPA s 2004 Criteria document for particulate matter states that ``the temporal relationship supports a conclusion of a causal relation, even when both the outcome and the exposure are community indices. Consistency and Strength Regarding the evidence on the health effects of fine PM, `` A growing body of epidemiologic evidence both (a) confirms associations between shortterm ambient exposures to fine-fraction particles (generally indexed by PM 2.5 ) and various mortality or morbidity endpoint effects and (b) supports the general conclusion that PM 2.5 (or one or more PM 2.5 components), acting alone and/or in combination with gaseous copollutants, are likely causally related to observed ambient fine particle associated health effects. 31