What Can I Do With My Data?

Similar documents
Transcription:

What Can I Do With My Data? Utilizing Existing Data for Analysis and Hypothesis Development Falgunee Parekh, MPH, PhD

Agenda My Research Background Background on Analysis of Surveillance (or Initial) Data Case Study of Lassa Fever Data Analysis Utilizing Surveillance Data Development of Collaboration Type of Existing Data Developing a research question Analysis Plan Results Questions and Discussion

Research Background Infectious Disease Epidemiologist >15 years of experience Field Epidemiology and Clinical Research Disease Experience Malaria, Zika, Lassa Fever, Influenza, Zoonotic Diseases and One Health Approach Country Experience Peru, Colombia, India, Azerbaijan, Tanzania, Democratic Republic of Congo, Gabon, South Africa, Zimbabwe

Aims of Surveillance Allows for rapid detection of disease outbreaks Supports early identification of disease problems endemic and nonendemic Provides an early warning system able to identify new and emerging diseases Assess the health status of a defined population (estimating level of occurrence/trends among diseases) Confirm absence of a specific disease

Uses and Applications of Surveillance Data Estimate the magnitude of the problem Detect epidemics/define a problem Evaluate control measures Facilitate health planning Determine geographic distribution of illness Portray the natural history of a disease Generate hypotheses, stimulate research Monitor changes in infectious agents and/or health practices

Example: Raw Dataset Case Date of Onset Disease Case Age Gender # Classification 1 22/10/16 Anthrax Confirmed 19 M 2 25/10/16 Anthrax Not a case 17 M 3 19/10/16 Anthrax Probable 23 F 4 15/10/16 Anthrax Investigation 18? Pending 5 23/10/16 Anthrax Confirmed 21 F 6 27/10/16 Anthrax Suspect 18 M 7 21/10/16 Anthrax Confirmed 25 F

Methods of Analysis of Surveillance Data Descriptive Methods Analysis of the data by person, place and time Calculation of rates Use of tables, graphs, and maps Analytical methods Cohort studies Case-Control studies

Developing a Data Analysis Plan To analyze data you need a data analysis plan A series of steps to organize your work The data analysis plan must build upon itself Start with simple descriptive statistics Build to more complex analyses Examine the data for possible errors and correct if possible at every step of the data analysis plan

Components of a Surveillance Analysis Plan Become familiar with the data Check for errors Clean the data Analyze counts and rates by year, months, or weeks (Time) Check for trends and seasonality Analyze data by regions or districts (Place) Analyze data by age and sex (Person) Subgroup analysis

Data Quality Missing Values Completeness of critical variables Data entry errors, Adherence to strict case definitions Biases Severe cases tend to be reported more than mild cases Better surveillance in urban areas than rural Non-standard reporting

Collaborations Develop collaborations with other investigators Fulfill your knowledge gaps Assist in development of analysis plan Allows for multiple perspectives in interpretation of analysis Allows for hypothesis development and continued collaboration on future projects

Case Study Lassa Fever Data, Sierra Leone

Case Study Lassa Fever Data, Sierra Leone Viral Hemorrhagic Fevers (VHFs) pose serious biological threats and potent agents of bioterrorism Ease of aerosolized dissemination Low infectious dose High morbidity/mortality rates Lack of effective vaccines or treatments The outbreak of Ebola demonstrates the rapid spread of VHFs across borders and regions due to mobile populations VHFs have serious impact on public health and heavy burden on health care infrastructure and agencies Lassa Fever has been imported to other countries

Background Lassa Fever (LF) Lassa virus (LASV) is an arenavirus Reservoir is the multimammate rat genus Mastomys LF is NOT a rare disease Endemic to West Africa and transmitted throughout the year Occurs in several countries including Guinea, Liberia, Nigeria, and Sierra Leone Estimated that 300,000 cases and 5,000 deaths occur annually One of the only VHFs that can be prospectively studied Understanding how LF spreads can better help us understand other disese like Ebola LF in Sierra Leone 2004-2011

Study Objective Characterize the morbidity/mortality, epidemiology and risk factors associated with clinical outcome for infection with Lassa virus (LASV)

Description of Dataset LF from Sierra Leone Developed Collaboration: Sierra Leone Ministry of Health and Sanitation (MOHS) provided access to country-wide data on suspected LF cases Surveillance and clinical data of suspected cases reported by MOHS, 2008 2013 Includes data on: Suspected Cases identified through passive and active surveillance Results of diagnostic laboratory testing Epidemiologic data collected from patient questionnaires and clinical assessments Potential contacts identified and approached by active surveillance team

LF Dataset Study Methods: Retrospective analysis of data collected from surveillance of LF in Sierra Leone Assess epidemiologic risk factors associated with disease and mortality

Where Do I Start??

Analysis of Data by Person, Place and Time Analysis by Person Compare counts or frequencies by: Age Gender Ethnicity Occupation Vaccination status Others? Analysis by Place Present geographic distribution of counts or rates Where cases were reported Where exposures might occur Determine the geographic area with the highest rates of infection Analysis by Time Examine occurrence of disease during particular time interval (years, months, weeks) Seasonal trends Analysis of time using person and place subcategories: Gender frequency over time Frequency in a region over time.

Analysis of Subgroups Analysis of sub-groups can reveal additional information Sub-Groups Gender Children Ethnicity Individuals with outdoor occupations Combinations (gender and ethnicity)

Develop an Analysis Plan Univariate analysis Temporal trend analysis across years Risk factor analysis to assess predictors of disease and mortality Age Gender Other subgroups

LF Results 2008-2013 Univariate Analysis by Time 3348 suspected LF cases identified between 2008-2013: 27.0% were LF Positive 31.5% of LF Positive (n=872), Died 56.3% of suspected cases were Female 13.7% of suspected received Ribavirin treatment 4000 3500 3000 2500 2000 1500 1000 500 0 178 Lassa Fever Enrollees, Diagnosis and Mortality 2008-2013 673 776 806 317 191 192 222 194 42 19 64 34 57 66 59 40 2008 2009 2010 2011 2012 2013 Total N LF Pos LF Died 598 3348 905 275

LF Results Analysis by Time 2008-2013 Characteristics of Suspected LF Cases by Year Characteristic 2008 2009 2010 2011 2012 2013 Total Chi-Sq. P-value CA * Trend P-Value N 178 317 673 776 806 598 3348 Female 84 (47.2) 177 (55.8) 356 (52.9) 460 (59.3) 473 (805, 58.7) 335 (56.0) 1,885 (3347, 56.3).016.026 Age in Years (Median) Mean DSOI/days (Median) 25.5 (26.0) 25.0 (316, 25.0) 23.7 (670, 23.0) 24.3 (766, 24.0) 24.7 (788, 23.0) 23.7 (593, 22.0) 24.3 (3311, 23.0).23** NA 9.6 (134, 8.0) 9.2 (307, 7.0) 8.6 (647, 6.0) 9.6 (600, 7.0) 8.2 (418, 6.0) 8.5 (323, 6.0) 8.9 (2429, 7.0).0003** NA *Cochran Armitage Trend test, **Krukal Wallis test The proportion of female suspected cases significantly increased over the years Days Since Onset of Illness(DSOI) significantly different across the years Appears to be decreasing

Proportion of Suspected Cases Proportion of LF Positive LF Results Total Suspected Enrollees with Defined LF Diagnosis, 2008-2013 35 LF Positive Cases and Ribavirin Treated by Year 60 LF Mortality by Year 30 25 p<.0009* 50 40 20 30 15 10 5 0 2008 2009 2010 2011 2012 2013 p<.0001* 20 10 0 2008 2009 2010 2011 2012 2013 Year p<.0001* Year % LF + %Ribavirin %Mortality * Cochran-Armitage Trend test Increased prevalence may be due to improved detection and/or increasing transmission More mild LF cases may be detected that don t require Ribavirin treatment

LF Results Analysis by Place Map of Cases in Sierra Leone 2008 2009 2010 LF cases identified from districts that had previously not reported LF Improved detection of LF 2011 2012 2013 Improved awareness of the population at risk of LF LF may be spreading Courtesy of Marc Souris

LF Results Analysis by Person Risk Factors of Lassa Fever Diagnosis, 2008-2013 Characteristic All Patients LF Non-LF P-value N 3233 882 (27.3) 2351 (72.2) Female 1823 508 (57.6) 1315 (55.9) NS Mean Age (Median) 24.5 (24.0) 21.9 (20.0) 25.5 (25.0) <.0001* Mean DSOI(Median) 8.96 (2377,7.0) 9.6 (723,8.0) 8.7 (1654,6.0) <.0001* House Deaths 167(927) 36 (295,12.2) 131 (632,20.7).0017 Contact with LF Case 770(2009) 145 (565,25.6) 625 (1444,43.3) <.0001 Ribavirin 454 (3218) 406 (871,46.6) 48 (2347,2.1) <.0001 * Wilcoxon Rank Sum Test, LF positive were of significantly younger age and had more days since onset of illness LF negative were significantly more likely to have reported a death in their household, and contact with a LF case Gender was not significantly different between LF positive and LF negative

LF Results Risk Factors of Lassa Fever Mortality, 2008-2013 Characteristic Total LF Non-Survivors Survivors P-value N 856 271 (31.7) 585 (68.3) Female 495 (57.8) 146 (53.9) 349 (59.7) NS Mean Age (Median) 21.7 (20.0) 18.7 (18.0) 23.1 (21.0).0005* Mean DSOI (Median) 9.6 (704,8.0) 9.3 (230,8.0) 9.7 (474,7.0) NS* House Deaths 36 (285) 3 (60,5.0) 33 (225,14.7).045 Contact with LF Case 139 (549) 16 (141,11.4) 123 (408,30.2) <.0001 Ribavirin 405 (852) 156 (270,57.8) 249 (582,42.8) <.0001 * Wilcoxon Rank Sum Test, Non-Survivors were of significantly younger age (p=.0005) Survivors significantly more likely to report household death or contact with LF case (p=.045, p<.0001) Ribavirin significantly associated with mortality (p<.0001); most likely confounding factor and an indication of disease severity

LF Results Subgroup Analysis Children < 5 years of age vs. All Other Suspected LF Cases, 2008-2013 Total Age<5 All Others P-value N 3233 583 2650 LF Positive 882 (27.3) 198(34.0) 684(25.8) <.0001 LF Mortality (N=856) 271 (31.2) 83(193,43.0) 188(663,28.4).0001 Ribavirin 454(3218,14.1) 107(582,18.4) 347(2636,13.2).0011 Female 1823(56.4) 268(46.0) 1555(58.7) <.0001 Household Deaths 167(927,18.0) 7(95,7.4) 160(832,19.2).0044 Contact with Case 770(2009,38.3) 76(309,24.6) 694(1700,40.8) <.0001 Mean DSOI 9.6(704,8.0) 8.0(392,7.0) 9.1(1985,7.0) NS Malaria 152 (356,42.7) 57(87,65.5) 95(269,35.3) <.0001 Among LF+, median DSOI for < 5years was 7.0 compared to 8.0 for all others (p=.065) Children < 5 years were significantly more likely to be LF positive, receive Ribavirin treatment, and die from LF compared to all others Children < 5 years were significantly more likely to have malaria All others significantly more likely to report household death or contact with case; low sample size

LF Results Subgroup Analysis Pregnant vs. Non-pregnant females (14-49), 2008-2013 Total Pregnant Non-Pregnant P-value N 345 162(47.0) 183(53.0) - LF Positive 120(34.8) 63(38.9) 57(31.2) NS LF Mortality 44(117,37.6) 32(61,52.5) 12(56(21.4).0005 Ribavirin 71(343,20.7) 43(160,26.9) 28(15.3).0083 Household Deaths 21(220,9.6) 5(68,7.4) 16(152,10.5) NS Contact with Case 39(266,14.7) 10(107,9.4) 29(159,18.2).04 Mean DSOI 8.4(278,6.0) 8.5(122,7.0) 8.4(156,6.0) NS Malaria 41(117,35.0) 17(46,37.0) 24(71,33.8) NS Pregnant women significantly more likely to receive Ribavirin treatment and die from LF Non-pregnant women significantly more likely to report contact with case Small sample size, so difficult to detect significance for other factors

LF Results Subgroup Analysis Malaria and LF Co-Infection, 2008-2013 Malaria testing results reported for 356 suspected LF cases 152 (42.7%) of suspected LF cases were Malaria positive 55/141 (39.0) of LF + patients were co-infected with malaria Those who were co-infected were of significantly younger age compare to those who were only LF positive 7.0 years vs. 22.7 years (p=.0005) The majority (41.8%) of co-infection cases occurred in children<5 years of age No significant difference in mortality detected between co-infected and LF+ alone; low sample size www.cdc.gov Lassa Fever p. falciparum Malaria

LF Dataset Study Methods: Retrospective analysis of data collected from surveillance of LF in Sierra Leone Assess epidemiologic risk factors associated with disease and mortality

Summary of LF Results - Interpretation LF prevalence significantly increased over the years and reported from new districts Could be due to improving detection, increasing transmission, or both LF mortality significantly decreased over the years Earlier detection and improving clinical management may result in better outcome Ribavirin treatment significantly associated with mortality The most severe cases usually receive Ribavirin treatment Ribavirin treatment probably a confounding factor, and an indicator of severe disease

Summary of LF Results Interpretation and Hypothesis Generation Summary of Results Young individuals, especially children < 5 years of age, were significantly more likely to be LF positive, to receive Ribavirin treatment, and to die from LF Early Detection and clinical care targeted for LF infected young children may be critical to improving LF outcome Pregnant women were significantly more likely to die for LF compared to nonpregnant counterparts High prevalence of malaria co-infection, especially in younger age Impact of co-infections on LF outcome needs to be further investigated Hypothesis Development Young children have increased risk of LASV infection and severe LF Pregnant women have increased risk of severe LF and death Malaria exacerbates LASV infection and results in more severe LF outcome

Conclusion If you have data, develop a step by step plan for analysis: Define objectives Assess data quality Develop collaborations Develop study methods Develop analysis plan Person, Place, Time Conduct analysis utilizing appropriate resources Interpret Results Present Results Abstract, Presentation, or Manuscript Develop Hypothesis for futures studies

Questions