What Can I Do With My Data?

Size: px
Start display at page:

Download "What Can I Do With My Data?"

Transcription

1 What Can I Do With My Data? Utilizing Existing Data for Analysis and Hypothesis Development Falgunee Parekh, MPH, PhD

2 Agenda My Research Background Background on Analysis of Surveillance (or Initial) Data Case Study of Lassa Fever Data Analysis Utilizing Surveillance Data Development of Collaboration Type of Existing Data Developing a research question Analysis Plan Results Questions and Discussion

3 Research Background Infectious Disease Epidemiologist >15 years of experience Field Epidemiology and Clinical Research Disease Experience Malaria, Zika, Lassa Fever, Influenza, Zoonotic Diseases and One Health Approach Country Experience Peru, Colombia, India, Azerbaijan, Tanzania, Democratic Republic of Congo, Gabon, South Africa, Zimbabwe

4 Aims of Surveillance Allows for rapid detection of disease outbreaks Supports early identification of disease problems endemic and nonendemic Provides an early warning system able to identify new and emerging diseases Assess the health status of a defined population (estimating level of occurrence/trends among diseases) Confirm absence of a specific disease

5 Uses and Applications of Surveillance Data Estimate the magnitude of the problem Detect epidemics/define a problem Evaluate control measures Facilitate health planning Determine geographic distribution of illness Portray the natural history of a disease Generate hypotheses, stimulate research Monitor changes in infectious agents and/or health practices

6 Example: Raw Dataset Case Date of Onset Disease Case Age Gender # Classification 1 22/10/16 Anthrax Confirmed 19 M 2 25/10/16 Anthrax Not a case 17 M 3 19/10/16 Anthrax Probable 23 F 4 15/10/16 Anthrax Investigation 18? Pending 5 23/10/16 Anthrax Confirmed 21 F 6 27/10/16 Anthrax Suspect 18 M 7 21/10/16 Anthrax Confirmed 25 F

7 Methods of Analysis of Surveillance Data Descriptive Methods Analysis of the data by person, place and time Calculation of rates Use of tables, graphs, and maps Analytical methods Cohort studies Case-Control studies

8 Developing a Data Analysis Plan To analyze data you need a data analysis plan A series of steps to organize your work The data analysis plan must build upon itself Start with simple descriptive statistics Build to more complex analyses Examine the data for possible errors and correct if possible at every step of the data analysis plan

9 Components of a Surveillance Analysis Plan Become familiar with the data Check for errors Clean the data Analyze counts and rates by year, months, or weeks (Time) Check for trends and seasonality Analyze data by regions or districts (Place) Analyze data by age and sex (Person) Subgroup analysis

10 Data Quality Missing Values Completeness of critical variables Data entry errors, Adherence to strict case definitions Biases Severe cases tend to be reported more than mild cases Better surveillance in urban areas than rural Non-standard reporting

11 Collaborations Develop collaborations with other investigators Fulfill your knowledge gaps Assist in development of analysis plan Allows for multiple perspectives in interpretation of analysis Allows for hypothesis development and continued collaboration on future projects

12 Case Study Lassa Fever Data, Sierra Leone

13 Case Study Lassa Fever Data, Sierra Leone Viral Hemorrhagic Fevers (VHFs) pose serious biological threats and potent agents of bioterrorism Ease of aerosolized dissemination Low infectious dose High morbidity/mortality rates Lack of effective vaccines or treatments The outbreak of Ebola demonstrates the rapid spread of VHFs across borders and regions due to mobile populations VHFs have serious impact on public health and heavy burden on health care infrastructure and agencies Lassa Fever has been imported to other countries

14 Background Lassa Fever (LF) Lassa virus (LASV) is an arenavirus Reservoir is the multimammate rat genus Mastomys LF is NOT a rare disease Endemic to West Africa and transmitted throughout the year Occurs in several countries including Guinea, Liberia, Nigeria, and Sierra Leone Estimated that 300,000 cases and 5,000 deaths occur annually One of the only VHFs that can be prospectively studied Understanding how LF spreads can better help us understand other disese like Ebola LF in Sierra Leone

15 Study Objective Characterize the morbidity/mortality, epidemiology and risk factors associated with clinical outcome for infection with Lassa virus (LASV)

16 Description of Dataset LF from Sierra Leone Developed Collaboration: Sierra Leone Ministry of Health and Sanitation (MOHS) provided access to country-wide data on suspected LF cases Surveillance and clinical data of suspected cases reported by MOHS, Includes data on: Suspected Cases identified through passive and active surveillance Results of diagnostic laboratory testing Epidemiologic data collected from patient questionnaires and clinical assessments Potential contacts identified and approached by active surveillance team

17 LF Dataset Study Methods: Retrospective analysis of data collected from surveillance of LF in Sierra Leone Assess epidemiologic risk factors associated with disease and mortality

18 Where Do I Start??

19 Analysis of Data by Person, Place and Time Analysis by Person Compare counts or frequencies by: Age Gender Ethnicity Occupation Vaccination status Others? Analysis by Place Present geographic distribution of counts or rates Where cases were reported Where exposures might occur Determine the geographic area with the highest rates of infection Analysis by Time Examine occurrence of disease during particular time interval (years, months, weeks) Seasonal trends Analysis of time using person and place subcategories: Gender frequency over time Frequency in a region over time.

20 Analysis of Subgroups Analysis of sub-groups can reveal additional information Sub-Groups Gender Children Ethnicity Individuals with outdoor occupations Combinations (gender and ethnicity)

21 Develop an Analysis Plan Univariate analysis Temporal trend analysis across years Risk factor analysis to assess predictors of disease and mortality Age Gender Other subgroups

22 LF Results Univariate Analysis by Time 3348 suspected LF cases identified between : 27.0% were LF Positive 31.5% of LF Positive (n=872), Died 56.3% of suspected cases were Female 13.7% of suspected received Ribavirin treatment Lassa Fever Enrollees, Diagnosis and Mortality Total N LF Pos LF Died

23 LF Results Analysis by Time Characteristics of Suspected LF Cases by Year Characteristic Total Chi-Sq. P-value CA * Trend P-Value N Female 84 (47.2) 177 (55.8) 356 (52.9) 460 (59.3) 473 (805, 58.7) 335 (56.0) 1,885 (3347, 56.3) Age in Years (Median) Mean DSOI/days (Median) 25.5 (26.0) 25.0 (316, 25.0) 23.7 (670, 23.0) 24.3 (766, 24.0) 24.7 (788, 23.0) 23.7 (593, 22.0) 24.3 (3311, 23.0).23** NA 9.6 (134, 8.0) 9.2 (307, 7.0) 8.6 (647, 6.0) 9.6 (600, 7.0) 8.2 (418, 6.0) 8.5 (323, 6.0) 8.9 (2429, 7.0).0003** NA *Cochran Armitage Trend test, **Krukal Wallis test The proportion of female suspected cases significantly increased over the years Days Since Onset of Illness(DSOI) significantly different across the years Appears to be decreasing

24 Proportion of Suspected Cases Proportion of LF Positive LF Results Total Suspected Enrollees with Defined LF Diagnosis, LF Positive Cases and Ribavirin Treated by Year 60 LF Mortality by Year p<.0009* p<.0001* Year p<.0001* Year % LF + %Ribavirin %Mortality * Cochran-Armitage Trend test Increased prevalence may be due to improved detection and/or increasing transmission More mild LF cases may be detected that don t require Ribavirin treatment

25 LF Results Analysis by Place Map of Cases in Sierra Leone LF cases identified from districts that had previously not reported LF Improved detection of LF Improved awareness of the population at risk of LF LF may be spreading Courtesy of Marc Souris

26 LF Results Analysis by Person Risk Factors of Lassa Fever Diagnosis, Characteristic All Patients LF Non-LF P-value N (27.3) 2351 (72.2) Female (57.6) 1315 (55.9) NS Mean Age (Median) 24.5 (24.0) 21.9 (20.0) 25.5 (25.0) <.0001* Mean DSOI(Median) 8.96 (2377,7.0) 9.6 (723,8.0) 8.7 (1654,6.0) <.0001* House Deaths 167(927) 36 (295,12.2) 131 (632,20.7).0017 Contact with LF Case 770(2009) 145 (565,25.6) 625 (1444,43.3) <.0001 Ribavirin 454 (3218) 406 (871,46.6) 48 (2347,2.1) <.0001 * Wilcoxon Rank Sum Test, LF positive were of significantly younger age and had more days since onset of illness LF negative were significantly more likely to have reported a death in their household, and contact with a LF case Gender was not significantly different between LF positive and LF negative

27 LF Results Risk Factors of Lassa Fever Mortality, Characteristic Total LF Non-Survivors Survivors P-value N (31.7) 585 (68.3) Female 495 (57.8) 146 (53.9) 349 (59.7) NS Mean Age (Median) 21.7 (20.0) 18.7 (18.0) 23.1 (21.0).0005* Mean DSOI (Median) 9.6 (704,8.0) 9.3 (230,8.0) 9.7 (474,7.0) NS* House Deaths 36 (285) 3 (60,5.0) 33 (225,14.7).045 Contact with LF Case 139 (549) 16 (141,11.4) 123 (408,30.2) <.0001 Ribavirin 405 (852) 156 (270,57.8) 249 (582,42.8) <.0001 * Wilcoxon Rank Sum Test, Non-Survivors were of significantly younger age (p=.0005) Survivors significantly more likely to report household death or contact with LF case (p=.045, p<.0001) Ribavirin significantly associated with mortality (p<.0001); most likely confounding factor and an indication of disease severity

28 LF Results Subgroup Analysis Children < 5 years of age vs. All Other Suspected LF Cases, Total Age<5 All Others P-value N LF Positive 882 (27.3) 198(34.0) 684(25.8) <.0001 LF Mortality (N=856) 271 (31.2) 83(193,43.0) 188(663,28.4).0001 Ribavirin 454(3218,14.1) 107(582,18.4) 347(2636,13.2).0011 Female 1823(56.4) 268(46.0) 1555(58.7) <.0001 Household Deaths 167(927,18.0) 7(95,7.4) 160(832,19.2).0044 Contact with Case 770(2009,38.3) 76(309,24.6) 694(1700,40.8) <.0001 Mean DSOI 9.6(704,8.0) 8.0(392,7.0) 9.1(1985,7.0) NS Malaria 152 (356,42.7) 57(87,65.5) 95(269,35.3) <.0001 Among LF+, median DSOI for < 5years was 7.0 compared to 8.0 for all others (p=.065) Children < 5 years were significantly more likely to be LF positive, receive Ribavirin treatment, and die from LF compared to all others Children < 5 years were significantly more likely to have malaria All others significantly more likely to report household death or contact with case; low sample size

29 LF Results Subgroup Analysis Pregnant vs. Non-pregnant females (14-49), Total Pregnant Non-Pregnant P-value N (47.0) 183(53.0) - LF Positive 120(34.8) 63(38.9) 57(31.2) NS LF Mortality 44(117,37.6) 32(61,52.5) 12(56(21.4).0005 Ribavirin 71(343,20.7) 43(160,26.9) 28(15.3).0083 Household Deaths 21(220,9.6) 5(68,7.4) 16(152,10.5) NS Contact with Case 39(266,14.7) 10(107,9.4) 29(159,18.2).04 Mean DSOI 8.4(278,6.0) 8.5(122,7.0) 8.4(156,6.0) NS Malaria 41(117,35.0) 17(46,37.0) 24(71,33.8) NS Pregnant women significantly more likely to receive Ribavirin treatment and die from LF Non-pregnant women significantly more likely to report contact with case Small sample size, so difficult to detect significance for other factors

30 LF Results Subgroup Analysis Malaria and LF Co-Infection, Malaria testing results reported for 356 suspected LF cases 152 (42.7%) of suspected LF cases were Malaria positive 55/141 (39.0) of LF + patients were co-infected with malaria Those who were co-infected were of significantly younger age compare to those who were only LF positive 7.0 years vs years (p=.0005) The majority (41.8%) of co-infection cases occurred in children<5 years of age No significant difference in mortality detected between co-infected and LF+ alone; low sample size Lassa Fever p. falciparum Malaria

31 LF Dataset Study Methods: Retrospective analysis of data collected from surveillance of LF in Sierra Leone Assess epidemiologic risk factors associated with disease and mortality

32 Summary of LF Results - Interpretation LF prevalence significantly increased over the years and reported from new districts Could be due to improving detection, increasing transmission, or both LF mortality significantly decreased over the years Earlier detection and improving clinical management may result in better outcome Ribavirin treatment significantly associated with mortality The most severe cases usually receive Ribavirin treatment Ribavirin treatment probably a confounding factor, and an indicator of severe disease

33 Summary of LF Results Interpretation and Hypothesis Generation Summary of Results Young individuals, especially children < 5 years of age, were significantly more likely to be LF positive, to receive Ribavirin treatment, and to die from LF Early Detection and clinical care targeted for LF infected young children may be critical to improving LF outcome Pregnant women were significantly more likely to die for LF compared to nonpregnant counterparts High prevalence of malaria co-infection, especially in younger age Impact of co-infections on LF outcome needs to be further investigated Hypothesis Development Young children have increased risk of LASV infection and severe LF Pregnant women have increased risk of severe LF and death Malaria exacerbates LASV infection and results in more severe LF outcome

34 Conclusion If you have data, develop a step by step plan for analysis: Define objectives Assess data quality Develop collaborations Develop study methods Develop analysis plan Person, Place, Time Conduct analysis utilizing appropriate resources Interpret Results Present Results Abstract, Presentation, or Manuscript Develop Hypothesis for futures studies

35 Questions