White Paper. Managing the Challenges of Modern Research Surveys. Because some knowledge is too important not to share.

Size: px
Start display at page:

Download "White Paper. Managing the Challenges of Modern Research Surveys. Because some knowledge is too important not to share."

Transcription

1 White Paper Managing the Challenges of Modern Research Surveys Because some knowledge is too important not to share.

2 Managing the Challenges of Modern Research Surveys One of the most pervasive tools in empirical research is the survey. Historically, surveys have been central to social science and psychology research. However, their role continues to expand due to their flexibility in content, format, and audience. Survey methodologies are rapidly evolving with the rise of computerized technologies. Online survey websites and smartphones in particular facilitate faster and cheaper collection of large amounts of rich survey data. But these new opportunities come at a cost: faster and cheaper surveying demands greater attention be paid to data integrity issues. Prometheus s Research Exchange Database (RexDB ) offers a highly customizable software platform exceptionally suited to manage survey data for research. This data pipeline can accept massive amounts of raw data, automatically assess its integrity, and transform it into usable forms. Altogether, RexDB offers a means to simultaneously leverage emerging survey tools and minimize the associated data quality distractions otherwise faced by researchers. Surveys in research: A brief overview Whether examining social science, psychological, clinical, economic, or other types of questions, modern surveys often have a general goal of quantifying a construct that cannot easily be measured through external observation alone. Frequently, survey studies take the form of questionnaires where individual items are assessed according to binary (Yes/No), categorical (e.g., occupation category), or Likert (graded) scales. An example of a wellestablished research survey is the expanded Positive and Negative Affect Schedule (PANAS- X). This 60-item questionnaire has participants self-report levels of guilt, hostility, shyness, joviality, and more, on a 5-point Likert scale. For example, to a question like To what extent have you felt cheerful during the past few weeks?, possible answers range from 1 = very slightly or not at all to 5 = extremely [1]. A contrasting outcomes-oriented research survey was evaluated in Price et al. (2014), which focused on the role of patient surveys in the meaningful assessment of quality of clinical care [2]. Regardless of the question of interest, researchers must address potential threats to the validity of survey studies. These major threats include: Confounding variables; for example, can we say a hospital policy has an effect on patientreported quality of care if the hospitals in question serve very different socioeconomic communities? Questionnaires not measuring intended constructs; for example, in a treatment adherence study, will asking diabetics how frequently they fill their prescriptions capture how well they actually follow their medication regimen? 2

3 Limitations to generalizability; for example, is survey data collected from undergraduates in a psychology class representative of the general adult population? Whether or not a study employs novel or well-established questionnaires, researchers must search their survey data for signs of the above threats. Emerging survey platforms While traditional survey formats like pen-andpaper, direct mailers, and phone calls remain common, new survey platforms like web-based data collection sites and electronic diaries are increasing the rate and depth of information capture. Although useful for research purposes, traditional and emerging research survey challenges must still be managed. Online surveying Online platforms facilitate the crowdsourcing of research surveys. For this, participants are recruited online to complete surveys virtually. Popular sites for doing so include SurveyMonkey and Amazon s Mechanical Turk (MTurk). One of the most popular survey crowdsourcing platforms, MTurk has a pool of over 500,000 users [3] from approximately 200 countries. This means certain types of survey data can be collected faster and cheaper than ever before. Burhmester et al. (2011), for example, recruited 500 participants in 33 hours to answer a two-question survey at the cost of a single penny [4]. The case for platforms like MTurk being a one stop shop for survey research is backed by analyses suggesting that the psychometric properties of online surveys match or are better than traditional pen-and-paper questionnaires [3, 4]. Test-retest reliabilities of MTurk surveys are high, and data missingness rates are approximately equal to their pen-andpaper counterparts. Furthermore, compared to traditional research survey samples (i.e., undergraduate student populations), online sample populations are more diverse: they span broader socioeconomic levels, countries of origin, and professions [4]. However, a number of potential issues remain. Crowdsourced survey data tend to be more biased towards answers that are perceived as more socially desirable; this may be a result of a misunderstanding by participants that payment is tied to correctness of survey answers [5]. Also, the remote nature of online crowdsourcing is restrictive, as with most remote surveys (e.g., phone surveys): they offer limited experimental manipulation capabilities, have little control over environmental influences during survey completion, and cannot supplement surveys with other data types (e.g., physiological measurements). Furthermore, while crowdsourced survey samples tend to be more diverse than traditional survey populations, they still do not represent the general population. These demographics are also in flux, with survey-takers increasingly being from non- Western countries [6]. Additionally, platforms like MTurk are supply-and-demand marketplaces. Given the large user base and therefore low cost of recruitment, the financial incentive is for participants to complete surveys quickly, regardless of correctness. 3

4 Electronic diaries Electronic diaries (EDs) generally constitute a computerized means to record research data. A common implementation medium involves mobile technologies (e.g., smartphone apps). These technologies can help capture additional dimensions of data namely temporal and environmental information. For example, an app that prompts a smoker to complete a questionnaire about their level of craving multiple times throughout the day facilitates a better study of craving dynamics in the context of situational influences (e.g., work versus social environment). Furthermore, the increasingly robust capabilities of smartphones and other mobile technologies means ED data can be augmented with GPS data and physiological measurements from a fitbit heart rate monitor, for example. This contrasts most other remote surveying forms. However, supplementing survey data with additional dimensions of information increases the complexity of subsequent data management and analysis tasks. As a simple example, questionnaires completed repeatedly throughout a day are likely recorded on a 12:00 AM to 11:59 PM time scale. However, analyzing the data according to a social day scale may be more appropriate, where a day is defined as the interval between a participant waking and their bedtime, regardless of the actual time of day or date. Furthermore, augmenting survey data with locational, physiological, or other types of data greatly increases the amount of variables that need to be simultaneously analyzed. Traditional statistical analysis and modeling tools may not be optimally suited to handle such complex datasets. By extension, off-the-shelf database tools can also have difficulty transforming raw survey data into a form suited for cutting-edge analytical programs based in systems science, machine learning, and engineering methods. Managing modern survey data Ultimately, emerging survey platforms are expanding the rate at which large amounts of rich, research-grade survey data can be collected. As a direct result, there is an increased responsibility to maintain a high level of integrity of the survey data. However, more flexible database tools can help researchers minimize the increasing data management burden. Unlike off-the-shelf software, RexDB by Prometheus Research offers a streamlined data pipeline that is flexible and and able to be supported by a team of analysts. This makes it exceptionally suited to manage complex survey data. Specifically, RexDB is heavily customizable and can be configured to accept diverse forms of multivariate data, like questionnaires completed through MTurk. Scalable and flexible, RexDB can securely manage multi-million point studies as well. Prometheus Research s analyst team continuously designs customized study-specific data quality tools for RexDB. These tools can flag data anomalies, automatically report validity issues, and more. For example, as surveys are being completed, a researcher can monitor a custom RexDB report that summarizes potentially skewed demographics of a sample population in an MTurk-based study. With tools like this, RexDB can automatically flag potential validity issues in a survey study in real-time, allowing researchers to take immediate action. 4

5 The customizability of RexDB also offers a means to minimize the effort required to transform raw multivariate survey data into usable forms. For example, Prometheus Research s analyst team can build calculations into RexDB that construct an overall positive mood score from the PANAS-X s individual cheerfulness and joviality questions. Calculations could similarly be configured to seamlessly transform 24-hour time stamps from electronic diary data into social day timeframes. RexDB s custom query and export tools also facilitate the merging of complex multivariate survey, locational, physiological, and other data into a file format compatible with advanced analytical software. This means RexDB can ease the transformation of complex survey data into a form ready for integration into cutting-edge systems science analytical methods. Altogether, RexDB offers a suite of customizable capabilities backed by a team of analysts that make survey data management an automated software task, rather than a growing burden placed solely on shoulders of researchers. References [1] D Watson & LA Clark (1994). The PANAS-X: Manual for the Positive and Negative Affect Schedule - Expanded Form. Iowa City: The University of Iowa. [2] RA Price, MN Elliott, AM Zaslavsky, RD Hays, WG Lehrman, L Rybowski, S Edgman-Levitan, & PD Cleary (2014). Examining the role of patient experience surveys in measuring health care quality. Medical Care Research & Review. Advanced Access. [3] Amazon Web Services, Inc. (2011). MTurk census: About how many workers were on Mechanical Turk in 2010? Amazon Web Services Forum, retrieved Feb 20, 2015: forums.aws.amazon.com [4] M Buhrmester, T Kwang, & SD Gosling (2011). Amazon s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6(1): 3-5. [5] TS Behrend, DJ Sharek, AW Meade, & EN Wiebe (2011). The viability of crowdsourcing for survey research. Behavior Research Methods 43: [6] J Ross, L Irani, MS Silberman, A Zaldivar, & B Tomlinson (2010). Who are the crowdworkers?: Shifting demographics in mechanical turk. Proceedings of the 28th International Conference on Human Factors in Computing Systems. Extended abstracts, pp

6 Additional Resources US CORPORATE OFFICE 55 Church Street 7th Floor New Haven, CT USA CONTACT US FOLLOW US Facebook: WEB & MORE For this and other white papers, academic presentations, and publications by Prometheus Research, please visit: