THE GUIDE TO SPSS. David Le

Similar documents
The Dummy s Guide to Data Analysis Using SPSS

SPSS 14: quick guide

SPSS Guide Page 1 of 13

CHAPTER FIVE CROSSTABS PROCEDURE

SCENARIO: We are interested in studying the relationship between the amount of corruption in a country and the quality of their economy.

Statistical Research Consultants BD (SRCBD) Missing Value Management. Entering missing data in SPSS: web:

ANALYSING QUANTITATIVE DATA

Module - 01 Lecture - 03 Descriptive Statistics: Graphical Approaches

Chapter 8 Script. Welcome to Chapter 8, Are Your Curves Normal? Probability and Why It Counts.

Chapter 1 Data and Descriptive Statistics

DDBA8437: Central Tendency and Variability Video Podcast Transcript

Morningstar Direct SM Scorecard

Summary Statistics Using Frequency

Tutorial Segmentation and Classification

Dropping Lowest Score from a Total Column

Lab 2. Analysis of Observational Study/Calculations with SPSS

Multiple Responses Analysis using SPSS (Dichotomies Method) A Beginner s Guide

Excel 2016: Charts - Full Page

MAS187/AEF258. University of Newcastle upon Tyne

Using Excel s Analysis ToolPak Add-In

Chapter 9 Assignment (due Wednesday, August 9)

A Descriptive Analysis of Reported Health Issues in Rural Jamaica Verlin Joseph, Florida Agricultural & Mechanical University

3 Ways to Improve Your Targeted Marketing with Analytics

VIDEO 1: WHY ARE FORMS IMPORTANT?

Basic applied techniques

Who Are My Best Customers?

Lecture 10. Outline. 1-1 Introduction. 1-1 Introduction. 1-1 Introduction. Introduction to Statistics

CEE3710: Uncertainty Analysis in Engineering

Physics 141 Plotting on a Spreadsheet

Concepts for Using TC2000/TCnet PCFs

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations

SPSS Instructions Booklet 1 For use in Stat1013 and Stat2001 updated Dec Taras Gula,

Questionnaire. (3) (3) Bachelor s degree (3) Clerk (3) Third. (6) Other (specify) (6) Other (specify)

Blackboard 9 - Calculated Columns

Getting Started with OptQuest

Tutorial #3: Brand Pricing Experiment

Corrugated Compression Strength

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

Descriptive Statistics Tutorial

Introduction to Control Charts

Performance Pro Update 3.13 June 8, Merit Matrix Enhancements

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University

Introduction to IBM Cognos for Consumers. IBM Cognos

KING ABDULAZIZ UNIVERSITY FACULTY OF COMPUTING & INFORMATION TECHNOLOGY DEPARTMENT OF INFORMATION SYSTEM. Lab 1- Introduction

Telecommunications Churn Analysis Using Cox Regression

Course on Data Analysis and Interpretation P Presented by B. Unmar. Sponsored by GGSU PART 1

MYOB Support Note. Invoices Aging Incorrectly

Section 9: Presenting and describing quantitative data

directlife DL8710 Quick start guide Get ready to start The assessment period After the assess ment

HR Business Partner Guide

5 key steps to boosting Quality of Hire

Bristow. FAQ- Using Manager Self Service in the System

Marginal Costing Q.8

Opening SPSS 6/18/2013. Lesson: Quantitative Data Analysis part -I. The Four Windows: Data Editor. The Four Windows: Output Viewer

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

A Production Problem

Sage (UK) Limited Copyright Statement

Using SPSS for Linear Regression

Design Like a Pro. Boost Your Skills in HMI / SCADA Project Development. Part 3: Designing HMI / SCADA Projects That Deliver Results

Data Handling Collecting Data and Averages. Grades F to C

Statistics: Data Analysis and Presentation. Fr Clinic II

Application of Markov Chain in Forecasting: A Study of Customers Brand Loyalty for Mobile Phones

By: Aderatis Marketing

Management Manual. Global Standardreports

Getting Started Quicken 2010 for Windows

Instructions for Creating a ParScore Course

Econometric Analysis Dr. Sobel

Better connections: Attitudes towards product sampling. September 2015

DRM DISPATCHER USER MANUAL

Our objectives today will be to review Outcomes Report layout and how to use the metrics to gauge how your site is doing in relation to all of the

Data Visualization. Prof.Sushila Aghav-Palwe

THE HR GUIDE TO IDENTIFYING HIGH-POTENTIALS

Payroll. Year End Guide Sage One. Payroll

REPORTING ON HISTORICAL CHANGES IN YOUR DATA

Office Virtual Training: Quarter-End Billing

How to Get More Value from Your Survey Data

Percent Lead Time Visibility

Supervisor Overview for Staffing and Scheduling Log In and Home Screen

Tutorial #7: LC Segmentation with Ratings-based Conjoint Data

Excel #2: No magic numbers

Unit III: Summarization Measures:

CHAPTER 10: ANALYSIS AND REPORTING

Credit Card Marketing Classification Trees

Strategy. Strategy Training Guide

LECTURE 17: MULTIVARIABLE REGRESSIONS I

Getting Started with Power BI 8 Easy Steps

Statistical Analysis. Chapter 26

HIMSS ME-PI Community. Quick Tour. Sigma Score Calculation Worksheet INSTRUCTIONS

10 Illegal Interview Questions to avoid at all costs

e-learning Student Guide

The Benchmarking module

(800) Leader s Guide

Learning Objectives. Module 7: Data Analysis

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Bar graph or Histogram? (Both allow you to compare groups.)

Approaches, Methods and Applications in Europe. Guidelines on using SPSS

Tutorial Segmentation and Classification

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

Creating Simple Report from Excel

Transcription:

THE GUIDE TO SPSS David Le June 2013 1

Table of Contents Introduction... 3 How to Use this Guide... 3 Frequency Reports... 4 Key Definitions... 4 Example 1: Frequency report using a categorical variable (sexual orientation)... 5 Example 2: Frequency report of using a continuous variable (age)... 7 Recoding Variables... 11 Key Definitions... 11 Example 1: Recoding a continuous variable (age)... 12 Example 2: Recoding a categorical variable (sexual orientation)... 19 Crosstabs... 23 Key Definitions... 23 Example: Crosstab Under 30 and Gay Identified... 24 How to read a crosstab output... 30 Computing a variable... 31 Key Definitions... 31 Example: Computing Body Mass Index (BMI)... 32 2

Introduction Welcome to the Investigaytors Guide to SPSS! This is a guide that summarizes the majority of statistical tools that we used while analyzing the 2011 Sex Now survey. Please note that although this guide is meant for the program SPSS, one could also use the free and similar program called PSPP for many of the examples. Thanks and happy investigayting! How to Use this Guide Each section of this guide is focused around different statistical tools (for example, frequency reports or crosstabs). We begin each section with a Key Definitions page that introduces the key concepts related to that statistical tool. Following the Key Definitions, we provide an illustrated example of using that statistical tool in SPSS including additional explanations and definitions of the program's output where needed. There are also hyperlinks throughout this guide, like this one Introduction, that will help you quickly and easily refer back to previous concepts and statistical tools. And with that, let's get started with Frequency Reports (try it out!). 3

Frequency Reports Key Definitions Frequency is the number of times a score occurs. Example: In the data set, A, A, B, C, C, C, the frequency of the score of B is 1. Frequency report (also called a frequency distribution) shows the number of times all scores occurred. Example: In the data set, A, A, B, C, C, C, the frequency distribution would show that A had a frequency of 2, B had a frequency of 1 and C had a frequency of 3. A categorical variable (also called a discrete variable) is a variable that categorizes or classifies. In these variables, you can only belong into one category or group and not the other. Example: In the Sex Now survey, sexual orientation is a categorical variable because it places (or categorizes) men into one of four groups: gay, bisexual, straight or other. A continuous variable is a variable whose amounts continue between scores. Example: Age is a continuous variable because a person can be 23 years old, 23.453 years old, or 24 years old. A person's age continues between 23 and 24 years of age and is, therefore, a continuous variable. A measure of central tendency is a score that summarizes your data. The ones we focus on are the mean, the median and the mode. The mean (also called the average) is the sum of all scores divided by the number of scores. Example: In the data set, 1, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, the mean would be 4.2727. The median (also called the 50th percentile) is the score that 50% are at or below. Example: In the data set, 1, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, the median would be 5. The mode is the most frequently occurring score. Example: In the data set, 1, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, the mode would be 5. Quartiles are the scores that 25%, 50%, and 75% are at or below. Example: In the data set, 1, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, the first quartile is 3, the second is 5 and the third is 5. 4

Example 1: Frequency report using a categorical variable (sexual orientation) 1. Go to Analyze > Descriptive Statistics > Frequencies 2. In the new window that appears, Select the Sexual Orientation variable from the left box > Click the arrow to bring it to the box on the right. Frequency reports will be made for all variables listed in the box on the right (multiple variables can be included in this box to get multiple frequency reports). Click OK. 1. Select variables 3. Variable(s) for frequency reports 2. Click the arrow 5

3. An output will appear that will look similar to the image below. Note: Notice that the last three columns of the table include three different types of percentages. Percent gives the percentage and includes those who did not fill out that question. In this example, there are 9 people who did not fill out this question and are labeled as "missing". Valid Percent gives the percentage excluding missing scores. In this example, the values in the valid percent column are slightly different from the percent column because the missing scores are excluded. Cumulative Percent gives percentages that represent the percentage of the current data plus the percentage of data coming before the current data. In this example, Bi has a cumulative percent of 96.5 because adding the percentages of Bi and Gay (which comes before Bi in this example) produces 96.5. 6

Example 2: Frequency report of using a continuous variable (age) 1. Go to Analyze > Descriptive Statistics > Frequencies 2a. Select the Age variable > Click on the arrow (Don t click on OK just yet) 1. Select variables 3. Variable(s) for frequency reports 2. Click the arrow 7

2b. For measures of central tendency, click on the Statistics button 2b. In the new window that appears, select mean, median, mode, and quartiles > Click Continue > Click OK 8

2b. Your output will look similar to the image below. 2c. If you would like charts that describe your data visually, click on the Charts button. 9

2c. In the new window that appears, Select the chart you would like displayed (we ll try bar charts) > Click Continue > Click OK 2c. Your output will look similar to the image below. 10

Recoding Variables Key Definitions Code is a number that is used to represent something. Example: In our data set, we use numbers to represent different types of responses. A common one is to use 0 for No and 1 for Yes. Recoding is changing the scores of a variable in a systematic way. We usually use recoding to combine a range of scores into different groups. Example: For the age variable, we may want to recode everyone under 25 as 1 and everyone 25 and above as 0, which creates two groups that can then be easily analyzed. Recoding into a different variable means you are recoding the scores and creating a new variable with the recoded scores. Example: After we have recoded the age variable into a different variable, we will have two variables: the original age variable and a new age variable with recoded scores. Important note: When recoding variables, we usually use 0 to represent No and 1 to represent Yes. You always want the group you are most interested in looking at to have the highest value (we use 1 but it does not necessarily need to be 1). 11

Example 1: Recoding a continuous variable (age) 1. Go to Transform > Recode into Different Variables 2. In the new window that appears, Select the Age variable from the left box > Click the arrow to move it to the box on the right > Under Output Variable give your recoded variable a name and label > Click Old and New Values When naming variables, you cannot use spaces or special characters other than the underscore (_). Labels describe the variable and allow spaces. 1. Select Variables 3. Variable(s) for recoding 4. Name the variable here 5. Describe the variable here 2. Click 6. Click 12

3. In the new window that appears, you enter the original values of the variable in the left side under Old Value and enter in the new values on the top right side under New Value In this example, we will recode every from age 1-100 into 3 groups: Aged 1-29, Aged 30-44, and Aged 45-100. Under Old Value, enter 29 into Range, LOWEST through value (this will gives you the lowest score up until 29) > Under New Value, enter 1 > Click Add 2. Enter new value 3. Click 1. Enter old values 13

3. For age groups, 30-44 and 45-100, use Range instead of Range, LOWEST through value Under range enter in 30 and 44 > Under New Value, enter 2 > Click Add > Do the same for 45-100 2. Enter new value 1. Enter old values 3. Click 14

3. Lastly, we want to exclude scores that are above 100 as they are most likely errors. Under Range, value through HIGHEST enter 101 (this will include all score from 100 until the highest score in the variable) > Click System-missing (this will change all scores above 101 to missing scores) > Click Add 2. Click 3. Click 1. Enter old values 15

4. When finished, the screen will look similar to the screen below > Click Continue 5. Click Change to get your new recoded variable > Click OK 16

6. Most importantly, don t forget to change your Value Labels In variable view (which you can change at the bottom of your screen), Click into the Values section of the variable you want to change Value Labels for > Under Values enter in the value of the recoded variables > Under Label enter in what the number represents (in this example, 1 represent those aged 30 and under, so the label is under 30 ) > Click Add > Click OK 2. Click here to open Value Labels window 5. Click 3. Enter value 4. Enter description 6. Click 1. Switch to variable view 7. Your output will look similar to the image below. 17

8. If you run a frequency report on this new variable, it should look similar to the image below. Note the new categories and their labels. 18

Example 2: Recoding a categorical variable (sexual orientation) 1. Go to Transform > Recode into Different Variables 2. In the new window that appears, Select the Sexual Orientation variable from the left box > Click the arrow to move it to the box on the right > Under Output Variable give your recoded variable a name and label > Click Old and New Values When naming variables, you cannot use spaces or special characters other than the underscore (_). Labels describe the variable and allow spaces. 1. Select variables 3. Variable(s) for recoding 4. Name the variable here 5. Describe the variable here 2. Click the arrow 6. Click 19

3. In the new window that appears, you enter the original values of the variable in the left side under Old Value and enter in the new values on the top right side under New Value In this example, we want to recode sexual orientation to group those who identified themselves as gay as one group and to group all other responses (bisexual, straight, other) into another group. First you need to figure out what value gay is in your variable (you can do this by opening up the Value Labels window, please see section 6 of example 1 or Ctrl+Click Here). In this case gay is coded as 1. Under Old Value enter 1 for Value > Under New Value enter 1 for Value > Click Add To recode the remaining values, simply click All other values under Old Value > Under New Value enter 0 for Value > Click Add > Click Continue Your screen should look similar to the image below. 20

4. Click Change to get your new recoded variable > Click OK 4. As in the previous example, change your Value Labels for your newly created variable and click OK 21

5. If you run a frequency report on this new variable, it should look similar to the image below. Note the new categories and their labels. 22

Crosstabs Key Definitions Chi-square gives us the probability that the results of the crosstab occurred by chance alone. If the probability is below 5% or.05, then our analyses are significant. Mantel-Haenszel common odds ratio estimate refers to how much more or less likely one group experienced an outcome or engaged in a behavior compared to another group. A score of 1.00 indicates that there is no difference between the two groups. Scores further from 1.00 (e.g., 0.50, 3.00) indicate that one group is different from another. A common odds ratio estimate gives you a sense of the magnitude of the effect that one variable has on another. *Important note: An odds ratio can only be calculated for variables that fit on a 2 x 2 table. For example, you could get a common odds ratio for the variable "Under 30" (coded with 1 as under 30 and 0 as 30 and older) and "Gay Identified" (coded as 1 for those identifying themselves as gay and 0 for those not identifying themselves as gay). You could NOT get a common odds ratio using variables such as "Age" that have more than two values. Significance (also called statistical significance) means that the results are unlikely to have occurred by chance and, as a result, we assume that the variables we have examined are related in some way. However, relationships that are statistically significant are not always meaningful or practically significant (please see confounding variable). 95% Confidence Interval for Common Odds Ratio indicates that there is a 95% chance that the actual common odds ratio, which is different from the estimate, falls within a range of values. The 95% confidence interval is composed of a lower bound (the lowest value of your range) and an upper bound (the highest value of your range), which gives you your range. A confounding variable (also called a third variable) is an unaccounted for variable that may be partially or fully responsible for the relationship between other variables. When interpreting results, it is always important to keep in mind any other variables that may have been involved in producing the results. By thinking about and taking into account the effects of confounding variables, we can be more confident that our results are valid. Example: We may find that young gay men are more likely to suffer from STIs compared to older gay men. Rather than concluding that age affects the likelihood of getting STIs, it would be a good idea to look at confounding variables such as drug use or access to healthcare to see if they play a larger role in the observed relationship. 23

Example: Crosstab Under 30 and Gay Identified 1. Go to Analyze > Descriptive Statistics > Crosstabs 24

2. In the new window that appears, Select the variables you want to crosstab > Use the arrows to move the variables into either Row(s) or Column(s) Note: Variables that you think predict or affect other variables should go into your Column(s) box. Variables that you think are affected by or are the outcome of other variables should go into your Row(s) box. In this example, we thing that being below 30 affects whether or not one identifies as gay, so under 30 goes into our column(s) and gay identified goes into our row(s). 1. Select variables 2. Click the arrows 25

3. Click on Statistics > In the new window, click on Chi-Square and Cochran s and Mantel-Haenszel statistics Test common odds ratio equal: 1 > Click Continue 26

4. Click Cells > In the new window, click on Column under Percentages > Click Continue > Click OK to get your crosstab 27

5. Your output will look similar to the images below. The Pearson Chi-Square value under Asymp. Sig. (2-sided) should be 0.05 or lower to be significant. So anything between.049 -.000 is significant. If your analysis is significant, then it's time to look at your Mantel-Haenszel Common Odds Ratio Estimate. 28

If the group you are most interested has been coded with the highest value (in this example, we coded under 30 as 1 and 30 and over as 0), the odds ratio will indicate how much more likely this group (those under 30) experienced an outcome or engaged in a behaviour (identifying as gay). In this example, the value of 1.369 means that men under 30 are 1.369 times more likely to identify as gay as compared with men 30 and over. When analyzing your data, you will get a common odds ratio estimate and a 95% confidence interval for that estimate. The closer your "Lower Bound" or "Upper Bound" is to 1, the less confident you should be in the common odds ratio estimate because it means that there is a chance that your actual common odds ratio is close to 1, which means there isn't a difference. If the range of values between your "Lower Bound" and "Upper Bound" include 1.00, it is highly likely that your analysis is not significant. 29

How to read a crosstab output When reading a crosstab, it s easiest to read one row at a time. Starting with "gay identified - no", we can see that, percentage-wise, more men 30 and older (33.4% versus 26.8%) do not identify as gay. Next we'll look at "gay identified - yes". Here, we can see that, percentage-wise, more men under 30 (73.2% versus 66.6%) identify as gay. After reading this crosstab, we may be tempted to suggest that there is a relationship between age and identifying as gay. However, we still need to examine your Pearson Chi- Squared to see if the relationship is significant, the Mantel-Haenszel common odds ratio estimate to see the magnitude of the relationship, the 95% Confidence Interval for Common Odds Ratio to determine our confidence in our estimate and lastly, think about potential confounding variables. Note: Notice that in raw numbers there are more men 30 years of age and over in this crosstab. The number of participants in each group is important and will affect the results you get for your Pearson Chi-Squared, your common odds ratio estimate and your confidence interval. 30

Computing a variable Key Definitions Body Mass Index (BMI) is a ratio of weight-to-height. There are four categories of BMI in the Canadian weight classification system: underweight (BMI less than 18.5); normal weight (BMIs 18.5 to 24.9); overweight (BMIs 25 to 29.9), and obese (BMI 30 and over). To calculate BMI, you divide weight in kilograms by height in metres squared. Example: If someone was 90kg and 1.7 meters tall, they would have a BMI of 31.14 and would fall in the obese category. When something is squared it means that the number is multiplied by itself. It is often represented by the superscript 2 (for example, 4 squared would we represented by 4 2 ). Example: 2 2 = 4. 4 2 = 16. Please be cautious of brackets. -(2) 2 = -4, whereas (-2) 2 = 4. Computing a variable involves calculating a new variable using a combination of numbers, existing variable(s), and mathematical operations (for example, "+", "-", "/", "*"). Example: If we wanted to compute BMI, we would need to use a weight variable, a height variable (in meters), the divide operation and either a multiplication or squaring operation. 31

Example: Computing Body Mass Index (BMI) 1. Go to Transform > Compute Variable 32

2. Under Target Variable, enter the name of the computed variable (following the same naming rules for variables) > Find the variables you need in your calculation in the box on the left > Click on the arrow to move the variables into the "Numeric Expression" box > Use the numerical keypad and the variables you have selected to construct your equation > When finished, click OK. In this example, we are calculating BMI, which is weight in kilograms divided by height in meters squared (multiply height by itself). 1. Name 2. Select Variables 3. Click 4. Expression to be computed 5. Use keypad 6. Click 33

3. Run a frequency report on this new variable > Your output will look similar to the image below. 34

4. Recode the variable to represent the different BMI categories (Underweight = <18.5, Normal weight = 18.5 24.9, Overweight = 25 29.9, Obesity = BMI of 30 or greater) > Your output will look similar to the image below. 35