THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures

Size: px
Start display at page:

Download "THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures"

Transcription

1 THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures Katherine Baisden, SRI International, Menlo Park, California ABSTRACT Once upon a time in the days of simple random sampling, analyzing survey data was straightforward. However, due to issues of efficiency and economy, we rarely utilize simple random sampling methods in survey research today. Most survey data are based on a stratified, clustered or complex sample design. Such designs impact the accuracy of variance estimates (standard errors) and test statistics (chi-squares, t-tests). Until recently, SAS programmers had to rely on other statistical software packages, such as SUDAAN, WesVar, and STATA, to produce accurate variance estimates and test statistics from complex sample designs. With the release of SAS V9.1, SAS has incorporated survey procedures (e.g., PROC SURVEYMEANS, PROC SURVEYFREQ, PROC SURVEYREG, PROC SURVEYLOGISTIC) to address this issue. This paper will examine four basic procedures used in a vast majority of survey research (means, frequencies, regressions and logistic regressions). It will explore the differences among the four Proc Survey procedures in SAS and the corresponding SAS-callable SUDAAN v8.0 procedures. Using data examples, the paper will highlight the differences in syntax and output. It will discuss the available options, limitations and recent updates of each package.. INTRODUCTION To maximize the effort of survey data collection and to minimize the cost, researchers continue to develop increasingly complex sample designs. These designs include stratification, clustering, unequal probabilities of selection, and a multitude of the combinations of all these techniques. Simple random sample designs are a rarity in this day and age of survey research. These complex designs impact the accuracy of variance estimates and test statistics. The SAS programmer must expand beyond the traditional tools in his/her analytical handbag to deal with survey data today. Until recently, SAS programmers had to use additional software packages, such as SUDAAN, to produce correct variance estimates. Now, with SAS v9.1 (PROC SURVEYSELECT, PROC SURVEYMEANS, PROC SURVEYFREQ, PROC SURVEYREG AND PROC SURVEYLOGISTIC), some of the tools needed to deal with this type of survey data are available in SAS. This paper compares and contrasts SAS v9.1 and SAS-callable SUDAAN v8.0 focusing on syntax and output for four of the most common procedures used in analysis; the crosstabulation/frequency, means, regression and logistic regression procedure. This will be demonstrated using data from a study of teachers in the state of California. Schools were classified on three criteria: the percentage of emergency credentialed teachers in the school (EMERG: le 10%, 11-19%, 20%+), the size of the district (DISTSIZE: less than 5000, ,000, 10,000+) and the type of school (SCHL_LVL: elementary, middle and high school). Weights were developed for the data based on these stratification variables. Teachers were then selected from each of the strata. For analysis purposes, our statistician has classified this as a stratified sample with replacement. The data examples will give you a highlight of the syntax (not all options can be included) for each of the procedures in SAS-callable SUDAAN and SAS. The data being presented is for illustration purposes in terms of syntax and not substantive findings. GENERAL COMMENTS ABOUT SAS AND SAS-CALLABLE SUDAAN Both SAS and SUDAAN procedures are based on the Taylor linear approximation method to calculate the variance estimates. However, SUDAAN does offer the option of using balanced repeated replicates (BRR) and jackknife weights. SUDAAN does not have the capability to calculate BRR or jackknife weights, but can use them if they are provided on the data set. The SAS-callable version of SUDAAN is designed to use within the framework of SAS. Within any SAS program you call into play SUDAAN and it uses the SAS dataset format. Thus, much of the syntax of a procedure is very similar. However there are some important differences to note. This paper will focus on PROC SURVEYMEANS, PROC SURVEYFREQ, PROC SURVEYREG and PROC SURVEYLOGISTIC. Each procedure has many options and statistics available in each package, however due to space and limitations, this paper will highlight the most common. 1

2 DETERMINING THE IMPACT OF SAMPLE DESIGNS ON VARIANCE ESTIMATES There is a way to measure the impact of complex sample designs (CSD) on variance estimates. A common measure is called the Design Effect (DE). The DE is a ratio. It takes the variance from the CSD and compares it to the variance that would have occurred under the assumption of simple random sampling (SRS). If the DE is close to 1.0 then one can assume the variances would have come out the same whether it was a CSD or a SRS design. Most of the time, the DE for a CSD is greater than one. The larger the DE, the more correlated are your respondents within clusters, leading to underestimated variances if analyzed with packages without the capabilities to go beyond the assumption of SRS. DE=variance of CDS/variance of SRS Below is a table which gives a breakdown on the impact of point and variance estimates when you are using weighted data with a complex sample design with various types of SAS procedures and SAS options (unweighted or weighted) and SUDAAN procedures. Point Estimates (Percents, Means, Etc ) Variances (Standard Errors, Variances) Unweighted Regular SAS Procedures Incorrect Incorrect Weighted Regular SAS Procedures Correct Incorrect SAS Proc Survey Procedures Correct Correct SUDAAN Procedures Correct Correct The point estimates will be the same for weighted regular SAS procedures, SAS Proc Survey procedures and SUDAAN procedures within rounding. The variance estimates will be the same for SAS Survey procedures and SUDAAN procedures. Using unweighted regular SAS procedures will produce incorrect point estimates and variance estimates. Using weighted regular SAS procedures will create correct point estimates but incorrect variance estimates. There may be a slight difference between the two programs because at this time there are slight differences in computation and the handling of missing data. For example, different estimates and standard errors may be due to the different tolerances for matrix inversion or the number of iterations in regression procedures. Before beginning any analysis it must be determined on what kind of sampling design the survey is based. SUDAAN offers you a choice of the following: 2

3 SAS and SUDAAN offer the following procedures: SAS SUDAAN PURPOSE RECORDS Print records from ASCII, SAS, SPSS and SUDAAN SURVEYFREQ CROSSTAB Produces weighted oneway and multiway frequencies RATIO Produces ratio estimates and their standard errors for correlated data SURVEYMEANS DESCRIPT Produces means, medians and quantiles and their standard errors SURVEYREG REGRESS Fits linear models SURVEYLOGISTIC RLOGIST Fits logistic regression models MULTILOG Logistic model with categorical dependent variables SURVIVAL Fits the discrete proportional hazards model SURVEYSELECT Helps you select a sample The majority of my knowledge about these procedures comes from self-discovery and hands -on experience. Although both programs use very similar syntax, SUDAAN requires more detail. For example SUDAAN version 8 requests that, for every variable in the syntax, you specify the number of levels in each variable (using the LEVELS statem ent). However, with the release of SUDAAN version 9, Research Triangle Institute (RTI ) will introduce the CLASS statement which will then eliminate the need to specify the number of levels for each categorical variable. The new CLASS statement can be us ed as a replacement for the SUBGROUP/LEVELS combination in all SUDAAN procedures. It also relaxes the restriction that levels of the variables must be consecutive integer values, 1, m. A CLASS variable must be numeric, but can take on any values, including missing values. In addition, SUDAAN will not accept 0,1 coding schemes when dealing with categorical values; all values for categorical variables must start with a 1, with the exception of the PROC RLOGIST. You do have the option to recode your variables on the fly within a SUDAAN procedure but it is another step that must be taken for successful completion of a procedure. Likewise, there is no default printing of output for SUDAAN. You must specify exactly what statistic you want printed and in what format. It is not as simple as requesting statistics on an OPTIONS statement within a SAS procedure. Unlike SAS, SUDAAN also does not provide the variable names in the output unless they are specified in the label of the variable. At the present time, you cannot run SAS Callable SUDAAN v8.0 in conjunction with SAS V9.0 or SAS v9.1, but you must use SAS v8.2. SUDAAN v9.0 will be compatible with SAS v9.1. SAS assumes that first-stage sampling is with replacement although reality bears witness that the vast majority of the time it is not. This can result in a slight overestimate of the variance, but this is very small. PROC SURVEYMEANS IN SAS PROC SURVEYMEANS; VAR T4B; STRATA EMERG DISTSIZE SCHL_LVL; WEIGHT WGTD; DOMAIN T40; TITLE MEAN OF 4B IN SAS ; RUN; This analysis is requesting the overall mean of T4B (number of classes taught) and the mean for number of classes taught for each gender (T40). The stratification variables are EMERG, DISTSIZE, and SCHL_LVL as indicated on the STRATA statement. The DOMAIN statement indicates a breakdown of T4B by gender. Without specifying any statistic keywords, SAS provides the NOBS, MEAN, STDERR and CLM statistics by default. A LIST option will provide basic information about (N, number of missing, strata variable levels) respondents in each stratum (presented in Exhibit 3A SAS SURVEYREG example). Calculated design effects are not available in this procedure. If you would like to calculate design effects you will need to run the same analysis as a normal weighted SAS MEANS procedure and then run it again as a PROC SURVEYMEANS. You would then take the results from the two procedures and then apply the DEFF formula (DEFF=CDS Variance/SRS Variance). Output in Exhibit 1A. 3

4 As in the PROC MEANS, when computing statistics for an analysis variable, SAS omits observations with missing values for that variable. In addition, it is important to note that in PROC SURVEYMEANS, if an observation has a missing value or non-positive value for the weight it will be excluded from the analysis. Observations are also excluded if there are missing values on the STRATA or CLUSTER statement, unless the MISSING option is used. When the MISSING option is used the missing values are treated as a valid category. As an experienced SAS programmer, you may want to sort the data set by T40 (gender) and use a BY statement. That method will produce a NOTE from SAS requesting a DOMAIN statement. PROC DESCRIPT IN SUDAAN (Overall Mean) PROC DESCRIPT DATA=ONE FILETYPE=SAS DESIGN=STRWR; NEST EMERG DISTSIZE SCHL_LVL; WEIGHT WGTD; VAR T4B; SETENV LABWIDTH=28 COLSPCE=1 COLWIDTH=10 DECWIDTH=4; PRINT NSUM= SAMPLE SIZE WSUM= POPULATION SIZE MEAN SEMEAN= S.E. DEFFMEAN= DESIGN EFFECT / STYLE=NCHS NSUMFMT=F6.O WSUMFMT=F10.0 DEFFMEANFMT=F6.2 SEMEANFMT=F7.4; RTITLE MEAN OF T4B IN SUDAAN ; RUN; (Mean by Gender) PROC DESCRIPT DATA=ONE FILETYPE=SAS DESIGN=STRWR; NEST EMERG DISTSIZE SCHL_LVL; WEIGHT WGTD; VAR T4B; SUBGROUP T40; LEVELS 2; Setenv labwidth=28 colspce=1 colwidth=10 decwidth=4; Print nsum= Sample Size Wsum= Population size Mean semean = S.E. Deffmean= Design Effect / style=nchs nsumfmt=f6.0 wsumfmt=f10.0 Deffmeanfmt=F6.2 Semeanfmt=F10.4; Rtitle Mean of T4B by T40 IN SUDAAN ; Run; This analysis is the same request as presented in the PROC SURVEYMEANS in the preceding section. The STWR design was used to correspond with the SAS assumptions. In SUDAAN the name of the procedure is DESCRIPT. You must specify the filetype and the design. The NEST statement is similar to the STRATA statement in SAS. The SUBGROUP statement corresponds to the DOMAIN statement in SAS; however, you must include a LEVELS statement indicating the number of levels for the variable. Design effects can be requested in SUDAAN, this is not true for the PROC SURVEYMEANS procedure. Output in Exhibit 1B. The SETENV statement sets the output environment parameters, similar to the options statement in SAS. The PRINT statement is the place where you have to indicate each statistic and a label for those statistics that you want in the output. The STYLE option is a particular way SUDAAN prints the output. NCHS style is printed according to the standards of the National Center for Health Statistics. Before you are done you must give a format for each statistic. If you have not given a large enough format, an ** will appear in the output. You must then go back and change the format for that specific variable. The RTITLE statement is equivalent to the TITLE statement in SAS. Unlike SAS, you have to execute the PROC DESCRIPT twice in order to get an overall mean of T4B and the separate means for T4B by gender (T40). SUDAAN handles missing values very much like SAS. Observations that have missing values for weights and required sample design variables will be excluded from the analysis. With the new CLASS statement you will have the option of including missing values in your analysis. 4

5 In both programs the point estimates and the standard errors are the same (within reasonable rounding error). SUDAAN does not provide an option to obtain standard deviations, but only calculates standard errors. SAS provides the flexibility of obtaining standard deviations. PROC SURVEYFREQ IN SAS PROC SURVEYFREQ; STRATA EMERG DISTSIZE SCHL_LVL; TABLES T40*T6 / CHISQ WCHISQ ROW COL CHISQ1; WEIGHT WGTD; TITLE CROSSTAB OF T40 BY T6 IN SAS ; RUN; This syntax is very similar to PROC FREQ in SAS. It is a crosstabulation of gender (T40) and T6 (Did respondent leave a teacher preparation program for employment?). There is an addition of a STRATA statement indicating the stratification variables. When requesting a chi-square analysis with this procedure you will get a Rao-Scott chisquare test (CHISQ option), which applies a design effect correction to the Pearson chi-square computing the design effect correction from proportion estimates instead of null proportions. The CHISQ1 option will give you a modified Rao-Scott chi-square test. The modified Rao-Scott chi-square bases the design effect correction on null hypothesis proportions. The WCHISQ is an option in the Tables statement that will give you a Wald chi-square. The default options are frequencies, weighted frequencies, standard error of the weighted frequencies, percentages and standard error of the percentages. You must specifically indicate that you want row and column percentages and their standard errors ; they are not given by default. Theoretically, the point estimates will not significantly differ from the SUDAAN output. If there are differences, it can usually be accounted for by rounding. Output in Exhibit 2A. PROC SURVEYFREQ excludes an observation from a crosstabulation table if that observation has a missing value for any of the table, weight or required sample design variables unless you specify the MISSING option. When the procedure excludes observations with missing values from a table, it displays the total frequency of missing observations below that table. With the MISSING option, the procedure treats the missing values as a valid category and includes them in calculations of percentages and other statistics. Unlike PROC FREQ you cannot specify a MISSPRINT option which will give the number of missing in each cell and still not include the missing values in the calculations of the percentages and other statistics. PROC CROSSTAB IN SUDAAN PROC CROSSTAB DATA=ONE FILETYPE=SAS DESIGN=STRWR; NEST EMERG DISTSIZE SCHL_LVL; WEIGHT WGTD; SUBGROUP T40 T6; LEVELS 2 2; TABLES T40*T6; SETENV COLWIDTH=9 DECWIDTH=2 COLSPCE=2; PRINT NSUM WSUM COLPER ROWPER TOTPER /WSUMFMT=F9.0 NSUMFMT=F9.0 CMHTEST=ALL TESTS=ALL CMHFMT=F8.2 CMHDFFMT=F8.0 CMHPVALFMT=F8.4 CHISQFMT=F11.2; RTITLE CROSSTAB OF T40 BY T6 IN SUDAAN ; RUN; The PROC CROSSTAB in SUDAAN follows the logic of the syntax presented in the PROC DESCRIPT. You must supply a DESIGN statement and a NEST statement. Besides specifying the crosstabulation in the TABLES statement, you must have a SUBGROUP statement and a corresponding LEVELS statement. SUDAAN will produce several types of chi-square tests including the Cochran-Mantel-Haenszel and the Pearson chi-square. Output in Exhibit 2B. The crosstabulation output prints out the totals on the left, reversed from the traditional SAS output. One of the disadvantages of SUDAAN output is that it produces a single page for every table and every test statistic you request. It is not environmentally friendly. 5

6 PROC SURVEYREG IN SAS PROC SURVEYREG; STRATA EMERG DISTSIZE SCHL_LVL / LIST; CLASS T40 ; MODEL T36=T40 T41 / ANOVA DEFF ADJRSQ SOLUTION ; WEIGHT WGTD; TITLE SURVEYREG OF T36(# YRS TEACHING)=T40 (GENDER) ; TITLE REGRESSION T36 (YRS TEACHING)=T41 (AGE)+ T40(GENDER) W/INTERCEPT IN SAS ; RUN; This procedure performs linear regression taking into account the survey design variables. The dependent variable must be continuous (or assume so) and the independent variables can be either continuous or categorical. Any categorical variable on the model statement must appear on the CLASS statement. In addition, the CLASS statement must precede the model statement. PROC SURVEYREG forms dummy indicator variables (coded 1 or 0) for categorical independent variables with the highest coded value of variable defined as reference group. By specifying ANOVA, you will get a traditional anova table. You have the ability to specify DEFF to get the design effects, which is important in understanding how the stratification or clustering sampling frame affected your data. SAS will produce an estimated regression coefficient table by default if there is not CLASS statement. If you have a CLASS statement in your code, then to produce this estimated regression coefficient table you must provide a SOLUTION option on the MODEL statement. To match the parameters of the SUDAAN procedure I ran the analysis to include the intercept. Output in Exhibit 3A. If an observation has a missing value or a non-positive value for the WEIGHT variable, then PROC SURVEYREG excludes that observation from the analysis. An observation is also excluded if it has a missing value for any STRATA variable, CLUSTER variable, dependent variable, or any variable used in the independent effects. The analysis includes all observations in the data set that have non-missing values for all these design and analysis variables. There is not an option of MISSING that is in the PROC SURVEYMEANS, PROC SURVEYFREQ and PROC SURVEYLOGISTIC, Regression and logistic regression procedures are exercises in developing a model to best explain or predict your dependent variable. It is an intricate and iterative process which can be very time-consuming. The process of modeling a phenomenon requires an in-depth knowledge of the subject matter. The choice of syntax and options will be determined by research questions and substantive knowledge. The examples used in this paper do not reflect the complexity of this type of analysis. PROC REGRESS IN SUDAAN PROC REGRESS DATA=ONEA FILETYPE=SAS DESIGN=STRWR; NEST EMERG DISTSIZE SCHL_LVL; WEIGHT WGTD; SUBGROUP T40 ; LEVELS 2 ; TEST SATADJCHI ADJWALDF; MODEL T36=T40 T41 ; SETENV COLWIDTH=8 DECWIDTH=3 LABWIDTH=44; PRINT BETA="BETA" SEBETA="STD ERR" DEFT="DEFF" T_BETA="T:BETA=0" P_BETA="P-VALUE" DF="DF" SATADJDF="ADJ DF" SATADCHI="CHI-SQ(SAT)" ADJWALDF="F-TEST(WALD)" SATADCHP="P-VALUE(SAT)" ADJWALDP="P-VALUE(WALD-F)" / RISK =ALL 6

7 DFFMT=F6.2 SATADJDFFMT=F6.3 SATADCHIFMT=F7.2 ADJWALDFFMT=F7.2 BETAFMT=F8.4 SEBETAFMT=F8.4 P_BETAFMT=F7.4 DEFTFMT=F6.2 SATADCHPFMT=F7.4 ADJWALDPFMT=F7.4; RTITLE "SUDAAN REGRESSION PROCEDURES T36=T40 T41"; RUN; The PROC REGRESS in SUDAAN follows the logic of the syntax presented in the PROC DESCRIPT. You must supply a DESIGN statement and a NEST statement. Instead of using a CLASS statement (at this time) you must use a subgroup statement and then indicate the number of levels for the categorical variable. As like all the other procedures in SUDAAN you must specify every statistic and their corresponding formats you want in the PRINT statement. You are not able to get a traditional anova table as you are accustomed to in SAS output. Output in Exhibit 3B. Comparable pieces of information have been highlighted in Exhibit 3A and Exhibit 3B. The R-square in SAS PROC SURVEYREG output (Exhibit 3A) is an adjusted multiple R-square and is found in the SUDAAN output (Exhibit 3B) labeled as Multiple R-Square for the dependent variable. The information in the tests of Model Effects table found in the SAS PROC SURVEYREG output can be found in the contrast table in the SUDAAN output. The beta coefficient and their standard errors are found in the estimated regression coefficient table in the SAS PROC SURVEYREG output and in the independent variables and effects table in the SUDAAN output. I have chosen to run the PROC SURVEYREG and PROC SURVEYLOGISTIC with an intercept. The NOINT (no intercept) option in both of these procedures uses the uncorrected sum of squares as opposed to an intercept option which uses a corrected sum of squares. Currently in the SUDAAN modeling procedures a record with missing values for any of the model variables is excluded from the analysis. With the new CLASS statement, records with missing values can now be included in the analysis provided the variable names are listed in the CLASS statement and INCLUDE=MISSING is used. The default is NOMISSING. PROC SURVEYLOGISTIC IN SAS PROC SURVEYLOGISTIC; STRATUM EMERG DISTSIZE SCHL_LVL; MODEL T42AB(EVENT= 1 )=T41 T40 / STB RSQ ; WEIGHT WGTD; TITLE SURVEYLOGISTIC OF T42AB T41 T40 W/INTERCEPT ; RUN; This procedure performs a logistic regression taking into account the survey design variables. Logistic regression analysis is often used to investigate the relationship between these discrete responses and a set of explanatory variables. The dependent variable can be binary (0,1) or ordinal (small, medium, large) in nature and the independent variables can be either continuous or categorical. A vast majority of variables in survey research are limited to binary or ordinal responses. When you have a binary dependent variable, you have the capability to determine which category you would like to be the event category in the model statement. The option RSQ on the MODEL statement will give you a generalized R square for the fitted model. Output in Exhibit 4A. In PROC SURVEYLOGISTIC, any observation with missing values for the response, offset or explanatory variables or any required sample design variable is excluded from the analysis. The estimated linear predictor, its standard error estimate, the fitted probabilities, and their confidence limits are not computed for any observation with missing offset or explanatory variable values. The MISSING option can be used in the same manner as with PROC SURVEYFREQ and PROC SURVEYMEANS. PROC RLOGIST IN SUDAAN PROC RLOGIST DATA=ONEA FILETYPE=SAS DESIGN=STRWR; NEST EMERG DISTSIZE SCHL_LVL; WEIGHT WGTD; 7

8 SUBGROUP T40 ; LEVELS 2; MODEL T42AB=T41 T40; TEST SATADJCHI WALDCHI; SETENV COLWIDTH=8 DECWIDTH=3 LABWIDTH=44; OUTPUT EXPECTED="EXPECTED" RESIDUAL="RESIDUAL" OBSERVED="OBSERVED" WEIGHT="WEIGHT" / FILENAME=FILETEST EXPECTEDFMT=F8.4 RESIDUALFMT=F8.4 OBSERVEDFMT=F8.4 WEIGHTFMT=F8.4; PRINT BETA="BETA" SEBETA="S.E." DEFT="DESIGN EFFECT" T_BETA="T:BETA=0" P_BETA="P-VALUE" OR LOWOR UPOR DF="DF" SATADJDF="ADJ DF" WALDCHI=" CHI-SQ (WALD)" SATADCHI=" CHI-SQ (SAT.)" WALDCHP=" P-VALUE (WALD)" SATADCHP=" P-VALUE (SAT.)" /T_BETAFMT=F8.2 DEFTFMT=F6.2 SEBETAFMT=F8.6 ORFMT=F5.2 LOWORFMT=F6.2 UPORFMT=F6.2 DFFMT=F7.0 SATADJDFFMT=F8.2 WALDCHIFMT=F8.2 SATADCHIFMT=F8.2 STYLE=NCHS; RTITLE "MODEL T42AB(MA Y/N)=T41(AGE) T40(GENDER) IN SUDAAN"; RUN; There are several procedures names in SAS-Callable SUDAAN that are very similar to SAS syntax. In order to not create confusion for SAS, SUDAAN has used a naming convention to start such procedures with the letter R. The syntax of this procedure is comparable to the PROC REGRESS. In all other procedures in SUDAAN, the binary coding of 0 and 1 is not accepted, however in the PROC RLOGIST the dependent variable can be coded as a 0 or 1. Output in Exhibit 4B. Information about the response (dependent) variable is found in the response profile in the SAS PROC SURVEYLOGISTIC (Exhibit 4A) output and the same information is found in the Sample and Population Counts for Response Variable table in the SUDAAN PROC RLOGIST output (Exhibit 4B). Beta coefficients and standard errors are found in the Analysis of Maximum Likelihood Estimates table in Exhibit 4A and Independent Variables and Effects table in Exhibit 4B. Each package produces odds ratios. LIMITATIONS OF EACH PACKAGE One of the major limitations at this time in SAS is the package does not offer the option of using balanced repeated replicates (BRR) or jackknife weights. Why is this so important? It is very common as a programmer/analyst to inherit data sets or secondary dataset analysis. In many cases we do not have access to the actual formation of the sampling design. It is essential, especially in SUDAAN, to be able to designate the sample design based on such information. With the use of balanced repeated replicates or jackknife weights, the syntax does not require any further information other than the supplied weights. This makes it much more usable for the analyst. Although SUDAAN offers more options in terms of survey sampling designs and procedures, it is a cumbersome program to code. SUDAAN documentation is not the easiest to comprehend, especially if you are a novice. The upcoming release of SUDAAN version 9 and the inclusion of a CLASS statement will resolve one of the major difficulties of working with SUDAAN and the ability to include missing if it is appropriate. From an economic viewpoint, using SUDAAN is an additional expense in terms of licensing and training. SAS gives you the ease of coding and more print control of output, however at this time it is very limited in what it offers an analyst in terms of design and procedures. 8

9 CONCLUSION Simple random sampling is like a rare gem in this day of social science research. We are dealing with increasingly more complex sample designs. These designs require the sophistication of SAS survey procedures or SUDAAN procedures. One must balance variety of choice with ease of coding. At this time SUDAAN is the most desirable package to use because of the variety of choice it offers in sample designs and the number of procedures available to analyze the data. However, it is a program that is cumbersome to program, creating a more labor-intensive task than its counterpoint in SAS. The inclusion of the CLASS statement in version 9 of SUDAAN will resolve some of these issues. You will still have to deal with the specification of the print options in SUDAAN. My conclusion is SAS is moving in the right direction and I hope to see it incorporate the power of SUDAAN in terms of choice and number of procedures in the future. The single most important SAS survey procedure to be included in SAS version 9.1 is PROC SURVEYFREQ. Contingency analys is is a mainstay of any data analysis. With the incorporation of these survey-based procedures in SAS, we look forward to greater ease in coding when dealing with complex sample designs. In this electronic age, we are faced with ever growing mountains of data and no single software package can meet our needs to manage and analyze the data. It is very common to switch back and forth between EXCEL or ACCESS and SAS. On many occasions we are given data in a spreadsheet format, asked to analyze the data in SAS and then requested to give back the results in a spreadsheet format. As programmers and analysts we see ancillary programs like EXCEL as part of our tool bag. We should think of using SAS v9.1 and SUDAAN in the same way. In the interim, if one has a design that fits the parameters of SAS design and statistical options offered now, welcome to automatic transmission. If your sample cannot meet the parameters of what SAS offers now, then you must contend with the manual transmission mode of SUDAAN. In the interim, we will have to switch back and forth between the two packages depending on our individual needs. ACKNOWLEDGEMENTS I wish to thank the Center for Education Policy at SRI International for their support and the opportunity to learn and expand into complex sample design programming. Special thanks go to Andrea Lash for her mentoring and support. I also want to thank Hal Javitz for his technical assistance. Thanks to my fellow programmers, Peter Godard and Kathryn Valdes for their comments. I owe a debt of gratitude to Betsy Davies-Mercier for editorial assistance. Overdue thanks to my husband, Rob Robbins for enduring late nights and lonely meals. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. REFERENCES An, Anthony and Donna Watts (1998), New SAS Procedures for Analysis of Sample Survey Data Proceedings of the Twenty-Third Annual SAS Users Group International Conference SAS Institute. Cassell, David L. and AnnMaria Rousey. Complex Sampling Designs Meet the Flaming Turkey of Glory Proceedings of the Twenty-Eighth Annual SAS Users Group International Conference. March Design Pathways and Spirit Lake Consulting, Seattle, WA. Research Triangle Institute (2001). SUDDAN User s Manual, Release 8.0, Research Triangle Park, NC: Research Triangle Institute. SAS Institute, Inc., SAS/STAT User s Guide, Version 8, Volumes 1,2,3, Cary, NC: SAS Institute Inc., PP. SAS online help that comes with version 9.1 CONTACT INFORMATION Katherine Baisden SRI International 333 Ravenswood Ave, BS381 Menlo Park, CA Phone: (650) Fax: (650) katherine.baisden@sri.com 9

10 EXHIBIT 1A SAS SURVEYMEANS Procedure Number of Strata 27 Number of Observations 530 Number of Observations Used 510 Number of Obs with Nonpositive Weights 20 Sum of Weights Statistics Std Error Lower 95% Upper 95% Variable N Mean of Mean CL for Mean CL for Mean T4b Domain Analysis: T40 Std Error Lower 95% Upper 95% T40 Variable N Mean of Mean CL for Mean CL for Mean (1) FEMALE T4b (2) MALE T4b EXHIBIT 1B SUDAAN PROC DESCRIPT (OVERALL MEAN) Number of observations read : 510 Weighted count : Denominator degrees of freedom : 483 Variance Estimation Method: Taylor Series (STRWR) Mean of T4b (# Classes Taught) by: Variable, One Variable Sample Population Design Size size Mean S.E. effect T4b (MEAN BY GENDER) Number of observations read : 510 Weighted count : Number of observations skipped: 20 (WEIGHT variable nonpositive) Denominator degrees of freedom :

11 Variance Estimation Method: Taylor Series (STRWR) Mean of T4b by T40 by: Variable, T40:GENDER Variable T40:GENDER Sample Population Design Size size Mean S.E. effect T4b:TOTAL NUMBER OF CLASSES TAUGHT Total (1) FEMALE (2) MALE

12 EXHIBIT 2A SAS SURVEYFREQ Procedure Data Summary Number of Strata 27 Number of Observations 530 Number of Observations Used 510 Number of Obs with Nonpositive Weights 20 Sum of Weights T40 T6 Freq Wted Freq Std Wted % SE % Row % SE Row Col % SE Col 1 (Females) 1 (Yes) (No) Total (Males) 1 (Yes) (No) Total Total 1 (Yes) (No) Total Frequency Missing = 39 Rao-Scott Chi-Square Test Pearson Chi-Square Design Correction Rao-Scott Chi-Square DF 1 Pr > ChiSq <.0001 F Value Num DF 1 Den DF 444 Pr > F <.0001 Sample Size = 471 Wald Chi-Square Test Chi-Square F Value Num DF 1 Den DF 444 Pr > F Sample Size = 471 Rao-Scott Modified Chi-Square Test Pearson Chi-Square Design Correction Rao-Scott Chi-Square DF 1 Pr > ChiSq <.0001 F Value Num DF 1 Den DF 444 Pr > F <.0001 Sample Size =

13 EXHIBIT 2B SUDAAN PROC CROSSTAB Number of observations read : 510 Weighted count : Number of observations skipped : 20 (WEIGHT variable nonpositive) Denominator degrees of freedom : 483 Variance Estimation Method: Taylor Series (STRWR) Crosstab of T40 (GENDER) by T6 (PREP PGM) by: T40:GENDER, T6:LEAVE MA OR PREP PGM FOR FT PAID POSITION T40:GENDER T6:LEAVE MA OR PREP PGM FOR FT PAID POSITION Total 1 (YES) 2 (NO) Total Sample Size Weighted Size Col Percent Row Percent Tot Percent SE Row Percent SE Col Percent SE Tot Percent (1) FEMALE Sample Size Weighted Size Col Percent Row Percent Tot Percent SE Row Percent SE Col Percent SE Tot Percent (2) MALE Sample Size Weighted Size Col Percent Row Percent Tot Percent SE Row Percent SE Col Percent SE Tot Percent

14 Variance Estimation Method: Taylor Series (STRWR) Chi Square Test of Independence for T40:GENDER and T6:LEAVE MA OR PREP PGM FOR FT PAID POSITION Crosstab of T40 (GENDER) by T6 (PREP PGM) ChiSq 0.41 P-value ChiSq 0.52 Degrees of Freedom ChiSq 1.00 LLChiSq 0.41 P-value LLChiSq 0.52 Degrees of Freedom LLChiSq Variance Estimation Method: Taylor Series (STRWR) Cochran-Mantel-Haenszel Test of Association for T40:GENDER and T6:LEAVE MA OR PREP PGM FOR FT PAID POSITION Crosstab of T40 (GENDER) by T6 (PREP PGM) Cochran-Mantel- Haenszel Chi- Square 0.41 Degrees of Freedom CMH 1 P-value CMH Test

15 EXHIBIT 3A SAS SURVEYREG Procedure Regression Analysis for Dependent Variable T36 Fit Statistics Data Summary R-square Number of Observations 500 Adjusted R-square Sum of Weights Root MSE Weighted Mean of T Denominator DF 473 Weighted Sum of T Design Summary Number of Strata 27 Stratum Information Stratum Information EMERG: DISTSIZE: SCHL_LVL: Stratum EMERGENCY DISTRICT SCHOOL Index STATUS SIZE LEVEL N Obs

16 Class Level Information Class Variable Label Levels Values T40 T40:GENDER ANOVA for Dependent Variable T36 Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Tests of Model Effects Effect Num DF F Value Pr > F Model <.0001 Intercept <.0001 T T <.0001 NOTE: The denominator degrees of freedom for the F tests is 473. Estimated Regression Coefficients Standard Design Parameter Estimate Error t Value Pr > t Effect Intercept < T T T < NOTE: The denominator degrees of freedom for the t tests is 473. Matrix X'WX is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique. EXHIBIT 3B S U D A A N PROC REGRESS Number of observations read : 510 Weighted count: Number of observations skipped : 20 (WEIGHT variable nonpositive) Observations used in the analysis : 500 Weighted count: Denominator degrees of freedom : 483 Maximum number of estimable parameters for the model is 3 File ONEA contains 510 Clusters 500 clusters were used to fit the model Maximum cluster size is 1 records Minimum cluster size is 1 records 16

17 Weighted mean response is Multiple R-Square for the dependent variable T36: Variance Estimation Method: Taylor Series (STRWR) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Identity Response variable T36: T36:NUMBER OF YEARS FULLTIME TEACHER Sudaan Regression Procedures T36=T40 T41 T Independent Variables and Effects Beta STD Err DEFF T:Beta=0 P-value Intercept T40:GENDER (1) FEMALE (2) MALE T41:YEAR OF BIRTH Variance Estimation Method: Taylor Series (STRWR) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Identity Response variable T36: T36:NUMBER OF YEARS FULLTIME TEACHER Sudaan Regression Procedures T36=T40 T41 T Contrast F- P- P- Chi- test(w- Value(- Value(- DF Adj DF sq(sat) ALD) SAT) Wald-F) OVERALL MODEL MODEL MINUS INTERCEPT INTERCEPT T T

18 Exhibit 4A SAS Surveylogistic Procedure Model Information Data Set WORK.ONE Response Variable T42Ab T42Ab:MASTER DEGREE Y/N Number of Response Levels 2 Stratum Variables EMERG EMERG: EMERGENCY STATUS DISTSIZE DISTSIZE: DISTRICT SIZE SCHL_LVL SCHL_LVL: SCHOOL LEVEL Number of Strata 27 Weight Variable WGTD WGTD: WEIGHT FOR RANDOM/TARGET ALL TEACHERS Model Binary Logit Optimization Technique Fisher's Scoring Variance Adjustment Degrees of Freedom (DF) Number of Observations Read 530 Number of Observations Used 289 Sum of Weights Read Sum of Weights Used Response Profile Ordered Total Total Value T42Ab Frequency Weight Probability modeled is T42Ab=1. NOTE: 231 observations were deleted due to missing values for the response or explanatory variables. NOTE: 10 observations having nonpositive frequencies or weights were excluded since they do not contribute to the analysis. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L R-Square Max-rescaled R-Square

19 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio <.0001 Score <.0001 Wald Analysis of Maximum Likelihood Estimates Standard Wald Standardized Parameter DF Estimate Error Chi-Square Pr > ChiSq Estimate Intercept T T Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits T T Association of Predicted Probabilities and Observed Responses Percent Concordant 49.0 Somers' D Percent Discordant 30.9 Gamma Percent Tied 20.1 Tau-a Pairs c Number of zero responses : 128 Number of non-zero responses : 161 EXHIBIT 4B SUDAAN PROC RLOGIST Independence parameters have converged in 5 iterations Number of observations read : 510 Weighted count: Number of observations skipped : 20 (WEIGHT variable nonpositive) Observations used in the analysis : 289 Weighted count: Denominator degrees of freedom : 483 Maximum number of estimable parameters for the model is 3 File ONEA contains 510 Clusters 289 clusters were used to fit the model Maximum cluster size is 1 records Minimum cluster size is 1 records Sample and Population Counts for Response Variable T42AB 0: Sample Count 128 Population Count : Sample Count 161 Population Count R-Square for dependent variable T42AB (Cox & Snell, 1989):

20 -2 * Normalized Log-Likelihood with Intercepts Only : * Normalized Log-Likelihood Full Model : Approximate Chi-Square (-2 * Log-L Ratio) : 5.02 Degrees of Freedom : 2 Note: The approximate Chi-Square is not adjusted for clustering. Refer to hypothesis test table for adjusted test. Variance Estimation Method: Taylor Series (STRWR) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Logit Response variable T42AB: T42Ab:MASTER DEGREE Y/N MODEL T42Ab(MA Y/N)=T41(AGE) T40(GENDER) Independent Variables and Effects DESIGN BETA S.E. EFFECT T:BETA=0 P-VALUE Intercept T41:YEAR OF BIRTH T40:GENDER (1) FEMALE (2) MALE Variance Estimation Method: Taylor Series (STRWR) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Logit Response variable T42AB: T42Ab:MASTER DEGREE Y/N MODEL T42Ab(MA Y/N)=T41(AGE) T40(GENDER) Contrast CHI-SQ CHI-SQ P-VALUE P-VALUE DF ADJ DF (WALD) (SAT.) (WALD) (SAT.) OVERALL MODEL MODEL MINUS INTERCEPT INTERCEPT T T Variance Estimation Method: Taylor Series (STRWR) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Logit Response variable T42AB: T42Ab:MASTER DEGREE Y/N MODEL T42Ab(MA Y/N)=T41(AGE) T40(GENDER) Lower Upper Independent Variables and Effects 95% 95% Odds Limit Limit Ratio OR OR Intercept T41:YEAR OF BIRTH T40:GENDER (1) FEMALE (2) MALE

THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures

THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures Katherine Baisden, SRI International, Menlo Park, California ABSTRACT Have you ever worked with survey data that are based

More information

MULTILOG Example #3. SUDAAN Statements and Results Illustrated. Input Data Set(s): IRONSUD.SSD. Example. Solution

MULTILOG Example #3. SUDAAN Statements and Results Illustrated. Input Data Set(s): IRONSUD.SSD. Example. Solution MULTILOG Example #3 SUDAAN Statements and Results Illustrated REFLEVEL CUMLOGIT option SETENV LEVELS WEIGHT Input Data Set(s): IRONSUD.SSD Example Using data from the NHANES I and its Longitudinal Follow-up

More information

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Logistic (RLOGIST) Example #2

Logistic (RLOGIST) Example #2 Logistic (RLOGIST) Example #2 SUDAAN Statements and Results Illustrated Zeger and Liang s SE method Naïve SE method Conditional marginals REFLEVEL SETENV Input Data Set(s): BRFWGTSAS7bdat Example Teratology

More information

SUDAAN Analysis Example Replication C6

SUDAAN Analysis Example Replication C6 SUDAAN Analysis Example Replication C6 * Sudaan Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 6 ; libname d "P:\ASDA 2\Data sets\nhanes 2011_2012\" ; ods graphics off

More information

MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution

MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution MULTILOG Example #1 SUDAAN Statements and Results Illustrated Logistic regression modeling R and SEMETHOD options CONDMARG ADJRR option CATLEVEL Input Data Set(s): DARESSD Example Evaluate the effect of

More information

CROSSTAB Example #8. This example illustrates the variety of hypotheses and test statistics now available on the TEST statement in CROSSTAB.

CROSSTAB Example #8. This example illustrates the variety of hypotheses and test statistics now available on the TEST statement in CROSSTAB. CROSSTAB Example #8 SUDAAN Statements and Results Illustrated Stratum-specific Chi-square (CHISQ) Test Stratum-adjusted Cochran-Mantel-Haenszel (CMH) Test ANOVA-type (ACMH) Test ALL Test option DISPLAY

More information

LOGLINK Example #2. Using the 2006 National Health Interview Survey (NHIS), Predict Self-Reported Doctor s Visits During the Past 2 Weeks.

LOGLINK Example #2. Using the 2006 National Health Interview Survey (NHIS), Predict Self-Reported Doctor s Visits During the Past 2 Weeks. LOGLINK Example #2 SUDAAN Statements and Results Illustrated Log-linear regression modeling SEMETHOD REFLEVEL EFFECTS PREDMARG Input Data Set(s): PERSONSX.SAS7BDAT Example Using the 2006 National Health

More information

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN 10.0.1 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Logistic (RLOGIST) Example #4

Logistic (RLOGIST) Example #4 Logistic (RLOGIST) Example #4 SUDAAN Statements and Results Illustrated SEs by replicate method REPWGT EFFECTS EXP option REFLEVEL Input Data Set(s): NH3MI1.SAS7bdat - NH3MI5.SAS7bdat Example Using the

More information

APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS

APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS As mentioned in the section called "Organization and Use of the Data File," selected interval variables

More information

Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5

Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5 Software for the Statistical Analysis of Correlated Data Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5 by Gayle S. Bieler gbmac@rti.org Research Triangle Institute and

More information

CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Logistic (RLOGIST) Example #9

Logistic (RLOGIST) Example #9 Logistic (RLOGIST) Example #9 SUDAAN Statements and Results Illustrated Calculation of response rates and standard errors PREDSTAT RESPRATE SETENV NEST Input Data Set(s): ELS.SAS7bdat Example Using data

More information

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved. AcaStat How To Guide AcaStat Software Copyright 2016, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Frequencies... 3 List Variables... 4 Descriptives... 5 Explore Means...

More information

CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN 10.0.1 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Survey Analysis: Options for Missing Data

Survey Analysis: Options for Missing Data Survey Analysis: Options for Missing Data Paul Gorrell, IMPAQ International, LLC, Columbia, MD Abstract A common situation researchers working with survey data face is the analysis of missing data, often

More information

Getting Started With PROC LOGISTIC

Getting Started With PROC LOGISTIC Getting Started With PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. 19229 Sonoma Hwy. PMB 264 Sonoma, California 95476 707 996 7380 SierraInfo@aol.com www.sierrainformation.com Getting

More information

SUDAAN Analysis Example Replication C5

SUDAAN Analysis Example Replication C5 Analysis Example Replication C5 * Analysis Examples Replication for ASDA 2nd Edition, SAS v9.4 TS1M3 ; * Berglund April 2017 * Chapter 5 ; libname d "P:\ASDA 2\Data sets\nhanes 2011_2012\" ; ods graphics

More information

Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data

Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data Hemalkumar B. Mehta, Michael L. Johnson Department of Clinical

More information

A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing?

A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing? A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing? Frauke Kreuter & Richard Valliant Joint Program in Survey Methodology University of Maryland, College Park fkreuter@survey.umd.edu

More information

Sample Survey Data and the Procs You OUGHT To Be Using. David L. Cassell, Design Pathways, Corvallis, OR

Sample Survey Data and the Procs You OUGHT To Be Using. David L. Cassell, Design Pathways, Corvallis, OR Sample Survey Data and the Procs You OUGHT To Be Using David L. Cassell, Design Pathways, Corvallis, OR ABSTRACT Statisticians and data analysts frequently have data sets which are difficult to analyze,

More information

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures SAS/STAT 14.1 User s Guide Introduction to Categorical Data Analysis Procedures This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual

More information

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design Robert A. Vierkant, Terry M. Therneau, Jon L. Kosanke, James M. Naessens Mayo Clinic, Rochester, MN ABSTRACT A matched

More information

Center for Demography and Ecology

Center for Demography and Ecology Center for Demography and Ecology University of Wisconsin-Madison A Comparative Evaluation of Selected Statistical Software for Computing Multinomial Models Nancy McDermott CDE Working Paper No. 95-01

More information

Final Exam Spring Bread-and-Butter Edition

Final Exam Spring Bread-and-Butter Edition Final Exam Spring 1996 Bread-and-Butter Edition An advantage of the general linear model approach or the neoclassical approach used in Judd & McClelland (1989) is the ability to generate and test complex

More information

Read and Describe the SENIC Data

Read and Describe the SENIC Data Read and Describe the SENIC Data If the data come in an Excel spreadsheet (very common), blanks are ideal for missing values. The spreadsheet must be.xls, not.xlsx. Beware of trying to read a.csv file

More information

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh tap sas82, sas <--Old tap sas913, sas <--New Version

B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh tap sas82, sas <--Old tap sas913, sas <--New Version B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh abc@glue.umd.edu, tap sas82, sas

More information

White Paper. AML Customer Risk Rating. Modernize customer risk rating models to meet risk governance regulatory expectations

White Paper. AML Customer Risk Rating. Modernize customer risk rating models to meet risk governance regulatory expectations White Paper AML Customer Risk Rating Modernize customer risk rating models to meet risk governance regulatory expectations Contents Executive Summary... 1 Comparing Heuristic Rule-Based Models to Statistical

More information

Elementary tests. proc ttest; title3 'Two-sample t-test: Does consumption depend on Damper Type?'; class damper; var dampin dampout diff ;

Elementary tests. proc ttest; title3 'Two-sample t-test: Does consumption depend on Damper Type?'; class damper; var dampin dampout diff ; Elementary tests /********************** heat2.sas *****************************/ title2 'Standard elementary tests'; options pagesize=35; %include 'heatread.sas'; /* Basically the data step from heat1.sas

More information

Introduction to Categorical Data Analysis Procedures (Chapter)

Introduction to Categorical Data Analysis Procedures (Chapter) SAS/STAT 12.1 User s Guide Introduction to Categorical Data Analysis Procedures (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 12.1 User s Guide. The correct bibliographic

More information

Getting Started with HLM 5. For Windows

Getting Started with HLM 5. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC PAPER 255-25 GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. USA Introduction Logistic Regression is an increasingly popular analytic tool. Used to predict the probability

More information

CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SAS v9.2

CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SAS v9.2 CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SAS v9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Survey Data Analysis in Stata 10: Accessible and Comprehensive

Survey Data Analysis in Stata 10: Accessible and Comprehensive Survey Data Analysis in Stata 10: Accessible and Comprehensive Christine Wells Statistical Consulting Group Academic Technology Services University of California, Los Angeles Thursday, October 25, 2007

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Univariate Statistics Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved Table of Contents PAGE Creating a Data File...3 1. Creating

More information

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white Appendix. Use these results to answer 2012 Midterm questions Dataset Description Data on 526 infants with very low (

More information

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

SPSS 14: quick guide

SPSS 14: quick guide SPSS 14: quick guide Edition 2, November 2007 If you would like this document in an alternative format please ask staff for help. On request we can provide documents with a different size and style of

More information

Logistic Regression Analysis

Logistic Regression Analysis Logistic Regression Analysis What is a Logistic Regression Analysis? Logistic Regression (LR) is a type of statistical analysis that can be performed on employer data. LR is used to examine the effects

More information

Need Additional Statistics in Your Report? - ODS OUTPUT to the Rescue!

Need Additional Statistics in Your Report? - ODS OUTPUT to the Rescue! Paper 3253-2015 Need Additional Statistics in Your Report? - ODS OUTPUT to the Rescue! Deborah Buck, inventiv Health. ABSTRACT You might be familiar with or experienced in writing or running reports using

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION v171025 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17 Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

Telecommunications Churn Analysis Using Cox Regression

Telecommunications Churn Analysis Using Cox Regression Telecommunications Churn Analysis Using Cox Regression Introduction As part of its efforts to increase customer loyalty and reduce churn, a telecommunications company is interested in modeling the "time

More information

Multiple Imputation and Multiple Regression with SAS and IBM SPSS

Multiple Imputation and Multiple Regression with SAS and IBM SPSS Multiple Imputation and Multiple Regression with SAS and IBM SPSS See IntroQ Questionnaire for a description of the survey used to generate the data used here. *** Mult-Imput_M-Reg.sas ***; options pageno=min

More information

Timing Production Runs

Timing Production Runs Class 7 Categorical Factors with Two or More Levels 189 Timing Production Runs ProdTime.jmp An analysis has shown that the time required in minutes to complete a production run increases with the number

More information

CHAPTER FIVE CROSSTABS PROCEDURE

CHAPTER FIVE CROSSTABS PROCEDURE CHAPTER FIVE CROSSTABS PROCEDURE 5.0 Introduction This chapter focuses on how to compare groups when the outcome is categorical (nominal or ordinal) by using SPSS. The aim of the series of exercise is

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis Young Cho at Chicago 1 The Circle of Research Process Theory Evaluation Real World Theory Hypotheses Test Hypotheses Data Collection Sample Operationalization/ Measurement

More information

An Application of Categorical Analysis of Variance in Nested Arrangements

An Application of Categorical Analysis of Variance in Nested Arrangements International Journal of Probability and Statistics 2018, 7(3): 67-81 DOI: 10.5923/j.ijps.20180703.02 An Application of Categorical Analysis of Variance in Nested Arrangements Iwundu M. P. *, Anyanwu C.

More information

I am an experienced SAS programmer but I have not used many SAS/STAT procedures

I am an experienced SAS programmer but I have not used many SAS/STAT procedures Which Proc Should I Learn First? A STAT Instructor s Top 5 Modeling Procedures Catherine Truxillo, Ph.D. Manager, Analytical Education SAS Copyright 2010, SAS Institute Inc. All rights reserved. The Target

More information

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1 Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to

More information

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology As noted previously, Hierarchical Linear Modeling (HLM) can be considered a particular instance

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

Never Smokers Exposure Case Control Yes No

Never Smokers Exposure Case Control Yes No Question 0.4 Never Smokers Exosure Case Control Yes 33 7 50 No 86 4 597 29 428 647 OR^ Never Smokers (33)(4)/(7)(86) 4.29 Past or Present Smokers Exosure Case Control Yes 7 4 2 No 52 3 65 69 7 86 OR^ Smokers

More information

Small Business advice seeking behaviour technical report. An analysis of the 2018 small business legal need survey July 2018

Small Business advice seeking behaviour technical report. An analysis of the 2018 small business legal need survey July 2018 Small Business advice seeking behaviour technical report An analysis of the 2018 small business legal need survey July 2018 Which characteristics of small businesses and the legal issues they face have

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset. Module 7: Multilevel Models for Binary Responses Most of the sections within this module have online quizzes for you to test your understanding. To find the quizzes: Pre-requisites Modules 1-6 Contents

More information

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes

More information

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use k:mydirectory, Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income

More information

A Descriptive Analysis of Reported Health Issues in Rural Jamaica Verlin Joseph, Florida Agricultural & Mechanical University

A Descriptive Analysis of Reported Health Issues in Rural Jamaica Verlin Joseph, Florida Agricultural & Mechanical University Paper 8160-2016 A Descriptive Analysis of Reported Health Issues in Rural Jamaica Verlin Joseph, Florida Agricultural & Mechanical University ABSTRACT Objective: There are currently thousands of Jamaican

More information

STATISTICS PART Instructor: Dr. Samir Safi Name:

STATISTICS PART Instructor: Dr. Samir Safi Name: STATISTICS PART Instructor: Dr. Samir Safi Name: ID Number: Question #1: (20 Points) For each of the situations described below, state the sample(s) type the statistical technique that you believe is the

More information

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Amber Barnato MD MPH MS University of Pittsburgh Scott Halpern MD PhD University of Pennsylvania Learning objectives 1. List

More information

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT STAT 512 EXAM I STAT 512 Name (7 pts) Problem Points Score 1 40 2 25 3 28 USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT WRITE LEGIBLY. ANYTHING UNREADABLE WILL NOT BE GRADED GOOD LUCK!!!!

More information

SPSS Guide Page 1 of 13

SPSS Guide Page 1 of 13 SPSS Guide Page 1 of 13 A Guide to SPSS for Public Affairs Students This is intended as a handy how-to guide for most of what you might want to do in SPSS. First, here is what a typical data set might

More information

CREDIT RISK MODELLING Using SAS

CREDIT RISK MODELLING Using SAS Basic Modelling Concepts Advance Credit Risk Model Development Scorecard Model Development Credit Risk Regulatory Guidelines 70 HOURS Practical Learning Live Online Classroom Weekends DexLab Certified

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author

APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT

More information

Introduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory

Introduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory Introduction to Survey Data Analysis Young Ik Cho, PhD Research Assistant Professor University of Illinois at Chicago Fall 2008 Focus of the Seminar Data Cleaning/Missing Data Sampling Bias Reduction When

More information

Using Excel s Analysis ToolPak Add-In

Using Excel s Analysis ToolPak Add-In Using Excel s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful,

More information

THE GUIDE TO SPSS. David Le

THE GUIDE TO SPSS. David Le THE GUIDE TO SPSS David Le June 2013 1 Table of Contents Introduction... 3 How to Use this Guide... 3 Frequency Reports... 4 Key Definitions... 4 Example 1: Frequency report using a categorical variable

More information

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output.

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output. Chapter 9 Stata v10.1 Analysis Examples Syntax and Output General Notes on Stata 10.1 Given that this tool is used throughout the ASDA textbook this chapter includes only the syntax and output for the

More information

Spreadsheets in Education (ejsie)

Spreadsheets in Education (ejsie) Spreadsheets in Education (ejsie) Volume 2, Issue 2 2005 Article 5 Forecasting with Excel: Suggestions for Managers Scott Nadler John F. Kros East Carolina University, nadlers@mail.ecu.edu East Carolina

More information

Harbingers of Failure: Online Appendix

Harbingers of Failure: Online Appendix Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine

More information

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Statistics 101 Thomas Leininger June 18, 2013 Types of outliers in linear regression Types of outliers How do(es) the outlier(s) influence

More information

The SAS System 1. RM-ANOVA analysis of sheep data assuming circularity 2

The SAS System 1. RM-ANOVA analysis of sheep data assuming circularity 2 The SAS System 1 Obs no2 sheep time y 1 1 1 time1 2.197 2 1 1 time2 2.442 3 1 1 time3 2.542 4 1 1 time4 2.241 5 1 1 time5 1.960 6 1 1 time6 1.988 7 1 2 time1 1.932 8 1 2 time2 2.526 9 1 2 time3 2.526 10

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................3

More information

Improving long run model performance using Deviance statistics. Matt Goward August 2011

Improving long run model performance using Deviance statistics. Matt Goward August 2011 Improving long run model performance using Deviance statistics Matt Goward August 011 Objective of Presentation Why model stability is important Financial institutions are interested in long run model

More information

F u = t n+1, t f = 1994, 2005

F u = t n+1, t f = 1994, 2005 Forecasting an Electric Utility's Emissions Using SAS/AF and SAS/STAT Software: A Linear Analysis Md. Azharul Islam, The Ohio State University, Columbus, Ohio. David Wang, The Public Utilities Commission

More information

Defining models using equations...

Defining models using equations... A Course in Statistical Modelling Methods@Manchester August 27, 28 and 29, 2014 session 03: Defining models and test selection Graeme Hutcheson Manchester Institute of Education University of Manchester

More information

CHAPTER 5 RESULTS AND ANALYSIS

CHAPTER 5 RESULTS AND ANALYSIS CHAPTER 5 RESULTS AND ANALYSIS This chapter exhibits an extensive data analysis and the results of the statistical testing. Data analysis is done using factor analysis, regression analysis, reliability

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

energy usage summary (both house designs) Friday, June 15, :51:26 PM 1

energy usage summary (both house designs) Friday, June 15, :51:26 PM 1 energy usage summary (both house designs) Friday, June 15, 18 02:51:26 PM 1 The UNIVARIATE Procedure type = Basic Statistical Measures Location Variability Mean 13.87143 Std Deviation 2.36364 Median 13.70000

More information

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)

More information

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National

More information

Using Weights in the Analysis of Survey Data

Using Weights in the Analysis of Survey Data Using Weights in the Analysis of Survey Data David R. Johnson Department of Sociology Population Research Institute The Pennsylvania State University November 2008 What is a Survey Weight? A value assigned

More information

Multilevel/ Mixed Effects Models: A Brief Overview

Multilevel/ Mixed Effects Models: A Brief Overview Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually

More information

FOLLOW-UP NOTE ON MARKET STATE MODELS

FOLLOW-UP NOTE ON MARKET STATE MODELS FOLLOW-UP NOTE ON MARKET STATE MODELS In an earlier note I outlined some of the available techniques used for modeling market states. The following is an illustration of how these techniques can be applied

More information

SUGI 29 Statistics and Data Analysis. Paper

SUGI 29 Statistics and Data Analysis. Paper Paper 206-29 Using SAS Procedures to Make Sense of a Complex Food Store Survey Jeff Gossett, University of Arkansas for Medical Sciences, Little Rock, AR Pippa Simpson, University of Arkansas for Medical

More information

Introduction of STATA

Introduction of STATA Introduction of STATA News: There is an introductory course on STATA offered by CIS Description: Intro to STATA On Tue, Feb 13th from 4:00pm to 5:30pm in CIT 269 Seats left: 4 Windows, 7 Macintosh For

More information

Computer Handout Two

Computer Handout Two Computer Handout Two /******* senic2.sas ***********/ %include 'senicdef.sas'; /* Effectively, Copy the file senicdef.sas to here */ title2 'Elementary statistical tests'; proc freq; title3 'Use proc freq

More information

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner SAS Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Melodie Rush Principal

More information

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Greater Mekong Sub-region (GMS) initiative Session 7 Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Consultant Witada Anukoonwattaka,

More information

Categorical Data Analysis

Categorical Data Analysis Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical

More information

SECTION 11 ACUTE TOXICITY DATA ANALYSIS

SECTION 11 ACUTE TOXICITY DATA ANALYSIS SECTION 11 ACUTE TOXICITY DATA ANALYSIS 11.1 INTRODUCTION 11.1.1 The objective of acute toxicity tests with effluents and receiving waters is to identify discharges of toxic effluents in acutely toxic

More information

Introduction to Survey Data Analysis. Linda K. Owens, PhD. Assistant Director for Sampling & Analysis

Introduction to Survey Data Analysis. Linda K. Owens, PhD. Assistant Director for Sampling & Analysis Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis General information Please hold questions until the end of the presentation Slides available at www.srl.uic.edu/seminars/fall15seminars.htm

More information

1. Understand & evaluate survey. What is survey data? When analyzing survey data... General information. Focus of the webinar

1. Understand & evaluate survey. What is survey data? When analyzing survey data... General information. Focus of the webinar What is survey data? Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Data gathered from a sample of individuals Sample is random (drawn using probabilistic

More information