### Survey - Complex

Analysis of Survey Data from Complex Sample Designs

taught by Patricia Berglund

Aim of Course:

In order to extract maximum information at minimum cost, sample designs are typically more complex than simple random samples. Cluster sampling and stratified designs are common. But how do you analyze the resulting data - in particular, how do you determine margins of error? This online course, "Analysis of Survey Data from Complex Sample Designs" teaches you how to estimate variances when analyzing survey data from complex samples, and also how to fit linear and logistic regression models to complex sample survey data.

This course may be taken individually (one-off) or as part of a certificate program.

Course Program:

## WEEK 1: Overview

- Applied Survey Data Analysis: An Overview
- Important terms, concepts, and notation
- Software Overview
- Getting to Know the Complex Sample Design
- Classification of Sample Designs
- Target Populations and Survey Populations
- Simple Random Sampling
- Complex Sample Design Effects
- Complex Samples: Clustering and Stratification
- Weighting in Analysis of Survey Data
- Multi-stage Area Probability Sample Designs

## WEEK 2: Overview continued

- Foundations and Techniques for Design-based Estimation and Inference
- Finite Populations and Superpopulation Models
- Confidence Intervals for Population Parameters
- Weighted Estimation of Population Parameters
- Probability Distributions and Design-based Inference
- Variance Estimation
- Hypothesis Testing in Survey Data Analysis
- Total Survey Error
- Preparation for Complex Sample Survey Data Analysis
- Analysis Weights: Review by the Data User
- Understanding and Checking the Sampling Error Calculation Model
- Addressing Item Missing Data in Analysis Variables
- Preparing to Analyze Data from Sample Subclasses
- A Final Checklist for Data Users

## WEEK 3: Descriptive Statistics

- Descriptive Analysis for Continuous Variables
- Special Considerations in Descriptive Analysis of Complex Sample Survey Data
- Simple Statistics for Univariate Continuous Distruibutions
- Bivariate Relationships between Two Continuous Variables
- Descriptive Statistics for Subpopulations
- Linear Functions of Descriptive Estimates and Differences of Means
- Categorical Data Analysis
- A Framework for Analysis of Categorical Survey Data
- Univariate Analysis of Categorical Data
- Bivariate Analysis of Categorical Data
- Analysis of Multivariate Categorical Data

## WEEK 4: Regression Models

- Linear Regression Models
- The Linear Regression Model
- Fitting linear regression models to survey data

- Four Steps in Linear Regression Analysis
- Some Practical Considerations and Tools
- Application: Modeling Diastolic Blood Pressure with the NHANES Data
- Logistic Regression and Generalized Linear Models for Binary Survey Variables
- Generalized Linear Models (GLMs) for Binary Survey Responses
- Building the Logistic Regression Model: Stage 1-Model Specification
- Building the Logistic Regression Model: Stage 2-Estimation of Model Parameters and Standard Errors
- Building the Logistic Regression Model: Stage 3-Evaluation of the Fitted Model
- Building the Logistic Regression Model: Stage 4-Interpretation and Inference
- Analysis Application
- Comparing the Logistic, Probit, and Complementary-Log-Log (C-L-L) GLMs for Binary Dependent Variables

Homework:

The homework in this course consists of short answer questions to test concepts, guided exercises in writing code and guided data analysis problems using software.

In addition to assigned readings, this course also has example software codes, supplemental readings available online, and an end of course data modeling project.

# Analysis of Survey Data from Complex Sample Designs

Who Should Take This Course:

Anyone designing surveys or analyzing survey data.

Level:

Advanced/Intermediate

Students should also have some familiarity with generalized linear models (GLM), in specific, logistic regression. These topics are covered as part of the Categorical Data Analysis course, and in greater depth in the GLMand Logistic Regression courses.

Organization of the Course:

This course takes place online at the Institute for 4 weeks. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

Time Requirement:

About 15 hours per week, at times of your choosing.

Credit:

Students come to the Institute for a variety of reasons. As you begin the course, you will be asked to specify your category:

- You may be interested only in learning the material presented, and not be concerned with grades or a record of completion.
- You may be enrolled in PASS (Programs in Analytics and Statistical Studies) that requires demonstration of proficiency in the subject, in which case your work will be assessed for a grade.
- You may require a "Record of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). For those successfully completing the course, CEU's and a record of course completion will be issued by The Institute, upon request.

Course Text:

The course text is *Applied Survey Data Analysis* by Steve Heeringa, Brady West, and Pat Berglund.

PLEASE ORDER YOUR COPY IN TIME FOR THE COURSE STARTING DATE.

Software:

The course will be driven by learning how to use specialized software procedures for the analysis of complex sample survey data, using real data sets, and exercises will be selected from the book chapters. Participants could use R, WesVar, or IVEware (free packages) or SAS, Stata, SUDAAN, or SPSS (commercial packages, with SPSS users required to purchase the Complex Samples Module). If you plan to use other software, check to be sure that it can analyze data from complex survey designs (clustered, stratified, multistage, etc.).