### Missing Data

Missing Data

taught by Patricia Berglund

Aim of Course:

Data sets often have missing values. Missing data is a problem, in particular, with multivariate modeling. If the analyst must discard an entire record because the value for one variable is missing, valuable information is lost. Better to find a way keep the record, adjust for the missing value(s), and let the analysis proceed.

This online course, "Missing Data" teaches the basics of handling missing data including evaluation of types and patterns of missing data, strategies for analysis of data sets with item missing data, and imputation of missing data with an emphasis on multiple imputation. Example applications are presented using simple random sample and complex sample design data sets with SAS and Stata with additional examples in other software packages such as IVEware. Homework exercises for each class session along with a final project are included.

This course may be taken individually (one-off) or as part of a certificate program.

Course Program:

## WEEK 1: Overview of Missing Data

- Missing Data: An Overview
- Causes of Missing Data
- Types of Missing Data
- Missing Data Patterns
- Issues for Analysis
- Assumptions and Implications for Analysis and Imputation
- Analytic Approaches for Data Sets with Missing Data
- Conventional and Novel
- Introduction to Multiple Imputation Process

- Software Options

## WEEK 2: Overview of the Multiple Imputation Process

- Overview of Multiple Imputation and the Three Step Process of MI
- Details of Three Step Process
- Imputation/Variance Estimation for MI
- Analysis of Imputed Data Sets
- Combining Results from Imputation and Analysis of Imputed Data Sets

- Selection of An Imputation Method
- Overview of Combined Results
- Application: Multiple Imputation of Continuous and Categorical Variables
- What method to use for imputation?
- How to specify method in common software?
- Discussion of Output from the Multiple Imputation Process
- Key output from imputation, analysis of imputed data sets, and combined results

- How is the variability of the MI process incorporated into the output?

## WEEK 3: Detailed Multiple Imputation Examples

- Imputation of Continuous Variables (Each example demonstrates the full three step MI Process)
- Consideration of imputation methods for continuous variables
- Inclusion of categorical predictor variables
- Application: continuous variable imputation with NCS-R data set
- How to correctly analyze data set derived from complex sample design
- Discussion of output from each of the three steps
- Diagnostic tools

- Imputation of Categorical Variables (Each example demonstrates the full three step MI Process)
- Consideration of imputation methods for categorical variables
- Inclusion of continuous and categorical predictor variables
- Application: categorical variable imputation with NCS-R data set
- How to correctly analyze data set derived from complex sample design
- Discussion of output from each of the three steps
- Diagnostic tools

## WEEK 4: Detailed Multiple Imputation Examples and Additional Topics in Missing Data

- Detailed MI Examples, continued
- Imputation of Categorical and Continuous Variables with non-monotone missing data pattern
- Application: Option 1: Use of sequential regression in Stata and SAS, comparison of results
- Application: Option 2: Create monotone missing data pattern with subsequent use of appropriate imputation method given monotone missing data, compare how this might change selection of imputation method

- Additional Topics in Missing Data
- Frequently Asked Questions about Missing Data and Imputation
- Handling missing data in longitudinal data sets
- Non-ignorable missing data
- What is non-ignorable missing data?
- How to handle missing data that is non-ignorable?

- Staying current in research and software developments in handling missing data

- Imputation of Categorical and Continuous Variables with non-monotone missing data pattern

Homework:

The homework in this course consists of short answer questions to test concepts, guided data analysis problems using software.

This course also has supplemental readings available online.

# Missing Data

Who Should Take This Course:

Any statistical analyst will benefit from this course.

Level:

intermediate

You should be familiar with introductory statistics. Try these self tests to check your knowledge.

You should also be confident with topics in regression, as covered in our course

Organization of the Course:

This course takes place online at the Institute for 4 weeks. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

Time Requirement:

About 15 hours per week, at times of your choosing.

Credit:

Students come to the Institute for a variety of reasons. As you begin the course, you will be asked to specify your category:

- You may be interested only in learning the material presented, and not be concerned with grades or a record of completion.
- You may be enrolled in PASS (Programs in Analytics and Statistical Studies) that requires demonstration of proficiency in the subject, in which case your work will be assessed for a grade.
- You may require a "Record of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). For those successfully completing the course, CEU's and a record of course completion will be issued by The Institute, upon request.

Course Text:

*Missing Data* by Paul D. Allison, University of Pennsylvania © 2001, 104 pages, SAGE Publications, Inc, Series: Quantitative Applications in the Social Sciences, Volume 136. This text may be ordered here.

Software:

Homework assignments will involve the use of standard statistical software. SAS, Stata, R and IVEware can be used for all the assignments, and other software packages such as Solas, Sudaan, and SPSS are suitable for most but not all assignments. The instructor is familiar with SAS, Stata and IVEware and can answer questions about those packages. The teaching assistant can provide support for R.

# Missing Data

September 29, 2017 to October 27, 2017October 05, 2018 to November 02, 2018