### Categorical Data

# Categorical Data Analysis

# taught by Brian Marx

Aim of Course:

This online course, "Categorical Data Analysis" will focus on a logistic regression approach for the analysis of contingency table data, where the cell entries represent counts that are cross-tabulated using categorical variables. Tests for (conditional) independence are discussed in the context of odds-ratios and relative risks, for both two-way and three-way data tables. After the ground work is laid for logistic regression models for binomial responses, more complex data structure will be introduced, e.g. those having more categorical variables or even continuous covariates. As a broad view is taken through the generalized linear model framework, opportunity is taken to also present a few model variations, such as Probit regression for binomial responses and Poisson regression for count data. Model checking (residuals, goodness-of-fit), model inference (testing, confidence intervals), model interpretation (odds-ratios, EL50s), and model selection (AIC, automatic procedures, testing reduced models) are all detailed. The focus of this course will remain laser sharp on logistic regression modeling and on the corresponding interpretation of these models, rather than the theory behind them.

This course may be taken individually (one-off) or as part of a certificate program.

Course Program:

## WEEK 1: Categorical Responses and Contingency Tables

- Binomial and multinomial distributions
- Maximum Likelihood
- Test of proportions
- Joint, marginal and conditional probabilities
- Odds ratio and relative risk
- Test of independence
- Three-way tables
- Conditional independence and homogenous association

## WEEK 2: Generalized Linear Models

- Components of a generalized linear model
- Binary data: logistic and probit models
- Poisson regression for count data
- Model checking and resideual analysis
- Inference about model parameters
- Goodness-of-fit and deviance

## WEEK 3: Applications and Interpretations for Logistic Regression

- Interpretation in logistic regression
- Odds-ratio, EL50, probability rate of change
- Inference and confidence intervals for logistic regression
- Grouped and ungrouped data
- Categorical predictors/ indicator variables/ coding
- Multiple logistic regression

## WEEK 4: Building and Applying Logistic Regression Models

- Strategies in model selection
- Model checking and AIC
- Forward, stepwise, backward algorithms
- Likelihood ratio testing for models
- Deviance and residuals assessment
- Effects of sparse data

HOMEWORK:

Homework in this course consists of short answer questions to test concepts and guided numerical problems using software.

In addition to assigned readings, this course also has supplemental readings available online, example software files, and an end of course data modeling project.

# Categorical Data Analysis

Who Should Take This Course:

Anyone who needs to analyze data in which the response is in yes/no or categorical form. Market researchers, medical researchers, surveyors, those who study education assessment data, quality control specialists, life scientists, environmental scientists, ecologists.

Level:

Intermediate

Organization of the Course:

This course takes place online at the Institute for 4 weeks. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

Time Requirement:

About 15 hours per week, at times of your choosing.

Credit:

Students come to the Institute for a variety of reasons. As you begin the course, you will be asked to specify your category:

- You may be interested only in learning the material presented, and not be concerned with grades or a record of completion.
- You may be enrolled in PASS (Programs in Analytics and Statistical Studies) that requires demonstration of proficiency in the subject, in which case your work will be assessed for a grade.
- You may require a "Record of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). For those successfully completing the course, CEU's and a record of course completion will be issued by The Institute, upon request.

Course Text:

The required text for this course is *An Introduction to Categorical Data Analysis, Second Edition* by Alan Agresti.

PLEASE ORDER YOUR COPY IN TIME FOR THE COURSE STARTING DATE.

Software:

Most standard software packages can do various forms of categorical data analysis. No one particular software program is required or used predominantly for course illustrations, but this course does require software that can do tests and confidence intervals for proportions, chi-square tests, and logistic regression. Standard packages such as SAS, Stata, R, SPSS, and Minitab can do this; click here for information on obtaining a free (or nominal cost) copy of various software packages for use during the course.

Note: If you are planning to use R in this course and are not already familiar with it, please consider taking one of our courses where R is introduced from the ground up: "R Programming - Introduction 1," "Introduction to R: Statistical Analysis," or "Introduction to Modeling." R has a learning curve that is steeper than that of most commercial statistical software.

# Categorical Data Analysis

April 07, 2017 to May 05, 2017October 06, 2017 to November 03, 2017April 06, 2018 to May 04, 2018October 05, 2018 to November 02, 2018