### Count Data

Modeling Count Data

taught by James Hardin

Aim of Course:

This online course, "Modeling Count Data" deals with regression models for count data; i.e. models with a response or dependent variable data in the form of a count or rate. A count is understood as the number of times an event occurs; a rate as how many events occur within a specific area or time interval. The course will cover the nature of count models, Poisson regression, negative binomial regression, problems of over- and under-dispersion, fit and residual tests and graphics for count models, problems with zeros (zero truncated and zero inflated mixture models, two-part hurdle models), and advanced models such as Poisson inverse Gaussian (PIG), generalized Poisson and generalized negative binomial models, left, right and interval truncated and censored count models, finite mixture models, quantile count models, generalized additive modeling, handling longitudinal and clustered count data, mixture count models, and an overview of Bayesian count models.

This course may be taken individually (one-off) or as part of a certificate program.

Course Program:

## WEEK 1: Fundamentals of Modeling Counts; Poisson Regression

- What are counts
- Understanding a statistical count model
- Variety of count models
- Estimation - the modeling process
- Poisson model assumptions
- Apparent overdispersion
- The basic Poisson mode
- Interpreting coefficients and rate ratios
- Exposure; modeling time, area, and space
- Prediction
- Poisson marginal effects

## WEEK 2: Overdispersion, Assessment of Fit, and Negative Binomial Regression

- Count model fit statistics
- Overdispersion: what, why, and how
- Testing overdispersion
- Methods for handling overdispersion – adjusting SEs
- Analysis of residuals
- Likelihood ratio tests
- Model selection criterion
- Validation sample
- Varieties of negative binomial models
- Negative binomial model assumptions
- Examples using real data

## WEEK 3: Alternative Count Models: NB Fit tests, PIG, Problem with Zeros

- General negative binomial fit tests
- Generalized NB-P regression (NBP)
- Heterogeneous negative binomial (NBH)
- Generalized Poisson - modeling underdispersion (GP)
- Poisson inverse Gaussian (PIG)
- Zero-truncated count models
- Two-part hurdle models
- Zero-inflated count models

## WEEK 4: Underdispersed count data, Advanced count models

- Generalized Poisson – modeling underdispersion
- Exact Poisson regression
- Truncation and censored count models
- Finite mixture models
- Non-parametric and quantile count models
- Overview of longitudinal and clustered count models
- 3-parameter count models
- Overview of Bayesian count models
- Project preparation

HOMEWORK:

Homework in this course consists of short answer questions to test concepts, guided data analysis problems using software, guided data modeling problems using software and end of course data modeling project. In addition to assigned readings, this course also has supplemental readings available online in the course.

Note: The Institute gratefully acknowledges the contribution of Prof. Joseph Hilbe, the original developer and instructor for the course.

# Modeling Count Data

Who Should Take This Course:

Analysts and researchers in a wide variety of fields who are concerned with modeling counts and rates.

Level:

Intermediate/Advanced

Some familiarity with linear modeling - such as that provided in Regression Analysis will be helpful.

Organization of the Course:

This course takes place online at the Institute for 4 weeks. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

Time Requirement:

About 15 hours per week, at times of your choosing.

Credit:

Students come to the Institute for a variety of reasons. As you begin the course, you will be asked to specify your category:

- You may be interested only in learning the material presented, and not be concerned with grades or a record of completion.
- You may be enrolled in PASS (Programs in Analytics and Statistical Studies) that requires demonstration of proficiency in the subject, in which case your work will be assessed for a grade.
- You may require a "Record of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). For those successfully completing the course, CEU's and a record of course completion will be issued by The Institute, upon request.

Course Text:

The required text is *Modeling Count Data, *Hilbe, Joseph M (2014), Cambridge University Press. This paperback edition includes R, Stata, SAS and Excel/CVS code, which can be downloaded from the author’s website. R data and functions are located in the COUNT package on CRAN. An electronic version of the book is also available from the publisher, or on Amazon.

Software:

The methods covered in this course are handled well by Stata, R and for the most part, SAS. Data sets used in the text are available in Stata, R SAS and Excel formats. With respect to code and output:

Stata: Code and output are provided for all examples for which known Stata commands exist.

R: Functions and scripts are available in the COUNT and msme packages; additional R code is provided at the following web site: *http://works.bepress.com/joseph_hilbe/*. Most examples have R support.

SAS: Some code and output is provided (e.g. chapter 15 on Bayesian count models).

The instructor and TA are familiar with Stata and R. The instructor is familiar with most SAS procedures related to the modeling of count data. No instructional support is available for SAS. If you plan on using R and are not already familiar with it, please consider taking one of our courses where R is introduced from the ground up: R-Programming: Introduction," "Introduction to R: Data Handling," or "Introduction to R: Statistical Analysis." R has a learning curve that is steeper than that of most commercial statistical software.

# Modeling Count Data

May 19, 2017 to June 16, 2017October 27, 2017 to November 24, 2017