### R Modeling

Modeling in R

taught by Sudha Purohit

Aim of Course:

In this online course, “Modeling in R,” you will learn how to use R to build statistical models and use them to analyze data. Multiple regression is covered first followed by logistic regression. The generalized linear model is then introduced and shown to include multiple regression and logistic regression as special cases. The Poisson model for count data will be introduced and the concept of overdispersion described. You will then learn how to analyse longitudinal data, first using relatively straightforward graphics and simple inferential approaches. This will be followed by describing mixed-effects models and the generalized estimating approach for such data. The emphasis in the course is how to use R to fit the models listed and how to interpret the R output, rather than the theoretical background of the models. Consequently some knowledge of linear models is required (statistics.com has courses in all of them).

See also the important note on the course Requirements tab.

Course Program:

## WEEK 1: Linear Regression, Logistic Regression

- Multiple linear regression with R
- Simple examples, dummy explanatory variables, interpreting regression coefficients; finding a parsimonious model

## WEEK 2: Generalized Linear Models With R

- Logistic regression with R
- The need for a different model when the response variable is binary, the logistic transform and fitting the model to some simple examples, deviance residuals
- Multiple regression and logistic regression as special cases of the generalized linear model
- The Poisson model for count data.
- The problem of overdispersion

## WEEK 3: Analysing Longitudinal Data Using R

- Examples of longitudinal data
- Simple graphics for longitudinal data and simple inference using the summary measure approach
- The 'long form' of longitudinal data
- Mixed-effects models for longitudinal data

## WEEK 4: Generalized Estimating Equations

- Modeling the correlational structure of the repeated measurements
- The generalized estimating equation approach for non-normal response variables in longitudinal data
- The dropout problem

HOMEWORK:

Homework in this course consists of guided data analysis problems using software and guided data modeling problems using software.

In addition to assigned readings, this course also has practice exercises, and the instructor's expert write-ups on important concepts.

# Modeling in R

Who Should Take This Course:

Anyone who is familiar with R and wants to learn how to use it to build and use statistical models.

IMPORTANT: the course will cover a variety of techniques and at different levels, to meet the needs of different groups of users. Those with minimal-to-moderate statistics preparation will want to spend time on the more extensive presentation of linear regression, and not attempt to complete all the more advanced segments on other methods. Those with more experience in statistics may not require as much time in the early stages, but will be better able to work with the more advanced segments. The goal is to provide guidance in using R to implement various modeling procedures, and not to provide conceptual development of the statistical methods. Most of the modeling techniques described here are covered in separate courses at Statistics.com. If you take this course first, you will probably not gain a full understanding of the more advanced techniques, but you will be better positioned, software-wise, to implement them when and if you take those courses. If you take the other courses first, you will have a better understanding of the concepts behind the techniques before tackling them in R, but will be less prepared software-wise when you take the conceptual courses. Either approach will work, but each has its own costs and benefits.

Level:

Advanced/Intermediate

- R Intro (stats) or the equivalent familiarity with R

Organization of the Course:

This course takes place online at the Institute for 4 weeks. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

Time Requirement:

About 15 hours per week, at times of your choosing.

Credit:

Students come to the Institute for a variety of reasons. As you begin the course, you will be asked to specify your category:

- You may be interested only in learning the material presented, and not be concerned with grades or a record of completion.
- You may be enrolled in PASS (Programs in Analytics and Statistical Studies) that requires demonstration of proficiency in the subject, in which case your work will be assessed for a grade.
- You may require a "Record of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). For those successfully completing the course, CEU's and a record of course completion will be issued by The Institute, upon request.

Course Text:

Course materials will be provided by the instructor.

Software:

Students must have access to R. For information on obtaining a copy of R, please click here.

# Modeling in R

August 18, 2017 to September 15, 2017February 02, 2018 to March 02, 2018August 17, 2018 to September 14, 2018