taught by Robert LaBudde
Aim of Course:
This online course, "Multivariate Statistics" covers the theoretical foundations of the topic. Multivariate data typically consist of many records, each with readings on two or more variables, with or without an "outcome" variable of interest. Procedures covered in the course include multivariate analysis of variance (MANOVA), principal components, factor analysis and classification.
This course may be taken individually (one-off) or as part of a certificate program.
WEEK 1: Multivariate Data
- Descriptive Statistics
- Rows (Subjects) vs. Columns (Variables)
- Covariances, Correlations and Distances
- The Multivariate Normal Distribution
- More than 2 Variable Plots
- Assessing Normality
WEEK 2: Multivariate Normal Distribution, MANOVA, & Inference
- Details of the Multivariate Normal Distribution
- Wishart Distribution
- Hotelling T2 Distribution
- Multivariate Analysis of Variance (MANOVA)
- Hypothesis Tests on Covariances
- Joint Confidence Intervals
WEEK 3: Multidimensional Scaling & Correspondence Analysis
- Principal Components
- Correspondence Analysis
- Multidimensional Scaling
WEEK 4: Discriminant Analysis
- Classification Problem
- Population Covariances Known
- Population Covariances Estimated
- Fisher’s Linear Discriminant Function
Homework in this course consists of short answer questions to test concepts, guided data analysis problems using software, and guided data modeling problems using software.
In addition to assigned readings, this course also has an end of course data modeling project, and supplemental readings available online.
Who Should Take This Course:
Students who are planning to take technique-specific courses (e.g. cluster analysis, factor analysis, logistic regression, GLM, mixed models) or domain-specific courses (e.g. data mining) and who need additional background in multivariate theory and practice prior to doing so.
Multivariate statistics is a wide field, and many courses at Statistics.com cover areas not included in this course. These include: Data Mining 1 and Data Mining 2, Cluster Analysis, Logistic Regression, Microarray Analysis, Factor Analysis, Longitudinal Data, and Missing Data among others.
ADVANCED - INTERMEDIATE: see prerequisites
Recommended, but not required: Maximum Likelihood Estimation
If you are unclear as to whether you have mastered the material in the introductory statistics courses, test yourself with these placement exams here
Organization of the Course:
This course takes place online at the Institute for 4 weeks. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.
At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.
About 15 hours per week, at times of your choosing.
Students come to the Institute for a variety of reasons. As you begin the course, you will be asked to specify your category:
- You may be interested only in learning the material presented, and not be concerned with grades or a record of completion.
- You may be enrolled in PASS (Programs in Analytics and Statistical Studies) that requires demonstration of proficiency in the subject, in which case your work will be assessed for a grade.
- You may require a "Record of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). For those successfully completing the course, CEU's and a record of course completion will be issued by The Institute, upon request.
The required text is An Introduction to Applied Multivariate Analysis with R by Brian Everitt, and Torsten Hothorn. The text may be purchased here
The course will be supplemented by notes supplied by the instructor for topics not covered by the text.
The exercises in this course will require the use of statistical software that can do multivariate analysis (plots, MANOVA, discriminant analysis, correspondence analysis, multidimensional scaling) and standard matrix operations.
Output in the course material and the text is based on the R statistical system and Microsoft Excel, as these are the programs the instructor is familiar with. Other software may be used, but you should be prepared to use your program and interpret its output (in comparison with that given in the course) on your own. If you are planning to use R in this course and are not already familiar with it, please consider taking one of our courses where R is introduced from the ground up: "Introduction to R: Data Handling," "Introduction to R: Statistical Analysis," or "Introduction to Modeling." R has a learning curve that is steeper than that of most commercial statistical software.
Click Here for information on obtaining a free (or nominal cost) copy of various software packages for use during the course