On Demand Workshop: Effectively Dealing with Missing Data without Biasing your Results
Learn the new approaches to missing data so you can confidently analyze your data with no loss of power, accurate p-values, and unbiased results.
If you have any experience with missing data, you know it really messes analyses up:
- You lose power when cases get dropped.
- Results become biased when representative cases aren’t included.
- Standard errors and p-values become too small if you lose cases or estimate missing values.
But there is something unique about the way it messes analyses up.
It’s not a data issue like skewness or outliers that you can just ignore (whether you should or not).
Ignoring missing data still means choosing a method of dealing with missing data–-you’re just using the default. The default method in most statistical software is listwise deletion–drop any case with any value missing.
Depending on which statistical software you’re using and the patterns and percentage of missing data, the default may be a perfectly acceptable way of dealing with the missing data.
Or it may be the worst possible option.
And in data analysis, it’s always better if you understand the defaults, know what they’re doing in your data set, and decide for yourself if it’s the best approach.
Up until about 15 years ago, there weren’t many other options.
There was listwise deletion and there was imputation (inserting estimates for the missing values).
Listwise deletion can decimate your sample, your power, and bias results.
Imputations solved the power issue, but most of the imputation methods were pretty sketchy, and they biased both overall results and p-values even worse.
It was a “damned if you do, damned if you don’t” kind of situation.
But it’s different now.
In August 1999, just a month after I started at the Statistical Consulting office at Cornell, I saw a talk by Joe Schaefer at the Joint Statistical Meetings about multiple imputation. I was blown away. It seemed too good to be true–it solved pretty much all of the problems with missing data.
So I read all that I could, attended a week-long mini-class, and tried it all out.
At that time, you had to use special stand-alone software to implement it, and all the ones I tried were a bit clunky to use.
Luckily, statistical software has caught up. And in that time, a few new studies have shown that some of the restrictive assumptions of multiple imputation aren’t as restrictive as they at first seemed. So multiple imputation is easier and more accurate than ever.
It’s also become clear that some of those old methods aren’t always as horrible as they seemed–there are some situations when listwise deletion works just fine.
But it pays to know the difference, and how to implement not just multiple imputation, but maximum likelihood approaches, which also give great outcomes and are quite a bit easier to use.
That is what you’ll learn in this workshop–the issues involved in missing data, an in depth understanding of the approaches and how to implement them, and the steps to diagnose the best approach in your situation.
In this workshop, you will learn:
Module 1: Missing Data–The Problem and Basic Solutions
Part 1. What is Missing Data?
Part 2. Missing Data Mechanisms
Part 3. The Four Main Approaches
Part 4. Complete Case Analysis
Part 5. Imputation
In this first module, you’ll get the big picture. The real issues, causes, and the solutions.
You learn step by step what the different mechanisms are–exactly how random the missingness is, and how that affects your results.
You’ll get an understanding of where missing data fits in to an analysis strategy and its relationship to other types of problem data–censoring, truncation, and other partial information.
And finally, we’ll explore two traditional, simple techniques for dealing with missing data–complete case analysis and imputation. They do work in some situations, but they’re disasters in others. You will learn how to tell the difference, and how to use them well.
Module 2: Multiple Imputation
Part 1: What is Multiple Imputation: The Concept
Part 2: When to Use it
Part 3: How to Do it, Step-by-Step, in SPSS and SAS
Mulitiple Imputation is a godsend in some really hairy missing data situations. Even with up to 50% of data missing, it can give you unbiased parameter estimates, standard errors, and full power.
But it has to be done well, and that’s not always easy. It requires a solid imputation algorithm and model.
This module will teach you, in detail, how to build an imputation model, how it differs from your analysis model, and what to do with the resulting imputed data.
Module 3: Multiple Imputation in Practice–Special Cases
Part 1: Multiple Imputation for Categorical Variables
Part 2: Imputation of the Dependent Variable
Part 3: The Role of Interaction Terms and Transformations in Imputations
Part 4: Imputing Scales or Scale Items
Multiple Imputation is very simple if only one predictor variable is missing data, it is highly correlated with other variables, and if it is continuous and normally distributed.
But real data is never so clean.
Luckily, multiple imputation can handle a lot of mess. So in this module, we’ll explore how to do mulitple imputaton in many messy situations.
So you will know how to make solid analysis decisions even with messy data.
Module 4: Maximum Likelihood and NonIgnorable Missing Data
Part 1: Maximum Likelihood Approaches
Part 2: Non-Ignorable Missing Data
Multiple Imputation isn’t the only game in town. There are a number of Maximum Likelihood techniques for running models that have all the advantages of Multiple Imputation without the hassle of imputing anything.
You may already be using some of them. And if you’re running linear models, you can take advantage of these techniques right as you run your models. No extra steps required.
It’s actually quite easy to do. But it only works for linear models.
So in part 1 you’ll learn what maximum likelihood estimation is, the types of analyses for which it works, and the exact steps to implement it.
Then in part 2, we’ll briefly discuss the approaches available for non-ignorable missing data. This is where you really have to make some crazy assumptions because the approaches require you to know something about the missing values.
Module 5: Missing Data Diagnosis
Part 1: Decision Factors in Choosing an Approach
Part 2: Missing Data Diagnosis, Step-by-Step
Part 3: Conclusions
Part of the reason it is so hard to learn how to deal with missing data is that the right approach depends on how much data are missing, patterns of missing data, why the data are missing, and how you will use the data in analysis.
These all vary in different types of research.
Learning how to analyze the patterns and reasoning for choosing an approach may be the most important part of the workshop.
This is actually the first step in dealing with missing data, but we save it for last so you have a clear picture of what your options are once you do the diagnosis.
So in this module, you’ll learn, in detail, how to analyze the data and the patterns of missingness to figure out the most likely mechanism, the effects of the missing data, and the best way to proceed in dealing with it.
But can I keep up?
The workshop is pitched at a level that should make it of interest to both students and professionals.
You do need to spend some time each week. You will learn concepts and get some clarity if you don’t practice what you’ve learned on your own. But you won’t entirely get it.
This is a workshop where you want to get your hands dirty with some data. Please expect to spend 2-3 hours per module just doing the exercises.
This workshop is for you if you:
- Have struggled with the devastating loss of power that comes from missing data
- Realize that listwise deletion and mean imputation don’t usually work well, and are looking for a better way
- Have heard about the amazing miracle of multiple imputation and want to learn what it is and how to do it
- Have struggled with using multiple imputation and realize that it can be quite difficult to implement well. You want to know when is it really necessary, and when (and how) can you use Maximimum Likelihood instead, which is both simple and powerful
- You will get the most out of the workshop if you have had at least two statistics classes and at least two years experience in data analysis.
- You use SAS, SPSS, Stata or R. You are welcome to use another software package, but at this time the workshop only includes examples in SAS, SPSS, Stata or R. Important: SPSS can only do multiple imputation in version 17.0 and higher, but there is a work-around for earlier versions, which I will show you. For any of the SPSS work, you will need to have the missing values add-on module. If you have it, “Missing Values” will appear in your Analyze menu. If you don’t and are employed by a university, you can get a one-year license for Windows or Mac to the full SPSS suite, including all their modules at On the Hub. Note that the Grad Pack does NOT contain the Missing Values Module, but the Faculty Pack does.
- We will use AMOS for the Full Information Maximum Likelihood. AMOS now comes bundled with most versions of SPSS. No prior experience using AMOS is necessary. Full Information Maximum Likelihood can also be run in Lisrel and MPlus, but I do not have experience with those programs.
The workshop is an On Demand Online Workshop
On Demand Online Workshops are the ultimate in convenience. Everything is housed on a private workshop website where you can get all the workshop materials when you need it, at your own speed, any time of day or night.
You will get a one-year membership in this workshop website, so feel free to come back again and again.
The workshop videos stream from the site, so as a member, you always have access to the most updated version.
The workshop materials are broken into modules, each on a separate page, so they’re easy to keep track of and you won’t get overwhelmed with too much information at once.
Each module contains:
- A set of streaming training videos. Watch them whenever it’s convenient for you, as quickly or slowly as you need.
- Supplemental handouts, including the presentation slides, syntax files to recreate the examples, and handy checklists and/or worksheets
- SPSS, SAS, Stata, R and excel data files from real research studies, so you can see how to deal with the challenges inherent in real data
- Exercises to practice what you’ve learned, with answers to check your work.
- A place to type in questions for each module.
Karen has guided and trained researchers through their statistical analysis for over 15 years. Her focus is on helping statistics practitioners gain an intuitive understanding of how statistics is applied to real data in research studies.
Comments from past participants in this workshop:
“I struggled with trying on my own to connect the dots of what I learned in statistical courses, to applying it to my dissertation. The workshop was, in a word, Excellent! It provided me with knowledge and skills that have increased my ability to conduct a missing data analysis. And having the video recordings is the cream in the sauce.Personally I rate my experience as a 99.99% CI.”
“Thank you so much. I learned so much in this workshop and it will change the way I examine (and design) my data sets from now on! I think the homework is an excellent part of the workshop. The video recordings are terrific. All of the resources provided for this workshop were outstanding.”
Prudence Plummer-D’Amato, PhD
“Karen, I would like to thank you for the wonderful job with the workshop. I got a lot from it.
You are a great presenter (very understandable, given the complexity of some of the presented concepts).
It is a nice push for further self-studies when working on my projects. I will definitely will register for other ones.”
Edmonton, Alberta, Canada
To help give you some background, you’ll get videos of these webinars:
– What Happened to R squared?: Assessing Model Fit for Logistic, Multilevel, and Other Models that use Maximum Likelihood
– The 11 Steps to Performing any Statistical Model
– Random Intercept and Random Slope Models
We really, truly believe you’ll find this workshop helpful and your satisfaction is guaranteed. If you participate in the full workshop and find you are not satisfied for any reason, we will give you a full refund. Just notify us within 90 days of purchase.