Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b).
We extend a unified and easy-to-use approach to measurement error and missing data. In our companion article, Blackwell, Honaker, and King give an intuitive overview of the new technique, along with practical suggestions and empirical applications. Here, we offer more precise technical details, more sophisticated measurement error model specifications and estimation procedures, and analyses to assess the approach’s robustness to correlated measurement errors and to errors in categorical variables. These results support using the technique to reduce bias and increase efficiency in a wide variety of empirical research.
Applications of modern methods for analyzing data with missing values, based primarily
on multiple imputation, have in the last half-decade become common in American politics
and political behavior. Scholars in these fields have thus increasingly avoided the biases
and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation.
However, researchers in much of comparative politics and international relations,
and others with similar data, have been unable to do the same because the best available
imputation methods work poorly with the time-series cross-section data structures
common in these fields. We attempt to rectify this situation. First, we build a multiple
imputation model that allows smooth time trends, shifts across cross-sectional units, and
correlations over time and space, resulting in far more accurate imputations. Second, we
build nonignorable missingness models by enabling analysts to incorporate knowledge from
area studies experts via priors on individual missing cell values, rather than on difficult-tointerpret
model parameters. Third, because these tasks could not be accomplished within
existing imputation algorithms, in that they cannot handle as many variables as needed
even in the simpler cross-sectional data for which they were designed, we also develop a
new algorithm that substantially expands the range of computationally feasible data types
and sizes for which multiple imputation can be used. These developments also made it
possible to implement the methods introduced here in freely available open source software
that is considerably more reliable than existing algorithms.