Publications

2007
Kremer, Michael, Ryan Bubb, and David Levine. 2007. “The Economics of International Refugee Law”. Abstract
We model the current system of refugee protection based on the 1951 Convention Relating to the Status of Refugees as a Pareto improving contract that bound states to provide a more efficient level of the global public good of refugee protection. Our analysis suggests that the increase in economic migration since the 1951 Convention was adopted has made it more difficult for host states to distinguish between refugees and those who migrate in search of economic opportunities. The response of states to this screening problem has been to shade on performance of their obligations under the 1951 Convention by, inter alia, increasing the standards of proof of their refugee status determination procedures, resulting in more false negatives and refoulement of refugees. We show that the choice of standard of proof can exhibit strategic complementarity; as more states use a high standard of proof, the best response of other states may be to increase their standard of proof. We also model potential reform schemes in which wealthy states pay poorer states to host refugees that initially travel to the wealthy states, and argue that such transfer systems could ameliorate the screening problem by inducing self-selection among those who migrate and result in increased protection of refugees. However, such reforms could also make some developing countries worse-off by increasing their burden of hosting refugees without fully compensating them for their increased costs.
Kremer, Michael, Ryan Bubb, and David Levine. 2007. “The Economics of International Refugee Law”. Abstract
We model the current system of refugee protection based on the 1951 Convention Relating to the Status of Refugees as a Pareto improving contract that bound states to provide a more efficient level of the global public good of refugee protection. Our analysis suggests that the increase in economic migration since the 1951 Convention was adopted has made it more difficult for host states to distinguish between refugees and those who migrate in search of economic opportunities. The response of states to this screening problem has been to shade on performance of their obligations under the 1951 Convention by, inter alia, increasing the standards of proof of their refugee status determination procedures, resulting in more false negatives and refoulement of refugees. We show that the choice of standard of proof can exhibit strategic complementarity; as more states use a high standard of proof, the best response of other states may be to increase their standard of proof. We also model potential reform schemes in which wealthy states pay poorer states to host refugees that initially travel to the wealthy states, and argue that such transfer systems could ameliorate the screening problem by inducing self-selection among those who migrate and result in increased protection of refugees. However, such reforms could also make some developing countries worse-off by increasing their burden of hosting refugees without fully compensating them for their increased costs.
Kremer, Michael, Ryan Bubb, and David Levine. 2007. “The Economics of International Refugee Law”. Abstract
We model the current system of refugee protection based on the 1951 Convention Relating to the Status of Refugees as a Pareto improving contract that bound states to provide a more efficient level of the global public good of refugee protection. Our analysis suggests that the increase in economic migration since the 1951 Convention was adopted has made it more difficult for host states to distinguish between refugees and those who migrate in search of economic opportunities. The response of states to this screening problem has been to shade on performance of their obligations under the 1951 Convention by, inter alia, increasing the standards of proof of their refugee status determination procedures, resulting in more false negatives and refoulement of refugees. We show that the choice of standard of proof can exhibit strategic complementarity; as more states use a high standard of proof, the best response of other states may be to increase their standard of proof. We also model potential reform schemes in which wealthy states pay poorer states to host refugees that initially travel to the wealthy states, and argue that such transfer systems could ameliorate the screening problem by inducing self-selection among those who migrate and result in increased protection of refugees. However, such reforms could also make some developing countries worse-off by increasing their burden of hosting refugees without fully compensating them for their increased costs.
Pande, Rohini, and Abhijit Banerjee. 2007. “Parochial Politics: Ethnic Preferences and Politician Corruption”. Abstract
This paper examines how increased voter ethnicization, defined as a greater preference for the party representing one's ethnic group, affects politician quality. If politics is characterized by incomplete policy commitment, then ethnicization reduces average winner quality for the pro-majority party with the opposite true for the minority party. The effect increases with greater numerical dominance of the majority (and so social homogeneity). Empirical evidence from a survey on politician corruption that we conducted in North India is remarkably consistent with our theoretical predictions.
Also Faculty Research Working Papers Series, John F. Kennedy School of Government.
Download PDF
Pande, Rohini, and Abhijit Banerjee. 2007. “Parochial Politics: Ethnic Preferences and Politician Corruption”. Abstract
This paper examines how increased voter ethnicization, defined as a greater preference for the party representing one's ethnic group, affects politician quality. If politics is characterized by incomplete policy commitment, then ethnicization reduces average winner quality for the pro-majority party with the opposite true for the minority party. The effect increases with greater numerical dominance of the majority (and so social homogeneity). Empirical evidence from a survey on politician corruption that we conducted in North India is remarkably consistent with our theoretical predictions.
Also Faculty Research Working Papers Series, John F. Kennedy School of Government.
Download PDF
Kremer, Michael, and Tom Wilkening. 2007. “Antiquities: Long-Term Leases as an Alternative to Export Bans”. Abstract
Most countries prohibit the export of certain antiquities. This practice often leads to illegal excavation and looting for the black market, which damages the items and destroys important aspects of the archaeological record. We argue that long-term leases of antiquities would raise revenue for the country of origin while preserving national long-term ownership rights. By putting antiquities into the hands of the highest value consumer in each period, allowing leases would generate incentives for the protection of objects.
Kremer, Michael, and Tom Wilkening. 2007. “Antiquities: Long-Term Leases as an Alternative to Export Bans”. Abstract
Most countries prohibit the export of certain antiquities. This practice often leads to illegal excavation and looting for the black market, which damages the items and destroys important aspects of the archaeological record. We argue that long-term leases of antiquities would raise revenue for the country of origin while preserving national long-term ownership rights. By putting antiquities into the hands of the highest value consumer in each period, allowing leases would generate incentives for the protection of objects.
The lack of "social capital" is increasingly forwarded as an explanation for why communities perform poorly. Yet, to what extent can these community-specific constraints be compensated? I address this question by examining determinants of collective success in a costly problem in developing economies—the upkeep of local public goods. One difficulty is obtaining reliable outcome measures for comparable collective tasks across well-defined communities. In order to resolve this I conduct detailed surveys of community-maintained infrastructure projects in Northern Pakistan. The findings show that while community-specific constraints do matter, they can be compensated by better project design. Inequality, social fragmentation, and lack of leadership in the community do have adverse consequences but these can be overcome by changes in project complexity, community participation and return distribution. Moreover, the evidence suggests that better design matters even more for communities with poorer attributes. Using community fixed effects and instrumental variables offers a significant improvement in empirical identification over previous studies. These results offer evidence that appropriate design can enable projects to succeed even in “bad” communities.
King, Gary, and Daniel Hopkins. 2007. “Extracting Systematic Social Science Meaning from Text”. Abstract
We develop two methods of automated content analysis that give approximately unbiased estimates of quantities of theoretical interest to social scientists. With a small sample of documents hand coded into investigator-chosen categories, our methods can give accurate estimates of the proportion of text documents in each category in a larger population. Existing methods successful at maximizing the percent of documents correctly classified allow for the possibility of substantial estimation bias in the category proportions of interest. Our first approach corrects this bias for any existing classifier, with no additional assumptions. Our second method estimates the proportions without the intermediate step of individual document classification, and thereby greatly reduces the required assumptions. For both methods, we also correct statistically, apparently for the first time, for the far less-than-perfect levels of inter-coder reliability that typically characterize human attempts to classify documents, an approach that will normally outperform even population hand coding when that is feasible. These methods allow us to measure the classical conception of public opinion as those views that are actively and publicly expressed, rather than the attitudes or nonattitudes of the populace as a whole. To do this, we track the daily opinions of millions of people about President Bush and the candidates for the 2008 presidential nominations using a massive data set of online blogs we develop and make available with this article. We also offer easy-to-use software that implements our methods, which we also demonstrate work with many other sources of unstructured text.
This paper describes material that is patent pending. Earlier versions of this paper were presented at the 2006 annual meetings of the Midwest Political Science Association (under a different title) and the Society for Political Methodology.
Download PDF
King, Gary, and Daniel Hopkins. 2007. “Extracting Systematic Social Science Meaning from Text”. Abstract
We develop two methods of automated content analysis that give approximately unbiased estimates of quantities of theoretical interest to social scientists. With a small sample of documents hand coded into investigator-chosen categories, our methods can give accurate estimates of the proportion of text documents in each category in a larger population. Existing methods successful at maximizing the percent of documents correctly classified allow for the possibility of substantial estimation bias in the category proportions of interest. Our first approach corrects this bias for any existing classifier, with no additional assumptions. Our second method estimates the proportions without the intermediate step of individual document classification, and thereby greatly reduces the required assumptions. For both methods, we also correct statistically, apparently for the first time, for the far less-than-perfect levels of inter-coder reliability that typically characterize human attempts to classify documents, an approach that will normally outperform even population hand coding when that is feasible. These methods allow us to measure the classical conception of public opinion as those views that are actively and publicly expressed, rather than the attitudes or nonattitudes of the populace as a whole. To do this, we track the daily opinions of millions of people about President Bush and the candidates for the 2008 presidential nominations using a massive data set of online blogs we develop and make available with this article. We also offer easy-to-use software that implements our methods, which we also demonstrate work with many other sources of unstructured text.
This paper describes material that is patent pending. Earlier versions of this paper were presented at the 2006 annual meetings of the Midwest Political Science Association (under a different title) and the Society for Political Methodology.
Download PDF
Kremer, Michael, Paul Glewwe, and Sylvie Moulin. 2007. “Many Children Left Behind? Textbooks and Test Scores in Kenya”. Abstract
A randomized evaluation suggests that a program which provided official textbooks to randomly selected rural Kenyan primary schools did not increase test scores for the average student. In contrast, the previous literature suggests that textbook provision has a large impact on test scores. Disaggregating the results by students’ initial academic achievement suggests a potential explanation for the lack of an overall impact. Textbooks increased scores for students with high initial academic achievement and increased the probability that the students who had made it to the selective final year of primary school would go on to secondary school. However, students with weaker academic backgrounds did not benefit from the textbooks. Many pupils could not read the textbooks, which are written in English, most students’ third language. The results are consistent with the hypothesis that the Kenyan education system and curricular materials are oriented to the academically strongest students rather than to typical students. More generally, many students may be left behind in societies that combine 1) a centralized, unified education system; 2) the heterogeneity in student preparation associated with rapid expansion of education; and 3) disproportionate elite power.
Kremer, Michael, Paul Glewwe, and Sylvie Moulin. 2007. “Many Children Left Behind? Textbooks and Test Scores in Kenya”. Abstract
A randomized evaluation suggests that a program which provided official textbooks to randomly selected rural Kenyan primary schools did not increase test scores for the average student. In contrast, the previous literature suggests that textbook provision has a large impact on test scores. Disaggregating the results by students’ initial academic achievement suggests a potential explanation for the lack of an overall impact. Textbooks increased scores for students with high initial academic achievement and increased the probability that the students who had made it to the selective final year of primary school would go on to secondary school. However, students with weaker academic backgrounds did not benefit from the textbooks. Many pupils could not read the textbooks, which are written in English, most students’ third language. The results are consistent with the hypothesis that the Kenyan education system and curricular materials are oriented to the academically strongest students rather than to typical students. More generally, many students may be left behind in societies that combine 1) a centralized, unified education system; 2) the heterogeneity in student preparation associated with rapid expansion of education; and 3) disproportionate elite power.
Kremer, Michael, Paul Glewwe, and Sylvie Moulin. 2007. “Many Children Left Behind? Textbooks and Test Scores in Kenya”. Abstract
A randomized evaluation suggests that a program which provided official textbooks to randomly selected rural Kenyan primary schools did not increase test scores for the average student. In contrast, the previous literature suggests that textbook provision has a large impact on test scores. Disaggregating the results by students’ initial academic achievement suggests a potential explanation for the lack of an overall impact. Textbooks increased scores for students with high initial academic achievement and increased the probability that the students who had made it to the selective final year of primary school would go on to secondary school. However, students with weaker academic backgrounds did not benefit from the textbooks. Many pupils could not read the textbooks, which are written in English, most students’ third language. The results are consistent with the hypothesis that the Kenyan education system and curricular materials are oriented to the academically strongest students rather than to typical students. More generally, many students may be left behind in societies that combine 1) a centralized, unified education system; 2) the heterogeneity in student preparation associated with rapid expansion of education; and 3) disproportionate elite power.
King, Gary, Kosuke Imai, and Olivia Lau. 2007. “Toward a Common Framework for Statistical Analysis and Development”. Abstract
We describe some progress toward a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. The framework we have developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures already implemented in R, without requiring any changes in existing approaches. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.
King, Gary, Kosuke Imai, and Olivia Lau. 2007. “Toward a Common Framework for Statistical Analysis and Development”. Abstract
We describe some progress toward a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. The framework we have developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures already implemented in R, without requiring any changes in existing approaches. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.
King, Gary, Kosuke Imai, and Olivia Lau. 2007. “Toward a Common Framework for Statistical Analysis and Development”. Abstract
We describe some progress toward a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. The framework we have developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures already implemented in R, without requiring any changes in existing approaches. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.
Verbal autopsy procedures are widely used for estimating cause-specific mortality in areas without medical death certification. Data on symptoms reported by caregivers along with the cause of death are collected from a medical facility, and the cause-of-death distribution is estimated in the population where only symptom data are available. Current approaches analyze only one cause at a time, involve assumptions judged difficult or impossible to satisfy, and require expensive, time consuming, or unreliable physician reviews, expert algorithms, or parametric statistical models. By generalizing current approaches to analyze multiple causes, we show how most of the difficult assumptions underlying existing methods can be dropped. These generalizations also make physician review, expert algorithms, and parametric statistical assumptions unnecessary. With theoretical results, and empirical analyses in data from China and Tanzania, we illustrate the accuracy of this approach. While no method of analyzing verbal autopsy data, including the more computationally intensive approach offered here, can give accurate estimates in all circumstances, the procedure offered is conceptually simpler, less expensive, more general, as or more replicable, and easier to use in practice than existing approaches. We also show how our focus on estimating aggregate proportions, which are the quantities of primary interest in verbal autopsy studies, may also greatly reduce the assumptions necessary, and thus improve the performance of, many individual classifiers in this and other areas. As a companion to this paper, we also offer easy-to-use software that implements the methods discussed herein.
Verbal autopsy procedures are widely used for estimating cause-specific mortality in areas without medical death certification. Data on symptoms reported by caregivers along with the cause of death are collected from a medical facility, and the cause-of-death distribution is estimated in the population where only symptom data are available. Current approaches analyze only one cause at a time, involve assumptions judged difficult or impossible to satisfy, and require expensive, time consuming, or unreliable physician reviews, expert algorithms, or parametric statistical models. By generalizing current approaches to analyze multiple causes, we show how most of the difficult assumptions underlying existing methods can be dropped. These generalizations also make physician review, expert algorithms, and parametric statistical assumptions unnecessary. With theoretical results, and empirical analyses in data from China and Tanzania, we illustrate the accuracy of this approach. While no method of analyzing verbal autopsy data, including the more computationally intensive approach offered here, can give accurate estimates in all circumstances, the procedure offered is conceptually simpler, less expensive, more general, as or more replicable, and easier to use in practice than existing approaches. We also show how our focus on estimating aggregate proportions, which are the quantities of primary interest in verbal autopsy studies, may also greatly reduce the assumptions necessary, and thus improve the performance of, many individual classifiers in this and other areas. As a companion to this paper, we also offer easy-to-use software that implements the methods discussed herein.
Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in these fields have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross-section data structures common in these fields. We attempt to rectify this situation. First, we build a multiple imputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we build nonignorable missingness models by enabling analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-tointerpret model parameters. Third, because these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also made it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing algorithms.
Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in these fields have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross-section data structures common in these fields. We attempt to rectify this situation. First, we build a multiple imputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we build nonignorable missingness models by enabling analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-tointerpret model parameters. Third, because these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also made it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing algorithms.

Pages