2007
We model the current system of refugee protection based on the 1951 Convention
Relating to the Status of Refugees as a Pareto improving contract that bound states to
provide a more efficient level of the global public good of refugee protection. Our analysis
suggests that the increase in economic migration since the 1951 Convention was adopted has
made it more difficult for host states to distinguish between refugees and those who migrate
in search of economic opportunities. The response of states to this screening problem has
been to shade on performance of their obligations under the 1951 Convention by, inter alia,
increasing the standards of proof of their refugee status determination procedures, resulting
in more false negatives and refoulement of refugees. We show that the choice of standard
of proof can exhibit strategic complementarity; as more states use a high standard of proof,
the best response of other states may be to increase their standard of proof. We also model
potential reform schemes in which wealthy states pay poorer states to host refugees that
initially travel to the wealthy states, and argue that such transfer systems could ameliorate
the screening problem by inducing self-selection among those who migrate and result in
increased protection of refugees. However, such reforms could also make some developing
countries worse-off by increasing their burden of hosting refugees without fully compensating
them for their increased costs.
We model the current system of refugee protection based on the 1951 Convention
Relating to the Status of Refugees as a Pareto improving contract that bound states to
provide a more efficient level of the global public good of refugee protection. Our analysis
suggests that the increase in economic migration since the 1951 Convention was adopted has
made it more difficult for host states to distinguish between refugees and those who migrate
in search of economic opportunities. The response of states to this screening problem has
been to shade on performance of their obligations under the 1951 Convention by, inter alia,
increasing the standards of proof of their refugee status determination procedures, resulting
in more false negatives and refoulement of refugees. We show that the choice of standard
of proof can exhibit strategic complementarity; as more states use a high standard of proof,
the best response of other states may be to increase their standard of proof. We also model
potential reform schemes in which wealthy states pay poorer states to host refugees that
initially travel to the wealthy states, and argue that such transfer systems could ameliorate
the screening problem by inducing self-selection among those who migrate and result in
increased protection of refugees. However, such reforms could also make some developing
countries worse-off by increasing their burden of hosting refugees without fully compensating
them for their increased costs.
We model the current system of refugee protection based on the 1951 Convention
Relating to the Status of Refugees as a Pareto improving contract that bound states to
provide a more efficient level of the global public good of refugee protection. Our analysis
suggests that the increase in economic migration since the 1951 Convention was adopted has
made it more difficult for host states to distinguish between refugees and those who migrate
in search of economic opportunities. The response of states to this screening problem has
been to shade on performance of their obligations under the 1951 Convention by, inter alia,
increasing the standards of proof of their refugee status determination procedures, resulting
in more false negatives and refoulement of refugees. We show that the choice of standard
of proof can exhibit strategic complementarity; as more states use a high standard of proof,
the best response of other states may be to increase their standard of proof. We also model
potential reform schemes in which wealthy states pay poorer states to host refugees that
initially travel to the wealthy states, and argue that such transfer systems could ameliorate
the screening problem by inducing self-selection among those who migrate and result in
increased protection of refugees. However, such reforms could also make some developing
countries worse-off by increasing their burden of hosting refugees without fully compensating
them for their increased costs.
This paper examines how increased voter ethnicization, defined as a greater preference for the
party representing one's ethnic group, affects politician quality. If politics is characterized by
incomplete policy commitment, then ethnicization reduces average winner quality for the pro-majority party with the opposite true for the minority party. The effect increases with greater
numerical dominance of the majority (and so social homogeneity). Empirical evidence from
a survey on politician corruption that we conducted in North India is remarkably consistent
with our theoretical predictions.
Also Faculty Research Working Papers Series, John F. Kennedy School of Government.
Download PDFThis paper examines how increased voter ethnicization, defined as a greater preference for the
party representing one's ethnic group, affects politician quality. If politics is characterized by
incomplete policy commitment, then ethnicization reduces average winner quality for the pro-majority party with the opposite true for the minority party. The effect increases with greater
numerical dominance of the majority (and so social homogeneity). Empirical evidence from
a survey on politician corruption that we conducted in North India is remarkably consistent
with our theoretical predictions.
Also Faculty Research Working Papers Series, John F. Kennedy School of Government.
Download PDFMost countries prohibit the export of certain antiquities. This practice often leads to
illegal excavation and looting for the black market, which damages the items and
destroys important aspects of the archaeological record. We argue that long-term leases
of antiquities would raise revenue for the country of origin while preserving national
long-term ownership rights. By putting antiquities into the hands of the highest value
consumer in each period, allowing leases would generate incentives for the protection of
objects.
Most countries prohibit the export of certain antiquities. This practice often leads to
illegal excavation and looting for the black market, which damages the items and
destroys important aspects of the archaeological record. We argue that long-term leases
of antiquities would raise revenue for the country of origin while preserving national
long-term ownership rights. By putting antiquities into the hands of the highest value
consumer in each period, allowing leases would generate incentives for the protection of
objects.
The lack of "social capital" is increasingly forwarded as an explanation for why communities
perform poorly. Yet, to what extent can these community-specific constraints be compensated? I
address this question by examining determinants of collective success in a costly problem in
developing economies—the upkeep of local public goods. One difficulty is obtaining reliable
outcome measures for comparable collective tasks across well-defined communities. In order to
resolve this I conduct detailed surveys of community-maintained infrastructure projects in Northern
Pakistan. The findings show that while community-specific constraints do matter, they can be
compensated by better project design. Inequality, social fragmentation, and lack of leadership in the
community do have adverse consequences but these can be overcome by changes in project
complexity, community participation and return distribution. Moreover, the evidence suggests that
better design matters even more for communities with poorer attributes. Using community fixed
effects and instrumental variables offers a significant improvement in empirical identification over
previous studies. These results offer evidence that appropriate design can enable projects to succeed
even in “bad” communities.
We develop two methods of automated content analysis that give approximately unbiased estimates
of quantities of theoretical interest to social scientists. With a small sample of documents
hand coded into investigator-chosen categories, our methods can give accurate estimates of the
proportion of text documents in each category in a larger population. Existing methods successful
at maximizing the percent of documents correctly classified allow for the possibility of substantial
estimation bias in the category proportions of interest. Our first approach corrects this bias for any
existing classifier, with no additional assumptions. Our second method estimates the proportions
without the intermediate step of individual document classification, and thereby greatly reduces
the required assumptions. For both methods, we also correct statistically, apparently for the first
time, for the far less-than-perfect levels of inter-coder reliability that typically characterize human
attempts to classify documents, an approach that will normally outperform even population hand
coding when that is feasible. These methods allow us to measure the classical conception of public
opinion as those views that are actively and publicly expressed, rather than the attitudes or nonattitudes
of the populace as a whole. To do this, we track the daily opinions of millions of people
about President Bush and the candidates for the 2008 presidential nominations using a massive
data set of online blogs we develop and make available with this article. We also offer easy-to-use
software that implements our methods, which we also demonstrate work with many other sources
of unstructured text.
This paper describes material that is patent pending. Earlier
versions of this paper were presented at the 2006 annual meetings of the Midwest Political Science Association (under a different title) and the Society for Political Methodology.
Download PDFWe develop two methods of automated content analysis that give approximately unbiased estimates
of quantities of theoretical interest to social scientists. With a small sample of documents
hand coded into investigator-chosen categories, our methods can give accurate estimates of the
proportion of text documents in each category in a larger population. Existing methods successful
at maximizing the percent of documents correctly classified allow for the possibility of substantial
estimation bias in the category proportions of interest. Our first approach corrects this bias for any
existing classifier, with no additional assumptions. Our second method estimates the proportions
without the intermediate step of individual document classification, and thereby greatly reduces
the required assumptions. For both methods, we also correct statistically, apparently for the first
time, for the far less-than-perfect levels of inter-coder reliability that typically characterize human
attempts to classify documents, an approach that will normally outperform even population hand
coding when that is feasible. These methods allow us to measure the classical conception of public
opinion as those views that are actively and publicly expressed, rather than the attitudes or nonattitudes
of the populace as a whole. To do this, we track the daily opinions of millions of people
about President Bush and the candidates for the 2008 presidential nominations using a massive
data set of online blogs we develop and make available with this article. We also offer easy-to-use
software that implements our methods, which we also demonstrate work with many other sources
of unstructured text.
This paper describes material that is patent pending. Earlier
versions of this paper were presented at the 2006 annual meetings of the Midwest Political Science Association (under a different title) and the Society for Political Methodology.
Download PDFA randomized evaluation suggests that a program which provided official textbooks to randomly
selected rural Kenyan primary schools did not increase test scores for the average student. In
contrast, the previous literature suggests that textbook provision has a large impact on test scores.
Disaggregating the results by students’ initial academic achievement suggests a potential
explanation for the lack of an overall impact. Textbooks increased scores for students with high
initial academic achievement and increased the probability that the students who had made it to
the selective final year of primary school would go on to secondary school. However, students
with weaker academic backgrounds did not benefit from the textbooks. Many pupils could not
read the textbooks, which are written in English, most students’ third language. The results are
consistent with the hypothesis that the Kenyan education system and curricular materials are
oriented to the academically strongest students rather than to typical students. More generally,
many students may be left behind in societies that combine 1) a centralized, unified education
system; 2) the heterogeneity in student preparation associated with rapid expansion of education;
and 3) disproportionate elite power.
A randomized evaluation suggests that a program which provided official textbooks to randomly
selected rural Kenyan primary schools did not increase test scores for the average student. In
contrast, the previous literature suggests that textbook provision has a large impact on test scores.
Disaggregating the results by students’ initial academic achievement suggests a potential
explanation for the lack of an overall impact. Textbooks increased scores for students with high
initial academic achievement and increased the probability that the students who had made it to
the selective final year of primary school would go on to secondary school. However, students
with weaker academic backgrounds did not benefit from the textbooks. Many pupils could not
read the textbooks, which are written in English, most students’ third language. The results are
consistent with the hypothesis that the Kenyan education system and curricular materials are
oriented to the academically strongest students rather than to typical students. More generally,
many students may be left behind in societies that combine 1) a centralized, unified education
system; 2) the heterogeneity in student preparation associated with rapid expansion of education;
and 3) disproportionate elite power.
A randomized evaluation suggests that a program which provided official textbooks to randomly
selected rural Kenyan primary schools did not increase test scores for the average student. In
contrast, the previous literature suggests that textbook provision has a large impact on test scores.
Disaggregating the results by students’ initial academic achievement suggests a potential
explanation for the lack of an overall impact. Textbooks increased scores for students with high
initial academic achievement and increased the probability that the students who had made it to
the selective final year of primary school would go on to secondary school. However, students
with weaker academic backgrounds did not benefit from the textbooks. Many pupils could not
read the textbooks, which are written in English, most students’ third language. The results are
consistent with the hypothesis that the Kenyan education system and curricular materials are
oriented to the academically strongest students rather than to typical students. More generally,
many students may be left behind in societies that combine 1) a centralized, unified education
system; 2) the heterogeneity in student preparation associated with rapid expansion of education;
and 3) disproportionate elite power.
We describe some progress toward a common framework for statistical analysis and software development
built on and within the R language, including R’s numerous existing packages. The framework we have
developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures
already implemented in R, without requiring any changes in existing approaches. We conjecture that
it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of
the theory of inference on which they are based, notation with which they were developed, and programming
syntax with which they have been implemented. This development enabled us, and should enable others,
to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting
notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous
academic disciplines. The approach also enables one to build a graphical user interface that automatically
includes any method encompassed within the framework. We hope that the result of this line of research
will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied
researchers whether or not they use or program in R.
We describe some progress toward a common framework for statistical analysis and software development
built on and within the R language, including R’s numerous existing packages. The framework we have
developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures
already implemented in R, without requiring any changes in existing approaches. We conjecture that
it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of
the theory of inference on which they are based, notation with which they were developed, and programming
syntax with which they have been implemented. This development enabled us, and should enable others,
to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting
notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous
academic disciplines. The approach also enables one to build a graphical user interface that automatically
includes any method encompassed within the framework. We hope that the result of this line of research
will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied
researchers whether or not they use or program in R.
We describe some progress toward a common framework for statistical analysis and software development
built on and within the R language, including R’s numerous existing packages. The framework we have
developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures
already implemented in R, without requiring any changes in existing approaches. We conjecture that
it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of
the theory of inference on which they are based, notation with which they were developed, and programming
syntax with which they have been implemented. This development enabled us, and should enable others,
to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting
notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous
academic disciplines. The approach also enables one to build a graphical user interface that automatically
includes any method encompassed within the framework. We hope that the result of this line of research
will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied
researchers whether or not they use or program in R.
Verbal autopsy procedures are widely used for estimating cause-specific mortality in areas
without medical death certification. Data on symptoms reported by caregivers along
with the cause of death are collected from a medical facility, and the cause-of-death distribution
is estimated in the population where only symptom data are available. Current
approaches analyze only one cause at a time, involve assumptions judged difficult or impossible
to satisfy, and require expensive, time consuming, or unreliable physician reviews,
expert algorithms, or parametric statistical models. By generalizing current approaches
to analyze multiple causes, we show how most of the difficult assumptions underlying existing
methods can be dropped. These generalizations also make physician review, expert
algorithms, and parametric statistical assumptions unnecessary. With theoretical results,
and empirical analyses in data from China and Tanzania, we illustrate the accuracy of
this approach. While no method of analyzing verbal autopsy data, including the more
computationally intensive approach offered here, can give accurate estimates in all circumstances,
the procedure offered is conceptually simpler, less expensive, more general, as
or more replicable, and easier to use in practice than existing approaches. We also show
how our focus on estimating aggregate proportions, which are the quantities of primary
interest in verbal autopsy studies, may also greatly reduce the assumptions necessary, and
thus improve the performance of, many individual classifiers in this and other areas. As a
companion to this paper, we also offer easy-to-use software that implements the methods
discussed herein.
Verbal autopsy procedures are widely used for estimating cause-specific mortality in areas
without medical death certification. Data on symptoms reported by caregivers along
with the cause of death are collected from a medical facility, and the cause-of-death distribution
is estimated in the population where only symptom data are available. Current
approaches analyze only one cause at a time, involve assumptions judged difficult or impossible
to satisfy, and require expensive, time consuming, or unreliable physician reviews,
expert algorithms, or parametric statistical models. By generalizing current approaches
to analyze multiple causes, we show how most of the difficult assumptions underlying existing
methods can be dropped. These generalizations also make physician review, expert
algorithms, and parametric statistical assumptions unnecessary. With theoretical results,
and empirical analyses in data from China and Tanzania, we illustrate the accuracy of
this approach. While no method of analyzing verbal autopsy data, including the more
computationally intensive approach offered here, can give accurate estimates in all circumstances,
the procedure offered is conceptually simpler, less expensive, more general, as
or more replicable, and easier to use in practice than existing approaches. We also show
how our focus on estimating aggregate proportions, which are the quantities of primary
interest in verbal autopsy studies, may also greatly reduce the assumptions necessary, and
thus improve the performance of, many individual classifiers in this and other areas. As a
companion to this paper, we also offer easy-to-use software that implements the methods
discussed herein.
Applications of modern methods for analyzing data with missing values, based primarily
on multiple imputation, have in the last half-decade become common in American politics
and political behavior. Scholars in these fields have thus increasingly avoided the biases
and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation.
However, researchers in much of comparative politics and international relations,
and others with similar data, have been unable to do the same because the best available
imputation methods work poorly with the time-series cross-section data structures
common in these fields. We attempt to rectify this situation. First, we build a multiple
imputation model that allows smooth time trends, shifts across cross-sectional units, and
correlations over time and space, resulting in far more accurate imputations. Second, we
build nonignorable missingness models by enabling analysts to incorporate knowledge from
area studies experts via priors on individual missing cell values, rather than on difficult-tointerpret
model parameters. Third, because these tasks could not be accomplished within
existing imputation algorithms, in that they cannot handle as many variables as needed
even in the simpler cross-sectional data for which they were designed, we also develop a
new algorithm that substantially expands the range of computationally feasible data types
and sizes for which multiple imputation can be used. These developments also made it
possible to implement the methods introduced here in freely available open source software
that is considerably more reliable than existing algorithms.
Applications of modern methods for analyzing data with missing values, based primarily
on multiple imputation, have in the last half-decade become common in American politics
and political behavior. Scholars in these fields have thus increasingly avoided the biases
and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation.
However, researchers in much of comparative politics and international relations,
and others with similar data, have been unable to do the same because the best available
imputation methods work poorly with the time-series cross-section data structures
common in these fields. We attempt to rectify this situation. First, we build a multiple
imputation model that allows smooth time trends, shifts across cross-sectional units, and
correlations over time and space, resulting in far more accurate imputations. Second, we
build nonignorable missingness models by enabling analysts to incorporate knowledge from
area studies experts via priors on individual missing cell values, rather than on difficult-tointerpret
model parameters. Third, because these tasks could not be accomplished within
existing imputation algorithms, in that they cannot handle as many variables as needed
even in the simpler cross-sectional data for which they were designed, we also develop a
new algorithm that substantially expands the range of computationally feasible data types
and sizes for which multiple imputation can be used. These developments also made it
possible to implement the methods introduced here in freely available open source software
that is considerably more reliable than existing algorithms.