Political Centralization and Government Accountability

This paper studies fiscal federalism when regions differ in voters’ability to monitor public offi cials. We develop a model of political agency in which rent-seeking politicians provide public goods to win support from heterogeneously informed voters. In equilibrium, voter information increases government accountability but displays decreasing returns. Therefore, political centralization reduces aggregate rent extraction when voter information varies across regions. It increases welfare as long as the central government is required to provide public goods uniformly across regions. The need for uniformity implies an endogenous trade off between reducing rents through centralization and matching idiosyncratic preferences through decentralization. We find that a federal structure with overlapping levels of government can be optimal only if regional differences in accountability are suffi ciently large. The model predicts that less informed regions should reap greater benefits when the central government sets a uniform policy. Consistent with our theory, we present empirical evidence that less informed states enjoyed faster declines in pollution after the 1970 Clean Air Act centralized environmental policy at the federal level.


I. Introduction
In the run-up to Scotland's 2014 independence referendum, the Scottish Government published a guide setting out its case for independence. Alex Salmond, the premier, argued that Scotland ought to become independent because its people are di¤erent from those of other parts of the British Isles and thus need a di¤erent government of their own. "After Scotland becomes independent ... the people of Scotland are in charge. It will no longer be possible for governments to be elected and pursue policies against the wishes of the Scottish people" (Salmond 2013, p. x-xi).
The Scottish leader's argument …nds support in the standard economic theory of …scal federalism. Its core result is the Decentralization Theorem: absent policy spillovers, decentralization is more e¢ cient than centralization if regions are not identical. This proposition, introduced by Oates (1972), has proved a remarkably general paradigm (Lockwood 2006). Local governments can tailor their choices to the particular conditions of each jurisdiction and thus provide higher social welfare than a single policy adopted by a common government. With no economies of scale, each group with distinct preferences should have an independent government (Tiebout 1956;Bewley 1981). Increasing returns and externalities promote political integration, but heterogeneity raises the downsides of large jurisdictions (Alesina and Spolaore 2003). Political-economy frictions provide rigorous microfoundations for the inability of a central government to match local preferences (Lockwood 2002;Besley and Coate 2003;Harstad 2007).
Yet, empirical evidence shows that decentralization has not consistently delivered the bene…ts its advocates predicted in theory (Treisman 2007). The majority of Scottish voters that rejected independence in the referendum may have been risk averse, but not unwise. The experience of countries all over the world teaches that decentralization can harm the quality of government just as it can improve it. Mismanagement and lack of accountability are common in local governments, especially in developing and transition economies (Bardhan and Mookherjee 2006).
This paper develops a model of political agency that explains why decentralization can reduce accountability and answers three key questions. When regions are heterogeneous, what determines if power should be centralized or decentralized? How many levels of government should there be? How should state borders be drawn? Our theory is grounded on the observation that regions di¤er not only in preferences-the focus of the classic theory-but also in their ability to monitor elected o¢ cials and hold government accountable. Government accountability varies widely within the United States: o¢ cial corruption in Louisiana and Mississippi is …ve times as prevalent as in Oregon and Washington (Glaeser and Saks 2006).
We study public goods provided by self-interested politicians whose goal is to extract wasteful rents. To keep extracting rents they need to win re-election, so their corruption is constrained by career concerns. Electoral discipline provides both incentives and screening. Politicians di¤er in ability and voters try to dismiss unskilled incumbents. Voters infer skill from performance, so politicians are incentivized to refrain from extracting rents because low public-good provision is punished at the polls, whether it stems from incompetence or corruption.
Our model has two key features. First, we study heterogeneous accountability arising from di¤erences in voters'information. Some voters correctly observe and understand policy outcomes, while others do not and cannot infer the incumbent's ability. Second, we develop a dynamic model with a recursive incentive structure. The expectation of future electoral discipline a¤ects the current trade-o¤ between rent extraction and re-election. Thus, a permanent increase in voter information has two e¤ects on electoral discipline. On the one hand, it makes re-election more responsive to performance, raising incentives to reduce rents. On the other hand, this very reduction in equilibrium rents lowers the appeal of re-election and thus indirectly dampens the decline in rent extraction. In our model we …nd that the direct e¤ect always dominates, but rent extraction falls with voter information at a declining rate because of the countervailing indirect e¤ect. When monitoring improves starting from a low initial level, politicians react sharply because the value of o¢ ce is high. Further improvements yield lower bene…ts.
Our core theoretical insight follows from the concave impact of an informed population on the quality of government. When di¤erent regions have di¤erent shares of informed voters, centralization reduces aggregate rent extraction. Political integration creates a single electorate with the average share of informed voters. Rent extraction falls sharply in less informed regions, while it does not increase as much in better informed ones. Thus, centralization yields aggregate e¢ ciency gains.
However, the distribution of these e¢ ciency gains is problematic. A centralized government is more accountable, but disproportionately accountable to the most informed regions. If it enjoys discretion over the geographic distribution of public goods, if favors informed regions and neglects uninformed ones. The resulting misallocation is regressive and so costly that centralization lowers social welfare despite reducing rents. Thus, we …nd that centralization can be welfare-maximizing only if it is accompanied by a uniformity constraint that requires the central government to provide identical public goods to all regions.
As a result of this endogenous need for uniformity, heterogeneous information drives a key trade o¤. Centralization improves accountability, but it foregoes the ability to match local public goods to idiosyncratic local preferences. Section 3 analyzes this trade o¤ and answers our motivating question: should government be decentralized when regions are di¤erent?
The answer depends on what type of heterogeneity is starkest. Di¤erences in tastes pull toward decentralization; di¤erences in information push toward centralization instead.
Empirical evidence supports our results. Without a uniformity constraint, politicians allocate spending across regions in response to voter information rather than actual needs (Strömberg 2004). With uniformity, instead, centralization mainly bene…ts the uninformed: reforms decentralizing public education in Argentina and Italy had regressive e¤ects and worsened inequality (Galiani, Gertler and Schargrodsky 2008; Durante, Labartino and Perotti 2014).
Our prediction that centralization improves government accountability is consistent with American history. Two former state governors-Don Siegelman of Alabama and Rod Blagojevich of Illinois-are in prison for corruption. Corruption has long been considered a distinctive plague of city and state governments (Ste¤ens 1904;Wilson 1966). Federal intervention during the New Deal eradicated the patronage and political manipulation that had characterized until then state and local welfare programs (Wallis, Fishback, and Kantor 2006). World history o¤ers other examples of accountability gains from centralization: in early modern Europe (Besley and Persson 2011;Dincecco 2011), in pre-colonial Africa (Gennaioli and Rainer 2007) and in transition economies (Blanchard and Shleifer 2001).
European history also provides direct support for our …nding that heterogeneous accountability prompts centralization. Germany and Italy were uni…ed as nation-states in the late nineteenth century. Italy had highly heterogeneous pre-unitary institutions and became a centralized nation-state. Instead, Germany had relatively homogeneous institutional quality and was organized as a federal country. Both regional di¤erences in accountability and the degree of centralization remain higher in Italy than Germany today (Ziblatt 2006).
In Section 4 we study how many levels of government there should be. The standard logic of …scal federalism suggests there should be many because every policy should be matched to the right geographic unit. In our framework, however, we …nd that multiplying government tiers is costly because there are economies of scope in accountability. When politicians are responsible for providing a larger set of public goods their incentives improve and they devote a lower share of the budget to rents. Such economies of scope imply that having a single level of government is best if information is homogeneous. A federal system can be optimal only if di¤erences in information are large enough. Then the federal government provides large accountability gain to poorly informed regions, while their local governments can match their idiosyncratic preferences over policies for which taste heterogeneity is starkest.
Our model can thus explain the empirical …nding that government quality declines as the number of government tiers rises. In the United States, the proliferation of overlapping special-purpose local governments in charge of speci…c policies has been a …asco (Berry 2009). Special-purpose districts are ine¢ cient and prone to capture by special interests. In Europe, too, multiple sub-national levels of governments have led to ine¢ ciencies, and their reduction and simpli…cation is now on the agenda. Cross-country evidence shows a robust positive correlation between corruption and the number of levels of government (Fan, Lin, and Treisman 2009).
Section 5 considers what should determine the boundaries of governments when people are not naturally sorted into internally homogeneous regions. We …nd that optimal borders have two characteristics: they cluster by tastes, but ensure maximum diversity of information. The second goal can trump the …rst when geographic constraints create a tension between the two. A disadvantaged uninformed group should not be a local minority; it should rather join better informed voters with similar preferences in a larger polity. E.g., breaking up California would reduce welfare because educated San Francisco liberals ought to share a state government with working-class left-wingers in the Central Valley.
This paper furthers the study of …scal federalism and the geographic structure of government. Starting with Tiebout's (1956) and Oates's (1972) seminal contributions, prior work focused exclusively on di¤erences in preferences. We show that this is only one half of the story. Once we consider also di¤erences in voter information across regions, we …nd that the two kinds of heterogeneity have opposite implications on the optimal architecture of government.
Di¤erences in preferences promote decentralization if the central government cannot tailor policies to local preferences (Oates 1972; Alesina and Spolaore 2003). Assuming that accountability is homogeneous across regions, prior work endogenized the failure of preferencematching under centralization through frictions in political bargaining (Lockwood 2002;Besley and Coate 2003;Harstad 2007). We provide an alternative microfoundation through heterogenous voter information.
More important, we show that di¤erences in information promote centralization because they entail larger accountability gains from political integration. Our …nding suggests that heterogeneous information is the key reason why centralization can increase accountability. Prior work mainly emphasized instead why accountability can rise with decentralization. In particular, decentralization can help voters monitor their local governments thanks to yardstick competition (Besley and Case 1995), while centralization entails a common-agency problem that makes politicians less accountable to voters in any single region (Seabright 1996). 1 Furthermore, we provide the …rst theory of economies of scope in government accountability. Prior work considered each policy instrument in isolation, typically assessing if it would be best centralized or decentralized (Oates 1999). We extend this line of inquiry by studying the pros and cons of a federal structure with multiple levels of government in charge of providing distinct public goods. 2

II. Political Agency and Public-Good Provision
In this section, we present the model of political agency that underpins our analysis of optimal political integration. Imperfectly informed voters face the problem of selecting and incentivizing self-interested rent-seeking politicians. We model electoral discipline in a framework of political career concerns (Alesina and Tabellini 2008). Voters try to retain competent politicians and dismiss incompetent ones. In solving this screening problem, they also create incentives for politicians to provide public goods. The incumbent moderates rent extraction because higher public-good provision raises voters' inference of his ability and thus his chances of re-election.

II.A. Preferences and Technology
The economy is populated by a continuum of in…nitely lived agents, whose preferences are separable over time and additive in utility from private consumption and utility from each of P public goods. Individual i in period t derives instantaneous utility whereũ i t is exogenous utility from private consumption, and g p;t the provision of public good p. We treatũ i t as an exogenous mean-zero shock and focus exclusively on public goods. Each public good yields bene…ts according to a logarithmic utility function. The relative importance of each good for individual i is described by the ideal shares i p 0 such that P P p=1 i p = 1. Each public good p is produced by the government with technology 2006; Treisman 2007). In our framework, instead, centralization unambiguously alleviates moral hazard in political agency. 2 Appendix A provides a more complete discussion of the literature.
The production technology has constant returns to scale: x p;t measures per-capita investment in providing public good p. We rule out economies of scale in public-good provision, which would provide an immediate technological rationale for centralization. Productivity p;t represents the stochastic competence of the incumbent politician in providing public good p. It follows a …rst-order moving-average process The shocks " p;t are independent and identically distributed across goods, over time and across politicians. They have support [ ";"], mean zero and variance 2 . Our preferred interpretation is that parties are composed of overlapping generations of politicians. The period-t government consists of older party leaders with competence " p;t 1 and young party members with competence " p;t . At t + 1, former party leaders retire, rising young politicians take over the leadership, and a new cohort joins the party. Politicians are self-interested rent seekers. Their objective is to maximize the present value of the rents they can extract while in o¢ ce, discounted by the discount factor 2 (0; 1]. Each period, the government allocates a …xed government budget b. The incumbent chooses the amount x p;t of expenditure on each public good. He extracts as rent the reminder which represents public resources devoted to socially unproductive projects. 3

II.B. Elections and Information
The incumbent faces reelection at the end of each period. If ousted he will never return to power. Politicians cannot make policy commitments, so the election is not based on campaign promises but on retrospective evaluation of the incumbent's track record. Voters do not observe directly the incumbent's competence nor his actions. Their inference is based on an imperfect signal of public-good provision. The textbook model of career concerns assumes that voters observe policy outcomes with additive noise. We assume instead that voter information is binary. An informed voter observes perfectly the vector g t of realized public goods. An uninformed voter receives no informative signal of g t , or proves completely incapable of understanding information about g t . 4 The electorate consists of a continuum of atomistic voters, partitioned into J groups. Group j comprises a fraction j of voters, whose preferences are described by the vector j of their ideal shares. The share of group-j members who are informed about public-good provision is a random variable j t , independent and identically distributed over time. Our model is robust to an arbitrary correlation of information across voters. 5 We measure voter information by the expected share of informed voters j = E j t . We allow for an intensive margin of political support, following the probabilistic voting approach. Each voter's preferences consist of two independent elements. First, agents have preferences over the public goods they expect either politician (the incumbent I or the challenger C) to provide in the following period. These preferences are summarized by the di¤erence where E i denotes the rational expectation given voter i's information. Second, voters have preferences for candidates'characteristics other than their competence: e.g., personal likability or party ideology. These preferences can be decomposed into an aggregate shock t and an idiosyncratic shock i t that is independent and identically distributed across voters. Voting is costless and all voters cast a ballot for their preferred candidate. Thus, voter i votes for the incumbent if and only if i t + i t . As in Baron (1994) and Grossman and Helpman (1996), informed voters cast their ballot based on observed policy outcomes, while uninformed voters choose which candidate to support purely on the basis of preferences unrelated to competence. 6 The distribution of the shocks t and i t is symmetric around zero, so voters do not favor systematically incumbents or challengers. We assume that the two shocks are uniformly distributed: t U [ 1= (2 ) ; 1= (2 )] and i t U ; . The support of preference 4 Uninformed voters may not realize that public goods a¤ect their utility. Such ignorance is particularly natural for public goods that yield long-run bene…ts. Voters may also understand the bene…ts of public goods, but fail to understand how they depend on the incumbent's actions and competence (Strömberg 2004). 5 Most simply, information could be uncorrelated across voters. Each voter in group j has probability j of being informed. Then in every period a share j of group members are informed. This assumption is consistent with imperfect sharing of information within a group (Ponzetto 2011;Ponzetto and Troiano 2014). First, agents privately acquire information. Some fail to observe g t . Second, agents communicate with a …nite number of neighbors. Some remain uninformed because none of their neighbors observed g t . If instead information sharing is perfect, information is perfectly correlated within each group. With probability j the entire group is informed ( j t = 1), and with probability 1 j the entire group is uninformed ( j t = 0). 6 The standard assumption that uninformed voters vote sincerely could be attributed to their imperfect rationality (Baron 1994;Grossman and Helpman 1996). It is also consistent with full strategic rationality because a continuum of voters entails strategic insigni…cance: no voter can ever be pivotal.
shocks is wide enough and the support of competence innovations " p;t narrow enough that The …rst set of inequalities ensures that every voter's ballot is imperfectly predictable, irrespective of g t . The second ensures that the outcome of the election is never entirely predictable either. The timeline within each period t is the following.
1. The incumbent's past competence innovations " t 1 become common knowledge.
2. The incumbent chooses investments x t and rent r t .
3. The competence innovations " t are realized and the provision of public goods g t is determined.
4. Voter information is realized: a share j t of members of group j perfectly observe g t . The rest remain completely uninformed. No voter has any direct observation of " t , x t , or r t .
5. An election is held, pitting the incumbent against a single challenger, randomly drawn from the same pool of potential o¢ ce-holders.

II.C. Political Career Concerns
Voters rationally expect every politician to choose the stationary investment x. The equilibrium allocation is time-invariant because the environment is stationary. It does not vary with the incumbent's observed skills " t 1 because performance is separable in e¤ort and ability. It cannot vary with the competence innovations " t because they are unknown to the politicians themselves when they make policy choices. 7 Thus, the outcome of the election a¤ects future public-good provision only through di¤erences in politicians'skills: No information exists about future competence innovations (either the incumbent's " t+1 or the challenger's " C t+1 ), nor about the challenger's current ability (" C t ). Thus, their expectation is nil for all voters. Uninformed voters cannot infer the incumbent's ability from realized public-good provision and retain the unconditional expectation E" p;t = 0. 8 Informed voters, instead, can infer the incumbent's ability from their knowledge of public-good provision: In a rational-expectation equilibrium their inference is perfectly accurate (x p;t = x p entails E (" p;t jg p;t ) = " p;t ). From the politician's perspective, the probability of re-election as a function of his policy choices is as we derive in Appendix C. The incumbent faces a trade o¤. Investing in public goods reduces his rents but increases his chances of re-election by raising informed voters'inference of his ability. A politician who values re-election R chooses to extract rents In a dynamic equilibrium, the value of re-election R is the expected present value of future rents from holding o¢ ce. In a rational-expectation equilibrium voters cannot be fooled ( x p = x p;t ). Then in every election the incumbent wins with probability = 1=2. Voter preferences are not exogenously biased in favor of incumbents or against them (the distribution of t and i t is symmetric around zero). An endogenous incumbency advantage does not arise because politicians'ability evolves as a …rst-order moving-average process. 9 8 We assume that uninformed voters vote sincerely based on their unconditional expectation because they are strategically insigni…cant or imperfectly rational (Baron 1994;Grossman and Helpman 1996). With a …nite number of voters an uninformed voter with full strategic rationality would instead care about his vote only when it is pivotal. In the limit as the number of voters diverges, uninformed voters would vote strategically based on expected ability conditional on an exactly tied election. In the equilibrium of our model, this conditional expectation remains E i " p;t = 0 given that the aggregate taste shock t is uniformly distributed on a su¢ ciently large support. Thus, we could identically assume that uninformed voters have a pivotal-voting motivation provided they cannot infer the aggregate taste shock t from their own tastes t + i t , either because the idiosyncratic shock is di¤use ( ! 1) or because their Bayesian reasoning is imperfect. 9 The impact of each competence shock lasts for two periods only, so past screening of incumbents does not translate in a forward-looking electoral advantage as it does with longer-lasting competence shocks (Ashworth and Bueno de Mesquita 2008). If the period-t incumbent was re-elected at t 1 the expectation of current productivity t is above average. Senior party leaders have proved their competence and won As a consequence, a politician who rationally anticipates extracting rent r whenever in o¢ ce has an expected net present value of re-election

II.D. Government Accountability from Voter Information
Let r=b 2 [0; 1] denote the fraction of the budget allocated to rents. The unique stationary rational-expectation equilibrium has the following characterization. 10 Lemma 1 In equilibrium, ruling politicians extract rents and have expected ability Rent extraction is a decreasing and convex function of voter information (@ =@ j < 0 and @ 2 =@ 2 j > 0). An increase in voter information j increases the ability of ruling politicianŝ p;t in the sense of …rst-order stochastic dominance.
Better information improves government accountability because it enables voters to monitor politicians more closely. It alleviates both the moral-hazard problem of politicians'incentives and the adverse-selection problem of politicians'selection. Voters can reward publicgood provision only when they perceive it accurately. As voter knowledge improves, the incumbent's performance more closely determines his chances of re-election. Ex ante, he extracts lower rents because his career concerns are heightened (@ =@ j < 0). Ex post, the average ability of ruling politicians rises because electoral screening improves (@E^ p;t =@ j > 0). 11 re-election. However, their known ability " t 1 is orthogonal to future performance t+1 because they are about to retire. A new cohort leades the party into the period-t election. Their skills " t can be inferred from policy outcomes g t , but not from the past re-election of their retiring colleagues. 10 All proofs are provided in Appendix C. 11 Voters have no incentives to acquire information in order to improve governance because of the rationalvoter paradox. Each voter has a negligible likelihood of determining the outcome of the election. His strategic incentives to become informed are likewise negligible. Therefore, information j re ‡ects exogenous voter characteristics. E.g., education enables voters to grasp the precise role of politicians in providing public goods; social capital re ‡ects civic involvement and the ability to share political knowledge in a wide social network (Ponzetto and Troiano 2014).
The key result in Lemma 1 is that rent extraction is decreasing but convex in voter information (@ 2 =@ 2 j > 0). 12 Decreasing returns to monitoring follow from the dynamic nature of the politicians'problem. The direct impact of voter information on rent extraction is linear (equation 10). For a given value of re-election R, more informed voters induce one-to-one more investment and lower political rents. A transitory one-period increase in voter information would have no other e¤ect, but a permanent increase in voter information has an indirect e¤ect too. Politicians understand expect tighter monitoring if they are reelected, so the expected future rents from holding o¢ ce decrease. Their decline reduces the incentives to refrain from extracting rents and mitigates the direct e¤ect of improved monitoring. Current rent extraction is more sensitive to the expectation of future rents when voters'average information is higher. Thus, a marginal increase in voters'information causes a lower decline in rent extraction when the share of informed voters is higher to begin with. 13 A large body of evidence con…rms that the quality of government is higher if citizens are more educated and politicians are subject to greater media scrutiny (e.g., Glaeser et al. 2004;Svensson 2005; Glaeser and Saks 2006; Snyder and Strömberg 2010). While none of these studies have explored speci…cally the concavity of this relationship, the data provide suggestive empirical support for our prediction. Svensson's (2005) documents that low human capital is the best predictor of high corruption across countries. Consistent with Lemma 1, Figure 1 shows that corruption is not only a decreasing but also a convex function of the share of people with a tertiary education. A similar relationship emerges in Figure 2, where we proxy information with newspaper circulation instead. Both results are robust to controlling for income. 14 12 Other determinants of the quality of government are straightforward. More patient politicians are more willing to reduce rent extraction in order to raise their chances of re-election (@ =@ < 0). A higher variance of politicians' ability raises the gains from screening (@E^ p;t =@ 2 > 0). Both incentives and screening improve when voters are keener on competence than other determinants of political popularity (@ =@ < 0 and @E^ p;t =@ > 0). 13 Extreme cases highlight decreasing returns to monitoring with particular clarity. If no voters are informed, career concerns are absent and rent extraction is unchecked ( = 0 ) = 1). Introducing a little monitoring induces a forceful reaction by politicians who are afraid of losing very large rents. If all voters are informed, career concerns are at their strongest but rent-extraction cannot be reduced to zero ( = 1 ) > 0). Incumbents always extract some rents: only the appeal of future rents induces them to make any productive investment. Marginally worsening perfect monitoring causes a small loss. 14 The multivariate regressions are respectively l = 2:4 ln y l 12 2 l + " l for newspaper circulation (across 100 countries). Corruption l 2 [ 2:5; 2:5] is the opposite of the Control of Corruption index from the World Governance Indicators, averaging across available years (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). Real GDP per capita is from the Penn World Table  8.0, measured in 1970 following Svensson (2005). The share of people over 25 with a tertiary education is from Barro and Lee's dataset version 2.0, also measured in 1970. Newspaper circulation per capita is from the World Development Indicators, averaging across available years (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005).

[FIGURES 1 AND 2 AROUND HERE]
Our …nding that government accountability is an increasing but concave function of voter information has a broader theoretical underpinning. The mechanism in Lemma 1 applies to any determinant of electoral discipline. Information, however, has an additional source of concavity: it can be shared by voters. The share j t of informed voters then results from a two-stage process (Ponzetto 2011;Ponzetto and Troiano 2014). First, it includes those who acquired information directly, for instance because they read newspapers or because their human capital enables them to assess politicians'performance accurately. Second, it includes those who did not acquire information directly but obtained it from an informed neighbor. Overall, the expected share of informed voters j is an increasing and concave function of the probability that each voter acquires information directly, because one voter's knowledge has greater spillovers if his neighbors are less informed. 15

III. Should Government Be Decentralized?
We turn now to our motivating question. Should di¤erent regions have di¤erent governments whenever there are no spillovers, in accordance with Oates's (1972) classic Decentralization Theorem? When can we expect decentralization to deliver the bene…ts Alex Salmond touted to Scotland's voters? When will centralization curb instead the graft and mismanagement of local governments, as with welfare spending and the New Deal (Wallis, Fishback, and Kantor 2006)? The key to our answer is that regions di¤er along several dimensions. They have di¤erent preferences but also di¤erent levels of voter information.
We consider an economy composed of L regions, each populated by a unit measure of voters. Preferences are homogeneous within each region, but heterogeneous across regions (Tiebout 1956;Oates 1972). E.g., residents of conservative "red states" may prefer greater spending on defence, justice and police, while residents of progressive "blue states" may prefer instead environmental protection, public education, and welfare spending. Our novel contribution lies in studying at the same time di¤erences in voter information. E.g., states with more educated residents have a higher expected share of informed voters, while voters in less educated states are less likely to assess government performance accurately.
Formally, we assume that each region's preference vector l is an independent draw from a distribution that is symmetric across goods, so the marginal distribution l p is the same for 15 If each agent obtains information directly with probability j and shares it in a group of n neighbors, his eventual probability of being informed is j = 1 1 all p and has mean E l p = 1=P . 16 Then, preference heterogeneity can range between two limit cases. It is nil when l p = 1=P deterministically. In this limit case of perfectly homogeneous preferences, everyone desires the same uniform basket of public goods. At the opposite extreme, preference heterogeneity is maximized when l p has a Bernoulli distribution with Pr l p = 1 = 1=P . In this limit case of maximum preference heterogeneity, each region desires a single idiosyncratic public good, so the same good yields utility to two regions with negligible probability 1=P . A series of mean-preserving spreads gradually spreads out the distribution of preferences from the …rst limit case to the second. We parametrize the distribution of preferences by a homogeneity parameter 2 R + such that the distribution becomes less dispersed as increases, spanning the whole feasible range. I.e., an increase in entails a mean-preserving contraction of l p . The limit case of maximum preference heterogeneity corresponds to = 0 and the limit case of perfectly homogeneous preferences to ! 1. 17 Information is independent of preferences. Each region's expected share of informed voters l is an independent draw from a distribution with mean E l = 2 (0; 1). Then, information heterogeneity can range between two limit cases. It is nil when l = deterministically. In this limit case of perfectly homogeneous information, every region has the same expected share of informed voters. At the opposite extreme, information heterogeneity is maximized when l has a Bernoulli distribution with Pr ( l = 1) = . In this limit case of maximum information heterogeneity, a fraction of regions are perfectly informed ( l = 1) while the remainder 1 are completely uninformed ( l = 0). A series of mean-preserving spreads gradually spreads out the distribution of information from the …rst limit case to the second. We parametrize the distribution of information by a homogeneity parameter 2 R + such that the distribution becomes less dispersed as increases, spanning the whole feasible range. I.e., an increase in entails a mean-preserving contraction of l . The limit case of maximum information heterogeneity corresponds to = 0 and the limit case of perfectly homogeneous information to ! 1. 18 In a decentralized system, each region forms a separate constituency with a share of informed voters l . It has an independent local government that allocates the regional budget b. Local politicians with skills D l;p;t invest in the provision of local public goods x D l;p;t and extract rent r D l;t = b P P p=1 x D l;p;t . 16 We abstract from di¤erences between the sample distribution and the population distribution by considering the limit case of a continuum of regions. 17 E.g., l could have a symmetric Dirichlet distribution on the regular (P 1)-simplex with concentration parameter > 0. Our results do not rely on this particular speci…cation. 18 E.g., l s B ; 1 could have a beta distribution with mean and sample size (i.e., con…dence) . Our results do not rely on this particular speci…cation.
Under centralization, instead, the central government is elected by a single uni…ed constituency whose share of informed voters equals the average across regions P L l=1 l =L. We rule out economies of scale: the central-government budget equals the sum bL of the regional budgets. Central politicians with skills C p;t choose expenditures x C l;p;t for each public good p in each region l and extract rent r C t = bL P L l=1 P P p=1 x C l;p;t . The central government may be required to provide public goods uniformly across regions (g C l;p;t = g C p;t for all l), either by a technological or by a constitutional constraint (Oates 1972; Alesina and Spolaore 2003). Conversely, it may be able to allocate spending across regions with complete discretion (Lockwood 2002;Besley and Coate 2003).
Di¤erent government structures admit the following ranking in terms of aggregate social welfare.
Proposition 1 Aggregate social welfare is higher under decentralization than under centralization without a uniformity constraint. It is highest under centralization with a uniformity constraint if and only if preferences are su¢ ciently homogeneous ( ). Centralization is more likely to be optimal when information is more heterogeneous ( is increasing in ) and politicians'ability less variable ( is increasing in ).
Centralization unambiguously reduces total rents when di¤erent regions have di¤erent information. Merging heterogeneous regions creates a single polity whose level of voter information equal the average across regions. Aggregate rent extraction declines because it is a convex function of voter information, as established in Lemma 1: Nonetheless, centralization reduces welfare if the central government can operate without a uniformity constraint that requires public goods to be provided identically in all regions. O¢ ce-seeking politicians target government spending to the most politically in ‡uential regions. In our model, in ‡uence stems from information. Absent a uniformity constraint, central-government spending in di¤erent regions is proportional to voter information: for all l and m.
This equilibrium allocation features harmful regressive redistribution. Independent local governments extract larger rents and provide fewer public goods in less informed regions. Centralization without uniformity further reduces public-good provision in these regions, increasing it instead in better informed ones. Then, aggregate social welfare declines even though the total provision of public goods rises as aggregate rent extraction falls. On the contrary, with a uniformity constraint the decrease in rents is accompanied by progressive redistribution that raises welfare further. Centralization slightly increases rent extraction in better informed regions, but greatly reduces it in less informed ones, which have a higher marginal utility of public goods because their local government is worse. Intuitively, the uninformed gain from outsourcing government monitoring to better-informed voters in other regions. The informed can also share in the accountability gains from centralization if a uniformity constraint is imposed on some goods but not others. Then the uninformed enjoy greater accountability in the provision of uniform public goods, and the informed enjoy greater in ‡uence over the provision of discretionary ones. Appendix B shows formally how such partial uniformity can make centralization a Pareto improvement, albeit at the cost of sacri…cing welfare maximization. 19 The key result in Proposition 1 is that the welfare-maximizing government structure for heterogeneous regions re ‡ects a trade o¤ between greater preference-matching under decentralization and greater accountability under centralization. On the one hand, the central government must be required to provide public goods uniformly, so centralization sacri…ces the ability to tailor local public goods to local preferences. The more regions di¤er in their ideal allocation, the greater the costs of political integration. Thus, preference heterogeneity is a centrifugal force. On the other hand, rent extraction falls when the most informed regions hold politicians accountable for everyone. The more regions di¤er in their monitoring ability, the greater the bene…ts of political integration. Thus, information heterogeneity is a centripetal force.
If tastes are similar enough across regions, centralization maximizes welfare despite the absence of externalities or economies of scale. Centralization is more likely to be optimal the more information varies across regions. So long as information is not perfectly homogeneous, it is optimal when preferences are similar but not identical ( is …nite).
The …nal result in Proposition 1 re ‡ects the cost of uniformity in government competence. Under decentralization, each region selects-to the best of its imperfect screening abilityruling politicians who are most talented at providing those public goods the region …nds most important. The central government, instead, has average skills that try to satisfy all regions but truly …t none. When the variance of politicians'ability is greater, so is the cost of such uniformity. Then, centralization becomes less appealing because it distorts the allocation of talent but has no impact on average screening: This invariance, however, follows from the assumption that voter information about public goods is independent of the level of government that provides them. This assumption is realistic to the extent that voter knowledge re ‡ects individual characteristics such as human capital, social capital, or civic engagement. Yet, voter information also re ‡ects di¤erences in media coverage, which plausibly varies with political integration. In particular, the media are more likely to report on centralized policies because they concern a broader audience (Gentzkow 2006;Snyder and Strömberg 2010). Such an increase in reporting would entail additional e¢ ciency gains from centralization, through better selection as well as better incentives (Glaeser and Ponzetto 2014). Then, greater variance in politicians' ability might make political integration more appealing, rather than less.
Do the theoretical results in Proposition 1 have counterparts in the real world? We certainly cannot prove empirically whether the European Union or an increasing federal share of U.S. government spending is good or bad. There is, however, evidence supporting the key points in our model: discretionary spending by the central government can shortchange less informed groups; decentralized control has often been associated with corruption and limited political accountability; the bene…ts of centralization are often greater for less informed populations; and decentralization has been more successful where accountability varies less across regions. Strömberg (2004) studies the allocation of discretionary government spending during the New Deal and documents that state governors favored counties with a greater share of radio listeners, and so with better informed voters. If we accept his identifying assumption that ground connectivity and woodland cover have no direct e¤ect on the e¤ectiveness of government expenditure, it follows that voter information alone is driving these di¤erences in public spending across space. The tendency of discretionary spending to follow knowledge is precisely why Proposition 1 …nds that discretion is bad.
The downsides of discretion may also explain why uniformity is common in many government policies. It may seem counter-intuitive that U.S. federal housing policy should o¤er similar subsidies to building in areas where supply is constrained, like New York City, and areas where supply seems almost unlimited, like Houston. One explanation for spatial uniformity is that the tendency of locational discretion to harm particular regions is well understood.
The fundamental downside of decentralization in our model is that it leads to less accountability and more corruption. We know of no studies that clearly illustrate the relative corruption of national versus local governments in the United States and Europe. However, the history of America's state and city governments is consistent with our theoretical prediction.
At the turn of the twentieth century, the governments of large American cities were infamous for their corruption. New York's "Boss Tweed" and his formidable Tammany Hall machine live on in popular memory as epitomes of organized graft in local government. 20 Other cities had equally corrupt administrations-a major theme of the progressive movement (Ste¤ens 1904). 21 This urban experience was very far from Tiebout's (1956) and Oates's (1972) vision of local governments responding tightly to the desires of their residents.
Federal intervention eradicated the corrupt manipulation that had characterized U.S. local politics, at least in the context of welfare spending. Until the Great Depression, poverty relief managed by states and localities was a byword for patronage and graft. The New Dealthe most dramatic episode of centralization in the history of the United States-introduced strict federal oversight of welfare programs. One consequence was a striking decrease in corruption (Wallis, Fishback, and Kantor 2006).
While city politics cleaned up after the New Deal, state governments remained notorious for corruption (Wilson 1966). Since the Second World War, ten governors and nine members of state executives have been convicted for o¢ cial corruption and sentenced to jail. No member of the federal cabinet, let alone a president, has been charged with crimes investigated as part of the federal prosecution of public corruption.
Contemporary cross-country studies have yielded con ‡icting and inconclusive results on the relationship between decentralization and corruption (Treisman 2007). Historical evidence from around the world, however, shows that political integration often had a positive impact on government accountability. Centralized political institutions in precolonial Africa reduced corruption and fostered the rule of law. They caused a long-lasting increase in the provision of public goods that endured into the postcolonial period (Gennaioli and Rainer 2007). Fiscal centralization was a key element in the modernization of European states. It proved a necessary step for the consolidation of state capacity, which was in turn a critical determinant of economic and political development (Besley and Persson 2011;Dincecco 2011). Blanchard and Shleifer (2001) argue that China grew faster than Russia in recent decades thanks to the greater strength of its central government vis à vis local politicians.
Proposition 1 predicts not only that centralization should reduce rent extraction, but that these accountability bene…ts should ‡ow mostly to the least informed regions, as long as the central government enacts a uniform policy. Empirical evidence on reforms to public education systems bears out this prediction. In the early 1990s, Argentina transferred control of federal secondary schools to provincial governments. Student test scores rose in richer municipalities, but failed to rise or even fell in poor ones (Galiani, Gertler and Schargrodsky 2008). Decentralization increased inequality and harmed those already disadvantaged. A 1998 university reform in Italy transferred responsibility for faculty hiring from the national ministry to individual universities. Faculty hires became signi…cantly more nepotistic in provinces with low newspaper readership. Those with higher readership experienced at best a marginal improvement (Durante, Labartino, and Perotti 2014). Decentralization worsened the quality of academic recruitment and hurt the least informed regions the most.
Environmental policy in the United States also provides suggestive support for our theoretical prediction. The Clean Air Act of 1970 transferred responsibility for pollution regulation from the state and local governments to the federal Environmental Protection Agency. Relative to pre-existing trends, pollutant emissions began to decline considerably faster in states with lower newspaper circulation (we provide a formal di¤erence-in-di¤erences analysis in Bo¤a, Piolatto and Ponzetto [2014]).
The conclusion of Proposition 1 is that decentralization is desirable only if accountability is relatively homogeneous across regions. Our …nding is consistent with historical evidence on the formation of uni…ed nation-states in Germany and Italy. Both countries were uni…ed in the second half of the nineteenth century: the Kingdom of Italy was established in 1861 and the German Empire in 1871. Before uni…cation, Germany comprised many modern and well-functioning states. In Italy, the quality of pre-unitary institutions was lower and more heterogeneous. The Kingdom of Sardinia, which led the process of uni…cation, could be considered the only e¢ cient modern state. Consistent with our theory, these di¤erent patterns of institutional quality before uni…cation can explain why Germany was conceived as a federal nation-state and Italy as a unitary one (Ziblatt 2006). Remarkably, both the degree of centralization and the underlying di¤erences in accountability have remained larger in Italy than in Germany up to the present day-excepting the tragic parenthesis of German centralization under Nazism.

IV. How Many Levels of Government Should There
Be?
The classic theory of …scal federalism studies "which functions and instruments are best centralized and which are best placed in the sphere of decentralized levels of government" (Oates 1999(Oates , p. 1120). This standard approach suggests that there should be as many levels of government as there are geographic units a function is optimally tied to. Evidence from local governments in the United States, however, paints a di¤erent picture. Specialpurpose districts managing individual public services for di¤erent and overlapping areas have performed poorly in terms of e¢ ciency and accountability (Berry 2009). In this section, we explain why the proliferation of government tiers can harm welfare and we study when it is optimal to create a federal structure in which some policy decisions are centralized and other decentralized.
We assume the same distribution of voter information as in Proposition 1, with mean and a homogeneity parameter . However, we now consider two kinds of public goods at the opposite extremes of preference heterogeneity. First, there is a set of public goods for which all regions have perfectly homogeneous preferences ( ! 1). By Proposition 1, these public goods would best be provided by a central government if there were no other policy choices. For the second set of public goods, preferences are completely idiosyncratic ( = 0 and P ! 1). Each region bene…ts exclusively from its own ideal variety, and derives no utility at all from any of the L 1 ideal varieties of the other regions. Absent other policies, Proposition 1 established that these idiosyncratic public goods should be provided by decentralized local governments. With both types of public goods, a resident i of region l has utility u i t =ũ i t + 0 log g l;0;t + (1 0 ) log g l;l;t , where g 0 is a composite bundle of all the homogeneously desired public goods, while g l is region l's desired variety of idiosyncratic public goods. The ideal share 0 2 (0; 1) provides a measure of preference homogeneity in this setting. The structure of government is described by an allocation of powers and budgets to the two levels of government, local and central. As before, full decentralization means that each local government provides the residents of its region l with both the homogeneously desired public goods (g l;0 ) and their ideal variety of idiosyncratic public goods (g l;l ). Conversely, the government is fully centralized if the central government is tasked with providing all public goods to residents of all regions.
An intermediate possibility is the creation of a federal system. The central government provides homogeneously desired public goods (g l;0 ) to all regions, while every region has its own local government provide the idiosyncratic public good g l;l . 22 The overall budget remains exogenously …xed at Lb. Consistent with our focus on expenditures, we assume that all regions must contribute equally to the central-government budget. Its size b C then su¢ ces to characterize the budget allocation. Local-government budgets are determined residually The central government may be required to provide any public good uniformly. The uniformity constraint is imposed independently on each good. It may apply to some goods and not others. It may not, however, apply to an aggregate of goods. This restriction is immediate for a technological constraint because every good is distinct. The aggregate amount of public goods provided to a region ( P L p=0 g l;p;t ) cannot be constrained constitutionally either. The quantities of di¤erent goods cannot be properly compared by an impartial auditor, so it is unfeasible to require the provision of "separate but equal" public goods to di¤erent regions.
The welfare-maximizing structure of government admits the following characterization.
Proposition 2 A federal system is optimal if di¤erences in voter information are large enough ( < ) while di¤erences in preferences are neither too small nor too large ( 0 2 ( D F ; F C )). A federal system is more likely to be optimal when information is more heterogeneous ( D F is increasing and F C decreasing in ) and politicians' ability more variable ( is increasing in and @ F C =@ > @ D F =@ = 0). Full centralization is optimal if di¤erences in preferences are small ( < and 0 F C , or and 0 D C ). Full decentralization is optimal if di¤erences in preferences are large ( < and 0 D F , or and 0 < D C ). Full centralization is less likely to be optimal when politicians'ability is more variable (@ D C =@ > 0).
Our model of accountability reverses the standard logic of …scal federalism. The existence of some policy instruments that are best centralized and some others that are best decentralized does not immediately imply that the government should be structured on federal lines. On the contrary, if regional di¤erences in voter information are negligible it is optimal to have a single level of government: either only a central government, or only independent local governments. This key result re ‡ects endogenous economies of scope in government accountability.
Politicians with little power have low-powered incentives. They control a smaller budget, so they have a lower value of holding o¢ ce. Moreover, their skills have a lower impact on voters' utility, so other factors are more likely to determine their re-election. As a result, their career concerns are weaker. In equilibrium, incumbents have incentives to demonstrate each skill in proportion to its welfare value. E.g., a politician tasked with providing g 0 to voters with average information invests x 0 = 0 R if he values re-election R. Crucially, the equilibrium value of re-election is proportional to the budget a politician controls. Then there are no economies of scale across regions: halving both the budget and the population served leads to invariant spending per capita. Instead, there are economies of scope across goods: halving both the budget and the set of public goods provided leads to lower spending on each good and a higher share of the budget dissipated as rents.
Centralization minimizes aggregate rent extraction because it exploits both these economies of scope and the e¢ ciency bene…ts of delegating government monitoring to the best monitors. As in Proposition 1, however, the central government fails to match idiosyncratic local needs. Under full centralization, each region unavoidably receives its ideal variety of idiosyncratic public goods in proportion to its residents'information: The optimal provision of homogeneously desired public goods is uniform across regions, so a uniformity constraint su¢ ces to ensure it. On the contrary, requiring uniform provision of idiosyncratic public goods only makes misallocation worse. The central government keeps catering disproportionately to the preferences of the informed, but it has to provide their ideal variety to other regions that derive no bene…t from it. This uniformity constraint is so wasteful it makes every region worse o¤ than discretionary central provision of idiosyncratic public goods. Preference heterogeneity then has a natural e¤ect on the optimal structure of government. If preferences are highly idiosyncratic, decentralization is optimal because local governments are best at preference-matching. If preferences are highly homogeneous, centralization is optimal because only rent-minimization matters. In both extreme cases, one class of public goods is marginal, so it is worth sacri…cing its optimal provision in order to exploit economies of scope and raise accountability in the provision of the dominant public goods.
When preference heterogeneity is intermediate, both idiosyncratic and homogeneously desired public goods are important. The key result in Proposition 2 is that a federal system is then optimal if and only if di¤erences in voter information across regions are large enough. When the information gap is larger, uninformed regions gain more from delegating monitoring to informed ones. Hence, there are greater bene…ts from having a central government provide homogeneously desired public goods ( D F is increasing in ). Greater heterogeneity also implies that uninformed regions lose more from ceding power to informed ones. Thus, there are greater costs of having the central government provide idiosyncratic public goods too ( F C is decreasing in ).
When di¤erences in voter information are large, it is worth sacri…cing economies of scope to reap the large bene…ts of a progressive transfer of accountability without paying the large costs of a regressive transfer of power. Figure 3 represents graphically the optimal structure of government. The larger the di¤erence in information, the larger the region F in which a federal system is optimal.
[FIGURE 3 AROUND HERE] As in Proposition 1, a downside of centralization is the uniformity of central politicians' skills. Thus, greater variation in the pool of political talent reduces the appeal of full centralization. As a consequence, not only decentralization but also a federal system become more attractive. 23 Proposition 2 established that multiple levels of government come at the cost of reduced government e¢ ciency and accountability, even if they may be desirable for preferencematching and distributive reasons. The experience of local government in the United States bears out empirically our prediction. Many states have overlapping layers of county governments, municipal governments, and multiple special-purpose governments, such as elected school districts and independent districts managing speci…c public utilities. The performance record of special-purpose governments has been disappointing and they have proved prone to capture by special interests (Berry 2009). The employees of the special-purpose district are often the key voting block in its elections. Public libraries provide a telling example of systematic ine¢ ciency. When they are run by directly elected special-purpose library districts they have larger budgets, but neither more visitors nor higher circulation. On the contrary, they hold fewer books and fewer of their employees are actually librarians.
Evidence from Europe con…rms that multiplying government tiers has detrimental e¤ects. In England, local government most commonly has two levels: counties and districts. A sizeable minority of areas are governed instead by a unitary authority entrusted with all local-government tasks. Unitary authorities are more e¢ cient, particularly because the twotier structure is linked to lower labor productivity and excess employment (Andrews and Boyne 2009).
France has three nested tiers of sub-national governments (regions, departments and municipalities) plus various associations of municipalities. This complex and multi-layered structure has been a source of ine¢ ciency and institutional weakness, especially at intermediate levels (Le Galès and John 1997). In its two latest reports on local government …nances, the French Court of Auditors stresses that the proliferation of sub-national government tiers determines unproductive public employment. It also highlights inadequate governance mechanisms and advocates intervention by the national parliament to set directly goals and standards for local governments. Pruning the local-government structure is on the French government's agenda. The Attali Commission recommended abolishing the departmental tier within ten years. President Hollande has proposed abolishing elected departmental councils by 2020.
In Germany, since 2000 three states (Rhineland-Palatinate, Saxony-Anhalt and Lower Saxony) have abolished one level of local government. Italy abolished elected provincial councils in 2014, and the government has proposed a constitutional reform to abolish provinces altogether. Italy's three-tier subnational structure (regions, provinces and municipalities) is widely recognized as ine¢ cient: it was arguably designed speci…cally as a way for political parties to provide patronage and sinecures (Dente 1988).
Cross-country evidence also supports the predictions of Proposition 2. In countries with more levels of government …rms report having to pay more frequent and costlier bribes. This positive correlation between corruption and the number of government tiers is particularly robust, and its magnitude is a …rst-order concern for developing countries. Fan, Lin and Treisman (2009, p. 32) conclude that "[o]ther things equal, in a country with six tiers of government (such as Uganda) the probability that …rms reported 'never'being expected to pay bribes was :32 lower than the same probability in a country with two tiers (such as Slovenia)." While there is clear evidence that the multiplication of government tiers dilutes accountability, we know of no equally clear evidence on the distributive bene…ts of federalism. Nonetheless, the pattern of political discourse in the United States is suggestively consistent with our theoretical prediction that the least informed regions bene…t the most from a federal structure relative to either unitary alternative. On average, Southern states have less educated voters and lower newspaper readership. They also have more corrupt governments (Glaeser and Saks 2006). The distributive predictions of our model can then help explain why the South is at the same time particularly patriotic-e.g., it provides a disproportionate share of U.S. military personnel-but also keenest on curbing the expansion of federal power and preserving the states'independent policy-making responsibilities.
When neither full centralization nor full decentralization is optimal, we can characterize the precise structure of the optimal federal system.

Corollary 1
In the optimal federal system, the budget, productivity and accountability of the central government are lower when di¤erences in preferences are larger (@b C =@ 0 > 0, The budget, productivity and accountability of local governments are higher when di¤erences in preferences are larger (@b D =@ 0 < 0, @E^ D l;l =@ 0 < 0 and @ D l =@ 0 > 0). Rent extraction by local governments increases with di¤erences in information ( P L l=1 D l =L is decreasing in ).
Overall rent extraction increases with di¤erences in information. It is a concave function of preference heterogeneity and it reaches a maximum at the value 0 2 (0; 1=2) for which local governments have on average the same accountability as the central government ( 0 = 0 , C = P L l=1 D l =L). The di¤erence in preferences associated with maximum rents increases with di¤erences in information ( 0 is increasing in ).
The comparative statics on each level of government highlight the fundamental strength of a federal system. Resources ‡ow to the level of government where they are most useful. All regions prefer the unique e¢ cient budget allocation that gives each level of government resources proportional to the ideal share of the public good it is responsible for providing: Voter monitoring of politicians obeys a similar equilibrium allocation. Screening for competence is proportional to the welfare weight of the public goods each politician is in charge of providing: Hence, incentives improve and rent extraction declines when a politician has more important responsibilities: such that @ C =@ 0 < 0 < @ D l =@ 0 . Aggregate rent extraction is lowest when one level of government accounts for most public-good provision, so it controls most of the budget and it is also the main focus of voter monitoring. Then total rents are low because one level of government is large and accountable, while the other is relatively unaccountable but small. By Proposition 2, when this logic (and the value of 0 ) is brought to an extreme, a federal structure becomes undesirable: the small and unaccountable level of government is best abolished. Hence, Proposition 1 highlights the second-best nature of the optimal government structure. Federalism is welfare-maximizing for intermediate values of 0 , but total rents are then larger too.
Intuitively, rent extraction is highest when both levels of government are equally accountable ( C = E D l ). Then, if either grew more important it would both control a larger budget share and extract proportionally fewer rents from it. Rents are largest when the central government is smaller than the local ones ( 0 < 1=2). This is a natural consequence of greater accountability at the central level in the presence of heterogeneous information. As di¤erences in voter information grow larger, so does the ine¢ ciency of local governments, and thus of a federal system that includes them. Accordingly, the peak of rent extraction is associated with a greater importance of local governments.

V. What Should Determine the Boundaries of Governments?
Government structure is not entirely described by the number of tiers. The size of subnational jurisdictions can also vary. Is it better to have few large local governments or many small ones? Our model can be applied directly to study the optimal boundaries of governments. Proposition 1 considered a simple symmetric setting in which either all regions should integrate or each should have its independent government. The intuition generalizes to asymmetric cases. Regional boundaries should be drawn so that people with similar preferences but di¤erent information share a government, while those with di¤erent preferences but similar information do not.
In this section we extend our model by relaxing the assumption that voters are sorted into geographic regions with internally homogeneous preferences. To study optimal boundaries when ideological groups do not naturally coincide with geographic regions, we assume a simple two-fold partition of voters by ideology and information.
Voters have ideological preferences for two distinct public goods L and R. Left-wingers desire the former and have utility u i L;t =ũ i t + log g l;L;t . Right-wingers desire the latter and have utility u i R;t =ũ i t + log g l;R;t . This simple preference structure provides a stylized model of local government consistent with Proposition 2. Preferences over locally provided public goods are highly heterogeneous because public goods that all voters desire homogeneously should be provided by the federal government instead.
Each ideological group comprises voters with di¤erent levels of information. Better informed voters succeed at inferring the incumbent's competence from realized policy outcomes with probability I . Relatively uninformed voters have a lower probability of learning A country is then characterized by the sizes of the four groups L;I , L;U , R;I and R;U . We consider partitions of this overall population into autonomous regions or federal states. Each region is endowed with a budget of b units per resident, so there are no economies of scale. Moreover, a region is the minimal administrative unit, so the regional government is subject to a technological uniformity constraint: it cannot di¤erentiate the provision of public goods across residents.
We begin by characterizing the optimal regional structure when there are no constraints on how citizens can be partitioned into regions. Without exogenous constraints, the optimal partition resolves intuitively the two forces highlighted by Proposition 1. Preference heterogeneity is a centrifugal force that can be accommodated by separating groups with di¤erent ideal allocations. Such optimal segregation re ‡ects Tiebout's (1956) classic intuition. It is typically optimal when there are no economies of scale and no constraints on creating as many regions as there are desired bundles of public goods (Bewley 1981). The novelty of our model lies in the centripetal force caused by di¤erences in information. A partition that achieves homogeneous preferences within each region can nonetheless be suboptimal. Optimality also requires the perfect mixing of like-minded voters with di¤erent levels of information. Citizens su¤er from sharing a government with others with opposite preferences who cause a distributional con ‡ict. They su¤er no less from being cut o¤ from better-informed voters with the same preferences, whose in ‡uence is necessary to keep the local government accountable.
Proposition 3 highlights that an ideologically homogeneous but uniformly ill-informed region is plagued by bad governance. Its government re ‡ects the preferences of local residents, but it is also unaccountable, ine¢ cient and corrupt. This prediction of our model is consistent with evidence from local governments in the United States. City politicians have at times succeeded in creating large local majorities of their poorer and less educated supporters by encouraging the out-migration of a rival higher-status group. The detrimental consequences of his process are best illustrated by the long career of Boston mayor James Michael Curley (Glaeser and Shleifer 2005). Both his policies and his stark rhetoric championed the poor Irish community against the richer Anglo-Saxon Protestants that had previously dominated the city. The end of Brahmin dominance pleased Boston's Irish and removed the discrimination they had su¤ered from. However, Curley's administration was ine¢ cient and corrupt; Boston declined under his government. Similar patterns emerge in other cases of populist local politics catering to particular ethnic and socioeconomic constituencies, such as African-Americans in Detroit under Coleman Young.
The optimal partition described by Proposition 3 has two contrasting features. Tension between the two can entail a welfare loss when groups with di¤erent preferences are separated. Proposition 1 characterized one set of circumstances leading to this outcome. When voters'preferences are not completely distinct, separation is undesirable if di¤erences in voter information are large enough. Another possibility is that perfect separation à la Tiebout is technologically impossible because residents with di¤erent preferences are mixed in a narrow area such as a city or county. If perfect separation is impossible, is partial separation desirable, or is it even worse than perfect integration?
Consider two symmetric atomistic locations. Their total population is identical, but the …rst location has a majority of left-wing residents and the second a majority of right-wing residents. The distribution of the population is characterized by a degree of ideological sorting 2 (0; 1) such that In the limit as ! 0 residents with di¤erent preferences are perfectly mixed, while in the limit as ! 1 there is perfect sorting. Voter information is also symmetric, but not homogeneous across locations. Voters with either preferences have an average probability of being informed in the location in which they belong to the majority. In the location where they are a minority, their information is reduced to (1 ) for a coe¢ cient 2 (0; 1) of information disadvantage. The lower information of the minority re ‡ects endogenous media slant (Gentzkow and Shapiro 2010). Local media choose an ideological bias to match the preferences of the local majority. As a consequence, news consumption becomes more appealing for the majority and less for the minority.
The following result characterizes formally whether political integration or partial separation is optimal when perfect segregation by preferences is impossible.

Proposition 4 Aggregate social welfare is higher under political integration than under separation if minorities su¤er from a high information disadvantage (
). Integration is more likely to be optimal when ideological sorting is less complete (@ =@ > 0) and politicians' ability less variable (@ =@ > 0). Intra-regional heterogeneity entails a new trade o¤. The centripetal force is information heterogeneity of a di¤erent kind than the one underlying Proposition 1. In Proposition 4 there are no di¤erences in average information across regions, so aggregate rent extraction is invariant. There are, however, di¤erences in information between the majority and the minority within each location. Under separation, uninformed minorities are dominated by better informed local majorities. Political integration restores even power to the two ideological groups. Each uninformed minority gains political in ‡uence thanks to the like-minded informed majority in the other location. Thus, political integration can raise welfare even if the e¢ ciency gains from delegated monitoring are absent.
These distributive welfare gains are monotone increasing in the information disadvantage of the minority. If information is homogeneous, separation is the constrained optimum ( > 0). Imperfect ideological segregation remains costly, and minorities bear a greater share of this cost. Yet, political integration merely worsens overall preference matching. At the opposite extreme, if a minority is completely uninformed it is essentially disenfranchised. Then utilitarian welfare maximization requires political integration to protect the minority ( < 1 for all < 1).
Ideological sorting provides a countervailing centrifugal force. As groups with opposite preferences are more and more segregated, the di¤erence in preferences across regions increases and so does the appeal of political separation. In the limit, political separation is always optimal if ideological sorting is complete, as Proposition 3 already established (lim !1 = 1). Finally, as in Proposition 1, greater variance in politicians' ability makes integration less attractive because of distortions in the allocation of talent. 24 Our results speak directly to proposals for the partition of California, which have been put forward several times-most recently, venture capitalist Tim Draper attempted to introduce for 2016 a ballot initiative to split the state in six. By far the largest state in the union, California is composed of several distinct regions. The most salient political divide is between East and West. The di¤erences are both partisan and ideological: Western California is more liberal, even among Republican voters and politicians; Eastern California considerably more conservative (Kousser 2009). At a …rst glance, such a political divide might suggest that a break up of coastal and inland California would be optimal on preference-matching grounds.
Proposition 4, however, cautions against this super…cial assessment. Both the southeastern Inland Empire and the San Joaquin Valley contain a large Hispanic population that overwhelmingly prefers the Democratic party. This group is much less educated, less politically knowledgeable, and less likely to vote than Republican supporters in the region, who are on average older, whiter, and wealthier. 25 At the same time, the left-wing Hispanic working class in the Valley shares the political leanings of highly educated liberals on the coast. This ideological alignment goes beyond mere partisanship and includes shared preferences over policies: "whether they ride in limousines, Volvos, or buses, Democrats in the blue areas of the state share similar policy views" (Kousser 2009, p. 2).
As a consequence, our model suggests that the political integration of California is welfare maximizing. For relatively uneducated inland minorities to have a government corresponding to their preferences, it is essential that they share a state with ideologically aligned liberal elites in the Bay area. Right-wing Californians, instead, are su¢ ciently educated and in ‡uential to have a voice in state-wide politics, despite being in the minority: California had a Republican governor for twenty-one of the past thirty years.
The lesson of Proposition 4 applies more broadly. Disadvantaged ethnic minoritieswhich are less educated and often politically underrepresented-should belong whenever possible to the same polity as better educated and higher-status voters having similar political preferences. Only then are politicians e¤ectively held accountable to both groups.

VI. Conclusion
Is government decentralization the right answer to di¤erences across regions? The idea has gained wide currency, from European Union law enshrining the principle of subsidiarity to independence movements in Québec, Scotland or Catalonia and recurring proposals to split California into separate liberal and conservative states. The classic theory of …scal federalism supports and formalizes the intuitive appeal of this notion: according to Oates's (1972) seminal Decentralization Theorem, decentralization is more e¢ cient than centralization whenever regions are not identical and there are no policy spillovers. This paper o¤ers a di¤erent perspective by focusing on a key overlooked dimension of regional heterogeneity: voters'ability to monitor politicians and hold them accountable. Our model explains why decentralization has often failed to deliver the accountability bene…ts anticipated by its proponents, and why it is more suitable for countries with homogeneous institutional quality, like Germany, than countries with gaping regional disparities, like Italy. When voter information varies across regions, centralization yields accountability gains. The central government is monitored mainly by the most informed regions and as result it has better incentives than the average local government. At the same time, however, its incentives are to serve the informed and neglect the uninformed, so it must be forced to provide at least some public goods uniformly in order to avoid unacceptable distributive distortions. The same force thus drives both sides of a trade o¤: preference heterogeneity prompts decentralization, but information heterogeneity prompts centralization instead.
As a result, the borders of governments should not re ‡ect only the classic Tiebout (1956) logic of sorting by preferences. It is also crucial to ensure diversity of information because uninformed voters are caught between the hammer of unaccountable politicians and the anvil of better informed voters with contrasting policy priorities. The solution is for them to share a government with highly informed voters with similar tastes. Thus, California should not be broken up: the bene…ts of separating the liberal local majority on the coast from the conservative local majority inland seem smaller than those of grouping together the coastal liberal elite with the working-class left-wing minority in the Central Valley.
Our analysis hints that the main problem with state boundaries in the United States is not that states like California are too big and diverse, but on the contrary that many states are too small. In our theory, the costs and bene…ts of fragmentation are driven by observables: respectively di¤erences in voter information and in political preferences. As a …rst step in bringing our model to the data, we computed a rough estimate of the net bene…ts from merging any pair of contiguous American states. We proxied the share of informed voters by that of college graduates, and preferences by presidential vote shares. This simple quantitative exercise suggests that merging the smallest states in the North-East (Delaware, Rhode Island, Vermont) and in the Mountain West (Idaho, Wyoming) with their larger neighbors would yield e¢ ciency gains at a negligible cost in terms of preferencematching. Re-uniting Virginia and West Virginia seems most attractive: the two states have very similar party vote shares, but very di¤erent levels of human capital. Our rough estimate of the welfare gains from a merger has the same order of magnitude as a permanent increase in the annual growth rate of real income per capita by 10 basis points. 26 Our framework also o¤ers new insights on federal systems with multiples level of government. The standard logic of …scal federalism suggests there should be many government layers, so that every policy instrument is tied to its optimal geographic unit. Instead, our theory shows economies of scope in government accountability. A unitary government that controls a large budget and multiple policy instruments su¤ers less from moral hazard than many special-purpose governments, each controlling a speci…c policy and its separate budget. Our model thus explains why the multiplication of government tiers is empirically associated with ine¢ ciency and poor accountability.
Furthermore, we have found that a federal structure can be desirable only if information heterogeneity is large enough. This result sounds a note of caution against the embrace 26 The full details of our quantitative exercise are available on request. of federalism as an answer to independentist movements. Devolution has so far been the preferred strategy in Belgium, Spain and the United Kingdom. However, if English and Scottish voters are equally good at monitoring government performance but prefer di¤erent government agenda, our model suggest that British federalism could be an inferior alternative either to the old model of centralization in Westminster or to full Scottish independence.
Conversely, our analysis shines a positive light on the European Union. Stark di¤erences in institutional quality across member states are perceived as a major problem since the start of the Euro crisis. How can the Union include both virtuous "core"countries like Germany, the Netherlands, or Finland, and the troubled Euro "periphery" of Greece, Italy, Portugal and Spain? Our model shows that such di¤erences in government accountability are in fact a motivating strength of the European project. They explain why we can expect e¢ ciency gains from transferring powers to EU institutions, but also why substantial policy choices should remain at the national level.
In this paper we have developed a theoretical framework rather than focusing on concrete policy instruments, but the allocation of speci…c policies to di¤erent levels of government is clearly an important topic for future research. In this context, our theory may help explain an enduring puzzle: why the European Union does exactly what it does (Alesina, Angeloni and Schuknecht 2005). The division of powers between member states and European institutions is not fully explained by classic considerations of externalities and preference heterogeneity. Our model shows that other considerations are equally crucial. E¢ ciency is maximized by centralizing policies whose understanding by voters varies most widely across countries. Political feasibility may require striking a balance between policies that transfer power to the core and others that transfer accountability to the periphery.

Figure II Corruption and Newspaper Circulation
Corruption is the opposite of the Control of Corruption index from the World Governance Indicators. Newspaper circulation per capita is from the World Development Indicators.

A Extended Literature Review
Early work on the economic theory of …scal federalism took a technological view of the costs and bene…ts of decentralization (Tiebout 1956;Oates 1972). The seminal models assumed exogenously that governments act benevolently in the interest of their constituents, but that di¤erent structures entail di¤erent limitations. A centralized government cannot di¤erentiate public goods across regions. Decentralized governments cannot coordinate to internalize externalities (and may forego economies of scale). More recent studies have used a political-economy approach both to microfound these classic assumptions and to investigate how federalism a¤ects accountability when political agency is imperfect.
A median-voter model of direct democracy explains local welfare-maximizing policy choices if voter preferences are symmetrically distributed around the mean Spolaore 1997, 2003). Any other distribution entails a wedge between the welfare-maximizing policy and the median voter's choice. If preferences are homogeneous within regions and asymmetric across regions, this wedge adds a cost of centralization (Alesina, Angeloni, and Etro 2005). If preferences are heterogeneous and asymmetric within each region as well, centralization can either alleviate or exacerbate the bias due to the wedge between median and average preferences (Lockwood 2008).
Legislative bargaining models account for the ine¢ cient distribution of expenditures by a central government. Even if voters are homogeneous across regions and each region elects a benevolent representative, a distributive distortion emerges because the government can precisely target a minimum winning coalition and provide no public goods to a minority of regions (Lockwood 2002). In this setting, furthermore, voters have incentives for strategic delegation to representatives that do not share their true preferences (Besley and Coate 2003).
Distributive distortions from unconstrained centralization emerge with considerable generality when the central government is subject to political-economy frictions. Harstad (2007) shows that bargaining between governments with asymmetric information leads to costly delays, which can be eliminated by a commitment to uniformity. Thus, his model microfounds both the uniformity constraint for the central government and the inability of local governments to cooperate e¢ ciently. Hindriks and Lockwood (2009) show that harmful targeting of public goods to a minimum winning coalition may occur even if multiple regions elect a single executive instead of many parliamentary representatives.
Such targeted spending does not emerge in the equilibrium of our model because idiosyncratic voter preferences determine an intensive margin of electoral support. If information is homogeneous across regions, it is equally valuable but cheaper for the incumbent to attain the same expected support in two regions than twice the support in a single region. The latter strategy would require doubling the local voters'utility, and thus raising investment exponentially. As a consequence, in our model centralization and decentralization are identical if regions have identical information.
We …nd instead that heterogenous voter information provides a distinct and complementary explanation for the inability of a central government to match public-good provision e¢ ciently to local needs and preferences. Our microfoundation provides particularly strong support for the assumption of a uniformity constraint because Proposition 1 …nds that centralization without uniformity is unambiguously dominated, whereas in bargaining models a uniformity constraint sometimes increases but sometimes decreases welfare under centralization. 27 Corollary B1 in Appendix B also provides a microfoundation for failed cooperation among local governments that is precisely complementary to Harstad's (2007). In his model, politicians are benevolent but asymmetric information generates bargaining frictions. In ours, bargaining is frictionless but politicians are rent seekers. Then career concerns fail to induce local politicians to internalize policy externalities because doing so would raise voter welfare without signalling the incumbent's ability.
Political-agency models of …scal federalism have mostly stressed the accountability ben-e…ts of decentralization. In particular, decentralization can help voters monitor their local governments thanks to yardstick competition (Besley and Case 1995; Belle ‡amme and Hindriks 2005; Besley and Smart 2007). When a local government underperforms its neighbors, voters know they should blame the incumbent's incompetence or corruption rather than exogenous underlying conditions that are correlated across regions. There is no scope for yardstick competition in our setting because voters'uncertainty concerns only the competence of their own government, and not also common economic fundamentals. Myerson (2006) presents a related argument that relies on local politicians'competition for national o¢ ce. In his signaling model, a centralized unitary democracy has multiple equilibria, ranging from no corruption to complete corruption. In a federal system, regional governors are keen on building a reputation in order to run for national o¢ ce. Thus, federalism eliminates the very worst equilibrium, with complete corruption at both levels-though it does not necessarily reduce aggregate corruption given the multiplicity of equilibria in both systems.
Centralization also entails a common-agency problem that makes politicians less accountable to voters in any single region (Seabright 1996;Persson and Tabellini 2000;Tommasi and Weinschelbaum 2007). If the exogenous "ego rents" from holding o¢ ce are higher under centralization, sharper incentives to gain re-election counteract the detrimental e¤ects of common agency, but need not fully counterbalance them (Seabright 1996;Persson and Tabellini 2000).
In our model, the value of holding o¢ ce derives endogenously from rent extraction rather than exogenously from ego rents. Therefore, the proportional increase in the government budget exactly compensates the reduced pivotality of each region. As a result, aggregate rent extraction is identical under centralization and decentralization when voter information is homogeneous across regions. Following Seabright (1996) and Persson and Tabellini (2000), we could assume instead that the value of o¢ ce rises less than proportionally with centraliza- 27 A di¤erence between our model and prior work is that we focus on di¤erent preferences over the allocation of resources across public goods. The literature has typically neglected this dimension and focused instead of di¤erent preferences over the total amount of public goods provided (Lockwood 2002(Lockwood , 2008 tion: b C < bL. Then, as in their models, centralization would increase rent extraction when information is homogeneous across regions. It would reduce it if and only if heterogeneity in voter information is large enough. Prior work has suggested some other channels through which centralization may increase accountability, even when voter information is homogeneous across regions. Unlike Proposition 1, however, these mechanisms are characterized by considerable ambiguity. In models of lobbying, centralization can either decrease or increase the government's susceptibility to capture by special interest groups ( Hindriks and Lockwood (2009) highlight con ‡icting forces in a signaling model of political agency. In their model, some politicians are rent seekers but others are welfare maximizers. In their …rst term, rent seekers may choose to restrain their rents in order to mimic welfare maximizers and fool voters into re-electing them. Centralization reduces voters' ability to screen and dismiss corrupt politicians: rent seekers can exploit common agency and get reelected despite extracting maximum rents in a minority of regions. This loss of accountability is the only consequence of centralization when rent seekers already choose restraint and a chance of re-election at the local level. However, centralization may also incentivize rentseekers to reduce their …rst-term rents in order to gain re-election and extract large rents in a second term. When this incentive is missing under decentralization, either the improved incentives or the worsened screening may dominate the overall di¤erence in accountability. 28 To the best of our knowledge, Section 4 and Proposition 2 provide the …rst study of …scal federalism when di¤erent government tiers control di¤erent policy instruments.

B Pareto-Improving Centralization
Our baseline analysis focuses on the welfare consequences of government structure. Di¤erences in information across regions make political integration desirable both because it yields e¢ ciency gains from increased accountability and because it is a form of progressive redistribution. Uninformed regions reap large gains while informed ones su¤er small losses, as shown in Proposition 1. Such distributional e¤ects of centralization are appealing from the perspective of aggregate social welfare, but they raise a question of feasibility: will informed regions oppose and block optimal integration? This question is particularly relevant in Europe. Propositions 1 and 2 suggest that a federal structure in the European Union may be optimal due to the large disparities in accountability across member states. But why would Finns and Germans agree to a federation whose bene…ts accrue to Greeks and Italians?
In this appendix, we extend our model in two directions that show how political integration can receive unanimous support. First, we allow for public-good spillovers across regions, a classic element of the …scal-federalism literature since Oates (1972). In our model, externalities imply not only-mechanically-that the informed care about public goods in uninformed regions, but also that centralization may increase government e¢ ciency in informed regions too. Alternatively, we discuss how unanimity can be obtained at the expense of welfare maximization, by combining centralization with partial discretionality in publicgood provision.

B.1. Public-Good Spillovers
We introduce externalities with a simple symmetric speci…cation that preserves constant aggregate returns to scale. There is a single composite public good (P = 1) and a resident i of region l has utility where the index 2 [0; 1] measures interregional spillovers. Citizens' mobility within the United States or the European Union provides an intuitive interpretation of this setup. Each agent has a probability of moving, and conditional on a move he has equal probability of moving to each region. Public-good spillovers entail systematic di¤erences between the productivity of the central government and that of local governments.
Proposition B1 Suppose there are spillovers in public goods across regions ( > 0). Then the expected competence of ruling politicians is on average higher under centralization than decentralization (E^ C > P L l=1 E^ D l =L). Aggregate rent extraction is lower under centralization than decentralization regardless of di¤erences in voter information ( C < P L l=1 D l =L). Both e¢ ciency advantages of centralization are increasing in the extent of spillovers (@(E^ C P L l=1 E^ D l =L)=@ > 0 and @ Internalizing spillovers through centralization raises the screening value of elections and thus the expected productivity of elected politicians. Informed voter may support an incompetent incumbent because of his personal likability or ideological a¢ nity, but they are less likely to be swayed by such factors when politicians'skills are more important. Public-good spillovers imply that competence is more important for the central than the local government. The ability of local politicians in ‡uences local public goods only; that of central politicians also determines spillovers from other regions. Therefore, voters are keener on screening for competence at the central than at the local level. This sharper voter focus on competence improves the monitoring as well as the screening value of elections. As a result, public-good spillovers strengthen the accountability gains from centralization: rent extraction declines with political integration even when regions have identical information. Both e¢ ciency advantages of centralization are monotone increasing in the extent of spillovers.
The improvement in politicians'selection and incentives described by Proposition B1 is distinct from the bene…ts of policy coordination that Oates (1972) highlighted as a rationale for centralization. Coordination is re ‡ected in an improvement in resource allocation rather than in government productivity. This additional classic element is also present in our model when we consider both a public good g that generates inter-regional spillovers > 0 and another public good h whose bene…ts are purely local. Then, a resident i of region l has utility where g 2 (0; 1) is the share of resources that would be allocated to the spillover-generating public good by a benevolent planner. Then the equilibrium allocation of resources across public goods is systematically di¤erent under centralization and decentralization Corollary B1 Centralization induces the socially optimal allocation resources across public goods ( C g = g ). Decentralization induces an insu¢ cient allocation of resources to the spillover-generating public good ( D g;l < g for all l). Under-provision is increasing in the size of spillovers (@ D g;l =@ < 0). Incumbents provide public goods merely to showcase their ability to their own constituents. Under centralization, all bene…ciaries of each public good vote for the incumbent's re-election. Then career concerns are exactly aligned with social welfare across goods. Resources are allocated to public goods in proportion to the full social value of each investment and each skill. Under decentralization, instead, career concerns induce every local politician to ignore all spillovers. Externality-inducing goods are under-provided and purely local goods are over-provided instead. Incumbents are uninterested in demonstrating their ability at generating welfare for regions that do not vote for their re-election. As a consequence, centralization entails endogenous gains from policy coordination. Oates (1972) assumed that local governments maximize local residents'welfare but are exogenously incapable of cooperating to reach Pareto improvements. Such a cooperation failure can be microfounded through frictions in bargaining between benevolent local governments (Harstad 2007). Corollary B1 provides the complementary microfoundation. If bargaining is frictionless but local politicians are rent-seeking instead of benevolent, career concerns provide them with no incentives to cooperate in the pursuit of aggregate social welfare. Cooperation is irrelevant for the pursuit of their own goal, re-election.

B.2. Partial Discretionality
If spillovers are modest or absent, is it ever possible to obtain unanimous support for the transfer of powers to a central government? In this context, the regressive distributive consequences of centralization without a uniformity constraint have a silver lining. Discretionality transfers power to the informed. This transfer is welfare-reducing, but it can be the price to pay to buy their support for an e¢ ciency-increasing institutional reforms.
Consider homogeneous, symmetric preferences ( ! 1) over a measure-one continuum of public goods. A resident i of region l has utility Centralization is characterized by an index of discretionality ! 2 [0; 1] such that goods p 2 [0; !] are not subject to the uniformity constraint, while goods p 2 [!; 1] are. By a straightforward extension of Proposition 1, social welfare is maximized by full uniformity (! = 0) and declines as discretionality increases. On the other hand, we can establish the following result.
Proposition B2 Suppose that the variance of politicians'ability is not too high ( 2 2 ). Then there is a level of discretionality! 2 C ; 1 such that centralization with discretionalitỹ ! is preferred to decentralization by every region. The minimum discretionality required for centralization to enjoy unanimous support is lower when voters are more informed (@!=@ < 0) and politicians'ability less variable (@!=@ 2 > 0).
Better incentives for central politicians reduce aggregate rent extraction and thus create an overall surplus. Proposition B2 shows that the incentives of the central government can be …ne-tuned so that all regions share in the e¢ ciency gains from centralization, irrespective of the distribution of voter information. Centralization transfers power over the allocation of a share ! of public goods from uninformed to informed regions. It also transfers accountability from informed regions to uninformed regions, inducing a uniform rent extraction C .
The uninformed gain more from reducing local rents to C than the informed lose from raising local rents to C . Then, if ! C the gain in power is worth more to the informed than their local decline in accountability. But if ! C the loss of power is worth less to the uninformed than their local increase in accountability. When rent extraction and discretionality are exactly matched (! = C ), all regions with l 6 = strictly prefer the endogenous allocation of resources under centralization to the one under decentralization (a region with exactly average information is indi¤erent). Higher voter information implies lower rent extraction by the central government. Then, informed regions require less discretionality to support centralization (@!=@ < 0).
Political integration is also redistributive with respect to screening. Central politicians have average skills above local politicians in uninformed regions, but below local politicians in informed ones. Unanimity requires informed regions to gain enough power to o¤set this progressive transfer through government selection. Therefore, the required discretionality is! > C , and it increases monotonically with the importance of political screening. If the variance of ability were too high, unanimous support for centralization might prove impossible ( 2 > 2 ). However, we view as a natural benchmark the case in which moral hazard is a greater problem than adverse selection in political agency.
The political debate within the European Union, whose treaties are adopted by unanimity of the member states, is consistent with the patterns described by Proposition B2. "Core" countries such as Austria, Finland, Germany and the Netherlands complain about the low institutional quality and the ine¤ective and corrupt politicians in "peripheral"countries such as Greece, Italy, Portugal, and Spain. Such complaints chime with our prediction of declining government accountability and productivity for the more informed regions. At the same time, peripheral countries complain that European policy is largely dictated by core countries and disproportionately caters to their needs and interests. Again, this accords with our prediction of declining policy-making power for the less informed regions. Proposition B2 suggests that intra-European frictions may be manifestations of a Pareto-improving agreement that makes the Union bene…cial for all members, albeit not welfare-maximizing.

Proof of Lemma 1
Taking into account that the realizations of the uniform idiosyncratic shock i are independent across voters, the share of members of group j who vote for the incumbent conditional on the realizations of g t , t and j t equals Taking into account the uniform aggregate shock t , the incumbent's probability of reelection conditional on the realizations of public-good provision g t equals Taking into account the mean-zero competence shocks " p;t , the incumbent's probability of re-election conditional on his policy choices x t (and residually r t ) equals The trade-o¤ between current rent extraction and a value R of re-election leads to policy choices and thus current rent extraction For ease of notation, let 2 2 . (C7) By equation (11), equilibrium rent-extraction is which is decreasing and convex in j . The equilibrium allocation of resources across public goods follows the shares The incumbent is re-elected if and only if Let t be an indicator variable for this condition. The competence of ruling politicians evolves according to^ where the superscripts I and C refer to the incumbent and challenger in the election at the end of period t 1.
The cumulative distribution function of ability^ p;t is Pr ^ p;t = Pr t 1 " I p;t 1 + " I p;t + 1 t 1 " C p;t 1 + " C p;t = Pr t 1 = 1^" I p;t 1 + " I p;t where F " (") is the cumulative distribution function of " p;t and f " (") its probability density function. Since an increase in P J j=1 j j j p induces an increase in^ p in the sense of …rst-order stochastic dominance.
The unconditional expectation of ability^ p;t is The equilibrium utility of each member of group j equals

Proof of Proposition 1
In a polity composed of L regions there are LP public goods: g l;p;t is the provision of public good p in region l at time t. Residents of each region l derive utility from public goods in their own region only: l l;p = l p while l m;p = 0 for l 6 = m. Under decentralization, in each region l a local politician with ability D l;p;t independently invests in the provision of public goods x D l;p;t and extracts rent r D l;t = b P P p=1 x D l;p;t . Equilibrium rent extraction is the expected ability of a local politician is and the relative shares of each local public good are D l;p Welfare in region l is and aggregate welfare is Under centralization a single politician with ability C p;t chooses investment in public goods x C l;p;t for all l. and extracts rents r C t = bL P L l=1 P P p=1 x C l;p;t . We partition the P public goods into two sets. The set U consists of public goods whose centralized provision is subject to a uniformity constraint g C l;p;t = g C p;t for all l. This constraint coincides with a constraint on resource allocation x C l;p;t = x C p;t for all l because ability C p;t is common. The complementary set D consists instead of public goods that the central government can provide in di¤erent amounts to di¤erent regions. Regardless of this partition, equilibrium rent extraction is and the expected ability of a central politician is For expositional convenience, we characterize the allocation of resources under centralization by the shares relative to a region's equal share of net aggregate resources, rather than to the total 1 Welfare in region l is and aggregate welfare is Letting E denote the expected value across a continuum of regions, aggregate welfare under decentralization is while under centralization it is Since the distribution of preferences is symmetric across goods, it is welfare-maximizing to apply the uniformity constraint either to all or to none. If no uniformity constraint is applied (U = ?) then centralization is welfare-reducing because the gain from reduced rentseeking is less than the loss from resource misallocation, even before taking into account the misallocation of ability: Centralization with uniformity (D = ?) is preferable to decentralization (W C W D ) if and only if For a given mean of the distribution of information E l = , the left-hand side can be written as Ef L l ; for a function f L l ; log 1 + 1 deterministic value 1=P and the right-hand side of equation C35 goes to zero. For every non-degenerate distribution of l (i.e., for any …nite ) there is a …nite threshold 0 such that centralization with uniformity is preferable to decentralization if and only if . The threshold is increasing in , and also increasing in because so is the right-hand side of equation C35.

Proof of Proposition 2
The division of powers is described by two indicator variables: 0 = 1 if and only if the central government is tasked with providing the homogeneously desired good; 1 = 1 if and only if it provides the idiosyncratically preferred good.
From equations (C5) and (11), equilibrium rent extraction by a local politician in region l is The politician's expected abilities are Equilibrium rent extraction by a central politician is His expected abilities are E^ C 0 = 0 0 2 and E^ C l = 1 (1 0 ) 2 l L for l = 1; 2; :::; L.
His budget shares given his budget b C are de…ned again with the convention that If he is entrusted with providing the homogeneously desired good he chooses a budget share or budget shares If he is entrusted with providing the idiosyncratically preferred good, he sets a budget share or budget shares Welfare in region l can be decomposed into four components The allocation of resources between the two levels of government has a welfare impact The allocation of each government's budget has a welfare impact Rent extraction by the di¤erent levels of government has a welfare impact The selection of politicians according to their skills has a welfare impact The allocation of the budget between the two levels of government a¤ects welfare only through the term u b l . Every region desires the unique Pareto e¢ cient allocation such that the local-government budget is Uniformity constraints a¤ect welfare only through the therm u l . If 0 = 1, imposing a uniformity constraint on centralized provision of the homogeneously desired public good increases aggregate social welfare by If 1 = 1, imposing a uniformity constraint on centralized provision of the idiosyncratically preferred public good reduces welfare in every region by With the e¢ cient central-government budget and the welfare-maximizing uniformity constraints, With equilibrium rent extraction, With the equilibrium skill of incumbent politicians, Abstracting from di¤erences between sample distributions and population distributions thanks to the assumption of a continuum of regions (L ! 1), aggregate social welfare is Under full decentralization ( 0 = 1 = 0) welfare is Under a federal system ( 0 = 1 and 1 = 0) it is Under full centralization ( 0 = 1 = 1) it is Under a reverse federal system ( 0 = 0 and 1 = 1) welfare would be so this arrangement is dominated by full centralization.
To compare the three undominated government structures, it is convenient to rescale welfare by an additive constant log b+ 0 log 0 +(1 Then welfare under full decentralization is independent of 0 up to the rescaling. Welfare under a federal system is with limits lim Its derivative with respect to 0 is with limits lim 0 !0 @W F @ 0 = 1 and lim It is a globally convex function of 0 : Welfare under full centralization ( 0 = 1 = 1) is with limits lim 0 !0 It is a monotone increasing and concave function of 0 : Its …rst derivative has limits lim 0 !0 There is a threshold D C 2 (0; 1) de…ned by W C ( D C ) = W D such that complete centralization yields higher welfare than complete decentralization if and only if > D C . There is a second threshold D F 2 (0; 1) de…ned by D F > 0 and W F ( D F ) = W D such that a federal allocation of powers yields higher welfare than complete decentralization if and only if 0 > D F . There is a threshold F C 2 (0; 1) de…ned by F C < 1 and W C ( F C ) = W F ( F C ) such that complete centralization yields higher welfare than a federal allocation of powers if and only if 0 > F C .
Therefore, a mean-preserving spread of l increases Ef D F l ; D F ; . At the same time, @Ef D F l ; D F ; =@ > 0 because @W F ( D F ) =@ > 0 = @W D =@ . Hence, D F is increasing in . The de…nition of F C can be written Ef F C l ; F C ; ; = 0, where such that Therefore, a mean-preserving spread of l decreases Ef F C l ; F C ; ; . At the same time, @Ef F C l ; F C ; ; =@ > 0 because @W C ( F C ) > @W F ( F C ). Hence, F C is decreasing in .
In the limit as information becomes perfectly homogeneous (lim !1 l = for all l), which is symmetric around its minimum 0 = 1=2, and lim !1 Thus lim In the limit as information becomes maximally heterogeneous ( ! 0 so Pr ( l = 1) = and Pr ( l = 0) = 1 ), lim !0 W D = lim !0 W F = lim !0 W C = 1, with well-de…ned ratios lim !0 Intuitively, a fraction 1 of regions unavoidably tend towards no provision of their ideal variety of the idiosyncratically preferred public good, but they also tend towards no provision of the homogeneously desired good if and only if its provision is decentralized. Thus . The threshold is increasing in because an increase in shifts down W C while leaving W D and W F una¤ected. Hence, @ F C =@ > 0 and and @ D C =@ > 0, while @ D F =@ = 0.

Proof of Corollary 1
In a federal system 0 = 1 and 1 = 0. Therefore, equilibrium rent extraction is The expected skills of incumbents are E^ C 0 = 0 2 and E^ D l;l = (1 0 ) 2 l , while E^ C l = E^ D l;0 = 0.
The e¢ cient budget allocation is Aggregate rent extraction is such that and Thus, aggregate rent extraction F reaches a maximum at 0 such that For a given mean of the distribution of information, the de…nition of 0 can be written Ef F l ; 0 ; = 0, where f F l ; ; such that Therefore, a mean-preserving spread of l increases Ef F l ; ; . At the same time, @Ef F ( l ; ; )=@ > 0. Hence 0 is increasing in . In the limit case of homogeneous information, lim !1 0 = 1=2. In the limit case of maximum information heterogeneity lim !0 0 > 0 because the threshold satis…es 1 + 0 2 = 1 + [1 + (1 0 ) ] 2 . A mean-preserving spread of l also increases average rent extraction by local governments because D l is a convex function of l : It does not a¤ect C . Therefore, E D l and @ F are decreasing in . Thus, the welfare-maximizing unconstrained partition equalizes the share of better-informed voters across regions with the same preferences.

Proof of Proposition 4
Let the total population be exogenously distributed into regions l 2 f1; 2g and preferences p 2 fL; Rg according to the probability distribution P l;p . Let the average information of each group be l;r . Under separation, the expected utility of each citizen is while under integration it is Eu I l;p = log b + log Thus, welfare under separation is while under integration it is Let the distribution of population be The second derivative is

Proof of Proposition B1
If a voter i in region l has utility The expected ability of a central politician is E^ C = 2 , so Rent extraction under centralization is C = 1 + 1 , so

Proof of Corollary B1
Under centralization, the share of the spillover-inducing good in each region l is with the welfare-maximizing uniformity constraint. Even without a uniformity constraint, so the allocation is socially optimal across goods although not across regions. Under decentralization, such that @ D g;l

Proof of Proposition B2
Under decentralization, region l has welfare The left-hand side is a function f P with @f P @ l = 1 + l 1 !
When ! =!, @f P @ l = 2 (1 + l ) l l " so the only other stationary point of f P is a maximum. f P is monotone increasing in l 2 ; 1 if 2 (1 + ) 1 + 2 .
If (but not only if) this last condition holds, then every region with l 6 = strictly prefers centralization with discretionality! to decentralization.