Supervised Regionalization Methods: A Survey

This article reviews almost four decades of contributions on the subject of supervised regionalization methods. These methods aggregate a set of areas into a predefined number of spatially contiguous regions while optimizing certain aggregation criteria. The authors present a taxonomic scheme that classifies a wide range of regionalization methods into eight groups, based on the strategy applied for satisfying the spatial contiguity constraint. The article concludes by providing a qualitative comparison of these groups in terms of a set of certain characteristics, and by suggesting future lines of research for extending and improving these methods.

in such a way that the resulting regions are conveniently related to the phenomena under examination.
The Statistical Office of the European Communities (Eurostat 2006) provides a clear differentiation between these two types of regions: Normative regions are the expression of a political will; their limits are fixed according to the tasks allocated to the territorial communities, to the sizes of population necessary to carry out these tasks efficiently and economically, or according to historical, cultural and other factors. Whereas analytical (or functional) regions are defined according to analytical requirements: functional regions are formed by zones grouped together using geographical criteria (e.g., altitude or type of soil) or/and using socio-economic criteria (e.g., homogeneity, complementarity or polarity of regional economies). (para. [4][5] Although the use of normative regions has been a common practice among regional scientists, in many cases statistical inference based on regions of this type may be strongly affected by aggregation problems, such as the ecological fallacy (Robinson 1950), the modifiable areal unit problem (Openshaw 1977a(Openshaw , 1977b(Openshaw , 1984Openshaw and Taylor 1981;Arbia 1989) or aggregation bias (Fotheringham and Wong 1991;Amrhein and Flowerdew 1992;Paelinck and Klaassen 1979;Paelinck 2000).
In this article we focus on methods that can be applied to design analytical regions. We will refer to them as regionalization methods, to use a term that encompasses the wide range of fields in which these methods have been created or applied. 1 The variety makes it difficult to provide a single definition of the regionalization problem, but we can enumerate some characteristics that are common to any method that can be used to define analytical regions: 1. All the methods aggregate geographical areas into a predefined number of regions, while optimizing a particular aggregation criterion; 2. The areas within a region must be geographically connected (the spatial contiguity constraint); 3. The number of regions must be smaller than or equal to the number of areas; 4. Each area must be assigned to one and only one region; 5. Each region must contain at least one area.
Another characteristic of the methods considered here is that they are supervised, in that they assume prior knowledge about the aggregation process: relevant variables for the aggregation, number of regions to be designed, the regional spatial contiguity constraint, and the existence of an aggregation criterion.
The purpose of this study is not to carry out a model competition but to make a qualitative comparison of the characteristics of the different methods that can be used in the context of the regionalization problem (as defined above). So our main contribution is to present a taxonomy of regionalization methods primarily based on the strategy used to satisfy the contiguity constraint, which is the main particularity of all the methods included in this article. Previous related studies are the articles by Fischer (1980), Murtagh (1985) and Gordon (1996Gordon ( , 1999. However, these articles provide in-depth analyses of some of these methods rather than an exhaustive review.
The structure of the article is depicted in figure 1, where the regionalization methods are classified into two main branches. The first branch, discussed in section 2, includes those approaches where the spatial contiguity constraint is indirectly satisfied. Section 3 covers the second branch of figure 1, which includes the approaches where the spatial contiguity constraint is explicitly included within the algorithm. In the last section, we present the main conclusions of this research.

ALGORITHMS WITHOUT AN EXPLICIT SPATIAL CONTIGUITY CONSTRAINT
This section describes regionalization methods in which the contiguity constraint is indirectly satisfied, this is, methods that do not incorporate an explicit procedure for satisfying the spatial contiguity constraint. This condition is usually applied a posteriori.
We classify these methods into two groups: methods that apply conventional clustering and in which the resulting clusters are revised in terms of spatial contiguity, and methods that force spatial contiguity by maximizing regional compactness.

REGIONALIZATION VIA CONVENTIONAL CLUSTERING ALGORITHMS
This is probably the simplest regionalization method. Regionalization via conventional clustering algorithms was proposed by Openshaw (1973) as a methodological approach for regionalizing large data sets, comprising two stages. The first stage applies any conventional partitioning, or hierarchical, clustering algorithm to aggregate areas that are similar in terms of a set of variables. 2 In the second stage, each cluster is revised in terms of spatial contiguity by applying the following rule: if the areas included in the same cluster are geographically disconnected, then each subset of contiguous areas assigned to the same cluster is defined as a different region. Openshaw and Wymer (1995) formalized this method on a step-by-step basis for classifying and regionalizing census data.
Note that the number of clusters defined in the first stage is always smaller than or equal to the number of contiguous regions resulting in the second stage. Thus, adjustments in the number of clusters are required to obtain the number of regions desired. In some cases, this is not possible; for example, an increment (reduction) of one unit in the number of clusters in the first stage can generate an increment (reduction) greater than one in the number of regions in the second stage. Openshaw and Wymer (1995) stressed the fact that regional homogeneity is guaranteed in the first stage. Moreover, this strategy may also help in providing evidence of spatial dependence between the areas. Thus, when the clusters in the first stage tend to be spatially contiguous, this may imply that the classification variables have some spatial pattern.
Another characteristic of this methodology is that it does not impose regional compactness. 3 In this case, the regional shape depends heavily on the spatial distribution of the classification variables and on the clustering algorithm chosen for the analysis. Johnston (1968), Lankford (1969), andFischer (1980) pointed out that the selection of the clustering algorithm is very important for identifying certain spatial patterns. For example, the centroid and Ward's algorithms can easily identify circular and dense spatial patterns, whereas single linkage algorithm is useful to identify elongated spatial patterns.
Finally, when optimizing the aggregation criterion, conventional clustering algorithms like the k-means approach (MacQueen 1967) only allow improving moves. This makes the algorithm converge quickly, mainly because it can easily be trapped in suboptimal solutions. It is also known that the final solution is very sensitive to changes in the initial centroids. One approach to this problem consists of solving it with different initial centroids and then selecting the best solution. Openshaw and Wymer (1995) proposed a simulated annealing variant to this algorithm, 4 which allows nonimproving moves as a way to force the algorithm to explore more solutions and avoid suboptimal ones.

REGIONALIZATION VIA MAXIMIZATION OF REGIONAL COMPACTNESS
Another way to obtain spatially contiguous regions is to force the regions to be as compact as possible. This strategy was introduced in the early 1960s by Weaver and Hess (1963) as a methodological approach to designing political districts. The authors saw the opportunity to adapt the mathematical formulation for solving the warehouse location-allocation problem to the political districting problem. The aim is to select a subset of areas to be region centers (warehouses) to which the other areas (customers) are assigned.
The aggregation criterion consists of maximizing regional compactness by minimizing the sum of the moments of inertia, defined as the product of the population per area and the squared distance from the centroid of each area to the region center to which it was assigned. 5 It is important to note that region centers are not a decision variable but a parameter in the formulation. The only decision variable in the models is the assignation of areas to the predefined region centers.
The formulation also requires exactly equal populations in the regions. To satisfy this constraint, fractional assignations of one or more areas to more than one region must be allowed. An iterative procedure fixes those fractional assignments and recalculates the new regional centers. When the solution without fractional assignments leads to a change of the regional centers, the warehouse location-allocation model is solved again with this new set of centers. The process stops when no change of centers is needed after solving fractional assignments.
The satisfaction of the spatial contiguity constraint is not always guaranteed, since the assignation of the areas to their closest regional center is based on a weighted distance measure (population • distance). Thus, a final inspection of the final solution is required to correct spatial disconnections. Hess et al. (1965) made a more formal presentation of this method. The fragmentation of areas is theoretically solved by relaxing the equal population constraint to a ''nearly equal'' population requirement that allows the regional population to be between a lower and an upper bound. 6 This relaxation makes it possible to formulate the problem as an integer programming model with a decision variable x ij = 1 if the population of area j is assigned to the center i, and x ij = 0 otherwise. x ij = 1 with i = j means that area i is selected as region center. A final revision for spatial contiguity is still required. Kaiser (1966) proposed an aggregation criterion based on a weighted combination of two components. The first component is a measure of population equality, in which the population of each region should be as close as possible to the ratio between the total population and the number of regions to be designed. The second component is a measure of relative geometric compactness, 7 where the shape of each region should be as close as possible to a circle. This relative compactness is calculated as the proportion of the geometric moment of inertia for region j and the moment of inertia of a circle with the same geometric area. The minimum value of this quotient is one, signifying that region j is a perfect circle. For a given solution, the global compactness is measured as the average of the relative compactness for all regions. Kaiser's regionalization procedure starts from an initial feasible solution that is improved, in terms of the aggregation criterion, by moving areas between regions. Two types of moves are allowed: first, moving an area from its region to every other region; and second, exchanging every pair of areas belonging to different regions. 8 Only improving moves are accepted, which means that the process may well be trapped in local optimal solutions and be sensitive to the starting solution. The iteration process stops when no improving moves are possible. Finally, feasibility in terms of contiguity constraint depends on the weight given to the population equality component with respect to the compactness component. Mills (1967) extends the Hess et al. (1965) location-allocation approach by taking into account natural boundaries in the regionalization process in such a way that a region is not split by these types of boundaries. 9 This condition is achieved by performing what he called permanent assignations, which consist of assigning an area to a particular center to avoid this area being assigned to another center located on the opposite side of a given natural boundary. Hess and Samuels (1971) extended the location-allocation methodology to the sales districting model. In this case, the regions are designed to equalize regional workload rather than regional population. The authors prefer the relaxed version proposed by Weaver and Hess (1963), where fractional assignations are adjusted a posteriori. The Hess and Samuels model is also extended by Zoltners (1979) to allow the incorporation of more than one attribute to be equalized across regions. Helbig, Orr, and Roediger (1972) formulated a heuristic and an integer programming model very similar to the one proposed by Hess et al. (1965). The authors assume that each area has a relatively equal population. This assumption makes it possible to replace the two sets of constraints related to the upper and lower bound for regional population with a set of constraints specifying the number of areas per region. Both the number of areas per region and the location of region centers are then adjusted iteratively by running the model several times. This modification avoids having to deal with fractional assignations of areas.
George, Lamar, and Wallace (1997) incorporated modifications to Hess et al. (1965) and presented three new versions of the model. The first one incorporated nonlinear penalty functions for deviations from the target regional population, in such a way that the bigger the deviation from the desired regional population, the bigger the penalty. The second one offers the possibility of inducing the model to create a similar solution to a given preexisting solution by introducing a function that reduces the distance from an area to a given center if they appear together in the preexisting solution. The third one avoids having a region split by natural boundaries by introducing a penalty function that increases the distance between an area and the centers located in the opposite side of the natural boundary. 10 Recently, Bacao, Lobo, and Painho (2005) proposed a methodology that applies genetic algorithms to define the location of the region centers. The algorithm starts by creating an initial set of solutions. Each solution comprises a set of region centers, the assignation of each area to the closest region centers, and a value for the aggregation criterion. With this initial set of solutions, new solutions are created by applying selection, crossover, and mutation operators to improve the aggregation criteria. The algorithm stops when a predefined number of solutions are generated without improvements in the aggregation criteria.
Most of the methods derived from Weaver and Hess (1963) usually place the emphasis on regional equality-for example, population or workload equality, and regional compactness. In these methods, regional homogeneity is assumed to be indirectly reached by imposing compactness. This assumption is related to Tobler's first law of geography: ''Everything is related to everything else, but near things are more related than distant things'' (Tobler 1970, 236). Another common characteristic of these approaches is that they incorporate the concept of regional center, which is usually located in a highly populated area within the region. In some cases, researchers are not interested in having this type of hierarchy in the regionalization process.
An alternative way of regionalizing via compactness, while taking into account regional homogeneity, is to include the x and y coordinates of the centroids of the areas as two additional classification variables. This can be seen as a twoobjective aggregation criterion. The first objective uses nongeographical classification variables (for example, socioeconomic variables) to measure within-region homogeneity. The second objective uses geographical variables (for example, x and y coordinates) to measure the regional compactness. These two objectives are combined to calculate pairwise similarities between areas in such a way that areas geographically near to each other are likely to be assigned to the same cluster. Once the pairwise similarity has been estimated, either hierarchical or partitioning clustering algorithm may be used to aggregate the areas. 11 There are many ways to combine geographical distances and nongeographical dissimilarities into a single pairwise similarity value. Webster and Burrough (1972), Cliff et al. (1975), and Perruchet (1983) proposed different multiplicative and additive forms to combine such elements. These authors also applied hierarchical agglomerative algorithms for merging areas. 12 Wise, Hainings, and Ma (1997) formulated a multiobjective function with three objectives: (1) homogeneity, measured as the within-region variance of a set of attributes; (2) compactness, measured as the distance between the centroid of each area and the centroid of the region it is assigned to; and (3) equality, measured as the difference between the total value of an attribute for all areas in a region and the mean total of this attribute for all regions. These three objectives are then combined in an additive form and the areas are aggregated by applying a k-means clustering algorithm.
The main drawback in formulating a multiobjective aggregation criterion is that we have to determine the proper individual weight for each objective (homogeneity, equality, compactness, etc.). For example, the weight assigned to the compactness objective should be strong enough to guarantee spatial contiguity (Webster and Burrough 1972;Wise, Haining, and Ma 1997;Openshaw, Alvanides, and Whalley 1998). This assignation of weights has been the main source of criticism of this methodology, since it becomes another source of variation in the final solution.
Regional compactness has been a very important characteristic in problems like political districting and sales territory assignment. But it may not be a desired characteristic when the researcher is interested in detecting spatial patterns that can lead to elongated regions. Note also that in these methods there is not much ''freedom'' in the way the relationships (similarities) between areas are defined. These methods require a similarity function directly related to the position of the areas in the geographical space; therefore, the closer the areas to a regional center, the higher the possibility of assigning those areas to that center. Finally, some of these methods represent the areas by their centroids. This representation is often criticized because the final solution is, in most cases, sensitive to the selection of those centroids. This sensitivity can become a big issue when the areas to be aggregated are large (Horn 1995;Martin, Nolan, and Tranmer 2001).

ALGORITHMS WITH AN EXPLICIT SPATIAL CONTIGUITY CONSTRAINT
The methods covered in this section include, within their solution process, additional instruments that ensure the spatial contiguity of each region. This implies that these models require information about the spatial neighboring relationships between areas.
We classify these methods into three main categories: exact optimization models, heuristic models, and hybrid or mixed heuristic models.

USE OF EXACT OPTIMIZATION MODELS FOR THE REGIONALIZATION PROBLEM
The aggregation of n areas into m spatially contiguous regions while optimizing a predefined aggregation criterion is not an easy problem to solve optimally (Altman 1998). Cliff and Haggett (1970), Cliff et al. (1975), and  used combinatorial mathematics to approximate the number of feasible solutions for a regionalization problem. Since this number depends not only on the number of areas and regions but also on how the areas are distributed in the space, these authors focused on providing the lower and upper bounds of the total number of aggregations. The upper bound results from a totally unconstrained formulation where each area can be grouped with any other area. The lower bound tries to incorporate the spatial distribution by assuming the extreme case where the areas form a chain, and each area can only be grouped with its neighbors. 13 However, the impressive progress of computational capabilities has made it possible to provide optimal solutions for problems that only a few years ago were insurmountable. As an example, Bixby et al. (2000) uses a Patient Distribution System problem with 49,944 rows, 177,628 columns, and 393,657 nonzero variables to illustrate the significant improvements in running times through the different versions of CPLEX. With a 296 MHz Sun UltraSparc, the running times went from 57, 840 seconds with CPLEX 1.0 (1988) to 165 seconds with CPLEX 6.5 (1999). 14 The main challenge for solving regionalization problems via exact methods is to find an efficient way to deal with the contiguity constraint. In this section, we describe different strategies for incorporating spatial contiguity constraint in optimization models. Zoltners and Sinha (1983) improved the model proposed by Zoltners (1979). In this new proposal, the authors introduce information into the model about adjacency/contiguity between areas to control the contiguity constraint. There are many alternatives for defining spatial contiguity between areas; in this particular case, Zoltners and Sinha use the road and highway network as a way to represent how areas are interconnected. This representation takes into account natural obstacles such as mountains, lakes, and rivers. Up to this point, the territory has been represented as a network with nodes representing the areas and links representing the roads connecting areas. Next, a sequence of steps is performed to reduce the number of links in the network. First, a set of areas are selected to be region centers, as many areas as there are regions to be designed. Second, the shortest path between each center and other areas is calculated using travel time as link weights. Finally, the solution from the shortest paths is used to obtain information on the number of adjacency levels between each area and the centers. 15 With this information, Zoltners and Sinha formulated a linear optimization model with a decision variable x ij = 1 if region i includes area j, and x ij = 0 otherwise. The key constraint that ensures spatial contiguity is that an area j can be assigned to center i at adjacency level k if there exists a neighboring area of j assigned to the same center at adjacency level k − 1.
Duque (2004) formulated the regionalization problem as a mixed integer programming model. As in the method we just saw, Duque borrows concepts from graph theory to deal with the spatial contiguity constraint. Thus, the areas and their neighboring relationships are represented as a connected graph with nodes representing areas and links representing first order spatial connectivity between areas.
According to graph theory, the aggregation of n areas into m contiguous regions can be achieved by selecting n-m links from the connected graph. These n-m links can be understood as a necessary but not sufficient condition. An additional condition for feasibility imposes that every pair of areas belonging to the same region must be connected by one and only one combination of links. 16 In the literature, this last condition has been referred to as the subtour breaking constraint.
Duque's formulation includes a set of subtour breaking constraints. It is clear that the enumeration of all potential subtours can be extremely difficult and inefficient. An alternative approach to dealing with subtours is proposed by Current, Revelle, and Cohon (1984) for the shortest covering path problem. It consists of solving a relaxed formulation that does not take into account any subtour breaking constraints. Then, if the solution contains subtours, they are eliminated by adding the proper subtour breaking constraints and solving the problem again. This process is repeated until a feasible solution is obtained.
Three sets of binary decision variables are used in Duque's formulation: X ijk 1; if area i and jjj ∈ N i belong to the same region k, with i < j and N i = set of areas j sharing a border with area i, 0; otherwise; The set of variables X ijk are used to select the n À m links, and the set of variables T ij are used to select all the pairwise dissimilarities between the areas assigned to the same region.
The linear objective function is formulated as the minimization of the sum of all pairwise dissimilarities between the areas assigned to the same region,

Min
P n i¼1 P n j¼1 D ij · T ij , with D ij defined as the dissimilarity relationships between areas i and j, with i < j. Later, Duque, Church, and Middleton (2006) formulated three new models for solving the regionalization problem. The three formulations have the same objective function used by Duque (2004), but each one applies a different strategy to satisfy the spatial contiguity constraint. They are named Tour RM , Order RM , and Flow RM .
Tour RM selects n-m links and prevents subtours by adapting the Miller-Tucker-Zemlin (MTZ) constraints with lifting proposed by Desrochers and Laporte (1991). MTZ constraints were originally formulated by Miller, Tucker, and Zemlin (1960) for the traveling salesman problem (TSP). The strategy consists of having a decision variable, u i , that numerates the areas in such a way that area i must have a value lower than area j if there exists a directed link leading from i to j. Whereas the TSP imposes that each area can have at most one entering and one leaving link, Tour RM allows areas to have more than one entering link to provide the flexibility needed to create any regional shape. The set of decision variables are: X ij 1; if the link from area i to jjj ∈ N i is selected, with N i = set of areas j sharing a border with area i, 0; otherwise; Order RM is based on the contiguity constraint formulated by Cova and Church (2000). In this case, the spatial contiguity constraint is achieved by selecting a core area per region. The remaining areas are assigned to one core area by taking into account the spatial contiguity order between the area and its corresponding core area. Thus, area i can be assigned to core area j at order o if and only if there is at least one area in the neighborhood of i that is assigned to j at order o À 1. The set of decision variables are: X iko 1; if areas i is assigned to area j at order o; 0; otherwise; T ij 1; if areas i and j belong to the same region k with i < j; 0; otherwise; Flow RM solves the regionalization problem by adapting Shirabe's model for the spatial unit allocation problem (Shirabe 2005). Shirabe's model is based on network flow theory in which the contiguity constraint is satisfied by designing a connected subnetwork representing fluid movement from multiple sources to a single sink. Flow RM extends Shirabe's model for a single region to solve the problem for m regions. The set of decision variables are: Finally, we should mention two additional approaches to the regionalization problem in the context of optimization models. The first is a nonlinear optimization model formulated by Macmillan and Pierce (1994) where, to guarantee that the n areas assigned to a region are spatially contiguous, it is sufficient to ensure that the (n − 1)th power of their binary contiguity matrix has no zero terms. And second, Garfinkel and Nemhauser (1970) formulated a linear optimization model that requires the enumeration of all the feasible regions as input. Then, the binary decision variable x j is used to select a predefined number of regions, such that each area belongs to one and only one region. Mehrotra, Johnson, and Nemhauser (1998) used a similar approach, but in this case, the set of feasible regions is generated with a column generation technique (Barnhart et al. 1998), which creates new solutions from an initial feasible solution.
Exact formulations for the regionalization problem are computationally intensive and their application is still limited to very small problems. As a very simple illustration, Table 1 presents the results for Tour RM , Order RM , and Flow RM models using ILOG CPLEX 9.0 on a Sun Blade 2500, 500 MHz, 2Gb RAM to aggregate the fourteen counties in Vermont into three and nine regions.

HEURISTIC MODELS FOR THE REGIONALIZATION PROBLEM
This section will cover different heuristic models for regionalizing. These models have been widely applied in the literature since the early 1960s, and they have proved to be highly effective in cases in which a large number of areas are to be aggregated. The main goal of heuristic models consists of finding, in a reasonable time, a solution as close as possible to the optimal, which in most cases is unknown. The ability to escape from local optimal solutions to reach a good solution is a very important component of the heuristic approaches. An additional challenge for heuristic models in the regionalization context is the ability to efficiently move from one solution to another neighboring solution without breaking the spatial contiguity constraint.
This review covers four types of heuristics: (1) heuristics based on hierarchical clustering algorithms where only contiguous regions can be merged, (2) heuristics where each region starts from an area (seed area) to which other neighboring areas are added, (3) heuristics that start from an initial feasible solution and search for improvements by swapping areas between regions, and (4) heuristics based on graph theory.
Adapted hierarchical clustering algorithms. This methodology is another way of using conventional hierarchical clustering methods to solve regionalization problems. In this case, an agglomerative hierarchical clustering algorithm is modified in such a way that only spatially connected regions (clusters) are allowed to be merged.
At the beginning of the algorithm, each area is a region by itself. Further iterations merge one region at a time until reaching a predefined number of regions. The research in this area explores the use of different methods of agglomerative hierarchical clustering, such as single linkage, complete linkage, average linkage, and Ward's method, among others. Two characteristics are explored in these methods: first, their ability to identify different spatial patterns (Lankford 1969;Byfuglien and Nordgard 1973); and second, how the spatial contiguity constraint affects the dendograms, which are a graphical representation of the clustering solutions at different scales (Ferligoj and Batagelj 1982). Other contributions to this topic have been made by Spence (1968), Webster and Borrough (1972), and Margules, Faith, and Belbin (1985), who carry out comparisons between different agglomerative hierarchical techniques when imposing the contiguity constraint; Openshaw (1973), who proposes a way to speed up the regionalization procedure based on hierarchical aggregation techniques by reducing the number of calculations and restricting the size of the similarity matrix by retaining only the similarities between contiguous areas; and Perruchet (1983), who defines spatial contiguity based on the contiguity threshold, which is the maximum distance beyond which two areas are not contiguous.
Hierarchical approaches can be useful when the researcher is interested in nested solutions at different scales (number of regions). This nested solution occurs because when two areas are merged at a given scale they are forced to be together at higher scales. But when the researcher is interested only in a specific scale, this approach may not be the best option, since the solution obtained is conditioned to the solution at every lower scale (Bunge 1966).
Seeded regions. The main characteristic of these heuristics is that each region is the result of selecting one area (seed area) to which other neighboring areas are assigned. The seeds can be generated one at the time, which implies that the next seed is selected after a region is completed; or in parallel, when all the seeds are selected at once.
This methodology was first proposed by Vickrey (1961) for solving districting problems. Vickrey's method starts by selecting an area at random, which is the reference area. Next, the geographically furthest area from the reference area is used as the initial seed for the first region. Then, neighboring contiguous unassigned areas are added to the seed in order of increasing distance from it. The adding process stops when the region satisfies a predefined condition-for example, a minimum population quota. When the first region is finished, the next region is designed by selecting as the initial seed the furthest unassigned area from the reference area. The strategy of having reference areas was implemented by Vickrey as a way to avoid the creation of enclaves, which are the remaining unassigned areas that cannot be a region by themselves. Those enclaves must be assigned to an already created region. Thoreson and Liittschwager (1967) expanded Vickrey's method in several aspects. First, they proposed repeating the regionalization process several times with different reference areas, to explore new solutions. And second, they introduce a modification of the algorithm to be used on a regular lattice.
Later, Gearhart and Liittschwager (1969) introduced other improvements to Thoreson and Liittschwager's algorithm by including new termination rules, new ways to prevent the formation of enclaves, different alternatives to seed the regions, and a more formal, clearer presentation of the algorithm.
The seeded regions approach has also been studied by Taylor (1973), Openshaw (1977aOpenshaw ( , 1977b, and Rossiter and Johnston (1981). A common factor between these contributions is that they select the set of seeds, called ''core areas,'' at the beginning of the algorithm. Then, neighboring areas are added to the cores. The process stops when all the areas are assigned. Rossiter and Johnston's method does not consider the next seed until a region has been finished, whereas Taylor and Openshaw's methods allow the regions to grow simultaneously.
Note that the algorithms derived from Vickrey's contribution assume that the number of regions is known. So, special attention must be paid to the criterion used for stopping the addition of areas to a region. On one hand, if the minimum population per region is set very low, then it is likely that the desired number of regions is reached very fast, with many areas waiting for assignment (enclaves). On the other hand, if the minimum population per regions is set very high, then the algorithm will not be able to create as many regions as wanted.
Modification of an initial feasible solution. The methods covered in this section require as an input an initial feasible solution which is then iteratively modified while searching for improvements in the aggregation criterion. A common factor among these methods is that each iteration must be feasible in terms of the spatial contiguity constraint.
This type of regionalization method was proposed by Nagel (1965) for redistricting problems. The process starts from an initial feasible solution that is modified by moving areas from their current region to another neighboring region. Several conditions must be satisfied before allowing a move: first, the donor region (the region currently containing the area being moved) must have more than one area. Second, the donor region cannot lose its spatial contiguity after removing the area.
Nagel allows two types of moves. The basic one is to move one area at a time. The second type of move involves trading areas, on a one-for-one basis, between two neighboring regions. 17 These moves are allowed only if there is an improvement in the aggregation criterion. The process stops when no improving moves can be performed.
The contiguity constraint is tested by selecting one area within the region. Then neighboring areas belonging to the same region are added until no more areas can be added. If all the areas belonging to the region can be added, then the region is spatially contiguous.
The algorithm automatic zoning procedure (AZP) proposed by Openshaw (1977a) uses the same strategy, but it does not include trading moves. Although the methodological approach was not innovative with respect to earlier contributions, the main contribution made by Openshaw was to use these regionalization models to explore the effects of the modifiable areal unit problem.
Up to this point, all the algorithms were based on a hill-climbing procedure, where only improving moves were allowed. Sammons (1978) highlighted the need to allow these procedures to perform nonimproving moves to be able to explore a wider range of feasible solutions and escape from local optimal solutions, which are very likely to occur when these approaches are used. Sammons also introduced a very interesting additional way to explore more solutions based on merging/splitting regions: the number of merges must be equal to the number of splits to maintain the number of desired regions. In the same book, Openshaw (1978) presents an extension of Openshaw (1977a), where merging/ split moves are incorporated as well. Ferligoj and Batagelj (1982) studied this methodology within the context of clustering with relational constraint. The authors see the trading not only as useful for exploring potential improvements on the aggregation criterion, but also as a way of preserving the number of areas within each region. They also propose several ways to generate the initial feasible solution. Browdy (1990) made an important contribution to this type of model. He formulates a model for the redistricting problem where nonimproving moves are possible by implementing a simulated annealing (SA) structure within the searching process. SA allows nonimproving moves with a probability that diminishes gradually over iteration time. In this algorithm, special attention should be given to the definition of the initial parameters related to the cooling schedule to ensure an appropriate trade-off between the execution time and a good solution. Macmillan and Pierce (1994) also used SA for redistricting problems in an algorithm called ANNEAL. They proposed an alternative method for checking for spatial contiguity called switching points, which proved to be more efficient than their method based on the powers of the binary contiguity matrix presented in the section Use of Exact Optimization Models for the Regionalization Problem. The switching point method does not have to iterate over all the areas assigned to the region; it only focuses on the area that is the candidate to be removed from the donor region and its first order neighbors, including those neighbors assigned to other regions. 18 Empirical evidence on the performance of the switching points method is provided by Macmillan (2001). Openshaw and Rao (1995) proposed two alternative approaches for improving Openshaw's AZP model. In one of the approaches, AZP-SA, nonimproving moves of areas are allowed by using a SA scheme. The other approach, AZPtabu, uses a well-known heuristic procedure called tabu search algorithm, 19 which also attempts to escape from local optimal solutions by allowing nonimproving moves. In AZP-tabu, once an area is moved from one region to another, its reverse move is prohibited for R subsequent iterations. R is then the length of the tabu list and its value is a key parameter in the algorithm. The authors also provide a version of AZP-tabu where the value of R is dynamically adjusted during the searching process. 20 Horn (1995) also took up Sammons' idea of implementing different types of area moves and allowing for temporary changes in the number of regions. Horn modifies an initial feasible solution with three types of moves: zone-at-march, moving an area from its region to a neighboring region; territory-at-march, merging two regions; and territory injection, selecting an area to create a new region. Here the number of regions can deviate from a predefined value to explore new solutions. The allowed maximum deviation from the desired number of regions is gradually reduced to zero to ensure a feasible solution. Finally, Horn stresses the importance of implementing different types of moves and encourages the exploration of moves that involve more than one area at a time. Duque and Church (2004) introduced the algorithm called automatic regionalization with initial seed location, where special attention is paid to the design of a good initial feasible solution before performing a local search by moving areas between regions. The algorithm has two stages. The first stage uses a seeded regions strategy to generate an initial feasible solution. Information on how the aggregation criterion changes through the assignation process is used to make changes in the initial set of seeds. This first stage generates a set of feasible solutions from where the best solution is chosen and then improved in a second stage by applying a local search process based on a tabu search algorithm. Using a good feasible solution as an input to the second stage will reduce the possibility of getting trapped by a local optimal solution and also the number of moves performed during the second stage. 21 Graph theory-based models. Maravalle and Simeone (1995) formulated a heuristic model for regionalization based on graph theory. The heuristic called MIDAS-méthode itérative d'agrégation spatiale represents the areas and their spatial contiguity relationships as a connected graph (G) where the link between nodes/areas i and j means that those two areas share a border. The problem is formulated as follows: ''Given a connected graph G, in which a vector of characteristics is associated with each vertex, find a minimum inertia partition of the vertex-set of G into a prescribed number of connected clusters'' (Maravalle and Simeone, 1995, 625).
The main steps of MIDAS are: first, generation of a spanning tree T of G; 22 second, generation of an initial partition of T by deleting p − 1 links where p is the number of regions to be designed; and third, the set of p − 1 deleted links is iteratively modified by replacing one deleted link by another nondeleted link in T. This replacement, which is a neighboring feasible solution of T, is allowed if the new solution leads to an improvement in the aggregation criteria.
Since, for a given T, the number of neighboring solutions is very restricted, the authors introduce a fourth step called tree-modification which generates a different spanning tree T of G that results from replacing links in T by other unselected links available on G. This new T allows evaluation of new neighboring solutions that are not possible to evaluate from T. This process is repeated until there are no improving moves to perform.
The main advantage of this approach is the fact that spatial contiguity constraint is maintained through the iterations without having to test for it at each move, since deleting any combination p − 1 links from any spanning tree T of G always leads to a feasible solution.

MIXED HEURISTIC MODELS
Under this category, we classified the heuristics that combine heuristic and exact optimization models to solve regionalization problems. The aim of these models is to merge in one model the computational power of heuristic approaches with the mathematical accuracy of exact models. This combination has been made in two different ways: first, the exact model can be applied in a set of neighboring regions to redraw their borders, achieving improvements in the aggregation criteria; second, information obtained from multiple solutions generated by any regionalization heuristic is used to reduce the number of variables and constraints in an exact model.
The first method was proposed by Duque (2004). The algorithm called Regionalization Algorithm with Selective Search (RASS) has as a main assumption that the design of contiguous and homogeneous regions is relevant only if there exist both disparities between the areas, which justify the design of more than one region, and some evidence of spatial dependence, which justifies the requirement of spatial contiguity. If these two properties are present in the data set, then the information available on the relationships between areas can be crucial in directing the search process in a more selective and less random fashion. The algorithm starts by selecting a subset of m neighboring regions. The areas belonging to those regions are passed to an optimization model to reaggregate them into m regions. 23 Next, taking into account information on the relationships between the areas, the algorithm decides which region should leave the set of neighboring regions and which region should be added to the set to run the optimization model again. Thus, the set of regions passed to the optimization model keeps changing throughout the iteration process until a convergence criterion is satisfied.
The aim of RASS is to take advantage of the optimization model by applying it to a set of regions instead of trying to solve the whole problem at once. The local improvements achieved with the optimization model may be more difficult to obtain with an area-swapping scheme. Duque and Church (2004) formulated an alternative way of taking advantage of the optimization model for regionalization problems. The authors propose the adaptation of a well-known heuristic model known as heuristic concentration (Rosing and ReVelle 1997). The central idea of this model is to use information from multiple solutions, obtained by any regionalization heuristic, to reduce the number of variables and constraints in the optimization model. The solution obtained from the reduced optimization model will be at least as good as the best solution obtained from the heuristic. The algorithm has two stages. The first stage runs a regionalization heuristic q times. The best m solutions are taken from the q available solutions. The second stage uses those m solutions to reduce an early version of the Order RM model presented in the section Use of Exact Optimization Models for the Regionalization Problem. Thus, if a set of areas is assigned to the same region in all the m solutions, those areas will be forced to be together in the optimization model. This information can also be used to reduce the index related to the maximum contiguity order (o). Empirical evidence shows that this procedure can reduce, on average, 83.8 percent of the constraints and 84.2 percent of the variables from the fully specified optimization model. Middleton (2006) proposed a similar approach. He formulates a heuristic called heuristic distillation to reduce the Flow RM model proposed by Duque, Church, and Middleton (2006). The algorithm uses information from multiple solutions obtained from any heuristic model. Next, information from the best m solutions is used to reduce the Flow RM model. Information on neighboring areas that are never assigned to the same solution, and on areas that are always assigned to the same region, is used to formulate a reduced version of the optimization model. This process can reduce, on average, 79.53 percent of the constraints and 56.51 percent the variables from the fully specified Flow RM optimization model.

CONCLUSIONS
In this article, we review more than four decades of contributions to regionalization methods, and we have also propose a taxonomy of the different regionalization methods considered in the literature based on the strategy used to satisfy the contiguity constraint. Table 2 shows the main topics considered in the reviewed studies (column 2) following our proposed taxonomy (column 1). Taking this information into account, our aim in this study was to provide a big picture of the alternative methods that can be applied by researchers who see official aggregations as a source of error when analyzing spatial data. This article may also be a good starting point for researchers interested in developing new regionalization techniques.
A first conclusion from this review is that there is no ultimate regionalization technique. Each technique has characteristics that can be desirable in some applications but undesirable in other ones. In table 3 we provide an evaluation of those methods in terms of a set of characteristics. This table can be used as a guide for selecting a suitable regionalization method for a specific analysis.
Another conclusion is that, although there is a high degree of subjectivity in the criteria these methods use, the application of structured and replicable regionalization methods may reduce the criticism that the use of analytical regions usually receives.
could be combined, leading to more powerful algorithms. Second, alternative ways of improving the computational tractability of exact optimization models for regionalization problems are needed. Third, less randomized aggregation processes could be designed by taking advantage of the available data (not only from classification variables but also from the spatial configuration of areas). Fifth, those models based on swapping or trading moves of areas could be extended toward the design of more aggressive moves that allow improvements in the local search process. And sixth, research addressed to assessing the real benefits of using analytical regions is needed.
NOTES 16. For more information on the properties of this (and other) configurations, see Ahuja, Magnanti, and Orlin (1993).
17. When trading two areas, each must touch the neighboring region in an area other than the area being traded. This move cannot break the contiguity constraint of either of the two regions involved in the trading.
18. The authors enumerate the special cases where this methodology may fail. 19. For more information on the tabu search algorithm, see Glover (1977Glover ( , 1989Glover ( , 1990. 20. The possibility of dynamically adjusting the value of R is known as the reactive tabu search (Battiti and Tecchiolli 1994).
21. The number of moves is reduced on average by 36.4 percent with respect to the total number of moves required by AZP-tabu.
22. The authors applied Kruskal's algorithm in this step (Kruskal 1956). 23. Proposed by Duque (2004) and presented in the section Use of Exact Optimization Models for the Regionalization Problem.