Did Residential Racial Segregation in the U.S. Really Increase?
A recent report by the Othering and Belonging institute at UC Berkeley claimed that, of large metropolitan areas in the U.S., 81% have become more segregated over the period 1990-2019. This finding contradicts the recent sociological literature on changes in residential segregation in the U.S., which has generally found that racial residential segregation has slowly declined since the 1970s, especially between Blacks and Whites. The major question then is: What accounts for this difference?
My new WP answers this question, and here’s a quick summary:
The segregation measure of the Berkeley study, the “Divergence Index,” is identical to mutual information, also known as the \(M\) index. This index is mechanically affected by changes in racial diversity. Given that the U.S. has become more diverse over the period 1990 to 2019, it is not surprising that this index shows increases in segregation. It is important to emphasize again that the index is mechanically affected by rising diversity. This means that if only the diversity of the metropolitan area changes, the index will increase. Of course, this doesn’t mean that in every metropolitan area where racial diversity is increasing the index value also increases—clearly, other things could also change. The fact that the index is mechanically related to diversity is also not a statement about the general relationship between diversity and segregation: It could be the case, for instance, that more diverse cities are more segregated. When one uses the \(M\) index to answer such a question, one will almost always find that such a relationship exists, because of the mechanical dependency between diversity and the \(M\) index. In mathematical terms, the simplest way to see the influence of diversity on the index value is to write the index as the sum of three entropies: \(M=E(\mathbf{p}_{u\cdot})+E(\mathbf{p}_{\cdot g})-E(\mathbf{p}_{ug}).\) The first term is the entropy of the neighborhood distribution, the second term is the entropy of the racial group distribution, and the third term is the entropy of the joint distribution. Given that the racial group entropy increases when diversity increases, the \(M\) is clearly affected by rising diversity.
Once I correct for the confounding of index change with diversity using a decomposition method, I find that the results are in line with the sociological literature: Residential racial segregation as a whole has declined modestly in most metropolitan areas of the U.S., although segregation has increased slightly when focusing on Asian Americans and Hispanics. The following plot shows the \(M\) index and the adjusted \(M\) that corrects for the mechanical influence of rising diversity. The \(H\) index is also shown:
Clearly, once we adjust for the mechanical influence of diversity (which the \(H\) also does), segregation in the median metropolitan area is declining. The figure also shows that all indices are suddenly increasing strongly in 2019. Why is this the case? The reason is that for the years 1990, 2000, and 2010, Census data are available. For 2019, only estimates from the American Community Survey are available, which are well known to inflate segregation estimates. Hence, even the increase in the \(M\), which is almost entirely confined to the period 2010-2019, may be spurious and due to the use of ACS data.
For more details, the working paper is on SocArXiv, as well as a complete set of replication materials.
A note on local measures of segregation
Because this came up in the discussion afterwards, here are some remarks on measures of local segregation. The M index (called divergence index in the Berkeley report) can be written as a weighted average of local segregation scores \(L_u\), where \(L_u\) measures the segregation of neighborhood \(u\):
\[L_u = \sum_{g=1}^{G}p_{g|u}\log\frac{p_{g|u}}{p_{\cdot g}}\]
(\(p_{g|u}\) is the proportion of racial group \(g\) in neighborhood \(u\), and \(p_{\cdot g}\) is the overall proportion of racial group \(g\) in the metropolitan area). This measure is the Kullback-Leibler divergence. Once we weight by the size of the neighborhood, we obtain the M index (mutual information):
\[M=\sum_{u=1}^{U}p_{u\cdot}L_{u}\]
The scores \(L_u\) are really useful, but the question is whether they should be used to compare/rank neighborhoods across metros/across time? The issue is unproblematic if we just look at one metro area at one point in time, e.g., to learn which neighborhoods are especially segregated. But what if we want to compare over time/across metros? Then it gets tricky, because local segregation scores are also influenced by the diversity of the metro area. The minimum value of local segregation is zero, but the maximum value is (see if you can guess it) the negative of the logarithm of the proportion of the metro area’s smallest racial group (see here for a proof). If you compare across metropolitan areas, then the range of the local scores will differ if the size of the smallest racial group differs. This makes comparisons really tricky, and I wouldn’t therefore be willing to classify neighborhoods across metro areas as “high” or “low” segregation.
If you want to compare over time, again the decomposition method can be used to adjust for changes in diversity. This map of changes in racial segregation in Brooklyn does this, and shows where segregation increased (red) and declined (blue) in Brooklyn between 2000 and 2010 using diversity-adjusted local segregation scores (source of map).
A final point on the H index: The only difference between M and H is the division by the racial entropy. So we can also define local scores for the H index: These are not normalized, but they work as a decomposition, i.e.
\[L_{u}^{(H)} = \frac{L_{u}}{E(\mathbf{p}_{\cdot g})}\]
and
\[ H = \sum_{u=1}^{U}p_{u\cdot}L_{u}^{(H)} = \frac{M}{E(\mathbf{p}_{\cdot g})}, \]
where \(E(\mathbf{p}_{\cdot g})\) is the racial group entropy of the metropolitan area.
The scores \(L_{u}^{(H)}\) are again useful to understand where in a metropolitan area segregation is lowest or highest, but they are equally problematic when used to compare across metro areas or across time.