The History of U.S. Housing Segregation Points to the Devastating Consequences of Algorithmic Bias
This is part of The Ethical Machine: Big ideas for designing fairer AI and algorithms, an on-going series about AI and ethics, curated by Dipayan Ghosh, a former Public Interest Technology fellow. You can see the full series on the .
LAUREN GREENAWALT
PUBLIC INTEREST TECHNOLOGY FELLOW, NEW AMERICA
Algorithms, and algorithmic discrimination, are often presented as recent phenomena. But while big data and computational power have enabled more advanced algorithms and reduced the need for human calculation, simpler algorithms have long been used to allocate private and public goods鈥攁nd not without prejudice. Indeed, as early as the 1930s, algorithms like those used by the Federal Housing Authority to grant or deny federally insured mortgages relied on significantly biased variables. There鈥檚 a strong argument to be made that this system increased residential segregation, hastened the decline of urban neighborhoods, and magnified racial inequality that still persists today.
Before we look backwards, though, it鈥檚 worth noting the two major benefits that elevating examples of historical algorithmic discrimination provides. First, surfacing cases of algorithms and algorithmic discrimination from decades in the past can help demystify these concepts. Second, showing the long-term consequences of algorithmic discrimination can provide current policymakers with lessons that may not be evident by analyzing more recent algorithms, whose effects may yet go unrecognized.
The FHA鈥檚 Use of Algorithms
After the banking crisis of the 1930s, the National Housing Act of 1934 established a (FHA) and tasked it with, among other assignments, insuring privately issued mortgages [1]. FHA-secured loans had more favorable terms than loans available before the creation of the administration: the FHA required that all federally insured loans be fully self-amortizing over a repayment period of at least 25 years, whereas prior loans had shorter repayment periods and typically left the borrower with an outstanding balance on the house. The security afforded by federally insured loans also allowed lenders to lower their required down payments and interest rates, making for far more affordable mortgages. This was true both for the individual mortgages that the FHA insured, as well as for loans made to proprietors of multi-family developments. For those who received FHA-backed loans, 鈥淚t often became cheaper to buy than to rent鈥 [2].
The Housing Act, however, required that the FHA only insure 鈥渆conomically sound鈥 mortgages [3]. Given that the FHA insured loans throughout the country, the administration needed to provide clear standards for local underwriting staff and contractors to determine if a mortgage was economically sound. To standardize these decisions, the FHA outlined a risk-rating approach in an that was distributed to all FHA staff and contractors. The Manual explained the following:
In order to secure uniformity and consistency in decisions, the risk-rating system prescribes that the elements of risk shall be treated by inter-related groups and then integrated into a final result according to a specified procedure. Adherence to the procedure is mandatory. [4]
In other words, the FHA created algorithms to rate the risk of mortgages and required that staff use these algorithms to determine which mortgages would be insured by the federal government.
The FHA likely did not think of its mortgage-insuring system as algorithm-based. However, a detailed look at the risk-rating system shows clear parallels to algorithms used today. Like modern algorithms, the FHA algorithms gathered and weighted a variety of inputs, and returned a score based on those inputs. Similar to current algorithms, the score generated by the FHA had a direct impact on who gained access to a hugely valuable good.
The FHA outlined 28 鈥渇eatures鈥 that it considered 鈥渢he most important ratable elements of risk in the making of a mortgage loan on a dwelling property.鈥 These features were divided into four categories: the property, the location, the borrower, and the mortgage pattern [5].
Local FHA staff and contractors were required to fill out grids (like the one below) in order to rate the property, location, and the borrower. Staff rated each 鈥渇eature鈥 or variable on a scale of one to five according to instructions provided in the Underwriting Manual. Each variable was assigned a designated weight; the small numbers in the top left of the boxes represented the weighted score. FHA staff carried the weighted score into the 鈥渞ating鈥 column and summed the rows to produce a total score, or rating, for the category [6].
The mortgage pattern rating grid yielded a total mortgage score based on information from the three categories: property, borrower, and location. The Underwriting Manual instructed staff members to transcribe the scores of the three other categories onto this card, as well as specified mortgage information. As in the other grids, the table assigned a weight to each feature score, which staff members summed to produce a total rating [7].
This prescribed system, or algorithm, provided three opportunities for loans to be automatically rejected. First, loans were rejected if staff members determined that any of the 28 features deserved a score less than one. To take an example from the borrower category, FHA examiners were instructed to mark an 鈥淴鈥 under the 鈥渞eject鈥 column if a 鈥渂orrower鈥檚 reputation is . . . so questionable that the undue risk would be involved in insuring a mortgage loan.鈥 Second, loans were rejected if any of the three category scoring grids yielded a score of less than fifty percent [8]. Finally, loans were rejected if the total rating, as calculated on the mortgage pattern grid, fell below 50 percent [9]. With each opportunity for rejection, the harms of biased variables compounded.
FHA Algorithms鈥 Anti-Urban, Anti-Integration Biases
The described algorithms and corresponding guidelines for automatic rejection appear objective and innocuous. However, certain variables considered in the rating of the location and the rating of property created systematic disadvantages for mortgages in urban and heterogenous neighborhoods.
The below image shows the rating of property grid that FHA staff and contractors filled out when considering a loan for federal insurance.
Whether intended or not, the features, or variables, under the 鈥渇unction鈥 section of the grid disadvantaged many urban properties and increased the chance that a mortgage would be rejected. For example, the Manual instructed staff to lower ratings for 鈥淟ivability and Functional Plan鈥 for any properties with a 鈥渄ark or poorly ventilated room鈥 [10]. In rating 鈥淣atural Light and Ventilation,鈥 staff members were instructed to consider 鈥減roximity to adjoining buildings鈥 [11]. Unfortunately, urban properties often failed to sport these elements of supposed function. As Kenneth T. Jackson wrote in the seminal book Crabgrass Frontier, 鈥淲hile such requirements did provide light and air for new structures, they effectively eliminated whole categories of dwellings, such as the traditional 16-foot-wide row houses of Baltimore, from loan guarantees鈥 [12].
Per Jackson鈥檚 point, mortgages for some typical urban housing structures were outright rejected based on a single variable. Even if this didn鈥檛 disqualify the applicant right away, the variable would be scored lower, which decreased both the category and overall mortgage rating, thus increasing the chance that the mortgage would ultimately be denied for federal insurance.
The location-rating algorithms also created barriers to federally insuring mortgages in urban neighborhoods [13]. The 鈥淩elative Economic Stability鈥 and 鈥淧rotection from Adverse Influences鈥 variables comprised more than half of the potential score. The Manual instructed FHA staff to rate 鈥淩elative Economic Stability鈥 of the location based on the occupations of the people in the neighborhood.
It noted that 鈥渓aborers鈥 were the lowest class of workers; neighborhoods where many residents were 鈥渓aborers鈥 were to be scored lower than neighborhoods with residents who had higher-status jobs. The Underwriting Manual itself acknowledged that this would make it more challenging for people in urban neighborhoods to secure federal insurance for mortgages. A paragraph in the instruction section noted that 鈥渁 large percentage of the employed population of a city is found working in the capacity of laborers鈥 [14]. By including a variable that would necessarily lower the score of urban neighborhoods, the rating of location algorithm, and the dependent rating of the mortgage pattern algorithm, again increased the chance that mortgages in urban neighborhoods would be rejected.
The 鈥淧rotection from Adverse Influences鈥 variable likewise biased the algorithms against urban neighborhoods. The Manual listed a variety of 鈥渁dverse influences鈥 that would lower the score for this variable. These 鈥渋nfluences鈥 such as nearby businesses, nearby schools, or 鈥渙ffensive noises and odors鈥 were common in urban neighborhoods [15]. As Jackson explains, 鈥淧rospective buyers could avoid many of these so-called undesirable features by locating in suburban sections鈥 [16].
More perniciously, the rating of location algorithm, and therefore the total mortgage score algorithm, also disadvantaged diverse or integrated neighborhoods.
More perniciously, the rating of location algorithm, and therefore the total mortgage score algorithm, also disadvantaged diverse or integrated neighborhoods. The Manual explicitly labeled the presence of 鈥渋nharmonious racial groups鈥 as an 鈥渁dverse influence鈥 [17]. It meanwhile instructed higher scores for locations that had natural or artificial barriers that protected from such inharmonious groups to form. The Manual even encouraged higher scores when racially restrictive covenants or deed restrictions were incorporated into the mortgage. Potential mortgages lacking these features were penalized, thus increasing the chance that the FHA would reject the insurance application of a mortgage in a diverse, or potentially diverse, neighborhood [18].
The FHA algorithms are no doubt a historical example of algorithmic discrimination. The input variables included in the rating grids strongly biased toward rejecting federally insured loans in urban or heterogeneous neighborhoods.
FHA Algorithms鈥 Legacy: Urban Decline, Residential Segregation, and Racial Inequality
The FHA algorithms鈥 anti-urban and anti-integration biases had penetrating and enduring effects, leading to mortgage insuring patterns that drove urban decline, residential segregation, and racial inequality. The United States is still grappling with their devastating impacts today.
The algorithms, as they were designed, produced a predictable result鈥攖he FHA insured far more loans in the suburbs than in urban neighborhoods. In the first 20 years of the FHA, for example, the suburbs of St. Louis County received five times the amount of FHA investment as did the city, when measured as number of loans or per-capita dollars insured [19]. Some cities fared even worse when compared to their suburbs. FHA insured zero mortgages in Newark and Patterson from the FHA鈥檚 inception to 1966 [20].
Previous research shows that the relative availability of FHA-insured mortgages in the suburbs led many city residents to move away from urban centers. Of the FHA-backed loans in St. Louis County, for example, over half were held by people who had most recently lived in the city [21]. This out-migration hurt the neighborhoods and neighbors left behind. As Jackson writes, 鈥淭his withdrawal of financing often resulted in an inability to sell houses in a neighborhood, so that vacant units often stood empty for months, producing a steep decline in value鈥 [22].
The suburbs were not equally accessible to people of all races.
But the suburbs were not equally accessible to people of all races. Therefore, the urban/suburban disparities in FHA-insured loans intensified racial segregation. As noted earlier, FHA algorithms were more likely to approve mortgages for federal insurance if the neighborhood or the mortgages presented a barrier to integration. Contractors, who sought FHA backing for full developments, added deed language to keep developments segregated and therefore increase the chance that their mortgage would be federally insured [23]. Black people, in larger part, were left out of the growing suburbs [24]. A pattern of white suburbs and black urban centers began to emerge, as black/white segregation in metro areas rose 40 percent between 1910 and 1940 and continued to grow鈥攖hough at a slower pace鈥攗ntil 1970 [25].
The FHA eventually removed language instructing lower scores for mortgages in areas with 鈥渋nharmonious racial groups,鈥 and in 1948 the Supreme Court deemed the racial covenants that were incentivized under the algorithms to be unconstitutional. However, these adjustments did not remedy the damage caused by the bias in the original algorithms. Richard Rothstein explains that houses in the suburbs appreciated in value and rapidly became unaffordable to those who were not able to secure mortgages during the early days of FHA-insurance. As he writes, 鈥淏y the time the federal government decided finally to allow African-Americans into the suburbs, the window of opportunity for an integrated nation had mostly closed鈥 [26].
The residential segregation partially spurred by FHA algorithms has been stubborn and resistant to intervention. Brookings Institution鈥檚 shows that in order for urban neighborhoods to be fully integrated鈥攖hat is, to have a proportionate number of black and white residents鈥攎ore than half of the black population would need to move to a different neighborhood [27]. Residential segregation has, expectedly, bled into other aspects of life. K-12 schools are more segregated today than they were 40 years ago [28], and racial segregation has been identified as a [29], [30], and the [31].
A Call to Monitor Modern Algorithms
Today, the government deploys algorithms for a wide range of uses. They are employed to determine eligibility for a variety of public benefits, to decide whether a defendant should be released before trial or detained, and to allocate firefighters or police officers to neighborhoods. The historical case of FHA mortgage rating shows the potential for government-deployed algorithms to systematize bias with devastating results. In mortgage risk-rating grids, both explicitly discriminatory variables and variables that appeared to be objective biased FHA algorithms against urban, heterogeneous neighborhoods. Though it may have been difficult to predict how influential these algorithms would be in contributing to urban decline, residential segregation, and racial inequality鈥攃onsequences that appear indefensible by current standards鈥攖oday鈥檚 goal must be constant vigilance and forethought. First, these insights should compel current policymakers to evaluate variables in government-deployed algorithms for potential bias.
Though it may have been difficult to predict how influential these algorithms would be in contributing to urban decline, residential segregation, and racial inequality鈥攃onsequences that appear indefensible by current standards鈥攖oday鈥檚 goal must be constant vigilance and forethought.
Policymakers should engage the public and subject-matter experts to evaluate variables used in algorithms that are currently deployed or may be implemented in order to determine the likelihood that use of the algorithm will systematize bias.
In any such analysis, policymakers should consider the following questions:
- Do any variables in the algorithm explicitly preference or disadvantage a particular identity, race, gender, class, or geography?
- What is the correlation between each variable in the algorithm and particular groups of interests such as race, gender, class, or geographic location? Are any of the variables so closely correlated with a group that they serve as a proxy for that group?
While analyzing algorithms鈥 variables is important, policymakers may not be able to fully gauge the disparate impacts or negative consequences of an algorithm until it鈥檚 deployed. Therefore, policymakers must also track any disparities and unintended consequences after deployment. At a minimum, they should monitor deployed algorithms closely enough to answer the following questions:
- Once implemented, does the algorithms yield systematically different results for different groups? In other words, does the algorithm result in preferences or disadvantages for a particular identity, race, gender, class, or geographic group?
- What are the consequences of the disparity created by the algorithm? What ripple effects does the disparity create?
The suggested analysis of algorithms鈥 variables and ongoing monitoring of algorithmic consequences could help surface potential cases of bias and discrimination soon enough to prevent devastating results. However, it鈥檚 impossible to prescribe how policymakers should respond to the myriad potential results of each analysis. While a review of the FHA risk-rating system certainly shows that the algorithms required reform, analysis of other algorithms may not yield an equally clear direction. In certain cases, analysis of variables may find one that is both critical to the functioning of an algorithm and highly correlated with a particular group. In other cases, the disparate impact of an algorithm may need to be weighed against other values.
In certain cases, analysis of variables may find one that is both critical to the functioning of an algorithm and highly correlated with a particular group.
The complexities of analyzing algorithms points to the importance of consulting both subject-matter experts of all kinds and the public in the review process, and in decisions based on these analyses. For instance, policy-area experts can help evaluate particular variables and surface unintended consequences of algorithms as they are deployed, while algorithmic decision-making experts can provide frameworks and tools to [32], to [33], or to [34]. The public, especially, must be able to shape algorithms that allocate public services or goods and should thus be consulted as the ultimate experts.
The FHA algorithms didn鈥檛 simply reject mortgages鈥攖hey rejected people from receiving mortgages. Algorithms deployed by government today also affect human lives. Only careful analysis of these algorithms can prevent the type of harm caused by those used in federal housing policy decades ago.
References
- 鈥淐reation of Federal Housing Administration,鈥 1246 搂 847 (1934), Sections 1鈥2, .
- Kenneth T. Jackson, Crabgrass Frontier: The Suburbanization of America (New York: Oxford University Press, 1984), 205.
- Federal Housing Administration, Underwriting Manual: Underwriting and Valuation Procedure Under Title II of the National Housing Act, 1938, Section 203I, available at https://babel.hathitrust.org/cgi/pt?id=mdp.39015018409253;view=1up;seq=1.
- FHA, Underwriting Manual, Pt I, Paragraph 214.
5, FHA, Underwriting Manual, Pt I, Paragraph, 221鈥222.
- FHA, Underwriting Manual, Pt I, Paragraphs 224鈥228. More detail on which staff filled out which section is available in Paragraph 224.
- FHA, Underwriting Manual, Pt I, Paragraphs 301鈥310.
- FHA, Underwriting Manual, Pt I, Paragraphs 227鈥230.
- FHA, Underwriting Manual, PI, Paragraph 206.
- FHA, Underwriting Manual, Pt II, Paragraph 133.
- FHA, Underwriting Manual, Pt II, Paragraph 149.
- Jackson, Crabgrass Frontier, 208.
- Economic stability comprised 40 percent of the category score. Weighted scores were left off of the grid as FHA Insuring Offices determined a maximum score that a neighborhood could receive given the large metropolitan district to which it belonged (FHA, Underwriting Manual, Pt II, Paragraph 203鈥217).
- FHA, Underwriting Manual, Pt II, Paragraph 219.
- FHA, Underwriting Manual, Pt II, Paragraph 232.
- Jackson, Crabgrass Frontier, 208.
- FHA, Underwriting Manual, Pt II, Paragraph 229.
- FHA, Underwriting Manual, Pt II, Paragraph 226鈥229.
- Jackson, Crabgrass Frontier, 210.
20. Jackson, Crabgrass Frontier, 213.
21. Jackson, Crabgrass Frontier, 209.
22. Jackson, Crabgrass Frontier, 213.
23. Richard Rothstein, The Color of Law: A Forgotten History of How Our Government Segregated America (New York: Liveright Publishing Corporation, 2017), 77.
24. Rothstein, The Color of Law, 67.
25. Douglas S. Massey, 鈥淩esidential Segregation and Neighborhood Conditions in U.S. Metropolitan Areas,鈥 in America Becoming: Racial Trends and Their Consequences 1 (2001), https://www.nap.edu/read/9599/chapter/14.
26. Rothstein, The Color of Law, 129.
27. William H. Frey, 鈥淐ensus Shows Modest Declines in Black-White Segregation,鈥 Brookings Institution, 2015, https://www.brookings.edu/blog/the-avenue/2015/12/08/census-shows-modest-declines-in-black-white-segregation/.
28. Rothstein, The Color of Law, 179.
29. David R. Williams and Chiquita Collins, 鈥淩acial Residential Segregation: A Fundamental Cause of Racial Disparities in Health,鈥 Public Health Reports, 2001, available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1497358/.
30. Sean Reardon, 鈥淪chool Segregation and the Racial Academic Achievement Gaps,鈥 Center for Education Policy Analysis, Working paper, 2015, 15鈥22, https://cepa.stanford.edu/sites/default/files/wp15-12v201510.pdf
31. Thomas Shapiro, Tatjana Meschede, and Sam Osoro, 鈥淭he Roots of the Widening Racial Wealth Gap: Explaining the Black-White Economic Divide,鈥 Institute on Assets and Social Policy, 2013, https://iasp.brandeis.edu/pdfs/Author/shapiro-thomas-m/racialwealthgapbrief.pdf.
32. See Joshua A. Kroll et al., 鈥淎ccountable Algorithms,鈥 University of Pennsylvania Law Review, 165, (2017) available at https://scholarship.law.upenn.edu/penn_law_review/vol165/iss3/3/.
33. See Ethics & Algorithms Toolkit at http://ethicstoolkit.ai/
34. See Sam Corbett-Davies et al., 鈥淎lgorithmic Decision Making and the Cost of Fairness,鈥 in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, https://5harad.com/papers/fairness.pdf, for an example of such a framework.