The Race Disparity Unit (RDU) and Centre for Data Ethics and Innovation (CDEI) began a partnership in March 2019 at the start of the CDEI’s review into bias in algorithmic decision-making. The RDU is a UK government unit which collates, analyses and publishes government data on the experiences of people from different ethnic backgrounds in order to drive policy change where disparities are found. The CDEI drew on its expertise to better understand how algorithmic decision-making could disproportionately impact ethnic minorities. The RDU also fed into roundtables we conducted on algorithmic decision-making in the policing sector.
Events this year have brought inequalities in society to the fore. COVID-19 has demonstrated the compounding impact of inequalities experienced by ethnic minorities across the health, employment and education systems. The current evidence, as published in the Minister for Equalities’ quarterly report on COVID-19 inequalities, shows that it is a range of socioeconomic and geographical factors - such as occupational exposure, population density, household composition and pre-existing health conditions - which contribute to the higher infection and mortality rates for ethnic minority groups.
Data is essential to understanding and addressing these disparities, but data is itself replete with inaccuracies, gaps and biases. The CDEI’s review into bias in algorithmic decision-making explores this issue through various lenses, but ultimately argues that data is crucial in our fight to combat bias and create a fairer, more equal society.
Understanding, rather than omitting, data
In our report, we explore the different ways algorithmic decision-making can perpetuate or exacerbate bias. One way organisations have looked to mitigate algorithmic bias is by omitting particularly sensitive data (often protected characteristic data such as race or sex) from the training dataset used to develop their algorithm. For example, police forces are unlikely to use protected characteristic data, like race, in their algorithms, but may use information like postcode. However, in some areas postcode can function as a proxy variable for race or community deprivation, thereby having an indirect and undue influence on the outcome prediction. This approach has been referred to as fairness through unawareness and is insufficient to manage algorithmic bias.
If biases in the data are not understood and managed early on they could lead to the creation of a feedback loop where future policing, not crime, is predicted. They could also influence how high or low risk certain crimes or areas are deemed by a data analytics tool and potentially perpetuate or exacerbate biased criminal justice outcomes for certain groups or individuals.
Collecting data in order to test algorithms
Fairness through unawareness is not just an ineffective approach, but also prevents organisations from identifying and addressing bias. Through our research we found that some organisations, in both the public and private sectors, were not collecting protected characteristic data for fear either that this would contravene data protection law. However, there are a number of lawful bases in data protection legislation for using protected or special characteristic data when monitoring or addressing discrimination. Another key driver that seems to be stopping organisations from collecting sensitive data is a perception that users would be concerned by or against its collection. However, the polling we carried out for this review suggested that, at least in some specific examples, public support for this was much higher than is sometimes perceived. Indeed, the only way to be sure a model is not directly or indirectly discriminating against a protected group is to check, and doing so requires having the necessary data.
Collection of data on protected characteristics is becoming increasingly common in recruitment as part of the wider drive to monitor and improve recruiting for underrepresented groups. This then allows tool vendors or recruiting organisations to test their models for proxies and monitor the drop-out rate of groups across the recruitment process. When we asked a representative sample of the public if they would mind their personal data being collected to help monitor fair outcomes in recruitment, more agreed with the practice than not.
There is a critical need for organisations looking to use algorithms to assist decision-making to collect data on protected characteristics in order to monitor outcomes and identify bias. In order to address the misconceptions around data protection law we recommend in our report that government work with relevant regulators to provide clear guidance on the collection and use of protected characteristic data in outcome monitoring and decision-making processes. They should then encourage the use of that guidance and data to address current and historic bias in key sectors.
We also set out examples of innovative approaches to address other challenges in collecting this data. In the short term, for example, organisations who find publicly held data insufficient will need to engage in partnership with their peers or bodies that hold additional representative or demographic information. In the private sector this could include industry specific data sharing initiatives, learning from examples such as Open Banking in finance or Presumed Open in energy.
Broader efforts by organisations such as the RDU to improve the collection of data and analysis of ethnicity data will be crucial, to understand the key drivers of ethnic disparities in the UK. The RDU is already doing this by working with government departments and academics to prioritise linkage between health, social and employment data to build a complete picture of ethnic group differences in relation to COVID-19 risk and outcomes.
Further progress in this area is being driven by the RDU through the government’s Commission on Race and Ethnic Disparities and the ongoing work to address COVID-19 disparities. These projects will build vital resources for the public sector to draw on to inform better decision-making.
About the CDEI
The CDEI was set up by government in 2018 to advise on the governance of AI and data-driven technology. We are led by an independent Board of experts from across industry, civil society, academia and government. Publications from the CDEI do not represent government policy or advice.