Yet another biased algorithm

Image of two pencils and a notebook on a wooden surface. Photo by Skitterphoto on

Every year in England and Wales, there is some sort of A-level controversy. Some years, controversy has arisen when too many students take supposedly “easy” A-levels. In other years, the concern has been that too many students are achieving top grades. These exams are the culmination of a student’s secondary education, and are designed to comprehensively evaluate students on two years of work. They are taken to be a demonstration of skills and knowledge. They also allow universities and future employers to compare (and rank) students and schools.

The news story related to A-levels has rarely been explicitly about bias, even when it has often been implicitly about socio-economic or gender ‘patterns’ appearing in exam results. (“Look! A pattern! How did that get there?”) This year’s controversy, on the other hand, wears its bias on its sleeve.

In 2020, for Covid-19 related reasons, the written exams were cancelled. Instead, the government opted to use an algorithm that purported to calculate how a student would have (might have? could have?) done on their exams. Grades were ‘assigned’ according to an algorithm that used 2017-2019 historic grade distribution at a particular school, the entering grades (GCSE) of the particular class, and (in order to bridge the two) a calculation of how well historic entering grades (GCSE) correlated with historic grade distribution in the 2017-2019 exams.

It turned out that the algorithm adjusted 40% of grades downwards, and that the top students from state-run schools in deprived areas were disproportionately affected by downgrading. Eventually, the algorithm-generated marks were overturned, and the use of the algorithm itself is being challenged as discriminatory, but not without leaving its mark. Of course, the discriminatory pattern was only discovered after the algorithm’s results had been published. At that point, the university admission process proceeded to secondary admissions (through a process called ‘clearing’). A lot of damage was done in a very short period of time as a result of lapsed judgment.

That the algorithm would turn out to be unfair was predictable, and that is part of the basis of the political furore. Ministers and high-level civil servants have resigned or been fired, at least partly because there is evidence that they should have or could have predicted the unfairness: two of the components in the algorithm essentially calculate a student’s mark based on how other students – in previous years – have performed on their exams. As my 6 year old will tell you, it is a basic forms of unfairness to give someone credit or blame for what someone else has done. Even the algorithm component that tied the estimated grade to the particular student used GCSE results, so that how a student had done on another unrelated exam 2 years previously held a significant sway on how the algorithm predicted (suggested?) that they would have (might have? could have?) done on an exam that didn’t, in fact, take place. All of this suggests that anyone who stopped to think about the components of the algorithm could have anticipated the results being unfair.

That the algorithm might turn out to exhibit patterns of bias, however, required a bit more insight and understanding of how these exam results normally work. But it, too, was predictable to anyone familiar with historic and recent patterns of bias in the exam results.

A-level results are fundamental to the university admission process. They are seen as a meritocratic – and therefore neutral, unbiased – ranking system. Some offers of university admissions are conditional on receiving certain results, and conditional admissions are revoked if a certain grade isn’t achieved, at least in a non-pandemic year. So, public outcry number one occurred as soon as it was revealed that some of the published results had been the result of downgrading by an algorithm: many students had their university offers of admission withdrawn, and the spots were offered to students on a wait list, who promptly accepted. Within a matter of hours, the university places were no longer available to the original student whose results had been downgraded by the algorithm. Again, the students whose results were downgraded and whose offers of university admission were rescinded were disproportionately from state-run schools in deprived areas. This was a tangible loss resulting from the use of the algorithm, and it had a disproportionate effect on disadvantaged students. Universities, exam boards, schools, and the governing body for the exam process have been scrambling to repair the damage. Many are hopeful that they will arrive at a solution through a combination of deferred university admissions and extending additional offers of university admissions. But since A-levels have always been used to rank students for university admissions, there is an extent to which the use of the algorithm was an attempt to preserve the ranking function of A-levels in the wake of the cancelled exams. Yet, the algorithm preserved more than that; it also preserved the socio-economic patterns of the outcomes.

In addition to undergraduate admissions processes, A-levels are also used as part of a postgraduate university admissions processes. How you did in an undergraduate degree might be weighed against (or outweighed by) how you had previously done in your A-levels before them.

A-levels are also used as part of many job applications including professional job applications throughout a career. How you did on your A-levels might be taken into consideration when you apply for a job as Barrister, University Lecturer, architect or medical doctor, notwithstanding the fact that all of these jobs require you to have completed further and additional qualifications beyond your A-level results.

These various uses of A-levels reveal that the exams serve a gatekeeping function. If you don’t achieve the right set of marks, the gate will remain closed. Only those who pass a certain threshold will be allowed to pass through the gate of university admissions, or of postgraduate admissions, or of professional qualifications.

My worry is that the effects of A-level bias have broader and longer lasting implications than the annual controversy suggests. A-levels are taken to demonstrate more than just a snapshot of a student’s skills and knowledge on a particular subject. They are interpreted as an objective merit-based ranking of students’ abilities. They are treated as having predictive qualities. The suggestion is that student who achieve high marks are not merely good at taking tests; rather, high achievers at A-levels possess “wisdom” or “knowledge” or “understanding” or even “genius“. These marks are used to compare and rank students or job applicants, and to determine who falls above (and who below) various thresholds. And so, the test’s gatekeeping function persists well beyond university admissions week.

A-level results are sticky. Via job applications, they can stay with an individual for decades. As an academic with a PhD who occasionally applies to jobs in the UK, I have filled out job applications that still ask about my (non-existent) A-level results. What could A-level results possibly reveal about someone applying for a position as a university lecturer? Well, in my case, they reveal that I am a foreigner, since I didn’t complete any A-levels. In Brexit Britain that is no small thing to have to admit at the start of an application process.

In more typical domestic cases, however, A-levels in job applications subtly reveal exactly what the algorithm controversy is about: they reveal class and social markers by naming where A-levels were completed. This is true of the various Old Etonians in government, for example. A-level results might, in this subtle way, reveal geographic or class origins; at least, they tend to reveal whether a student attended a state-school in a working class neighbourhood or a £30,000 private school. And in either case, the implicit information could and likely would taint the ‘neutral’ merit ranking of using A-levels in hiring decisions.

But, implicit bias itself is sticky, so, once a hiring manager or admissions officer (or a hiring manager’s algorithm or AI) knows where you completed your schooling, the neutral merit ranking can also become tarnished by prestige and other accompanying forms of bias.

Barocas and Selbst point out that AI or Data Mining programs in the employment context “tend to assign enormous weight to the reputation of the college or university from which an applicant has graduated, even though such reputations may communicate very little about the applicant’s job-related skills and competencies. If equally competent members of protected classes happen to graduate from these colleges or universities at disproportionately low rates, decisions that turn on the credentials conferred by these schools, rather than some more specific qualities that more accurately sort individuals, will incorrectly and systematically discount these individuals” (Barocas & Selbst 689).

Although the A-level algorithm was a simple (non-AI) algorithm rather than machine learning, many of the concerns that I’ve previously raised about bias in machine learning are present in this case involving a simple algorithm. AI typically ‘learns’ any patterns present in existing data, whether or not the pattern is obvious to the programmers. This algorithm was just a little more explicit – and transparent – about using historic data’s predictive implications to substitute for current judgment.

One lesson that we might draw from the controversy is the reminder that algorithms do exactly what we ask them to do. If there are embedded assumptions in our programming, or if there are embedded biases in the data we feed them, the output will retain those biases. In this case, though, we might also remember that we can use the outputs of an algorithm more and less responsibly. Since the algorithm wears its bias on its sleeve, I am hopeful that the 2020 A-level results will be taken with a grain of salt. But an even better outcome would be a deep reckoning with the uses and purposes served by asking for A-level results.

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s