The Data Protection Act v Machine Learning Algorithms

Matthew Fisher is a doctor and aspiring barrister with an interest and experience in MedTech.

Josef. K the protagonist of Kafka’s novel ‘The Trial’ was an ambitious and successful banker prior to his unexpected arrest. The criminal charges brought against him were never explained because they were beyond the comprehension of all but the most senior judges. Attempting to understand his guilt, consumed K’s every thought – he was distracted at work, subservient to his lawyer and ultimately docile when led to his execution. ‘The Trial’ eloquently argued that transparency is a prerequisite of accountability. In the Age of the Algorithm, Kafka’s novel is now more relevant than ever.

Machine learning algorithms increasingly regulate our lives making decisions about us in finance, education, employment and justice. Ultimately, it will become pervasive in most, if not all aspects of decision making in the foreseeable future. But what is a machine learning algorithm? How does it decide? What rights do data subjects have? This article aims to answer all three of these questions.

What are Machine Learning Algorithms?

An algorithm is a set of instructions which are followed to complete a task. For example, place bowl on table, pour in both cereal and milk, finally eat with spoon. A more complex example from healthcare is the CHAD VASC score. It allows clinicians to make evidenced based decisions when prescribing blood thinners for patients at risk of stroke. The score is comprised of eight separate questions such as age, sex and blood pressure. The answers to these questions are the algorithm’s variables, which determine the CHAD VASC score. Two of the variables – age greater than 75 and having had a previous stroke are double weighted to reflect their significant predictive value.

The CHAD VASC algorithm is the product of research studies performed by human clinicians and the algorithm’s weighted variables are fixed. However, machine learning algorithms require no human input and the weighted variables can change to reflect new data inputs and outputs. It is a form of artificial intelligence because it allows computers to automatically draw inferences when presented with new data, without being explicitly programmed for the task.

A common type of machine learning algorithm is an artificial neural network which imitates the human brain. The neural network functions via interconnected neurons, which are the algorithms weighted variables. The connections between neurons become stronger if the algorithm arrives at the correct answer and weaker if it arrives at the wrong answer. The system has an input layer (i.e. data on age, sex and blood pressure), hidden layers and an output layer (% risk of stroke). There are large numbers of connections between each of these layers which can be refined. With time and large data sets, billions of refinements can develop into an algorithm that is very successful at its given task.

How do Machine Learning Algorithms decide?

A complex machine learning algorithm is one with many variables. In these algorithms, the input and output layers are known but the hidden decision-making layers remain hidden. These complex models cannot be explained in their entirety, which has led to their media label of ‘black boxes’.

To understand why this is the case it is necessary to consider the ‘curse of dimensionality’ from computer science. Data can be represented geometrically – if there are two variables all the data can be displayed on a two dimensional xy graph and with three variables, a three dimensional xyz graph.

However, in complex systems there are thousands of variables, requiring thousands of dimensions. It is important at this point to distinguish between low-dimensional settings such as the three-dimensional physical space of everyday experience, from the high-dimensional spaces which arise when analysing data.

Therefore, in high-dimensional spaces such as complex machine learning algorithms it is possible to have thousands of dimensions. As the number of dimensions (variables) increases, the number of ways in which all the potential values can be combined grows – exponentially.

It is the ‘curse of dimensionality’ that turns complex machine learning algorithms into black boxes because their hidden decision-making layers are beyond human comprehension. Arguably, it is possible to provide an explanation for a specific variable in a complex system. However, this is challenging for several reasons, which are beyond the scope of this article.

The Rights of Data Subjects: The Data Protection Act 2018 (GDPR)

In the US, algorithms are used to make bail, sentencing and parole decisions without human involvement. However, since the enactment of GDPR this should not be the case in the UK.

Section 49 of the Act states that automated significant decision-making by controllers about subjects is unlawful, unless required by law. A significant decision is defined as one which has “an adverse legal effect” or “significantly affects the data subject”. If it is a qualified significant decision (required by law) the subject has one month following receipt of notification to request that the controller either “reconsiders the decision” or “takes a new decision not based solely on automated processing”.

But what of solely automated decisions which are less than significant? For example, a patient whose risk of stroke is determined solely by a machine learning algorithm on the basis that it is significantly more accurate than the human designed CHAD VASC score. This is not science fiction, the Topol Review an independent report written on behalf of the Health Secretary states that

rather than relying on a concept of the normal derived from population studies (i.e. CHAD VASC), AI techniques such as deep learning will be used to define normality for an individual, and hence identify any deviation from it, using that individual’s genomic, anatomical, phenotypic and environmental data, and its variations over time.

This is truly personalised medicine because by combining all the variables that make up you, a stroke prevention management plan can be tailored specifically with you in mind. Personalised medicine will result in considerably better patient outcomes. However, only Dr Algorithm can administer such a complex system. He will make decisions for you and about you. In this, scenario human doctors and health professionals will be akin to the low-level court officials and guards in “The Trial” – merely implementing the unexplainable decisions made by a higher authority.

However, this dystopian / utopian future must first circumvent Section 98 of the Data Protection Act:

Right to information about decision-making.(1) Where — (a) the controller processes personal data relating to a data subject, and (b)results produced by the processing are applied to the data subject, the data subject is entitled to obtain from the controller, on request, knowledge of the reasoning underlying the processing.

S.98 only applies when the processing is done solely by automated means but in the above scenario humans remain ‘in the loop’. However, human involvement can be rendered nominal secondary to “automation bias” a phenomenon whereby humans either over or under-rely on decision making tools. It is fair to assume that human doctors will over rely on complex machine learning algorithms, either through choice or insurance, rendering their involvement in the decision-making process illusory.

If the role of human doctors is proven to be illusory s.98 will apply, requiring the data controller to provide “knowledge of the reasoning underlying the process”. However, as established it is not possible to provide meaningful explanations of the decision-making processes underlying complex machine learning algorithms. If the algorithm is simple having only a few variables combined in a straightforward way it is easier to explain the decision-making process but the algorithm does not perform very well. Therefore, we will end up with a trade-off between performance and explicability. Is a Kafkaesque world a price worth paying for dramatically improved health outcomes?

Conclusion

The technology behind today’s machine learning algorithms is not new, most dates from the ‘70s, ‘80s, and ‘90s. What has changed is the vast quantities of data that corporations and governments store on all of us, feeding the algorithms which make decisions for us and about us. As we leave 4G behind and enter a new world of 5G and the of ‘internet of things’ our data trails will grow exponentially, as will the role that algorithms play in our lives.

The Data Protection Act 2018 has provided subjects with powerful rights and controllers with serious obligations. The law must now determine how best to interpret and implement this powerful piece of legislation. If done well the benefits of machine learning algorithms will lead to a fairer, more prosperous society. If done badly, it will lead to a wildly unequal society and give rise to a new digital aristocracy. A sentiment Stephen Hawking agreed with:

the rise of powerful AI will be either the best, or the worst thing, ever to happen to humanity. We do not yet know which.

The post The Data Protection Act v Machine Learning Algorithms appeared first on UK Human Rights Blog.