Anonymisation of data is neither a corollary of privacy protection nor is it an oxymoron to the idea of privacy. Instead, it is more likely a gateway to a possible privacy breach which has not been addressed in the government’s Personal Data Protection Bill, 2019 .
The Bill which is heralded as a much-needed safeguard to rein in the digital Wild West is an embodiment of the constitutional spirit of privacy, evoked by the Puttaswamy-I case. It seeks to protect the personal data of individuals collected by companies and the state by laying down a comprehensive framework for processing such data. Accordingly, it also outlines what forms of data processing are exempt from this framework.
One such exemption is the processing of anonymised data. The Bill excludes the coverage of anonymised data, except under Section 91 which covers the use of anonymised and non-personal data by the central government for targeted delivery of services and formulation of evidence-based policies.
In view of the sweeping exemption of anonymised data and its effects on privacy, it is important to understand its meaning, historical treatment and the requisite safeguards necessary for its protection.
What is anonymised data?
Anonymisation is a technique applied to personal data to completely strip it of its characteristics, traits, nature and identifiers for possible identification of individuals. On account of this, it is usually not covered by the legislation protecting personal data as it is considered not to affect the privacy of an individual.
The Bill understands ‘anonymisation’ distinctly from ‘de-identification’. Broadly, anonymisation is subject to a regulatory standard of irreversibility while de-identification is carried out to mask identifying data in accordance with the code of practice prescribed by the Data Protection Authority (Authority). Since de-identification can be reversed, its reversal without adequate consent and transparency to the data principal is now a punishable offence. However, anonymisation is still premised on irreversibility and the impossibility of identification.
Generally, identifiability of an individual is considered as a spectrum with identification, on one end and perfect anonymity, on the other. The legal protections also correspond with this understanding; personal data and its privacy is protected by law while no equivalent protection is granted to anonymised data. However, the efficacy and possibility of anonymisation itself are considered suspect by many. Some researchers argue that data can be truly anonymised only by deletion while others argue that various technological tools can be used to achieve a practical degree of anonymity. In the same vein, there is also a growing recognition of the trade-off between the utility and privacy of a dataset.
Is it possible to truly anonymise data?
This article seeks to question the assumption of irreversibility and perfect anonymity attached to such data. Over the last decade, substantial research has pointed towards the shaky standing of anonymised data. For example, even during the 1990s, an MIT graduate student named Latana Sweeney, identified the governor of Massachusetts from three data points in an anonymised database.
The European General Data Protection Regulation (“GDPR”) assesses anonymisation on a standards based approach. It assesses the singularity, linkability and inference that can be drawn from an anonymised set. This refers, respectively, to the ability of the dataset to remotely identify an individual, link with other datasets to identify an individual or draw inferences from the dataset. The Article 29 Working Party, established under the ertswhile data protection regime in Europe, had highlighted the potential of all possible techniques of anonymisation falling short of the standard in one situation or the other.
It assessed various techniques used to anonymise data such as randomisation, generalisation, aggregation etc. and concluded that depending on the technique used, the data may be subject to re-identification, when processed and combined with other datasets. The risk of re-identification arises if certain data points are such as to indirectly identify an individual, in isolation or in combination with more data. Thus, effective anonymity may be hard to ensure in practice.
In this light, it is subsequently explored if anonymised data retains some value for privacy and if individuals should continue to have a right in it.
Reasonable expectations of privacy in anonymised data
In view of the idea of the residual risk and traits of personhood that anonymised or non-personal data always retains, its coverage in the Bill through Section 91 has the potential to lay down interesting jurisprudence regarding the contours of reasonable expectations of privacy. While the right to privacy is attached to personal data, the aim of this article is to suggest that a residual privacy right also exists in anonymised data.
This is because personal data, under the Bill, includes the derivatives of personal data or ‘data about or relating to a natural person who is directly or indirectly identifiable’ which may also arise in ‘combination with other information’. Anonymised data used to target delivery of services, coupled with the risk of de-anonymisation, invariably renders such data as an extension of personal data. It is important to examine if there arise reasonable expectations of privacy in such data, especially its use.
This enquiry is important as the right to privacy extends only upto where it can be reasonably expected to extend. For example, there is no right to privacy in the investigation of a personal diary of a criminal in which he/she has made a personal confession of a crime, subject to a warrant. On the other hand, a right to privacy and bodily autonomy extends to my face and movements as are currently recorded by CCTV cameras in public spaces.
This is a standard and test derived from American jurisprudence which suggests that the constitutional privacy protection for an individual is derived by balancing an objective component of privacy against the subjective expectations of that person. While Justice Nariman rejected this test in the case of Puttaswamy-I, it was endorsed in Puttaswamy-II and currently lays down the dominant strand of interpretation for privacy law in India.
The use or processing of anonymised data carries within it the risk of being de-anonymised and turning into personal data. It can be argued that the risk of it being misused is a mere possibility and is not a sufficient reason for recognition of privacy in such data, especially when it is normally understood to be irreversible and thus, protected. There are two responses to this presumption of relative sanctity of anonymised data; firstly such data may not need to be subject to the same level of privacy protection as personal data. The protection needs to be graded to ensure protection of the principal, by laying down strict standards of anonymisation and punishing de-anonymisation. Secondly, since the privacy right subsists in the culmination of the risk of de-anonymisation, namely, creation of personal data, it is necessary that a more nuanced regulatory framework is applied. Meanwhile, it must be kept in mind that both the state and private parties are involved in usage of non-personal data and in making data-based decisions that affect us, individually or collectively.
It may also be argued that an individual does not have a subjective expectation of privacy in anonymised data, by virtue of its nature, and thus the question of carving out reasonable expectations of privacy does not arise. This does not hold much validity because the balance leads to a consideration of the objective expectation in the absence of a subjective expectation of privacy. For example, a state university announcing and disclosing the details of a top scorer to newspapers does not imply that the person did not have the right to privacy in such information. While he/she may not want to conceal or hide (or protect) such information, it is legally protected.
The entire construct can be further looked at from another perspective. Personal data also includes data which indirectly identifies an individual. This may be done using certain specified traits or in combination with other information. The degree of indirect identifiability is not explained or laid down in the Indian context yet. To that extent, any semblance of recognition of a person in an anonymised dataset may overlap with indirectly identifying personal data where reasonable expectations of privacy naturally subsist. Thus, the authority would also do well to lay down the extent of indirect identifiability in contrast to anonymisation.
The impact of de-anonymisation on an individual under the Bill
This enquiry arises as the envisaged use of non-personal data, under the Bill, opens up a wide range of possibilities of public use of anonymised data. Even otherwise, anonymisation was generally considered a legal way out for companies to circumvent the application of law. In view of these practices, the primary concern is what happens if the data is de-anonymised by any kind of processor/fiduciary, after further processing – intentionally or otherwise?
The moment anonymisation is removed from data, it becomes personal data and falls within the purview of the Bill. The ex-ante compliance with the irreversibility standards simply allows the conversion of personal data to anonymised data. There are two possibilities in the event of de-anonymisation; the fiduciary complies with the Bill or it does not. These options are more pronounced in the case of de-anonymisation because there is no way for individuals or the authority to know that the data has been compromised.
It is within the exclusive domain and knowledge of a fiduciary. Envisaging this possibility, the private member Data (Privacy and Protection) Bill, 2017 provided the right to individuals to be informed of a personal data breach arising due to de-anonymisation. Similarly, an ex-post sanction on re-identification de-identified data which includes anonymized data, has been put in place in the UK Data Protection Act, 2018. This is necessary to allocate responsibility where it is due. The processor or fiduciary which collects the data should comply with the irreversibility standard while the ultimate processor which handles the data and re-identifies it should be sanctioned for the negligence and offence.
As things currently stand, there is no way for individuals to be informed that data which was once part of an anonymised dataset has been de-anonymised and is being used for identification or profiling. To an extent, a data principal or an individual may obtain information from a data fiduciary under Section 17. However, this assumes active attention expended by an individual to track informational privacy, something which the society is grappling to understand in real terms.
The exercise of Section 17 by the data principal, to identify a fiduciary which may be using an erstwhile anonymised dataset, is a stretch of both the imagination and the provision. Due to the problem of lack of incentives and oversight for processors and fiduciaries to maintain the integrity of anonymised data, it is important to ensure efficient checks by auditory oversight of the authority and an ex-post sanction to curb de-anonymisation.
Section 82 of the Bill, in its current form, only punishes reversal of de-identification, with no sanction for reversal of anonymisation. It also punishes such re-identification with no exemptions for the research community. This has a perceptible chilling effect and is an inadequate safeguard to protect the anonymised data of individuals. Thus, all forms of sanction on re-identification must expressly guard against this possibility.
The way forward
The extent of possibility of de-anonymisation can be effectively curtailed by the irreversibility standards laid down by the Authority. But if the global lessons are anything to go by, it is hard to imagine a standard which will ensure complete anonymity.
If that may be so, it is important to lay down safeguards ranging from sanctioning de-anonymisation, including the right to transparency granted to principals for the use of such data, obliging the fiduciaries to inform the principals the moment they possesses de-anonymised personal data (currently, such notice is required to be given ‘as soon as reasonably practicable’ under Section 7) and periodic audit requirements to check the integrity of anonymised data.
It is also hoped that the irreversibility standard development will be informed by a technical consideration of the reasonable expectations of privacy arising in such data.
Anushka M. is a research associate at ‘IT for Change,’ a Bangalore-based NGO.