by Aesha Anurag Shah
The members of the parliament are currently discussing the much-awaited legislation- Personal Data Protection Bill, 2019, (“PDP Bill”). The big question that arises with the PDP Bill (or with any other law that is striving to protect data privacy) is whether its aim of data protection can co-exist with the age of ‘Artificial Intelligence’ (“AI”) and ‘Machine Learning’. AI and Machine Learning systems find and exploit patterns received from data collected. Some claim that AI threatens personal data protection and the regulations we have today cannot catch up with the advancements in AI (Kerry, 2018). Others are convinced that through ‘Federated Learning’, and ‘Differential Privacy’ data privacy can be easily maintained without obstructing machine learning (Cronja, 2019). This article seeks to explain and analyse these two sides of the debate.
The Growth of AI and Privacy laws
The AI boom is taking place worldwide, as is evident in the report published by experts from Stanford, MIT, OpenAI, Harvard and the Partnership on AI industry consortium (Perrault, et al., 2019). The report reveals that there is a significant rise in the number of conferences, papers, employment, and educational enrolment related to AI. Technical performance in the different fields within AI has been documented to have advanced as well. Technical performance is concerning the advancements in the AI systems’ efficiency and accuracy.
On the other hand, there has also been an upsurge in data privacy laws being introduced and implemented. Currently, in 2020, 10% of the world’s population has its personal data protected under modern privacy legislations. However, it has been predicted that by 2023, 65% of the world’s population would be covered by a personal data protection law (Goadsduff, 2020). These predictions are not based on mere speculation but on the fact that in many countries, drafts and plans for such a law are either completed or are underway. The increasing number of personal data breaching scams like the Snowden scam in 2013 and Cambridge Analytica scam in 2018 are also fuelling this change. Millions of users have had their personal data compromised in one way or the other. Still, these events necessitated and have led to the introduction of personal data protection laws in many countries.
The contradictory objectives of AI and Data Protection
AI and personal data protection both have seemingly opposite aims. For better accuracy and advancement of AI, excessive amounts of data are needed for the machine to learn from. This requires access and input of massive chunks of data. Conversely, personal data protection has the aim to limit or delete the amount of data on an individual that is available to decrease the possibility of tracking an individual from it.
The rise in the usage of AI has led to a growing amount of data that can be processed to identify an individual. The machine learning system is a subset of AI that automates repetitive tasks to find patterns in and interpret massive amounts of data without explicit programming. The problem of personal data privacy by the machine learning systems has multiplied as the current systems have developed to a stage that the discovery of personal identity can take place by a minimal input of data.
It will be useful at this stage to understand what the structure of a data set is. Firstly, there is something known as a ‘Personally Identifiable Information’ (“PII”). This information identifies a specific person. Sources of PII can include email, Aadhar number, full name etc. The second type of information is ‘Quasi Identifiers’ (“QI”). QI consists of information that is not unique to one individual like gender, age, job. AI systems have adapted to figure out a PII from a QI with the help of information available externally or via query results (request for data by a user. For example, on a search engine).
This poses a problem as the promotion and betterment of AI might lead to compromising personal data protection while securing personal data might lead to restrictions on AI usage and a consequent hindrance in AI’s development. The hindrance would occur as AI systems or Machine Learning systems, in particular, advance and learn only when more data is fed into them for an independent interpretation. To protect data, massive amounts of data cannot be made accessible.
The two possibilities with AI and Personal Data Protection
There are wide-spread criticisms that while laws and regulations might curb the issue of the unethical and non-consensual use of personal data in a way, they are not foolproof against the development of AI (Kerry, 2018). With systems like machine learning and deep learning, personal data can still be compromised without coming in the radar of the regulations in place. There is also a problem of controlling the data that is inferred by the machine learning or deep learning systems that might violate privacy.
The other side of the debate highlights that the problems presented are not unsolvable. Technology can be used to deal with problems posed by technology. Through ‘federated learning’ or ‘differential privacy’ the privacy concerns can be solved while ensuring the development of AI systems (Dorchel, 2019). Federated learning method separates the user identity from the learning outcome of the machine. The machine itself anonymises the learning outcome and then sent to the developer so that the data can be used without invading on an individual. This process is called decentralisation of data as opposed to a centralised version that puts user data in one cloud space or a data centre.
Differential privacy is another method that is used to withhold individual identity from the data collected. There is a ‘noise’ that is created between the concealed personal identity data and the data that is to be posted. A noise in the path of reaching the hidden data is a distraction created by the system that prevents a person from trying to obtain personal information by misleading them. This fools a person and leads to misinformation. This misinformation is close to, but not equal to the truth, and this way, the personal data is supposed to be protected.
However, the solutions offered by the opposite side of the debate are not invincible. Even though there might be some damage control offered by methods like federated learning and differential privacy, personal data is still susceptible to breaches. With federated learning, there remains a way to receive private personal data. The training of the machine to extract only the outcome, leaving behind the user identity, is done on the user’s device. Then this trained outcome is sent to the server. But, as the training is done based on user identity itself, tracking the same can be easily done (Gad, 2020). This is because there are still traces or parameters of the original data in the altered data based on the transformation process.
Differential privacy is an even weaker data protection mechanism. First of all, the answer received by someone trying to get personal information is close to the truth, and not completely false. Secondly, by attempting to break into the personal identity data multiple times, it is possible to receive accurate data (Elamurugaiyan, 2018).
By the analysis of both sides of the debate, we can establish there cannot be a hundred per cent data protection as there is always a way to reach the personal data when it is collected, processed or separated (MacMahan et al., 2016). Personal data violation is like any other crime in the sense that it cannot be stopped completely, but active measures can be taken to avoid it from happening.
While the growth of AI leads to more personal data breaches, it can also lead to solutions that prevent those data breaches. Data protection will strive to control data breaches, and data breaches will try and circumvent data protection tactics, all with the help of a prospering AI system. Hence, it is always going to be a never-ending chase by one towards the other.
In such a scenario, the job of the legal and regulatory framework governing AI would be to ensure that the collectors and processors of data containing user identity, or sensitive information, follow the latest standards of data protection. So, the regulations will have to be altered repeatedly to remain relevant in combating the increasing number of breaches.
Cronja, I. (2019, May 21). Machine Learning and Data Privacy: Contradiction or Partnership? Retrieved from Digitalist Magazine: https://www.digitalistmag.com/digital-economy/2019/05/21/machine-learning-data-privacy-contradiction-or-partnership-06198537/
Dorchel, A. (2019, April 25). Data Privacy in Machine Learning. Retrieved from Luminovo.AI: https://luminovo.ai/blog/data-privacy-in-machine-learning
Elamurugaiyan, A. (2018, August 31). A Brief Introduction to Differential Privacy. Retrieved from Georgian Impact Blog: https://medium.com/georgian-impact-blog/a-brief-introduction-to-differential-privacy-eacf8722283b
Gad, A. (2020, April 24). Breaking Privacy in Federated Learning. Retrieved from HeartBeat: https://heartbeat.fritz.ai/breaking-privacy-in-federated-learning-77fa08ccac9a
Goadsduff, L. (2020, September 14). Gartner Says By 2023, 65% of the World’s Population Will Have Its Personal Data Covered Under Modern Privacy Regulations. Retrieved from Gartner: https://www.gartner.com/en/newsroom/press-releases/2020-09-14-gartner-says-by-2023–65–of-the-world-s-population-w
Kerry, C. F. (2018, July 12). Why protecting privacy is a losing game today-and how to change the game. Retrieved from Brookings: https://www.brookings.edu/research/why-protecting-privacy-is-a-losing-game-today-and-how-to-change-the-game/
Perrault, R., Shoham, Y., Brynjolfsson, E., Clark, J., Etchemendy, J., Grosz, B., . . . Niebles, J. C. (2019). Artificial Intelligence Index Report. Calfornia: AI Index Steering Committee, Human-Centered AI Institute, Standford.
About the Author
Aesha is a third year student at Jindal Global Law School, pursuing BBA LLB (hons.). She is also an in-house researcher with The Digital Future – Artificial Intelligence team.