Given that AI models require large swathes of data to operate, the GDPR’s expansive definition of personal data means that many applications of AI involve complex data protection issues – especially where those datasets are obtained from third-party sources.

At the Irish DPC’s request, the European Data Protection Board (“EDPB”) has adopted Opinion 28/2024 on data protection considerations when developing and deploying AI models (the “Opinion”).

The Opinion provides high-level views and considerations for DPAs to apply when assessing GDPR compliance associated with developing or deploying AI models, and provides a helpful indication on the EDPB’s thought trajectory. Importantly, it leaves open the door for businesses to rely on the legitimate interest basis when using personal data to train or deploy AI models.

Below, we discuss how the Opinion responds to the Irish DPA’s four questions and provide our key takeaways.

When can AI Models be considered “anonymous”, therefore meaning its data processing activities fall outside the scope of the GDPR?

The EDPB notes that not all AI models are anonymous. Even where an AI model has not been intentionally designed to return personal data from the training data, any personal data from the training set may remain “absorbed” by and reflected within the model’s mathematical parameters, meaning it may be possible for the model to regurgitate that personal data in future output.

Consequently, anonymity of AI models should be assessed on a case-by-case basis. The Opinion notes that an AI model can be considered anonymous where personal data used to train the model cannot subsequently be obtained from it through “means reasonably likely to be used.” Broadly, this involves two main considerations:

  • #1: Technical Controls to Safeguard Against Reidentification. Has the business implemented technical controls to safeguard against the use of “reasonably likely” tools that “realistically” may be used to try and extract personal data from the model, including by malicious threat actors, in the context in which the AI model is being used? This includes implementing controls against exfiltration, regurgitation of training data, and reconstruction attacks – and then (internally or externally) auditing their effectiveness. The EDPB flags that businesses likely need to conduct and document a “thorough evaluation” of the identification risks, and the risk of identification “should be insignificant for any data subject” for a model to be considered anonymous and to satisfy the GDPR’s accountability principle. The Opinion also emphasises that the amount of attack-resistance testing required will vary depending on the context in which the AI model’s output will be used. For example, an AI model that underpins an internal employee-facing AI chatbot, may require less testing than an external-facing public chatbot. Based on the Opinion, businesses may want to consider testing against (1) attribute or membership inference; (2) exfiltration; (3) regurgitation of training data; (4) model inversion; or (5) reconstruction attacks. Broadly speaking, this means that when AI models are trained using personal data, tests are conducted to ensure, inter alia, that unauthorised parties are not able to infer, reproduce, reverse-engineer, or reconstruct, the model’s training data.
  • #2: Other measures implemented to reduce the amount of personal data contained in the model. These include (1) the appropriateness of the selection of sources used to train the AI model and steps taken to avoid or limit the collection of personal data; (2) data preparation and minimisation techniques to restrict the volume of personal data involved; (3) the selection of robust methods to reduce or eliminate identifiability; (4) measures added to the AI model itself to lower the likelihood of obtaining personal data from model outputs; and (5) whether the model is subject to effective engineering governance that accounts for all these risks.

Can Businesses Ever Rely on Legitimate Interests When Using Personal Data to: (i) Develop AI Models; or (ii) Deploy AI Models?

The Opinion states that legitimate interest is potentially a lawful basis for personal data processing in the context of AI model development and employment, including fine-tuning.

The EDPB flags the three-step Legitimate Interests Assessment, set out in the EDPB’s Legitimate Interest Guidelines 1/2024, and provides examples of how these can be interpreted within an AI-context:

  • Pursuit of a legitimate interest by the controller or third party: The EDPB emphasises the importance of identifying and documenting a lawful, clear, specific, and real (not speculative) interest that is being pursued when processing the personal data. The EDPB’s examples in the Opinion are: (i) developing the service of a conversational agent to assist users; (ii) developing an AI system to detect fraudulent content or behaviour; and (iii) improving threat detection in an information system.
  • The necessity test: Businesses must also demonstrate the necessity of the personal data processing for meeting the interest. In particular, per the EDPB, there should be no less intrusive way of pursuing the purpose. The Opinion states that, if the purpose could be achieved through an AI model that does not involve personal data, then the necessity test is not met. Implementing technical safeguards to protect personal data may help meet this test.
  • The balancing test: Finally, businesses must demonstrate that their legitimate interest is not overridden by the interests or fundamental rights and freedoms of the underlying individuals. Businesses should consider the types and amount of personal data being processed, the context in which the data was collected (e.g., whether the individual has a direct relationship with the business) and being processed, and the impact the processing may have on them. The Opinion notes that the technical and organisational safeguards that businesses have implemented to protect any personal data in the AI model from misuse, such as measures to stop users using the model to create unlawful deepfakes or spread of misinformation about individuals, are relevant to the balancing exercise. Finally, the Opinion flags that, when assessing a data subject’s interests, rights and freedoms against their legitimate interest, a controller may consider certain mitigating measures to limit the impact of processing on data subjects. For example: pseudonymisation measures, transparency measures around how the personal data was collected, and technical measures to prevent the storage, regurgitation or generation of personal data.

What are the Penalties for AI Models that Unlawfully Processed Personal Data?

The Opinion states that it is for each DPA to determine the appropriate penalty for any unlawful personal data processing. The EDPB gives three scenarios to assist DPA’s assessments.

  • #1: Where the controller unlawfully processes personal data to train an AI model, and the same controller subsequently deploys the AI model. The EDPB opines that the fact that unlawful processing in the development stage may impact the lawfulness of subsequent processing, as it may tip the legitimate interest assessment in individuals’ favour.
  • #2: Where a controller unlawfully processes personal data to train an AI model, and a different controller subsequently deploys the AI model. The EDPB notes that whether the subsequent controller that deploys the model has liability will depend on whether that controller conducted an appropriate assessment to verify whether the AI model developer had lawfully used personal data to train the AI model. The level of detail required in the assessment will vary based on the level of risk the model’s subsequent deployment may pose to the individuals whose personal data was used to train the AI model.
  • #3: Where the controller unlawfully processes personal data to train an AI model, but the model is anonymised before any subsequent deployment. The EDPB confirms that the lawfulness of the processing carried out at the deployment stage should not be impacted by the unlawfulness of the initial processing in this case, as the GDPR would not apply once the data was anonymised.

Key Takeaways

Given the Opinion, businesses may want to consider the following key points:

  • Adopting a principles-based AI Governance framework: The EU’s future AI regulatory trajectory is still evolving. The EDPB acknowledges that the application of the GDPR principles to AI models “raises systemic, abstract and novel issues” and the Opinion does not aim to be exhaustive. Further, recent shifts in the EU’s political and economic AI agenda may signal a possible future softening in the EU’s AI regulatory approach. This highlights the importance of developing principles-based AI governance programmes that can adapt to changes in regulatory approach.
  • Record keeping: Businesses using personal data to develop or deploy AI models should document their legitimate interest assessments where relevant. If businesses intend to “anonymise” personal data processed, they should maintain adequate documentation to demonstrate this to the supervisory authority. Using personal data to train (or fine-tune) AI models continues to be a key risk area under the GDPR. While the Opinion does not block businesses from relying on the legitimate interest lawful basis when developing or deploying AI models, it is clear that the bar for using this basis – or for claiming that the AI model is anonymised and therefore outside the scope of the GDPR – is high. Carefully documenting the AI model development and/or deployment process will likely be an important part of successfully defending against regulatory scrutiny.
  • Undertaking due diligence on the AI model/system provider’s data protection compliance. For businesses that are procuring AI – whether AI models that will underpin their own AI systems, or ‘turn-key’ AI systems such as AI chatbots – the Opinion emphasises the importance of conducting robust due diligence on the AI model/system provider’s data protection compliance before the business may be insulated from downstream liability when deploying the AI model/system.

***

To subscribe to the Data Blog, please click here.

The cover art used in this blog post was generated by ChatGPT.

Author

Avi Gesser is Co-Chair of the Debevoise Data Strategy & Security Group. His practice focuses on advising major companies on a wide range of cybersecurity, privacy and artificial intelligence matters. He can be reached at agesser@debevoise.com.

Author

Robert Maddox is International Counsel and a member of Debevoise & Plimpton LLP’s Data Strategy & Security practice and White Collar & Regulatory Defense Group in London. His work focuses on cybersecurity incident preparation and response, data protection and strategy, internal investigations, compliance reviews, and regulatory defense. In 2021, Robert was named to Global Data Review’s “40 Under 40”. He is described as “a rising star” in cyber law by The Legal 500 US (2022). He can be reached at rmaddox@debevoise.com.

Author

Martha Hirst is an associate in Debevoise's Litigation Department based in the London office. She is a member of the firm’s White Collar & Regulatory Defense Group, and the Data Strategy & Security practice. She can be reached at mhirst@debevoise.com.

Author

Ned Terrace is an associate in the Litigation Department. He can be reached at jkterrac@debevoise.com.

Author

Michiko Wongso is an associate in the firm’s Data Strategy & Security Group. She can be reached at mwongso@debevoise.com

Author

Olivia Halderthay is a trainee associate in the Debevoise London office.