This article was originally published on Privacy World on April 11, 2023 and was updated on June 1, 2023.
Artificial intelligence (AI) depends on the use of “big data” to create and refine the training models from which the AI “learns”. Although concerns have tended to focus on questions such as inherent bias within the training data, or a lack of information in relation to the way in which the AI’s algorithms operate, the Italian data protection authority (the Garante) has made an order that indicates an even more fundamental difficulty for AI developers – lawfully acquiring the data required to build training models in the first place. If training models cannot be lawfully built and expanded, then the viability of AI might be called into question.
On 30 March 2023, the Garante reached for perhaps the most draconian element of its regulatory toolkit – an order temporarily banning OpenAI LLC (OpenAI), provider of the generative AI service ChatGPT, from processing personal data of individuals who are within Italy. This “stop order” reflects the Garante’s view that urgent measures were required in light of the risks posed both to users of ChatGPT and to individuals whose personal data had been collected and used to build its training models.
Key concerns underpinning the Garante’s finding were that:
- OpenAI did not properly inform users or individuals whose personal data was collected for use in training the AI models driving ChatGPT of the data collection
- OpenAI did not identify and communicate a valid lawful basis for collecting personal data to train its algorithm
- ChatGPT processes personal data inaccurately as output provided may not correspond to real facts
- OpenAI did not implement users’ age verification mechanisms, even though, based on its terms, the content ChatGPT generates is intended for users over the age of 13.
Taking those factors together, the Garante found that the processing of personal data to train the AI models constituted a breach of the transparency and fair processing obligations in EU GDPR Articles 5, 6, 8, 13 and 25. The temporary “stop order” was imposed with immediate effect, reserving the right either to make the ban permanent or to impose other sanctions depending on the outcome of the Garante’s full investigation.
Responding to the order, OpenAI blocked people in Italy from accessing ChatGPT while it worked on providing responses within the 20-day deadline set by the Garante. The dialogue was fruitful, as ChatGPT was able to operate again within a couple of weeks. On May 24, the Garante confirmed that OpenAI made ChatGPT available again to Italian users, thanks to a number of changes and improvements to its practices. Further changes are needed however and will be scrutinized by the authority, which has not closed its investigations.
A Fundamental Challenge for “Big Data”?
AI has a voracious appetite for data. The sophistication and reliability of AI depends on the quality and extent of its training data. That data must be obtained from somewhere, and developers often resort to measures such as web-scraping, web-crawling or text mining to obtain it in large enough quantities.
The Garante’s findings in relation to ChatGPT indicate that where personal data is regarded as being collected directly from individuals, the developer must ensure that it meets its obligations as data controller to identify an appropriate lawful basis, and to provide the information required by GDPR Article 13.
The Garante’s focus on GDPR Article 13 suggests that it considered personal data to have been collected directly from individual data subjects. It is also possible that techniques such as web scraping would involve the collection of data from sources other than the data subjects themselves. In those cases, the relevant transparency obligations would include provision of the information required by GDPR Article 14. Although Article 14(5)(b) provides an exception where “the provision of such information proves impossible or would involve a disproportionate effort”, other data protection authorities (including Poland’s and the UK’s) have emphasised that “impossible” means “impossible”, and not just extremely difficult or expensive, and that it would not be a “disproportionate effort” to provide information to millions of individuals even if, as in the Polish decision, the costs of doing so would outweigh the revenue and profits hoped for from the processing activities.
Data protection authorities are not inclined to treat the acquisition of personal data for AI training models as a special case, meriting less protection than any other form of personal data acquisition. Acquiring data for training models might attract particularly close scrutiny and ever more resolute protection. It is not merely a question of administrative fines or exposure to potential compensation claims; it is a matter of no-go.
What Now?
In the days since Italy announced its probe, supervisory authorities in France, Germany and Ireland have contacted the Garante to ask for more information on its findings. Other supervisory authorities around the world, such as in Canada and South Korea, have also launched their own investigations. Privacy activists have embraced the pack, with two complaints lodged with the French supervisory authority in relation to OpenAI. Jean-Noël Barrot, the French Digital Minister, has publicly stated that the platform does not respect privacy laws. He did not, however, go as far as suggesting that France should ban it.
In light of the Garante’s decision, the European Data Protection Board has also launched a dedicated task force on ChatGPT, which is intended “to foster cooperation and to exchange information on possible enforcement actions conducted by data protection authorities”.
And last, but not least, ChatGPT’s founder engaged in a massive communication exercise, visiting a number of EU capitals arguing for a reasonable regulatory framework for generative AI systems (and pointing out that if the framework is too stringent, it might require OpenAI to pull out of the EU).
May (A)I? Still to be seen …