Generative artificial intelligence (AI) recently burst onto the scene, producing text, images, sound, and video content that closely resemble human-made content. After being trained on publicly available data, ChatGPT, DALL-E, Bard and other AI models were unleashed to an eager public. The rate of adoption of this technology far surpasses the speed with which legislative bodies can pass the laws needed to ensure safety, reliability and fairness.
Norwegians are trying to get ahead of the game, raising questions about consumer protection and data privacy. The Norwegian Consumer Council published a report in June 2023 to address the harm generative AI might inflict on consumers. The report, Ghost in the machine – addressing the consumer harms of generative AI, presents overarching principles that would ensure generative AI systems are developed and used in a way that protects human rights.
The Norwegian data protection authority, Datatilsynet, is also raising awareness about the ways generative AI violates the General Data Protection Regulation (GDPR). Generative AI models train on large amounts of data taken from many different sources, usually without the knowledge or consent of the originator of the data.
“There are a few issues with generative AI in terms of data collection,” says Tobias Judin, head of the international section at Norway’s data protection authority. “The first questions are around what these companies do to train their models.”
Data privacy may be violated during the training phase
Most of the models used for generative AI are foundational models, which means they are general purpose enough to be used by a variety of applications. The people who train the foundational models compile massive amounts of data from open sources on the internet, including a huge quantity of personal data.
The first concern raised by data protection authorities is whether organisations are entitled to collect all that personal data. Many data privacy experts think the data collection is unlawful.
“Another issue is transparency,” says Judin. “Are people made aware that their personal data will be used to train a model? Probably not. One of the legal principles concerning data collection is data minimisation, which says you shouldn’t collect more data than what is necessary.”
Tobias Judin, Datatilsynet
“Companies developing a foundational model will invariably say that, since the model is used for just about anything, it is necessary to collect all available data. This approach doesn’t sit well with GDPR. Another set of issues are around data accuracy and quality. Some of the data may be from web forums, including information that is contested – personal data. Those data will also be part of the training of this model.”
Once a model is trained, the data is no longer needed. Many organisations think that since the data is not needed, they can delete it to make all issues around data privacy go away. But that thinking has now been challenged. A new type of attack, called model inversion attacks, involves making certain kinds of queries to an AI model to re-identify the data that went into training the model. Some of these attacks specifically target generative AI.
Data rectification and erasure become very complicated
Another problem is that since models are trained with personal data, if a data protection authority orders an organisation to erase some personal data, it may mean the model has to be erased, because the data has become an integral part of the model.
“This is a massive compliance issue,” says Judin. “If you didn’t do your due diligence when it came to collecting and curating training data, you’re stuck.
“And then the last stage is when the model becomes operative,” adds Judin. “You have a consumer-facing service, where people can ask questions and get text or other content back. A user may ask the model something personal about somebody else. The model could generate an incorrect answer. If that answer gets posted on a web site – and the person identified by the data asks for the data to be rectified – the site owner might be able to rectify it. But since the source of the error is generative AI, the error cannot be fully corrected. It’s been baked into the model.”
The companies that trained the model can no longer change what the model generates. From a data protection point of view, it’s not clear whether the rights to erasure and rectification can be upheld. The only way to respect these rights may be to scrap the model and start afresh. The organisations that own the models will probably not want to do that because they invested a lot of resources into training the model in the first place.
An additional area of concern is that the queries users enter into this service – the questions they write – could be used for “service improvements”, meaning that the input is used for further training. The AI models could also collect input for targeted advertisement.
Generative AI models are designed to take existing material and present it in new ways. This means the models are inherently prone to reproducing existing biases and errors.
The Norwegian Consumer Council calls on EU institutions to resist lobbying pressure from big tech companies and make watertight laws to protect consumers. Their recent report says that more than just laws are needed – enforcement agencies need more resources to make sure the laws are followed.