As organisations accelerate their adoption of artificial intelligence and large language models (LLMs), one question is often overlooked: how do we make these systems unlearn?
In a world where trust and compliance depend on the ability to forget, unlearning may soon become as critical as learning itself. A recent 2025 meta-review of more than 180 research papers on machine unlearning in LLMs explores the complexity of this challenge.
The real problem
Most AI safety systems today rely on guardrails - filters and refusal policies that tell a model what not to say. But the patterns learned during training remain embedded in the model’s parameters. Those patterns can resurface through paraphrasing, rewording, or jailbreak prompts.
The difference is crucial:
- Guardrails hide knowledge.
- Unlearning removes capability.
Without true unlearning, sensitive or outdated information can linger beneath the surface.
How unlearning works
Unlearning research generally focuses on three stages of the AI lifecycle:
- Training-time methods: Techniques like SISA partition data so that later removals only retrain the affected segments.
- Post-training techniques: Adjust models after training through methods such as gradient editing or representation steering. In one case, Microsoft erased Harry Potter knowledge from Llama 2-7B in a single GPU hour — compared to 184,000 GPU hours for pre-training - with almost no loss of benchmark accuracy.
- Inference-time modifications: Fast, surface-level fixes such as filters or blocklists. They’re quick to apply but easy to bypass, as they don’t change what the model has truly learned.
The challenge? Unlearning is still brittle, costly, and difficult to verify.
Towards a more robust approach
Anthropic recently tested pre-training filtering - removing harmful information before models are trained. This proactive step reduced hazardous capability scores by 33% while maintaining standard accuracy levels. It’s an encouraging sign that responsible AI design begins with prevention, not patching.
Why it matters
From a compliance perspective, the implications are significant. Australia’s Privacy Act provides individuals with access and correction rights (APP 12–13), but not a direct right to erasure. However, organisations must destroy or de-identify data if it becomes redundant. Similarly, under the Consumer Data Right, consent withdrawal triggers an obligation to delete or de-identify shared data.
If you’re using cloud-hosted LLMs for inference only (with no fine-tuning) and haven’t opted into provider training, deleting your logs and prompts clears your local data - but the base model itself remains unchanged.
In other words, unlearning only matters once your data has updated the model’s weights. And under Australian law, the duty to delete or de-identify rests with your organisation, not the cloud provider. When consent is withdrawn or data expires, that duty extends to your AI pipelines too.
AI compliance is no longer just about privacy - it’s about how we teach our systems to forget responsibly.
If you’d like to explore how we can help your organisation apply AI ethically and compliantly in the cloud, reach out to me.