Simeon Lobo

Simeon Lobo

Director Consulting Expert, AI & Data

As organisations accelerate their adoption of large language models (LLMs), one question is often overlooked: how do we make these systems unlearn?

In a world where trust and compliance depend on the ability to forget, unlearning may soon become as critical as learning itself. A recent 2025 survey reviewing 180+ research papers on LLM unlearning highlights just how complex this challenge is

The real problem

Most AI safety systems today rely on guardrails - filters, policies, and monitoring that tell a model what not to say. But the patterns learned during training remain embedded in the model’s parameters, and can sometimes resurface via paraphrasing, rewording, or jailbreak-style prompting.

The distinction matters:

  • Guardrails constrain or hide knowledge at the interface level.
  • Unlearning aims to remove or materially reduce a model’s ability to produce specific information or behaviours (often approximately, and hard to verify with absolute certainty).

How unlearning works

Some techniques used for unlearning include:

  • Training-time methods: Designing training so removals are cheaper later. Techniques like SISA-style approaches partition training so that later deletions only require retraining affected segments.
  • Post-training methods: Reducing recall/capability after training via targeted parameter updates (often using fine-tuning-style procedures), model editing methods, or representation-level interventions. For example, Microsoft demonstrated “erasing” Harry Potter knowledge from a Llama 2–7B model in roughly a single GPU hour - compared to the much larger compute cost of full pre-training - while keeping general benchmark performance largely unchanged.
  • Inference-time controls: Measures applied at runtime (e.g., policy filters, safety classifiers, prompt/system constraints, retrieval restrictions). These can be quick to deploy, but they typically don’t change what the model has learned, and can be bypassed under some conditions.

The reality: Unlearning today can be brittle, costly, and difficult to verify for most organisations - especially when you need strong guarantees, not just best-effort behaviour changes.

Towards a more robust approach

One promising direction is preventing unwanted capabilities from being learned in the first place. For example, Anthropic reported results from filtering harmful information before training, reducing hazardous-capability scores while maintaining standard performance on general tasks.

This complements unlearning: if you reduce exposure to risky data and behaviours early, you may need less “surgical forgetting” later.

Why it matters

From a compliance perspective, the implications are significant.

Australia's Privacy Act (APP 12-13 and APP 11)

Australia’s Privacy Act gives you the right to access the personal information an organisation holds about you and to request corrections (APP 12–13). It does not operate like a simple, universal “delete everything about me” right in the way people often associate with some overseas regimes (and privacy reforms in Australia continue to evolve, so treat this as a point-in-time summary).

Even so, organisations are generally expected to take reasonable steps to destroy or de-identify personal information once they no longer need it for any permitted purpose - unless they’re required or authorised to keep it (for example, by another law). In practice, that can extend to copies and traces across systems (including archives and backups), depending on what’s reasonable in context.

Consumer Data Right (CDR)

Under the Consumer Data Right, consumers can withdraw consent for certain collection, use, or disclosure of CDR data. When data is no longer needed, CDR participants may be required to delete or de-identify it (subject to applicable rules and exceptions). Practically, this means consent withdrawal and data lifecycle controls need to be reflected in your data flows - not just your UI

What this means for cloud-hosted LLMs

If you’re using cloud-hosted LLMs for inference only (i.e., no fine-tuning) and you haven’t opted in to provider training on your inputs/outputs, your organisation can often delete or rotate out prompts, outputs, logs, and RAG indexes under your control - and the provider’s base model weights typically won’t change as a result.

But the details matter:

  • Provider defaults and contracts vary (including retention windows, logging, abuse monitoring, and how “training opt-in” is defined).
  • Your own systems (RAG indexes, caches, analytics logs, observability tools, backups) can retain personal information even if the base model never trained on it.

In other words, unlearning matters most once your data has updated a model’s weights - but deletion/de-identification still matters even when it hasn’t, because your pipeline can store and propagate personal information in many places.

Accountability doesn’t end at the vendor boundary

Under Australian privacy obligations, your organisation remains accountable for handling personal information appropriately across the end-to-end AI pipeline, including vendors. That’s why contracts, retention settings, and technical deletion pathways (logs, backups, indexes, feature stores, and fine-tuning datasets) are as important as model choice.

If you’d like to explore how we can help your organisation apply AI ethically and compliantly in the cloud, reach out to me.

About this author

Simeon Lobo

Simeon Lobo

Director Consulting Expert, AI & Data

Simeon is a Director at CGI Australia where he leads the national AI & Data capability and contributes to the firm's AI strategy across the UK & Australia. With over 25 years of experience at the intersection of business and technology, he transforms ...