The Wrong Layer: Why Obfuscating Data Won’t Fix AI’s Trust Problem

Obfuscating data before it reaches an AI model degrades data quality without removing risk, and just relocates trust rather than resolving it. The actual fix is classification: knowing which data tier belongs where, and who controls the infrastructure it sits on.

10 min read

A response to Charlie, Tim Berners-Lee’s new approach to AI and privacy

Tim Berners-Lee has spent the last decade trying to undo a mistake he is honest enough to call his own. He gave the world an open web and watched it get captured by a handful of platforms. Now he is trying to make sure the same thing does not happen to AI, and the tool he has built to do it is called Charlie. You keep your data in a vault you control. When an AI model asks a question, Charlie decides what to send, and before it sends it, it twists the data slightly. Your birthday shifts by a few days. Your resting heart rate drifts by a few beats. The model gets something close enough to be useful and far enough from the truth that it cannot, in theory, work back to you.

It is a genuinely clever idea, and I want to be fair to it before I disagree with it. I worked this exact problem years ago with digi.me, another data vault company chasing the same goal: give people a single place to hold their data and a mechanism to control what leaves it. The vault concept is sound. Centralising your data under your own control, rather than scattered across every app and chatbot you have ever used, is real progress. Berners-Lee deserves credit for building toward it again, twenty years after he first sketched Charlie as a thought experiment.

But obfuscation is solving the wrong layer of the problem, and it is worth being precise about why.

Degrading data does not reduce risk, it just moves it

The premise of obfuscation is a trade: a little less accuracy for a little more privacy. Twist the data enough and the model can never reconstruct who you are. The trouble is that this assumes today’s models are the threshold that matters. They are not. A model that cannot re-identify you from a slightly wrong birthdate this year may well be able to next year, once it has more context, more correlated data sources, and better inference. Obfuscation is a moving target dressed up as a fixed solution. You are not removing the risk. You are betting that the risk stays below the model’s capability curve, and that curve only moves in one direction.

There is a second cost that gets less attention: degraded data produces degraded reasoning. If you feed a model slightly wrong health data, slightly wrong financial data, slightly wrong biometric data, you are not just protecting privacy, you are introducing systematic error into every inference built on top of it. Wrong assumptions, false correlations, conclusions nobody asked for and nobody can audit, because nobody know exactly how the obfuscation was applied. Obfuscating data to protect people from AI and then using that same obfuscated data to make decisions about those people is not a privacy solution. It is a different failure mode wearing privacy’s clothes.

You haven’t removed trust, you’ve relocated it

The second problem is the one Berners-Lee himself acknowledges, almost in passing: all your information still sits in a vault, and you still have to trust the entity that owns it. Charlie does not eliminate the trust problem. It moves it. Instead of trusting OpenAI or Google with your raw data, you are trusting Inrupt, or a bank, or a government body, with the vault that decides what those companies see. Maybe that is a better trade. Sir Tim Berners-Lee has more credibility on data sovereignty than most large AI labs do, and that counts for something. But “trust a smaller, better-intentioned custodian instead of a large one” is not the same claim as “we have solved the trust problem in AI.” It is an improvement in who holds the keys, not a structural fix for what happens once the door opens.

What actually closes the gap

The piece that is missing from Charlie, and from most of the privacy-by-obfuscation conversation, is classification.

Not all data needs the same protection, and pretending otherwise is itself part of the problem. Some information is genuinely fine to share with a cloud model: low-sensitivity, low-risk, the kind of thing that causes no harm if a large AI lab sees it in the clear. Call that the open tier. Some information should stay inside the community, enterprise, or local government body it belongs to, available to the people who need it but never handed to a third-party model without a clear authorisation step. And some information, health records, biometric data, anything that constitutes core institutional or personal sensitivity, should never reach a cloud model at all, twisted or not. It should be processed locally, on infrastructure the community or institution actually controls.

This is not a new idea on my part. It is the logic behind every serious information classification framework, and it is the logic I think the AI privacy debate keeps skipping past in its rush to find a clever technical trick. Obfuscation tries to make one tier of data safe for all destinations. Classification accepts that different data belongs in different places, full stop, and builds the infrastructure to enforce that rather than the math to disguise it.

That infrastructure point matters more than the technical one. The reason most people cannot make good decisions about what to share is not that they lack a clever obfuscation tool. It is that nobody has taught them to ask the first question: what category is this data in, and where is it allowed to go? Education on classification, plain language, no jargon, has to come before any technical fix, because no vault, however well designed, can protect someone who does not understand what they are putting into it.

And the institutions that should be building the infrastructure to act on that classification are not, for the most part, large AI labs. They are universities, local governments, community bodies, the entities that already hold a duty of care toward the people whose data they are managing. AI processing for sensitive tiers should happen on infrastructure those institutions control, not on infrastructure rented from the same handful of companies whose business model depends on having as much data as possible in as few hands as possible.

The same exposure, a different vendor

This is not an abstract concern. I made the institutional infrastructure argument at length in No Surprises, the piece I wrote on NHS England’s 15 June admission that its Federated Data Platform assessment was wrong about who actually accesses identifiable patient data. Palantir engineers held standing administrative access to a tenant the public had been told was NHS-staff-only. NHS England corrected the document, apologised for the error, and is reviewing supplier access. Those are reasonable first moves. They are not the end of the obligation.

When the promise a public institution makes about sensitive data turns out to be false, the right response is not to fix the page and move on. It is to act immediately and visibly: suspend the access in question while the review runs, not after it concludes; report to the regulator and the public on a timeline measured in days, not the next scheduled update; and treat the correction as the start of an independent audit, not the end of the conversation. Recognising an error is the minimum. Acting on it before the next person’s data is exposed is the actual duty of care. An institution that corrects a document and waits for the next deadline has not closed the gap, it has just lowered the volume on it.

The reason this matters beyond one contract is structural, and it is the same structure I was arguing against with Charlie above. Palantir’s UK operation sits under a US-headquartered parent. Under the CLOUD Act and FISA Section 702, US authorities can in principle compel a US-controlled provider to produce data regardless of where it physically sits, and can attach a gag order forbidding the provider from disclosing that the request was ever made. That capability does not depend on who occupies the White House. It is written into the architecture.

What does depend on the administration is whether anyone outside the United States can hold the company accountable for what it does with that capability. Meta’s own internal documents, surfaced by Reuters in November 2025, showed the company had projected roughly $16 billion in 2024 revenue, about a tenth of its total, from fraudulent and scam advertising, with its own safety staff estimating Meta’s platforms were involved in a third of all successful scams in the US. That is not a foreign government reaching into a vault. It is a US company knowingly tolerating large-scale harm to its own users because the harm was profitable, with no external check strong enough to stop it before the documents leaked. At the same time, the Trump administration has spent 2025 and into 2026 actively pressuring the EU to weaken or abandon enforcement of the Digital Markets Act and Digital Services Act against exactly these companies, threatening tariffs and imposing visa sanctions on the European officials responsible for that enforcement, on the explicit grounds that holding US tech firms accountable amounts to discrimination against American business.

Put those two facts together and the exposure becomes clear. A US-controlled platform holding sensitive data has, built into its legal environment, both the capability to be compelled by its own government and a political climate actively working to remove the external checks, in Europe and by extension anywhere relying on European-style regulation as a backstop, that would otherwise catch misuse before it became a Reuters investigation. NHS data sitting inside that same structural category is not protected by Palantir’s specific intentions, any more than Meta’s users were protected by Meta’s specific intentions. It is exposed by the architecture itself, the same architecture that makes obfuscation the wrong fix for the wrong layer of the problem.

This is exactly why the question is not whether a particular vendor can be trusted today. It is whether a nation can build research and care infrastructure on rented capability from a jurisdiction that is simultaneously weakening the only external mechanisms that would otherwise check it.

Questions a board should be asking, including the NHS trust board

This is not only an essay-stage argument. It is a governance test any board overseeing sensitive data, an NHS trust, a research partnership, a local authority, should be running today, not after the next admission. A few that follow directly from the structure above:

Jurisdiction. Which of our data processors are controlled by a US parent entity, and have we had a written legal opinion, not a vendor assurance, on what the CLOUD Act and FISA Section 702 mean for our specific data in practice?
Concentration. How many of our critical data functions sit with a single vendor, and what is our actual fallback if that vendor’s access were suspended tomorrow, not in theory but operationally?
Verification, not assurance. When was our current Data Protection Impact Assessment last independently tested against what suppliers can actually do, rather than what their contract says they are permitted to do?
Trigger, not timetable. If an error like NHS England’s is found in our own DPIA or access model, do we have a pre-agreed protocol to suspend the access in question immediately, or would we wait for a scheduled review cycle?
External accountability. If our primary safeguard against vendor misconduct is foreign regulatory enforcement, such as the EU’s Digital Markets Act or Digital Services Act, what happens to our risk position when that enforcement is itself under sustained political pressure to weaken?
Classification before contract. Have we classified what we hold (open, restricted, confidential, highly confidential) before deciding which tier of infrastructure each category is permitted to touch, or did the vendor relationship come first and the classification get fitted around it afterward?

A board that cannot answer these with evidence, not reassurance, has the same exposure NHS England had on 15 June. The only difference is whether anyone has asked yet.

The honest version of Berners-Lee’s instinct

Berners-Lee is right about the thing that matters most: trust is the actual bottleneck, not accuracy, not convenience. He is right that the primacy of the individual, the thing the original web was built on, has to apply to AI too. Where Charlie goes wrong is in trying to solve a governance problem with a data engineering trick. You do not build trust by making data fuzzy enough that a model cannot quite pin you down. You build it by being honest and specific about who controls the infrastructure, what tier your data sits in, and who is allowed to see it under what conditions.

Twisting a birthdate is clever. It is not a substitute for an institution you can actually hold accountable.

Sources: Inrupt and Tim Berners-Lee on Charlie, SXSW London, June 2026 (AFP/Tech Xplore, BigGo Finance, Inrupt); NHS England’s 15 June 2026 response to the National Data Guardian and the eight Caldicott Principles, as set out in full in No Surprises; Reuters investigation into Meta’s projected revenue from fraudulent and scam advertising, November 2025; reporting on Trump administration pressure against EU Digital Markets Act and Digital Services Act enforcement, including visa sanctions against EU officials, December 2025 to January 2026 (Financial Times via Irish Times, CNN Business, Fortune).

Across the site

Looking for the framework?

The five Partnership Principles — Partnerships Over Transactions, Embedding Over Advising, Adventure Over Comfort, Presence Over Performance, Growth Through Discomfort — live in their own dedicated section, with a long-form essay for each.

The codified short form is published as a whitepaper on fabcampaigns.com.

Go to Partnership Principles