What AI Can Learn from Signal’s Architecture of Trust
A framework for designing AI systems that assume harm will happen—and heal anyway
When Trevor Perrin and Moxie Marlinspike developed the Axolotl Ratchet in 2013, they made a radical philosophical choice: they assumed the system would be compromised. Not might be. Would be. And rather than treating this as failure, they designed for recovery.
The genius of what became the Double Ratchet Algorithm isn’t that it prevents breaches—it’s that breaches don’t cascade. Each message generates new keys. Compromise one, and you’ve gained nothing lasting. The system ratchets forward, healing itself with every exchange.
We need the same thinking for AI.
The Current Architecture of Harm
Social media platforms operate on the opposite principle: they assume engagement is inherently good and optimise relentlessly for it. When harm occurs—when the algorithm amplifies scapegoating, when outrage proves more viral than understanding, when minorities become targets of coordinated hatred—the system treats this as a feature, not a bug. Engagement is engagement. The machine doesn’t distinguish between connection and combustion.
This is the architectural equivalent of a cryptographic system that uses the same key forever: once compromised, everything falls.
We’ve watched this play out. The pattern of authoritarian rise that I traced in my Monday Momentum piece (Signal over Noise here) —Hitler, Mussolini, Stalin, Mao—always begins with identifying an enemy minority, blaming complex problems on simple scapegoats, and eliciting hatred to form political majorities. Today’s platforms don’t create this impulse, but they amplify it with unprecedented efficiency. The algorithm doesn’t care whether it’s connecting music lovers or consolidating grievances. It only knows what spreads.
Recent research from Frontiers in Psychology found that after only five days of TikTok usage, there was a four-fold increase in misogynistic content on users’ feeds. The recommender systems don’t just reflect preferences—they amplify and normalise harmful ideologies, increasing users’ exposure to radical material through what researchers call “dosage escalation.” A meta-analysis of 151 algorithmic audits across major platforms found that roughly 8-10% of recommendations are actively harmful—comparable to defect rates that trigger regulatory oversight in food safety.
Forward Secrecy for Social Discourse
What would it mean to apply Signal’s design philosophy to AI and social platforms?
Forward secrecy in cryptography means that even if an attacker obtains current keys, past communications remain protected. The equivalent in social AI would be systems that don’t allow present harms to compound historical ones. Consider recommendation algorithms that actively decay the influence of content that has proven divisive—not through censorship, but through architectural choices that prevent yesterday’s outrage from fuelling tomorrow’s.
Post-compromise security means the system recovers even after breach. Translated to AI ethics, this suggests building systems that assume manipulation will occur—that bad actors will game engagement metrics, that hatred will sometimes go viral, that the minority du jour will be targeted—and design for recovery rather than prevention alone.
Key rotation with every message ensures that each exchange is cryptographically independent. The social equivalent might be AI systems that don’t allow accumulated “engagement debt” to distort future interactions. Your past clicks shouldn’t permanently determine your information diet. The algorithm should ratchet forward, offering genuine optionality rather than deepening existing grooves.
Toward an Ethical Ratchet Framework
I propose we think about ethical AI through five principles drawn from the Double Ratchet’s architecture:
1. Assume Compromise
Design as if your system will be weaponised. Because it will be. The question isn’t whether bad actors will exploit AI for scapegoating—it’s whether your architecture limits the blast radius when they do. Every platform should ask: if someone successfully uses our system to target a minority group, what prevents that harm from cascading?
2. Heal Forward
Build recovery into the base architecture, not as an afterthought moderation layer. Current approaches treat harm as content to be removed after the fact. An ethical ratchet would treat harm as inevitable and design systems that naturally attenuate it—not through censorship, but through structural choices about how influence accumulates and decays.
3. Rotate the Keys
Don’t let any single metric—engagement, time-on-site, shares—become the master key to the entire system. When platforms optimise for one variable, they create the conditions for that variable to be exploited. Algorithmic diversity isn’t just a nice-to-have; it’s a security feature. Rotate what you optimise for. Keep the system from becoming predictable enough to game.
4. Independence of Exchange
Each interaction should have the opportunity to be genuinely new. Current recommendation systems treat users as fixed entities—if you clicked on outrage content once, you’re forever marked as an outrage-clicker. An ethical ratchet would preserve the possibility of change, ensuring that present choices aren’t permanently determined by past behaviour.
5. Open Source the Protocol
The Signal Protocol’s security comes partly from its transparency. Anyone can audit the cryptography. The equivalent for AI ethics would be radical openness about algorithmic architecture—not the training data or model weights necessarily, but the principles and mechanisms by which content is amplified or attenuated. Trust architectures require the possibility of verification.
Models Already Working: A Global Survey
The good news is that we’re not starting from zero. Across multiple continents, researchers and institutions have developed approaches that point toward what serious R&D investment could accomplish.
Taiwan: The vTaiwan and Polis Model
Taiwan’s digital democracy initiatives, led by former Digital Minister Audrey Tang, represent the most advanced real-world implementation of consensus-building algorithms. The vTaiwan platform, developed after the 2014 Sunflower Movement protests, uses the Polis system—an open-source platform that clusters participants by voting patterns and surfaces statements that find consensus across ideological divides rather than within them.
The key innovation: Polis has no reply button. This architectural choice eliminates the troll factor by preventing back-and-forth arguments. Instead, participants focus on expressing ideas that will garner support from multiple sides. The algorithm privileges bridge-building narratives over divisive ones.
Results: Over 30 national issues have been discussed on vTaiwan, with 80% leading to decisive government action. The platform has been used for complex regulatory questions including Uber regulation, online alcohol sales, and fintech legislation. In 2023, Taiwan’s Ministry of Digital Affairs partnered with the Collective Intelligence Project to launch “Alignment Assemblies” that apply these methods to AI governance—asking citizens to help define what “good AI” looks like.
Tang describes the approach as “Plurality”—technologies for collaborative diversity that “increase the bandwidth of democracy.” The insight is profound: rather than treating social media as a space for individuals to broadcast, Taiwan has built infrastructure for collective sense-making.
The AT Protocol and Bluesky: Separating Speech from Reach
The AT Protocol (Authenticated Transfer Protocol), developed by Bluesky, offers a different architectural innovation: separating “speech” from “reach.” The base layer remains permissive—anyone can post. But the reach layer—what gets amplified—is independently controllable.
Key features:
- Decentralised identity: Users own their identity via cryptographic keys, independent of any single platform
- Portable data: Users can move their posts, followers, and social graph between providers
- Competing App Views: Different clients can offer different recommendation algorithms, allowing users to choose how content is curated
- Composable moderation: Moderation services operate independently of hosting, enabling community-driven standards
The protocol now has 30+ million users, providing a real-world testbed for alternatives to centralised platform control. As Bluesky co-founder Jay Graber notes: “Decentralizing social media at a technical level doesn’t, by itself, solve all these problems, but it distributes power over the network and allows many more entities beyond one company to build solutions.”
Bridging Algorithms: From Birdwatch to Community Notes
Research on Twitter’s Birdwatch (now Community Notes) demonstrates that “bridging-based ranking” can reduce misinformation spread while increasing trust. The system uses matrix factorisation to identify annotations that appeal broadly across heterogeneous user groups—surfacing notes that both conservatives and liberals find helpful, rather than those that only resonate within one tribe.
Studies show that users who saw bridging-selected annotations were significantly less likely to reshare potentially misleading posts. The approach has been praised by Ethereum co-founder Vitalik Buterin for its ability to “break through echo chambers.”
The Prosocial Ranking Challenge at UC Berkeley’s Center for Human-Compatible AI is testing alternative ranking algorithms through browser extensions with thousands of users, measuring effects on polarisation, well-being, and information quality. Initial results are sobering: improvements are modest, and some interventions worsen the problems they aim to solve. But this is precisely why we need more R&D—to understand what works and what doesn’t.
Collective Constitutional AI: Democratic Input to Model Behaviour
The Collective Intelligence Project has partnered with Anthropic and OpenAI to explore how public input can shape AI behaviour. In their Collective Constitutional AI experiment, 1,000 Americans helped draft principles for an AI chatbot using the Polis platform.
The resulting “public constitution” was used to train a new AI model, which exhibited lower social bias across metrics like race, gender, and disabilities compared to models trained solely on internally-developed principles. The experiment demonstrates that Constitutional AI training—where models are taught to follow written normative principles—creates an opening for democratic participation in AI alignment.
Key finding: participants from diverse backgrounds reached high consensus on most statements about AI behaviour, despite initial political differences. The model trained on collective input proved “more responsive to culturally specific issues” than the baseline.
The Investment Landscape: Where Money Is and Isn’t Going
What’s Funded
UK AI Security Institute (formerly AI Safety Institute): ~£50 million annually, focused on pre-deployment testing of frontier AI capabilities including biosecurity, cyber risk, and autonomous systems. Notable for joint testing agreements with US counterparts and development of the Inspect testing platform.
Stanford HAI (Human-Centered AI): Over $40 million channelled into human-centered AI research since 2019, supporting 300+ scholars. Publishes the annual AI Index Report, conducts policy bootcamps for regulators, and operates the Center for Research on Foundation Models.
Mozilla Foundation: $35 million venture fund for responsible tech startups (Mozilla Ventures), $30 million R&D lab for trustworthy AI (Mozilla.ai), plus $2.7 million Responsible Computing Challenge funding universities in Kenya, India, and the US to integrate ethics into computer science curricula. Also operates Common Voice, the world’s largest open-source multilingual voice dataset.
Berkman Klein Center (Harvard): Long-standing research on algorithmic accountability, AI governance, and media information quality. Developed risk assessment tools database for criminal justice algorithms. Runs the Institute for Rebooting Social Media.
Oxford Internet Institute / Digital Ethics Lab: Research on data ethics, algorithmic fairness, and digital governance. Contributors to the Alan Turing Institute’s Data Ethics Group. Professor Sandra Wachter’s work on algorithmic accountability has informed GDPR implementation.
Alan Turing Institute (UK national data science institute): £42 million initial government funding over five years. Ethics and Responsible Innovation Research led by David Leslie. Public policy programme exploring data-driven governance.
Center for Humane Technology: Non-profit founded by former Google design ethicist Tristan Harris and Aza Raskin (inventor of infinite scroll). Produces “Your Undivided Attention” podcast, developed “Foundations of Humane Technology” course (10,000+ participants), and supports litigation on anthropomorphic AI design harms.
What’s Not Funded (The Gap)
Despite this activity, a critical gap remains: almost no investment targets protocol-level architectural research—the fundamental design patterns that would make cascading harm structurally difficult. Current funding focuses on:
- Reactive moderation: Content removal after amplification
- Frontier AI safety: Capabilities testing for advanced models
- Ethics principles: Guidelines and frameworks
- Individual researcher fellowships: Important but fragmented
What’s missing:
- Dedicated protocol design institutes: No equivalent to CERN or EMBL for harm-resistant architectures
- Formal verification methods: Mathematical proofs for algorithmic safety properties
- Systematic bridging algorithm development: Beyond academic papers to production-ready systems
- Public interest App Views: Competitors to commercial recommendation defaults
- Post-compromise recovery mechanisms: Systems that heal after breach
The contrast is stark: platforms spend billions annually on content moderation (Facebook alone employs 15,000+ moderators) while protocol-level R&D receives effectively zero dedicated public investment.
A Concrete R&D Agenda
Based on this analysis, I propose investment organised around three pillars:
Pillar 1: Harm-Resistant Architectures (Protocol Design)
Investment needed: €100-150 million over 5 years
Institutional model: European Protocol Design Institute—analogous to CERN for physics. Could be located in the Alpine region (connecting to Smart Mountains’ territory) to symbolise resilience and long-term thinking.
Research priorities:
- Formal models of “algorithmic forward secrecy”—preventing past engagement from permanently determining future recommendations
- “Ratcheting” recommendation systems that decay influence accumulation over time
- Separation architectures that decouple speech infrastructure from amplification mechanisms (building on AT Protocol research)
- Post-compromise recovery mechanisms for social systems
- Bridging algorithms that surface cross-cutting consensus rather than tribal reinforcement
Key partners: Computational Democracy Project (Polis), Bluesky/AT Protocol team, Taiwan’s Public Digital Innovation Service (PDIS), Stanford CRFM
Pillar 2: Democratic AI Governance Infrastructure
Investment needed: €50-80 million over 5 years
Research priorities:
- Scaling Taiwan’s Alignment Assembly model globally
- Developing tools for collective input into AI constitutional principles
- Creating interoperable standards for algorithmic transparency
- Building citizen oversight mechanisms for recommendation systems
- Federated privacy-preserving methods (differential privacy, secure multi-party computation) adapted for “engagement privacy”
Key partners: Collective Intelligence Project, Taiwan Ministry of Digital Affairs, OpenAI Democratic Inputs programme, Anthropic, Meta Community Forum
Pillar 3: Public Interest Social Infrastructure
Investment needed: €80-100 million over 5 years
Research priorities:
- Public interest App Views competing with commercial defaults
- Citizen-controlled moderation cooperatives with democratic governance
- Public service media models adapted for social platforms (drawing on BBC/PSM principles)
- Interoperability standards enabling user choice across platforms
- “Community Notes”-style bridging systems scaled beyond single platforms
Key partners: Knight First Amendment Institute, Mozilla Foundation, Center for Humane Technology, European Broadcasting Union
Institutional Recommendations
1. Establish a European Protocol Design Institute
Model: CERN-style dedicated research institution with the mandate, budget, and talent to work on fundamental algorithmic architecture.
Governance: Independent scientific board with rotating industry observers (no veto power). Multi-stakeholder advisory including civil society, affected communities, and domain experts.
Initial budget: €100 million over five years, with sustained funding contingent on demonstrated progress.
Location consideration: The selection of a headquarters for an Ethical European Protocol Design Institute involves a strategic choice between three distinct symbols of governance and human endurance. The Dolomites serve as a beacon for European technological sovereignty and environmental integration, symbolising resilience, where the permanence of the peaks mirrors the long-term thinking required for sustainable ethics. Geneva offers a contrasting institutional gravity, anchoring protocols in the global epicentre of human rights and international legal frameworks. Finally, Valtellina provides a visceral narrative of “heroic” resilience; its UNESCO-recognised stone terraces and the uncompromising traditions of Storico Ribelle and Nebbiolo viticulture embody a centuries-old commitment to shaping a future through a hard-won, respectful harmony with a rugged landscape. Together, these locations represent the vital intersection of environmental stewardship, legal rigour, and the cultural tenacity necessary to define the ethics of tomorrow.
2. Expand AI Safety Institute Mandates
Current UK and EU AI safety efforts focus on frontier model capabilities. Expand their scope to include “systemic recommendation risk”—the slow-burn harms from engagement-optimised architectures.
Specific additions:
- Evaluation frameworks for polarisation, radicalisation, and attention exploitation
- Pre-deployment testing authority for major algorithm changes (not just model releases)
- Mandatory incident reporting for detected amplification of harmful content
- Annual “Algorithmic Harm Index” analogous to the Stanford AI Index
3. Create Regulatory Incentives for Protocol Innovation
The EU’s Digital Services Act and AI Act create compliance burdens. Complement these with positive incentives:
- Reduced regulatory burden for platforms implementing certified “ratchet-compliant” architectures
- Public procurement preferences for prosocial recommendation systems
- Tax incentives for R&D spending on harm-resistant design (similar to R&D tax credits for green technology)
- Interoperability mandates enabling user choice of recommendation algorithms
4. Scale Democratic AI Experiments
Build on the Collective Intelligence Project / Anthropic model:
- Annual €5 million prize pool for prosocial ranking innovations
- Mandatory platform API access for academic testing of alternative algorithms
- Public release of evaluation results enabling comparison across approaches
- Global Alignment Assemblies bringing Taiwan’s model to other democracies
5. Fund Long-Term Research Programmes
Following Mozilla’s principle of “looking where the puck is headed”:
- 10-year research programmes not dependent on annual funding cycles
- Multi-disciplinary teams combining computer science, political science, psychology, and philosophy
- Embedded practitioner fellows from platforms, regulators, and civil society
- Regular public reporting on progress and failures
The Path Forward: Presence Over Purchase
There’s a deeper connection here to how I think about my own work. In my ventures, I’ve come to distinguish between being present and being available for purchase. The former creates genuine relationship; the latter extracts value while hollowing out connection.
Current AI and social platforms are designed for purchase—for extracting attention, monetising engagement, treating human connection as inventory to be optimised. An ethical ratchet would be designed for presence: systems that facilitate genuine exchange, that don’t exploit the gap between what users want and what keeps them scrolling, that assume relationships are more valuable than transactions.
The Double Ratchet works because it makes no assumptions about the goodwill of attackers. It assumes the worst and designs for resilience. We need AI systems that make no assumptions about the goodwill of those who would weaponise them. That assume the worst and heal forward anyway.
The signal protocol. Not signal over noise—signal as architecture. A way of building systems that carry meaning without corruption, that heal themselves even under attack, that ratchet forward toward security rather than backward toward exploitation.
The pieces exist. Taiwan has demonstrated that consensus-building algorithms work at national scale. Bluesky has proven that architectural separation between speech and reach is possible. Bridging algorithms show that prosocial ranking isn’t utopian. The Collective Intelligence Project has trained AI on democratically-sourced principles.
What’s missing is the institutional will to assemble these pieces at scale—and the investment to build on what works. The cost of not making this investment grows with every election cycle, every targeted minority, every society that forgets where scapegoating leads.
The time for protocol design is now.
Fabrizio de Liberali Ideas
Appendix: Key Research Initiatives and Resources
Active Research Programmes (by Region)
Taiwan
- vTaiwan (citizen consultation platform using Polis)
- Join (government petition and deliberation platform)
- Alignment Assemblies (AI governance through collective intelligence)
- Public Digital Innovation Service (PDIS)
United States
- Stanford HAI ($40M+ in research funding)
- Berkman Klein Center, Harvard (algorithmic accountability)
- UC Berkeley Center for Human-Compatible AI (Prosocial Ranking Challenge)
- MIT Media Lab (various AI ethics projects)
- Collective Intelligence Project (democratic AI governance)
United Kingdom
- UK AI Security Institute (~£50M annual budget)
- Alan Turing Institute Data Ethics Group
- Oxford Internet Institute Digital Ethics and Defence Technologies Research Group
- Laboratory for AI Security Research (LASR)
Europe
- EU AI Act implementation
- AI Continent Action Plan (April 2025)
- INRIA (France) AI Evaluation programme
- Various national AI Safety Institutes forming
Global
- UNESCO Global AI Ethics and Governance Observatory
- OECD AI Policy Observatory
- Center for Humane Technology (US-based, global reach)
- Mozilla Foundation (global programmes)
Key Academic Research
- Taiwan Model: Hsiao, Lin, Tang et al. (2018) “vTaiwan: A Process Overview” — SOCArvix
- Bridging Algorithms: Wojcik et al. “Birdwatch: Crowd Wisdom and Bridging Algorithms” — ACM
- Sociotechnical Harms: Shelby et al. (2023) “Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy” — ACM FAccT
- Algorithmic Amplification: Knight First Amendment Institute symposium proceedings
- Collective Constitutional AI: Anthropic/CIP (2023) — ACM FAccT
- Recommender System Harms: Meta-analysis of 151 algorithmic audits — ScienceDirect
- AT Protocol: Kleppmann et al. (2024) “Bluesky and the AT Protocol” — ACM CoNEXT
Regulatory Frameworks
- EU AI Act (February 2025 prohibited AI provisions effective)
- EU Digital Services Act (platform accountability)
- UK Online Safety Act (duty of care)
- NIST AI Risk Management Framework (US voluntary standards)
- ISO/IEC 42001 (global AI governance standards)
Further Reading
- Perrin, T. & Marlinspike, M. (2016). “The Double Ratchet Algorithm.” Signal Foundation
- Masnick, M. (2019). “Protocols, Not Platforms: A Technological Approach to Free Speech”
- Bail, C. (2021). “Breaking the Social Media Prism” — Princeton University Press
- Harris, T. & Raskin, A. — “Your Undivided Attention” podcast
- Tang, A. (2024). “Alignment Assemblies can enable us to govern AI collaboratively” — RSA Journal
- Floridi, L. & Taddeo, M. — “What is Data Ethics?” — Philosophical Transactions A
- Chua, A. (2018). “Political Tribes: Group Instinct and the Fate of Nations”