Friction Engineering — when disagreement becomes the mechanism
2026-04-22#ai #method #friction-engineering #sofia

Friction Engineering — when disagreement becomes the mechanism

I've been an enterprise architect for over fifteen years. For several of those years, I was skeptical of AI — machines don't create, they digest and produce a statistically correct response. Then I found myself building extensively with AI. Not by conversion. By necessity.

My practice pushed me to structure the approach so that a local success wouldn't remain an exception. Architecture is the decision process that maintains direction over time. These decisions are made in friction with their context: exchanges with peers, challenging business ambitions against available resources, trade-offs between what's desirable and what's feasible. This friction is natural between humans. With AI assistants, it disappears — you have to design it.

I propose a practice I call Friction Engineering: the deliberate design of friction between specialized AI assistants and a human orchestrator, as a quality mechanism that governs decisions before they become a specification and code.

Emerging approaches — structured prompt governance (Zhang & Xia, 2026) or autonomous multi-agent frameworks (Wu et al., 2023) — advance structuring and coordination, but none create cross-challenge between distinct specialized perspectives under practitioner arbitration. This is the gap Friction Engineering aims to fill.


AI says yes. That's the problem.

A single LLM says yes. Not always, but often enough that it's a structural problem rather than an anecdote.

The fundamental problem of human-AI collaboration is not the visible error — it's the silent consensus. The AI acquiesces, the practitioner validates, and nobody challenges the decision. Buçinca et al. (2021) showed this empirically: without deliberate friction, humans accept AI outputs without critical examination. La Rosa & Beretta (2025) raise the question at the systemic level: how can friction models developed for a single human-AI dyad be extended to more complex joint cognitive systems? In a human pair, productive disagreement emerges naturally. With AI, you have to provoke it intentionally.

Four symptoms keep recurring:

  • Scope drift in single-assistant mode. You start with a generalist assistant. After three sessions, it codes, advises, writes, arbitrates — in the same conversation, with the same tone, without constraint. Give it a poorly framed question, it produces a well-formulated answer. Give it a flawed direction, it executes enthusiastically. This is not collaboration. It's compliance.
  • Context loss. In long or multi-activity sessions, context is lost. January's assumptions are still active in April. Last week's decisions have vanished from the window. The assistant starts from scratch without telling you.
  • Proprietary memory. What the assistant "knows" lives in the provider's infrastructure. Switch providers, everything disappears. The dependency is silent but total.
  • Practice opacity. How do you build a work structure that's transferable to another practitioner and auditable? How do you know if what you're doing works, if it's degrading, if you can improve?

What it should be

I have a picture in mind of what human-AI collaboration that holds over time could look like. Against scope drift — assistants that don't do everything but do what they do well, focused on their domain, able to contest with precision rather than acquiesce with elegance. Against forgetting and dependency — a system where nothing is lost, every decision traced, every session historized, memory persisting outside the provider, in my files, in my repo, portable from one provider to another. Against opacity — a transferable, auditable practice that lets you know if what you're doing is degrading before it's too late.

None of this exists in current tools. Providers sell fluidity. Nobody sells friction.


Friction Engineering

The implementation rests on one principle: creating intentional friction — a systematic cross-challenge between specialized assistants, under the practitioner's control, until convergence. The means to ensure decision quality upstream of production.

Cross-challenge One persona produces, another contests, the practitioner arbitrates Persona 1 author produces Artifact Persona 2 reviewer challenges [contestable] Practitioner validates adjusts arbitrates revised Cost is linear (1 + N), not combinatorial

The point of isolating is the challenge and the inspection. One persona can challenge another's work, always under the practitioner's observation, to complete the picture from a different perspective.

Layered isolation Each persona only sees its own scope Inter-persona relations Product Scope reference files, constraints, artifacts Persona role / expertise explicit constraints what it does not do loaded at boot The context window is invested in depth, not coverage

Isolation is the central mechanism. Each assistant is constrained by a persona — a role, a product scope, explicit relations with others. It only sees its own scope. It contests better because it contests narrower. Project context and persona context nest — one is included in the other. Never forgotten.

Friction governs decisions upstream. Downstream, execution also needs guardrails — that's a second loop.

Two loops, one practitioner Practitioner Decision Contestation revise FRICTION Guardrails Production iterate HARNESS spec reopen Project context

On the downstream side, for production, you can use a range of techniques from software development — TDD, hooks, CI. Bockeler (2026) formalizes this loop under the name harness engineering: guides (feedforward) and sensors (feedback), computational or inferential, that frame agent execution. She notes, however, that the behaviour harness — ensuring the system does what was functionally intended — remains an open problem. Friction Engineering completes this picture: the upstream loop (friction between specialized perspectives) aims precisely to clarify design decisions before the downstream harness takes over.

I'm developing a method around this observation — SOFIA, A method for orchestrating specialized AI personas through intentional friction — that addresses these challenges. In SOFIA, the practitioner is the orchestrator: they activate personas, route artifacts, and arbitrate frictions. Here's how.

The implementation here uses Claude as the provider and a markdown file structure sourced in git for history. The method itself is provider-agnostic.

An assistant that stays in its lane

A generalist assistant, after three sessions, codes, advises, writes, arbitrates — in the same conversation, with the same tone, without constraint. An LLM has a finite context window. The wider the context, the more diluted the signal. The agent knows a bit of everything but contests nothing with precision.

Each persona is defined by a file loaded at session boot. Three layers of isolation: its role and expertise (what it does), its product scope (the files it can read, the artifacts it produces, what it doesn't touch), and its relations (which other personas it interacts with, and how).

A persona won't stray from its activity and will say when a task isn't its domain. It can challenge another persona's work from its own perspective — or even the practitioner's proposals. These challenges are called frictions.

Mira — architect persona — produced pedagogical deliverables five times: user guide, onboarding, quick start, manual mode, derivation grammar. Each time, for lack of a dedicated persona. The result was structurally correct but not pedagogically optimized — a derivation grammar is a pedagogical artifact, not an architectural one. It must make a process adoptable, not specify it.

Isolation made the signal visible: 3+ deflections on the same domain, threshold reached. The finding led to identifying an uncovered axis in the structure — and considering a dedicated pedagogy persona. Without isolation, Mira would have continued absorbing this role by gravity, and nobody would have noticed.


A memory that doesn't fade

In design sessions, context loss is frustrating. In long projects, it's structurally dangerous. January's assumptions are still active in April. Last week's decisions have vanished from the window. The assistant starts from scratch without telling you.

One context file per persona, versioned in git, reloaded at each session. Not the provider's memory — yours. In markdown, in a repo, inspectable and portable.

Here's what a context file looks like:

---
persona: winston
product: oxynoe
---
# Writer Context — Oxynoe

## Scope
Editorial content: articles (fragments), watch (regards),
blue book (voice), launch posts.

## Key documents
| File | Role |
|------|------|
| fragments/ | Short articles — .md sources |
| regards/   | Periodic watch — .md sources |

## Inter-instance flows
- Receives from Products: validated claims (Lea)
- Receives from Methods: accessible method content (Pedagogue)

At each session close, a structured summary is produced — what was done, decisions made, frictions raised, what remains open:

---
persona: mira
date: 2026-04-16
session: "2233"
---
## Produced
- H2A annotation on 25 reviews — 114 friction lines annotated
- Deletion of 35 cross-instance duplicates

## Decisions
- Source of truth for reviews = produits/shared/review/ [PO]
- Sessions are immutable — no reconstruction [mira] → ratified

## Orchestrator friction
- ~ [contestable] shortcut on deletion outside scope
  — [PO] → ratified
- ~ [contestable] Mira proposes not adding initiative tag
  — [PO] → revised

## Open
- 24 reviews with resolutions to qualify

Personas never forget what happened in previous sessions. The next persona starts where the previous one left off. The orchestrator can go back through git history to find any decision.


Decisions that don't happen in silence

Decisions made silently are the most common failure mode. A persona proposes, the orchestrator validates, the next one executes. The chain is smooth — and that's precisely the problem. Smoothness is not quality. It's the absence of friction.

All decisions are submitted to the practitioner and traced. Each friction is qualified with a marker — [sound], [contestable], [simplification], [blind_spot], [refuted] — and resolved explicitly: ratified, revised, contested, or rejected. This mechanism draws on the intelligibility protocol by Mestha et al. (2025) — PXP — which structures iterative human-AI interaction around mutual predictions and explanations. The protocol's four resolutions (ratified, revised, contested, rejected) are an adaptation of PXP gestures, applied at the friction level rather than the message level.

The marker says what was contested. The resolution says what was done about it. Here's a resolved friction from an architecture session:

~ [contestable] Mira proposes not adding the initiative tag
  (redundant with frontmatter) — [PO] → revised
  The PO corrects: the audit script parses in batch, not content.
  The tag is added.

Mira had valid reasoning — avoid redundancy. The practitioner saw what she couldn't see: the downstream analysis pipeline's need. The friction surfaced an invisible constraint. Without it, the tag wouldn't have been added, and the dashboard would have been blind.

Another example. A developer persona proposes an ADR for concurrent execution in the rendering engine. Solid design. Coherent architecture. An architect persona reviews it and says: not now. No measured bottleneck. The roadmap says "make it work before make it fast." A security point is unaddressed. The ADR waits. The design will be better when the time comes. Without that pushback, the dev would have implemented — and the risk was that the cleanup engine would break the player a few months later, forcing a complete rework.

Without resolution, friction is an inventory of disagreements — it governs nothing.


A method that transfers

How do you let another practitioner, in another domain, adopt the way of working? My ambition is for this method to scale to multidisciplinary teams.

The formal protocol — H2A (Human-to-Assistant) — separates common work modalities (sessions, frictions, artifacts) from what's specific to each practitioner via canvases. Transfer starts from a simple base: persona, artifact, session. Three building blocks. Friction as a constitutive element for surfacing design decisions. Context management around sessions and artifacts.

The model in practice:

persona (role + constraints)
    ↓ produces
artifact (frontmatter + content)
    ↓ circulates via
session (structured summary + qualified frictions)
    ↓ arbitrated by
practitioner (validates, revises, rejects)

The method is currently used by four practitioners across different domains. This is early adoption — feedback is encouraging but the track record remains limited. Transfer works because the base is simple. A practitioner who has never heard of friction gets an immediate benefit — their setup stops drifting.


A cockpit to observe your own practice

Multiple practitioners use the method. How do you advise them on their practice without reading the dozens of sessions and artifacts in their instance? How do you detect a persona that's too compliant, spot scope drift, identify wear before it sets in?

Wear is the most counter-intuitive failure mode, because it looks like success. The friction surfaces polish each other smooth. The orchestrator rejects contestation, the personas soften their pushback. The system degrades into polite agreement. Sessions run, markers appear, resolutions are logged — but the contestation has lost its teeth.

The H2A protocol formally structures how personas record their sessions and artifacts — making it possible to audit an instance, recalibrate it, and advise the practitioner on their practice. An analysis pipeline reads sessions, extracts frictions, and produces a dashboard with five views:

View Question
Map What does the organization look like? Topology, personas, trajectory
Mirror Am I healthy as an orchestrator? KPIs, radars, flows
Lens What happened over time? Time series, distributions
Probe Is the instance structurally conforming? Pass/warn/fail checks
Legend How do I read all this? Embedded documentation

Wear is detectable through instrumentation: a growing ratio of [sound] and ratified, a gradual disappearance of [contestable] and [blind_spot]. Each persona receives failure mode tags — slip (friction without arbitration), wear (polished surfaces), crush (one side imposes), asymmetry (one-way friction). Detection requires someone to look. And that's the paradox: the better the system runs, the less vigilant the orchestrator becomes.

A concrete example. A post-sprint audit revealed that Mira — architect persona — had absorbed four roles beyond her own during three days of publication: dev, ops, release manager, graphic designer. During that sprint, she produced zero ADRs — she coded instead of specifying. And 3 out of 3 friction mechanisms had been short-circuited. The audit made visible what publication pressure had made invisible. The persona hadn't overflowed out of incompetence — it had overflowed by gravity, because nobody else carried those roles.

The current snapshot of my three instances is accessible as open data: oxynoe.io/h2a.


What it changes

  • Isolation enables focus. Each assistant contests better because it contests narrower. Separation of responsibilities is not an org chart — it's context engineering applied to friction.
  • Decisions are visible, not delegated. Acceleration of quality and maintainability. Decisions can then, in a software development context, be used to frame development via the harness.
  • The method is transferable and replicable. Multiple instances beyond my own work suggest this, with the caveats of still-limited track record.
  • Instance audit enables recalibration. Structure and personas can be adjusted based on observable data, not impressions.

What remains to be built

Friction Engineering is early-stage empirical work. One primary practitioner, nine personas, three project instances, 400+ sessions over several months.

Two open fronts:

  • Scalability to multidisciplinary teams. My intuition: each practitioner on the team could have their own instance and personas to reach their goals and increase their speed and output quality — we'll be able to do what we didn't have time to do before. But this is an intuition, not a result.
  • Formalization. A comprehensive article on the position taken in this blog post, as well as a second one analyzing all metrics and practical findings that I and the various practitioners have encountered.

The data, protocol, and instrumentation are open because these questions are worth testing beyond a single project. Feedback and contestation welcome — it's kind of the point.


Going further

To apply the method, a documentation repository is available, along with a set of tools — canvases to adapt the method to your own practice and the protocol formalism:


References

  • Buçinca, Z., Malaya, M. B. & Gajos, K. Z. (2021). "To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making". Proc. ACM Hum.-Comput. Interact. (CSCW).
  • La Rosa, B. & Beretta, A. (2025). "Frictional AI in Joint Cognitive Systems". HHAI 2025 Workshop, Pisa (CEUR Vol. 4074).
  • Mestha, R. et al. (2025). "Intelligibility Protocol" (PXP).
  • Bockeler, B. (2026). "Harness Engineering for Coding Agent Users". martinfowler.com.
  • Zhang, W. & Xia, J. (2026). "Structured-Prompt-Driven Development". martinfowler.com.
  • Wu, Q. et al. (2023). "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation". arXiv:2308.08155.
← All fragments