The Compositor as a Contract

WEFT OS — Post 4 of 10


Wayland is one of those technologies that people summarize with a sentence that is technically true and practically useless:

  • “Wayland replaces X11.”

The more accurate statement is:

  • Wayland redraws the boundary between who controls what.

If you’ve never had to debug a display stack, that might sound like philosophy.

It isn’t.

It’s the difference between a system that can make strong promises and a system that can only make suggestions.

If you build a desktop environment, that boundary becomes your entire life.

Not because it’s philosophically interesting.

Because every bug you ship is a violation of an implied contract:

  • who gets focus
  • who is allowed to move a window
  • who decides where pixels end up
  • who gets to observe input

WEFT OS is built around a compositor written in Rust using Smithay.

This post is about why the compositor is not “the part that draws windows”.

It’s the part that defines contracts.

And why WEFT treats those contracts as security boundaries before they’re architecture.

This post is not a tutorial on Wayland.

It’s an attempt to explain why the compositor is the place where an OS either becomes coherent or becomes haunted.

Because once you decide the shell is a document, the compositor becomes the part that keeps the document honest.

🔗X11 vs Wayland in one sentence (the sentence people avoid)

Here’s the uncomfortable simplification.

In X11, the client has a lot of power.

In Wayland, the compositor takes that power back.

That power is not “drawing pixels.”

It’s authority.

It’s things like:

  • who knows where the cursor is
  • who can observe global input
  • who can move which window
  • who can draw above which surface

If you want to build a modern desktop with stronger isolation assumptions, this shift matters.

It doesn’t automatically make anything secure.

But it changes what the OS can enforce.

And in WEFT, enforcement is the whole point.

🔗The four questions a compositor answers

If you strip away implementation details, a compositor answers a small set of questions over and over:

  1. Which surfaces exist right now?
  2. Which surface is visible where?
  3. Which surface is allowed to receive input?
  4. What happens when a client misbehaves or dies?

If you don’t answer those questions consistently, you don’t have a desktop.

You have a temporary arrangement of pixels.

This is why I describe the compositor as a contract.

The contract is not expressed in a manifesto.

It’s expressed in protocol, in input routing, in the failure model.

If you want to understand why “contract” is the right word, imagine an OS where these questions have inconsistent answers.

You get:

  • focus bugs that feel like the machine is ignoring you
  • input leaks where one app can observe another
  • window stacking glitches that look like corruption
  • animation jitter that looks like the machine is overloaded

None of those are “UI bugs.”

They are violations of authority.

They are the OS lying.

🔗WEFT’s compositor stance: authority lives here

In WEFT, the compositor is the authority for:

  • surface lifecycles
  • stacking and layer ordering
  • focus routing
  • output geometry
  • presentation timing

That list is important because it says what the compositor is not.

The compositor is not the UI.

It is not a big “shell process” that also happens to paint pixels.

It is the place where policy is enforced.

That enforcement is what keeps a web-rendered shell honest.

This is the part that tends to surprise people who come from the web side.

They see the shell as a UI tree and assume the UI tree should be the authority.

But the UI tree is not the authority.

The compositor is.

And the reason is not aesthetic.

It’s that UI code is the place where complexity accumulates.

The system needs one component whose job is to say “no” consistently.

That’s one reason Wayland compositors matter so much philosophically.

In the Wayland model, clients do not get to treat the display server as a public square where everything is globally observable.

The compositor sits in the middle and decides what becomes real:

  • which surfaces are visible
  • which surfaces receive input
  • which requests are allowed to affect the session

That makes the compositor the natural place for policy.

And that, in turn, is why a compositor framework matters.

Something like Smithay is not interesting because “Rust compositor library” sounds fashionable.

It’s interesting because a framework with explicit protocol handlers and centralized state makes the authority model easier to express clearly.

That doesn’t remove the responsibility.

It just gives you a better place to put it.

🔗Why “the shell is HTML” doesn’t mean “the shell owns the world”

The opening post in this series said “the desktop is a document”.

That phrase invites a specific misunderstanding:

  • if the desktop is a document, then apps must be inside the document

That is not the model.

In WEFT, application content is not rendered inside the shell’s DOM tree.

Applications are clients. They have their own surfaces.

The shell is also a client. It draws shell UI and window chrome.

The compositor is the place where those surfaces meet.

This matters for two reasons:

  • It prevents the shell from becoming a single point of failure for the entire session.
  • It keeps application pixels out of the shell’s trust domain.

If you’ve ever tried to recover cleanly from a UI crash, you already know why “single process desktop” designs age badly.

There’s another reason this separation matters.

If the shell is a document, it will be tempting to treat everything as DOM.

That temptation leads to an architecture where:

  • the shell process becomes the place where “system things” happen

And then, quietly:

  • the shell becomes the place where authority leaks

The compositor contract is the guardrail against that.

🔗Layers are not an aesthetic choice

Every desktop ends up with layers, even if it pretends not to.

WEFT treats layers as a fixed contract, because fixed contracts are easier to reason about than “whatever the UI currently wants”.

At a high level, you can think of four strata:

  • background
  • application surfaces
  • shell chrome
  • overlays

The exact taxonomy can evolve.

The important part is that the compositor enforces it.

If a random app could draw itself above the shell, you don’t have a shell.

You have a suggestion.

Layering is also how you keep the UI legible.

Desktop UIs rely on subtle guarantees:

  • notifications appear above app content
  • modal overlays actually block interaction behind them
  • window chrome remains visible

These aren’t “CSS problems.”

They’re system composition problems.

And they are easier to enforce in the compositor than in the shell.

🔗The reason WEFT defines a shell protocol

Here’s the problem WEFT is trying to solve:

  • the shell wants to represent windows as DOM elements
  • the compositor owns app surfaces

So WEFT defines a small Wayland protocol extension that allows the shell to create and manage compositor-backed “window slots”.

This isn’t about inventing new windowing concepts.

It’s about expressing a very specific division of responsibility:

  • the shell requests a window slot and provides metadata
  • the compositor remains authoritative for effective geometry and focus

This is the key phrase to hold onto:

  • the compositor remains authoritative

If the compositor is not authoritative, a custom protocol doesn’t save you.

It just gives you more surface area to be wrong.

So the protocol’s job is not to create authority.

It’s to expose a narrow interface where the shell can request things and the compositor can accept, reject, or modify them.

If you read the protocol itself, you can see the shape:

  • a single manager global
  • per-window objects

Requests such as:

  • create_window
  • set_geometry
  • update_metadata

Events such as:

  • configure (authoritative effective geometry)
  • focus_changed
  • window_closed

The interesting part isn’t the method names.

The interesting part is the direction of authority.

The shell can ask.

The compositor decides.

That is what prevents “the desktop is a document” from becoming “the desktop is a web page with privileges.”

🔗What “focus routing” actually means

Focus bugs are one of the fastest ways to make an OS feel broken.

They’re also one of the hardest bugs to debug if you don’t have a clear authority boundary.

When you click on a window, several things happen:

  • the compositor decides which surface is focused
  • the compositor routes keyboard events to that surface
  • the shell updates its visual state (highlighted window, taskbar state)

The shell can render “active window” chrome.

But it should not be the component that decides where the keyboard goes.

That’s not because the shell is untrusted as a person.

It’s because the shell is complex, and complex things make bad authorities.

🔗Presentation timing: the invisible part users feel

There’s another aspect of compositing that people only talk about when it goes wrong:

  • frame pacing

If the compositor is composing at the wrong time, or if feedback about presentation is missing, you get:

  • jitter
  • resize stutter
  • animations that feel “off”

Users don’t say “presentation timing is wrong.”

They say:

  • “it feels laggy.”

This is one of the reasons WEFT cares about the compositor providing timing feedback relevant to shell-managed surfaces.

If the shell is going to animate window chrome, it needs to align that animation with reality.

Otherwise the UI becomes a guess.

And guessy UI is how you lose trust.

🔗Policy enforcement: the boring part that keeps the system real

When people talk about compositors, they often focus on rendering.

But a compositor is also a policy engine.

It has to enforce constraints that are simultaneously:

  • boring
  • non-negotiable

Some examples:

🔗Output bounds

If a client requests a geometry that doesn’t fit, the compositor must clamp it.

Not because the client is malicious.

Because the compositor is the only component that knows the real output geometry and the real stacking state.

If you let clients self-report “I’m here now”, you get the X11-style world where the OS can’t make strong promises.

🔗Stacking policy

Clients should not be able to place themselves above shell overlays.

If they can, you’ve created the simplest possible UI spoofing attack:

  • draw above the system UI

Wayland’s architecture pushes this responsibility into the compositor.

WEFT leans into that.

🔗Object lifetimes and stale identifiers

A shell is going to maintain a model of windows.

But windows are ultimately backed by compositor-managed objects.

That means the compositor must be able to say:

  • “that thing is dead now”

And the shell must treat that as authoritative.

If the shell tries to act on a stale window object, the compositor must reject it.

This sounds like paperwork.

It’s also how you prevent a whole class of “ghost window” bugs where the UI believes a thing exists and the system disagrees.

🔗What happens when a client misbehaves

“Misbehaves” does not only mean “malicious.”

It also means:

  • buggy
  • slow
  • inconsistent
  • crashing

If an app spams configuration requests or renders nonsense buffers, the compositor still has to protect the session.

If the shell asks for something impossible, the compositor has to respond with the closest valid state rather than letting the system drift into undefined behavior.

This is one of the reasons I keep pushing the idea that the compositor is a contract.

The contract isn’t “clients will behave.”

The contract is:

  • the system remains stable even when clients don’t

🔗Failure is part of the contract

Desktop environments are judged by how they fail.

Not by how they look on day one.

WEFT’s goal is that:

  • if the shell crashes, the session should not necessarily die
  • if an app crashes, the shell should not care
  • if the compositor crashes, everything reconnects and rebuilds cleanly

That goal isn’t fully proven by implementation yet.

But the architecture is shaped around making it possible.

And the shape of the architecture is simple:

  • keep authority in the compositor
  • keep UI in the shell
  • keep app logic isolated

If those roles are mixed, crash recovery becomes fantasy.

This is also where the “document shell” idea can be misunderstood.

People hear “document” and assume:

  • reload fixes everything

That’s a browser habit.

An OS cannot rely on reload as a recovery strategy.

It has to be able to keep functioning under partial failure.

That’s the difference between:

  • a UI experiment

and:

  • a system

🔗The debate I actually want here

If you want to argue about WEFT at a serious level, argue about the contract.

In particular:

  • Is a custom compositor–shell protocol a long-term maintenance trap?
  • Does it duplicate too much of xdg-shell semantics?
  • Is the “shell manages window slots” model clean enough to justify a custom extension?
  • What is the smallest possible protocol surface that still keeps authority correct?

Those questions have real consequences.

Because this protocol is the seam where “document shell” meets “window system reality”.

If the seam is wrong, WEFT collapses into one of the two bad designs:

  • everything is in the shell process
  • or apps live inside the shell browsing context

Neither is acceptable if you care about isolation and long-uptime behavior.

🔗What comes next

Next post: WebAssembly.

Not as “the thing that makes JavaScript fast”, but as a runtime model.

Why an app with no ambient authority is a different kind of promise.

And why capability-based security changes what it means to install software.