This post is about experimenting with imitating and extending the window management concepts from the venerable Plan9, Rio. The backstory and motivation is simply that I’ve had the need for a smaller and more ‘hackable’ base than the feature-heavy Durden environment for a while, and this seemed like a very nice fit.
For the TL;DR – here’s a video of a version with some added visual flair, showing it in action:
From this, the prio project has been added to the Arcan family. I wanted this experiment to exhibit three key features, that I’ll cover in more detail:
In addition to these features, it would be a nice bonus if the code base was simple enough to use as a starting point for playing around with kiosk-/mobile-/tablet- level window management schemes.
The user-designated confined spaces is shown in the beginning of the video via the green region that appears after picking [menu->new], which allows the user to “draw” where he wants a new CLI-/terminal- group to spawn. By default, the clients are restricted to the dimensions of this space, and any client-initiated attempts to resize- or reposition- will be ignored or rejected.
This is not something that is perfect for all occasions, and is a “better” fit for situations where you have a virtual machine monitor, emulator, video playback, remote desktop session or a command-line shell.
The reason is simply that these types of applications have quite restrained window management integration requirements. This means that you can get away with forceful commands like “this will be your display dimensions and that’s final” or trust that a “hi, I’m new here and I’d like to be this big” request won’t become invalid while you are busy processing it. By contrast, a normal UI toolkit application may want to spawn sub-windows, popup windows, tooltips, dialogs and so on – often with relative positioning and sizing requirements.
The primary benefits with this model is that it:
- Drastically cuts down on resize events and resize requests.
- [more important] Provides a basis for compartmentalisation.
Resize events are among the more expensive things that can happen between a client and its display server: they both need to agree on a new acceptable size, new memory buffers needs to be allocated with possibly intermediate datastores. Then, the new buffers need to be populated with data, synchronized, composited and scanned out to the display(s). At the 4k/8k HDR formats we are rapidly reaching as the new normal, a single buffer may reach sizes of 265MB (FP16 format @8k), amounting to, at least, a gigabyte of buffer transfers before you’d actually see it on the screen.
You really do not want a resize negotiation to go wrong and have buffers be useless and discarded due to a mismatch between user expectations and client needs.
*very long explanation redacted*
Over the course of a laptop battery cycle, waste on this level matters. This is one of the reasons as to why ‘tricks of the trade’ like rate-limiting / coalescing resize requests on mouse-drag-resize event storms, auto-layouter heuristics for tiling window management and border-colour padding for high-latency clients — will become increasingly relevant. Another option for avoiding such tricks is to pick a window management scheme where the impact is lower.
The basic idea of Compartmentalisation (and compartmentation in the sense of fire safety) here is that you define the location and boundary (your compartment) for a set of clients. The desired effect is that you know what the client is and where it comes from. There should be no effective means for any of these clients to escape its designated compartment or misrepresent itself. To strengthen the idea further, you can also assign a compartment with additional visual identity markers, such as the color of window decorations. A nice example of this, can be found in Qubes-OS.
With proper compartmentalisation, a surreptitious client cannot simply ‘mimic’ the look of another window in order to trick you into giving away information. This becomes important when your threat model includes “information parasites”: where every window that you are looking at is also potentially staring back at some part of you, taking notes. The catch is that even if you know this or have probable cause to suspect it, you are somehow still forced to interact with the surreptitious client in order to access some vital service or data – simply saying “no!” is not an option (see also: How the Internet sees you from 27c3 – a lack of a signal, is also a signal).
The natural countermeasure to this is deception, which suffers from a number of complications and unpleasant failure modes. This is highly uncharted territory, but this feature provides a reasonable starting point for UI assisted compartmentalisation and deception profiles assigned per compartment, but that’s a different story for another time.
Hierarchical Connection Structure
Generally speaking, X and friends maintain soft hierarchies between different ‘windows’ – a popup window is a child of a parent window and so on – forming a tree-like structure (for technical details, see, for instance, XGetWindowAttribute(3) and XQueryTree(3)). The ‘soft’ part comes from the fact that these relations can be manipulated (reparented) by any client that act as a window manager. Such hierarchies are important for transformations, layouting, picking and similar operations.
A hierarchy that is not being tracked however, is the one behind the display server connections themselves – the answer to the question “who was responsible for this connection?”. A simple example is that you run a program like ‘xterm’. Inside xterm, you launch ‘xeyes’ or something equally important. From the perspective of the window management scheme, ‘xeyes’ is just a new connection among many, and the relationship to the parent xterm is lost (there are hacks to sort of retrieve this information, but not in a reliable way as it requires client cooperation).
In the model advocated here, things are a bit more nuanced: Some clients gets invited to the party, and some are even allowed to bring a friend along but if they start misbehaving, the entire party can be ejected at once without punishing the innocent.
In Plan9/Rio, when the command-line shell tries to run another application from within itself, the new application reuses (multiplexes) the drawing primitives and the setup that is already in place. While the arcan-shmif API still lacks a few features that would allow for this model to work in exactly this way, there is a certain middle ground that can be reached in the meanwhile. The following image is taken from the video around the ~40 mark:
Instead of multiplexing multiple clients on the same connection primitive, each confined space acts as its own logical group, mapping new clients within that group to individual tabs, coloured by their registered type. Tabs bound to the same window come from the same connection point. The way this works is as follows:
(feel free to skip, the description is rather long)
The connection model, by default, in Arcan is simple: Nothing is allowed to connect to the display server. No DISPLAY=:0, no XDG_RUNTIME_PATH, nothing. That makes the server part rather pointless, so how is a data provider hooked up?
First, you have the option to whitelist by adding things to a database, and a script that issues an explicit launch_target call which references a database entry, or by exposing a user-facing interface that eventually leads down the same path. The engine will, in turn, spawn a new process which inherits the primitives needed to setup a connection and synchronise data-transfers. When doing so, you also have the option of enabling additional capabilities, such as allowing the client to record screen contents, alter output display lookup-tables or even inject input events into the main event loop, even though such actions are not permitted by default.
Second, you have the option to use designated preset software which performs tasks that are typically prone to errors or compromising security. These are the so called frameservers – intended to improve code re-use of costly and complex features across sandbox domains. The decode frameserver takes care of collecting and managing media parsing, the encode frameserver records or streams video, etc. Strong assumptions are made as to their environment requirements, behaviours and volatility.
Lastly, you have explicit connection points. These are ‘consume-on-use’ connection primitives exposed in some build-time specific way (global shared namespace, domain socket in home directory and so on). The running scripts explicitly allocates and binds connection points on a per-connection basis (with the choice to re-open after an accepted connection) using a custom name. This allows us to:
- Rate-limit connections: external connections can be disabled at will, while still allowing trusted ones to go through.
- Compartmentalise trust and specialise user-interface behaviour based on connection primitives used.
- Redirect connections, you can tell a client that “in the case of an emergency (failed connection)” here is another connection point to use. This is partly how crash recovery in the display server is managed, but can also be used for much more interesting things.
When a new user-designated confined space is created, a connection point is randomly generated and forwarded (the ARCAN_CONNPATH environment variable) to the shell that will be bound to the space. The shell can then chose to forward these connection primitives to the clients it spawns and so on. A caveat is that the authentication against the connection point is currently- (and deliberately) very weak, which means that right now, if the connection points are enumerable through some side-channel (/dev/shm, /proc), a new connection could ‘jump compartments’ by switching connection points. Proper authentication is planned as part of the security- focused branch on the Arcan roadmap.
User-definable preset- roles
Time to add something of our own. An often praised ability of X is its modularity; how you can mix and match things to your heart’s content. The technical downside to this is that it adds quite a bit of complexity in pretty much every layer, with some intractable and serious performance and security tradeoffs.
Other systems have opted for a more rigid approach. Wayland, for instance, ties different surface types and interaction schemes together through the concept of a “sHell”. Roughly put, you write an XML specification of your new shell protocol that encapsulates the surface types you want, like “popup” or “statusbar” and explain how they are supposed to behave and what you should and should not do with it. Then, you run that spec through a generator, adjust your compositor to implement the server side of this spec, and then develop clients that each implement the client side of this new spec. There are a number of sharp edges to this approach that we’ll save for later, though it is an interesting model for comparison.
Arcan has a middle ground: each “segment” (container for buffers, event deliver, etc.) has a preset / locked-down type model (e.g. popup, titlebar, …) but delegates the decision as to how these are to be used, presented or rejected – to a user controlled set of scripts (‘Prio’, ‘Durden’, ‘something-you-wrote’) running inside a scripting environment. This is complemented by the notion of script-defined connection points, that were covered at the end of the previous section.
This approach still decouple presentation and logic from ‘the server’, while maintaining the ‘Window Manager’ flexibility from X, but without the cost and burden of exposing the raw and privileged capabilities of the server over the same protocol that normal clients are supposed to use.
A direct consequence from this design – is that you can quickly designate a connection point to fulfil some role tied to your window management scheme, and apply a different set of rules for drawing, input and so on, depending on the role and segment types. This can be achieved without modifying the clients, the underlying communication protocol or rebuilding/restarting the server.
At the end of the video, you can see how I first launch a video clip normally, and how it appears as a tab. Then, I specify a designated connection point, ‘background’ and relaunch the video clip. Now, its contents are being routed to the wallpaper rather than being treated as a new client.
This means that you can split things up like a single- connection point for a statusbar, launch-bar, HUD or similar desktop elements and enforce specific behaviours like a fixed screen position, filtered input and so on. You can even go to extremes like a connection point for something like a screen recorder that only gets access to non-sensitive screen contents and “lie” when you get an unexpected connection and redirect the output of something nastier.
As the purists will no doubt point out, these three key features do not really cover a big raison d’être for Rio itself – exposing the window management, buffer access and drawing control in the spirit of ‘everything is a file’ API, and through that feature, multiplex / share the UI connection. That is indeed correct, and part of the reason for why this is not supported right now, is that the previous post on ‘Chasing the dream of a terminal-free CLI’, and this one stand to merge paths in another post in the future, when the missing pieces of code have all been found.
As things stand, Prio is obviously not a ‘mature’ project and outside the odd feature now and then, I will not give it that much more attention, but rather merge some of it into the floating management mode in Durden. When the open source Raspberry Pi graphics drivers mature somewhat, or I get around to writing a software rendering backend to Arcan, I’ll likely return to Prio and make sure it is ‘desktop-complete’ and performant enough to be used efficiently on that class of devices.