Arcan as Operating System Design

Time to continue to explain what Arcan actually “is” on a higher level. Previous articles have invited the comparison to Xorg ( part1, part2 ). Another possibility would have been Plan9, but Xorg was also a better fit also for the next (and last) article in this series.

To start with a grand statement:

Arcan is a single-user, user-facing, networked overlay operating system.

With “single-user, user-facing” I mean that you are the core concern; it is about providing you with controls. There is no compromise made to “serve” a large number of concurrent users, to route and filter the most traffic, or to store and access data the fastest anywhere on earth.

With “overlay operating system” I mean that it is built from user-facing components. Arcan takes whatever you have access to and expands from there. It is not hinged on the life and death of neither the Linux kernel, the BSD ones or any other for that matter. Instead it is a vagabond that will move to whatever ecosystem you can develop and run programs on, even if that means being walled inside an app store somewhere.

As such it follows a most pervasive trend in hardware. That trend is treating the traditional OS kernel as a necessary evil to work around while building the “real” operating system elsewhere. For a more thorough perspective on that subject, refer to USENIX ATC’21: It’s time for Operating Systems to Rediscover Hardware (Video).

This is a trick from the old G👀gle book: they did it to GNU/Linux with Android and with ChromeOS. It is not as much the predecessor mantra of “embrace, extend and extinguish” as it is one of “living off the land” — understanding where the best fit is within the ecosystem at large.

From this description of what Arcan is — what is the purpose of that?

The purpose of Arcan is to give you autonomy over all the compute around you.

With “autonomy” I mean your ability to move, wipe, replace, relocate or otherwise alter the state that is created, mutated or otherwise modified on each of these computing devices.

The story behind this is not hard in retrospect; user-facing computers are omnipresent in modern life — they outnumber us. You have phones, watches, tablets, “IoT” devices, e-Readers, “Desktop” Workstations, Laptops, Gaming Consoles and various “smart”- fridges, cars and meters and so on. The reality is that if a computer can be fitted into something, be sure that one or several will be shoved into it, repeatedly.

The fundamentals on how these computers work differ very little; even “embedded” is often grossly overpowered for the task at hand. On the other hand, getting these things you supposedly own to collaborate or even simply share state you directly or indirectly create is often hard- or impossible- to achieve without parasitic intermediaries. The latest take on this subject on the time of writing this is parts of “cloud”; routing- and subjecting- things to someone else’s computing between source (producer) and sink (consumer).

Part of the reason for this is persistent and deliberate balkanisation combined with manipulative monetisation strategies that have been permitted to co-evolve and steer development for a very long time; advances in cryptography has cemented this.

An example: In the nineties it was absurd to think that the entire vertical (all the ‘layers’) from datastore all the way to display would have an unbroken chain of trust. The wettest of dreams that the Hollywood establishment had was that media playback was completely protected from the user tampering with- or even observing- the data stream until presented. That is now far from absurd; it is the assumed default reality and you are rarely allowed to set or change the keys.

The scary next evolution of this is making you into a sensor and it is sold through claims of stronger security features that are supposed to protect you from some aggressor. A convenient side effect is that it actually serves to safeguards the authenticity of the data collected from-, of- and about- you. As a simple indicator: when no authentication primitive (password etc.) is needed, the “ai in the cloud” model of you has locked things down to the point that you best behave if you want to keep accessing those delicious services your comfort depend on.

The overall high-level vision how this development can be countered on a design basis is covered by the 12 principles for a diverging desktop future on our sister blog — but the societal implementation is left as an exercise for the reader.

A short example of the general idea in play can be seen in this old clip:

This demonstrates live migration between different kinds of clients moving to arcan instances with different levels of native integration and no intermediaries. As a native display server on OpenBSD in the centre to a limited application on the laptop to the left (OSX) and to a native display server on the pinephone on the right.

The meat and remainder of this article will provide an overview of the following:

Building Blocks

The following diagram illustrates the different building blocks and how they fit together.

Illustration of Building Blocks and their Interactions

SHMIFShared Memory Interface

SHMIF is a privilege barrier between tasks built only using shared memory and synchronisation primitives (commonly semaphores) as necessary and sufficient components. This means that if an application has few other needs over what shmif provides, all native OS calls can be dropped after connection setup. That makes for a strong ‘least-privilege’ building block.

It fills the role of both asynchronous system calls and asynchronous inter-process communication (IPC), rolled into one interface. The main inspiration for this is the ‘cartridges‘ of yore — how entire computers were plugged in and removed at the user’s behest.

There is a lot of nuance to the layout and specifics of SHMIF which are currently out of scope, notes can be found in the Wiki. The main piece is a shared 128b event structure that is serialised over two fixed size ring buffers, one in-bound and one out-bound. The rest is contested metadata that is synchronously negotiated and transferred to current metadata that both sides maintain a copy of.

This is optionally extended with handle/resource-token blob references when conditions so permit (e.g. unix domain socket availability or the less terrible Win32 style DuplicateHandle calls), as that is useful for by-reference passing large buffers around which can be preferred for accelerated graphics among many other things.

The data model for the events passed around is static and lock stepped. This model is derived from the needs of a single user “desktop”. It has been verified against both X11, Android as well as special purpose “computer-in-computer” clients like whole-system emulation à QEmu, specialised hybrid input/out devices (e.g. streamdeck) and virtual- augmented- and mixed- reality (e.g. safespaces).

A summary of major bits of shmif:

  • Connection: named, inheritance, delegation, redirection and recovery.
  • Rendering: pixel buffers, formatted text, accelerated device buffers.
  • Audio: sources, sinks and synchronisation to/from video.
  • Input: digital, translated (keyboard), mice, touch / tablet, analog sensors, game / assistive, eye tracker, application announced custom labels.
  • Sensors: Positional, Locational and Analog.
  • Window management (relative position, presentation size, ordering, annotation).
  • Color profiles and Display controls (transfer functions, metadata).
  • Non-blocking State transfers (clipboard, drag and drop, universal open/save, fonts).
  • Blocking state controls (snapshot, restore, reset, suspend, resume).
  • Synchronisation (to display, to event handler, to custom clock, to fence, free-running).
  • Color scheme preferences.
  • Key/value config persistence and “per window” UUID.
  • Coarse grained timers.

Most of this is additive – there is a very small base and the rest are opt-in events to respond to or discard. All event routing goes through user-scriptable paths that are blocked by default, forcing explicit forwarding. A client does not get to know something unless the active set of scripts (typically your ‘window manager’) routes it.

Upon establishing a connection, requesting a new ‘window’ (pull) or receiving one initiated by the server side (push), the window is bound to an immutable type. This type hints at policy and content to influence window management, routing and to the engine scheduler.

A game has different scheduling demands from movie playback; an icon can attach to other UI components such as a tray or dock and so on. This solves for accessibility and similar tough edge cases, and translates to better network performance.

A12 – Network Protocol

SHMIF is locally optimal. Whenever the primitives needed for inter-process SHMIF cannot be fulfilled, there is A12. The obvious case is networked communication where there is no low latency shared memory, only comparably high-latency copy-in-copy-out transfers.

There are other less obvious cases, with the rule of thumb being that two different system components cannot synchronise over shared memory with predictable latency and bandwidth. For instance, ‘walled garden’ ecosystems tend to disallow most forms of interprocess communication, while still allowing networked communication to go through.

A12 has lossless translation to/from SHMIF, but comes with an additional set of constraints to consider and builds on the type model of SHMIF to influence buffer behaviour, congestion control and compression parameters.

The constraints placed on the design are many. A12 needs to usable for bootstrapping; operate in hostile environments; isolated/islanded networks and between machines with unreliable clocks; incompatible namespaces and possibly ephemeral-transitive trust. For these reasons, A12 deviates from the TLS model of cryptography. It relies on a static selection of asymmetric- and symmetric- primitives with pre-shared secret ‘Trust On First Use’ management rather than Certificate Authorities.


Around the same era that browsers started investing heavily into sandboxing (late 2000s, early 2010s) Arcan, then a closed source research project, also focused on ephemeral least privilege process separation of security and stability sensitive tasks. The processes that carry out such tasks are referred to as ‘frameservers’.

In principle ‘frameservers’ are simply normal applications using SHMIF with an added naming scheme to them and a chainloader (arcan_frameserver) that is responsible for sanitising and setting up respective execution environments.

In practice ‘frameservers’ have designated roles (archetypes). These control how the rest of the system delegates certain tasks, and gives predictable consequences to what happens if one would crash or be forcibly terminated. It is also used to put a stronger contract on accepted arguments and behavioural response to the various SHMIF defined events.

The main roles worth covering here are ‘encode‘, ‘decode‘ and to some extent, ‘net‘.

Decode samples some untrusted input source e.g. a video file, a camera device or a vector image description, and converts it into something that you can see and/or hear, tuned to some output display. This consolidates ‘parsing’ to single task processes across the entire system. These processes have discrete, synchronous stages where incrementally more privileges, e.g. allocating memory or accessing file storage, can be dropped. The security story section goes a bit deeper into the value of this.

Encode transforms something you can see and/or hear into some alternate representation intended for an untrusted output. Some examples of this are video recording, image-to-text (OCR) and similar forms of lossy irreversible transforms.

Net sets up transition and handover between shmif and a12. It also acts as networked service discovery of pre-established trust relationships (“which devices that I trust are available”, “have any new devices that I trust become available”) and as a name resource intermediary e.g. “give me a descriptor for an outbound connection to <name in namespace>”.

Splitting clients into this “ephemeral one-task” and regular ones lead to dedicated higher level APIs for traditionally complex tasks, as well as acting as delegates for other programs.

It is possible for another shmif client to say “allocate a new window, delegate this to a decode frameserver provided with this file and embed into my own at this anchor point” with very few lines of code. This lets the decode frameserver act as a parser/decode/dependency sponge. Clients can be made simpler and not invite more of the troubles of toolkits.


The ‘engine’ here is covered by the main arcan binary and invites parallels to app frameworks in the node.js/electron sense, as well as the ‘Zygote’ of Android.

It fills two roles — one is to act as the outer ‘display server’ which performs the last composition and binds with the host OS. The scripts that run within this role is roughly comparable to a ‘window manager’, but with much stronger controls as it acts as ‘firewall/filter/deep inspection’ for all your interactions and data.

The other role is marked as ‘lwa’ (lightweight arcan) in the diagram and is a separate build of the engine. This build act as a SHMIF client and is able to connect to another ‘lwa’ instance or the outermost instance running as the display server. This lets the same code and API act as both display server, ‘stream processor’ (see: AWK for Multimedia) and the ‘primitives’ half of a traditional UI toolkit.

Both of these roles are programmable with the API marked as ‘ALT’ in the diagram and will be revisited in the sections on ‘Programmable Interfaces’ and ‘Appl’.

The architecture and internal design of the engine itself is too niche to cover in sufficient detail. Instead we will merely lay out the main requirements that distinguish it against the many strong players in the core 2D- supportive 3D- game engine genre.

Capability – enough advanced graphics and animation support for writing applications and user interfaces on the visual and interactive span of something ranging from your run-of-the-mill web or mobile ‘app’ to the Sci-Fi movie ‘flair over function’ UIs. It should not rely on techniques that would exclude networked rendering or exclude devices which cannot provide hardware acceleration.

Abstraction – The programmable APIs should be focused on primitives (geometry, texturing, filtering), not aggregates/patterns (look and feel). Transforms and animations should be declarative (“I want this to move to here over 20 units of time”), (“I want position of a to be relative to position of b”) and let the engine solve for scheduling, interpolation and other quality of experience parameters.

Robust – The engine should be able to operate reliably in a constrained environment with little to no other support (service managers, display/device managers). It should avoid external dependencies. It should be able to run extended periods of time — months, not hours or days.

Resilient – The engine should be able to recover from reasonable amounts of failures in its own processing, and that of volatile hardware components (primarily GPU). It should be able to hand over/migrate clients to other instances of itself.

Recursive – The engine should be able to treat additional instances of itself as it would any other node in the scene graph, either as an external source node or a subgraph output sink node.

Programmable Interfaces

SHMIF has been covered already from the perspective of its use as an IPC system. As an effect of this, it is also a programmable low-level interface. A thorough article on using it can be found in (writing a low level arcan client), and a more complex/nuanced follow up in (writing a tray icon handler). The QEmu UI driver, the arcan-wayland bridge and Xarcan are also interesting points of reference to hack on.

TUI is an abstraction built on top of SHMIF. It masks out some features and provides default handlers for certain events — as well as translating to/from a grid of cells- or row-list- of formatted text. It comes with the very basics of widgets (readline, listview, bufferview). Its main role in the stack is to replace (n)curses style libraries and improve on text dominant tools as a migration strategy for finally leaving terminal emulation to rest.

ALT is the high level API (and protocol*) for controlling the Engine. The primary way of using it is as Lua scripts, but the intention is a bit more subtle than that. For half of this story see ‘Appl’ below. Lua was chosen for the engine script interface in part for its small size (with that low memory overhead and short start times), the easy binding API and minimal set of included functions. It is treated and thought of as a “safe” function decoration for C more than it is treated as a normal language.

The *protocol part is that the documentation for the API double as a high level Interface Description Language to generate bindings that would use the API out of process — allowing both Lua “monkey patching” by the user and process separation with an intermediate protocol. This makes the render process and ALT into a dynamic sort of postscript for applications with animations and composition effects rather than static printer-friendly pages.


This is not a discrete component, but rather a set of restrictions and naming conventions added on top of the core engine. To understand this, a rough comparison to Android is again in order.

The Android App is, grossly simplified, a Zip archive with some hacks, a manifest XML file, some Java VM byte code, optional resources and optional native code. The byte code traditionally came from compiling Java code, but several languages can compile to it. The manifest covers some metadata, importantly which system resources that the app should have access to.

The Arcan Appl (the ‘l’ is pronounced with a sigh or ‘blowing raspberries’) has a folder structure:

  • A subdirectory to some appl root store with a unique name
  • A .lua file in that directory with the same name.
  • A function with the same name as the directory and the .lua file.

Resources the appl can access and data stores it can create and delete files within are broken down into several namespaces. The main such namespaces are roughly: application-local-dynamic, application-local-static, fonts, library code and shared.

Similarly to how the Android app can load native code via JNI, the Arcan appl can dynamically load shared libraries. In contrast to Android where native code is pulled in for supporting the high level Java/Kotlin/…, the high level scripting in Arcan is to support the native code so that the tedious and error prone tasks are written in a memory safe, user hackable/patchable code by default.

The mapping of the namespaces themselves, restrictions or additional permissions, configuration database and even different sets of frameservers are all controlled by the arguments that was used to start each engine process.

The database act as a key / value store for the appl itself, but also as a policy model for which other shmif capable clients should be allowed to launch (* enforcement for native code is subject to controls provided by the host OS), as well as a key / value store for tracking information for each client launched in such a way.

Other resources permissions are not directly requested or statically defined by the appl itself, it the window manager that ultimately maps and routes such things.

User Interfaces

There are a number of reference appls that have been written and presented on this page throughout the years. These are mainly focused on filling the ‘window manager’ role, but can indeed also be used as building blocks for other applications.

These have been used to drive the design of ALT API itself; to demonstrate the rough scope of what can be accomplished, and they are usable tools in their own right.

The ones suitable as building blocks for custom setups or ‘daily drivers’ are as follows:

Console — (writing a console replacement using Arcan) which act as fullscreen workspaces dedicated to individual clients, with a terminal spawned by default in an empty one. This is analogous to the console in Linux/BSD setups and comes bundled with the default build (but with unicode fonts, OSD keyboard, touchscreen support, …).

Durden — Implements the feature set of traditional desktops and established window management schemes (tiling, stacking/floating and several hybrids). It is arranged as a virtual filesystem tree that UI elements and device inputs reference.

Safespaces — Structurally similar to Durden as far as a ‘virtual filesystem’ goes, but intended for Augmented-, Mixed- and Virtual Reality.

Pipeworld — Covers ‘Dataflow’; a hybrid between a programming environment, a spreadsheet and a desktop.

There are other shorter ones that are not kept up to date but rather written to demonstrate something. A notable such example is the Plan9-like Prio (One night in Rio – Vacation Photos from Plan9).


There are several intimidating uphill battles with established conventions and the network/lock-in effect of large platforms — no interesting applications equals no invested users; no developers equals no interesting applications.

The most problematic part for compatibility comes with ‘toolkit’ (e.q. Qt and GTK) built ones. Although often touted as ‘portable’, what has happened time and again is a convergence to some crude and uninteresting capability set tied to whatever platform abstraction can be found deep in the toolkit code base — it is never pretty.

There is fair reason as for why many impactful projects went with ‘the browser as the toolkit’ (i.e. Electron). The portability aspects for the big toolkits will keep on loosing relevance; the long term survival rate for well-integrated ‘native’ feel portable software looks slim to none. The end-game for these rather looks like banking on one fixed idea/style or niche.

The compatibility strategy for Arcan is “emphasis at the extremes” — first to focus on the extreme that is treating other applications as opaque virtual machines (which include Browsers). Virtualization for compatibility is the strongest tactic we have for taking legacy with us. This calls for multiple tactics to handle integration edge cases and incrementally breaking the opaqueness – such as forced decomposition through side-band communicated annotations, “guest-additions” and virtual device drivers.

The second part to the strategy is to focus on the other extreme that is the ‘text dominant’ applications, hence all the work done on the TUI API. As mentioned before, it is needed as a way out of getting ‘Terminals’ with command lines that are not stuck with hopelessly dated and broken assumptions on anything computing. Terminal emulators will be necessary for a long time, and Arcan comes with one by default — but as a normal TUI client.

TUI is also used as a way of building front-ends to notoriously problematic system controls such as WiFi authentication and dynamic storage management. It is also useful for ‘wrapping’ data around interactive transfer controls; leave the UI wrapping and composition up to the appl stage.

The distant third part of the compatibility strategy are protocol bridges — the main one currently being ‘arcan-wayland’. For a while, this was the intended first strategy, but after so many years of the spec being woefully incomplete, then seriously confused, it is now completely deranged and ready for the asylum. That might sound grim, yet this is nothing compared to the ‘quality’ of the implementations out there.

Security Story

One area that warrants special treatment is security (and the overlap with some of privacy). This is an area where Arcan is especially opinionated. A much longer treatment of the topic is needed, and an article for that is in the queue.

The much condensed overarching problem of major platforms are that they keep piling on ‘security features’ (for your own good they say) and (often pointless) restrictions or interruptions that are incrementally and silently added through updates — with you in the dark of what they are actually supposed to protect you from, and the caveats with that.

The following two screenshots illustrate the entry level of this problem:

sudo-sickness: “bash wants access to control Finder”
“something” wants to “use your microphone”

Note: The very idea that the second one even became a dialog is surprising. Most UIs that predates this idiocy had trained users to route data using their own initiative and interaction alone through “drag and drop”, “copy and paste” and so on. It is a dangerous pattern in its own right, and a mildly competent threat actor knows how to leverage this side channel.

There is a lot to unpack with these two alone, but that is for another time.

The core matter is that these fit in some threat model, but unlikely to be part of your threat model. The tools to actually let you express what your threat model currently is, and tools to select mitigations to fit your circumstances — are both practically nonexistent.

Compare to accessibility: supporting vision impairment, or blindness, has vastly different needs from deafness which has different needs from immobility. Running a screen reader will provide little comfort to someone that is hard of hearing and so on. Turning such features on without the user being informed; or even rudely interrupting by repeatedly asking at every possible junction, is rightly to be met with some contention.

In the other end, someone working on malware analysis has different needs from someone approving financial transactions for a company has different needs from someone forced to use a tracking app from an abusive partner or employer. Yet here protections with different edge cases and failure modes are silently enabled without considering the user.

The security story is dynamic and context dependent by its very nature. A single person could be switching between having any of the needs expressed above at different times over the course of a single day. More technically, it might be fine for your laptop to “on-lid-open: automatically enable baseband radio, scan for known access points, connect to them, request/apply DHCP configuration” coupled with some other service waking up to “on network: request and apply pending updates” and so on from the comforts of your home. It might also get you royally owned while at an airport in seemingly infinitely many ways.

To tie things back to the Arcan design. The larger story comes from the 12 principles linked earlier, and a few of those are further expanded into the following maxims:

The Window Manager defines the set of mitigations for your current threat model.

This is hopefully the least complicated one to understand. To break it down further:

  1. The window manager is first in line to operationalise your intents and actions.
  2. The window manager reflects your preferences as to how your computer should behave.
  3. You, or someone acting on your behalf, should always have the agency to tune or work around undesired behaviours or appearances.
  4. Any interaction should be transformable into automation through your initiative and bindable to triggers that you pick.
  5. Automation and convenience should be easy to define and discover, but not a default.

The point is to make the set of scripts that is the Appl controlling the outermost instance of Arcan (as a display server) the primary control plane for your interests. If you think of each client/application as possibly sandboxed-local or sandboxed-networked, the scripts define the routing/filtering/translation rules between any source to any sink — a firewall for your senses.

There is no IPC but SHMIF.

Memory safety vulnerabilities (typically data/protocol parsers) were for a long time a cheap and easy way to gain access into a system (RCE – Remote Code Execution).

The cost and difficulty increased drastically with certain mitigations, e.g. ASLR, NX, W^X and stack canaries – but also through least-privilege separation of sensitive tasks (sandboxing). Neither of these are panaceas but they have raised the price and effort so substantially that there is serious economy and engineering effort behind just remote code execution alone (public examples) — which is far from what goes into a full implant.

Bad programming patterns break mitigations. If you don’t design the entire solution around least-privilege, very little can safely be sandboxed. In UNIX, everything is a file-descriptor. Subsequently, blocking write() to file-descriptors breaks everything.

What happens when trying to sandbox around non least-privilege friendly components is that you get IPC systems. Without a systemic perspective you end up with a lot of them, and they are really hard to get right. Android developers had serious rigour and a lot of effort in Binder (their primary IPC system) yet it was both directly and indirectly used to break phones for many years — and probably still is.

Few IPC systems actually gets tested or treated as security boundaries, and eventually you get what in offensive security is called ‘lateral movement’.

This is the story of (*deep breath*) how the sandboxed but vulnerable GTK based image parser triggered by a D-Bus (IPC) activated indexer service on your mail attachment exposed RCE via a heap corruption exploited with a payload that proceeded to leverage one of the many Use-after-Frees in the compositor Wayland (IPC) implementation, where it gained persistence by dropping a second stage loader into dconf (IPC) that used PulseAudio (IPC) over a PipeWire (IPC) initiated SIP session to exfiltrate stolen data and receive further commands without your expensive NBAD/IDS or lazy blue team noticing — probably just another voice call.

In reality the scenario above will just be used to win some exotic CTF challenge or impress women. What will actually happen is that some uninteresting pip-installed python script dependency (or glorious crash collection script) just netcats $HOME/.ssh/id_rsa (that you just used everywhere didn’t you?) to a pastebin like service– but that’ll get fixed when everything is rewritten in Rust so stay calm and continue doing nothing.

The point of SHMIF is to have that one IPC system (omitting link to the tired xkcd strip, you know the one); not to end them all, or be gloriously flexible with versioned code-gen ready for another edition of ‘programming pearls’ — but to solve for the data transport, hardening, monitoring and sandboxing for only the flows necessary and sufficient for the desktop.

Least privilege parsing and scrubbing

Far from all memory safety vulnerabilities are created equal. The interesting subset is quite small, and somehow need to be reachable by aggressor controlled data. That tend to be ‘parsers’ for various protocols and document/image formats. If you don’t believe me, believe Sergey and Meredith (Science of Insecurity).

This can (and should be) leveraged. Even parsing the most insane of data formats (PDF) have fairly predictable system demands. With a little care they can do without any system calls at all after they have been buffered, and it is really hard to do anything from that point even with robust RCE.

This is where we return to the ‘decode’ frameserver — a known binary that any application can, and should, delegate parsing of non-native formats to. One which aggressively sandboxes the worst of offenders. With support from the IPC system that tunes the parsing and consumes the results, it also becomes analysis, collection and fuzzing harness in one. Leveraging the display server to improve debugging.

Someone slightly more mischievous can then, run these delegates on single-purpose devices that network boot and reset to a steady state on power-cycle. Let them consume the hand-crafted and targeted phishing expedition, remote-forward a debugger as a crash collector to a team that extract and reverse exploit chain and payload, replicate-inject into a few honey pots with some prices for the threat actor to enjoy. This clip from pipeworld looks surprisingly much like part of that scenario, no?

Really quick ‘gdb attach -p’

Many data formats are becoming apt at embedded compromising metadata. Most know about EXIF today, fewer are aware just how much can be shoved into XMP – where you can find such delicacies as metadata on your motion (gait), or tracking images hidden inside the image as a base64 encoded jpeg inside the xmp block of a jpeg. A good rule of thumb is to never let anything touched by Adobe near your loved ones. If you would manage to systematically strip the one, something new is bound to pop up elsewhere.

By splitting importing (afsrv_decode) and (afsrv_encode) to very distinct tasks with a human observable representation and scriptable intermediary model the transition from the one to the other — you also naturally get designs that lets you define other metadata to encode. If that then is what gets forwarded and uploaded to whatever “information parasite” (social media as some tend to call it) that pretends to strip it away, but really collects it for future modelling, starts to trust it, well shucks, that degrades the value of the signal/selector. The point is not to “win”, the point is to make this kind of creepiness cost more than what you are worth so some are incentivised to try and make a more honest living.

Compartment per device.

With A12 comes the capability to transition back and forth between device-bound desktop computing and several networked forms. This opens up for a mentality of ‘one task well’ per discrete (‘galvanically isolated’) device, practically the strongest form of compartmenting risk that we can reasonably afford.

The best client to compartment on ‘throwaway devices’ is the web browser. The browser has dug its claws so deep into the flesh of the OS and hardware itself, and expose so much of it to web applications that the distance between that and downloading and running a standalone binary is tiny and getting shorter by the day — we just collectively managed to design a binary format that is somehow worse than ELF, a true feat if there ever was one.

The browser offer ample opportunity for persistence and lateral movement, yet itself aggregates so much sensitive and useful information that you rarely need to seek elsewhere on the system.

In these cases lateral movement as covered before is less interesting. Enough ‘gold’ exists within the browser processes that it is a comfortable target for your disk-less ephemeral process parasite to sit and scrape credentials and proxy through; ‘smash and grab’ as the kids say.

There is generally an overemphasis on memory safety to the point that it becomes the proverbial ‘finger pointing towards the moon’ and you miss out on all the other heavenly glory. There are enough great and fun vulnerabilities that require little of the contortionist practices of exploiting memory corruptions, and a few have been referenced already.

A category that has not been mentioned yet is micro-architectural attacks — one reason why the same piece of hardware is getting incrementally slower these days. You might have heard about these attacks through names indicative of movie villains and vaguely sexual sounding positions, e.g. SPECTRE and ROWHAMMER. Judging by various errata sections between CPU microcode revisions alone, there is a lingering scent in the air that we are far away from the end of this interesting journey.

Instead of handicapping ourselves further, assume that ‘process separation’ and similar forms of whole-system virtualization is useful for resiliency and for compatibility still, but not a fair security mechanism; sorry docker. Instead, split up the worst offenders over multiple devices that, again, are wiped and replaced on a regular basis. You now cost enough to exploit that a thug with a wrench is the cheaper solution.

At this juncture, we might as well also make it easier to extract state (snapshot/serialize) and re-inject into another instance of the same software on another device (restore/deserialize). In the end, it is a prerequisite for making the workflow transparent and quick enough that spinning up a new ephemeral browser tab should be near instant.

Another gain is that you reduce the amount of accidental state that accumulates, and you get the ability to inspect what that state entails. This is the story about how your filthy habits built up inside and between the tabs that you, for some reason, refuse to close. Think of the poor forensics examiner that has to sift through it — toss more of your data.

Anything that provides data, should be able to transition to producing noise.

Consider the microphone screenshot from earlier, or ‘screen’ sharing for that matter. What value does it have to your actions over abstracting away from the device?. Having a design language that is ‘provide what you want to share’ might look similar enough to a browser asking for permission to use your microphone or record your desktop — but there is quite some benefit to basic the user interaction to work at this other level of abstraction.

Some are strictly practically, like getting the user to think of ‘what’ to ‘present’ rather than to try and clean the desktop from any accidentally compromising material. Being explicit about source makes it much less likely that the iMessage notification popups from your loved one showing something embarrassing will appear in the middle of a zoom call with upper management.

By decoupling the media stream from the source (again, fsrv_decode and fsrv_encode), there is little stopping you from switching out or modifying the stream as it progress. While it can be used for cutesy effects such as adding googly eyes to everything without the application being directly the wiser — it also permits a semantically valid stream to slowly be morphed to and from a synthetic one.

With style transferring GANs improving, as well as adversarial machine learning for that matter, AI will no doubt push beyond that creepy point where AI synthetic versions of you will plausibly pass the Turing test to any colleagues and loved ones. This also implies that your cyber relations or verbal faux pas will be more plausibly deniable. You can end a conversation, and let the AI keep it going for a while. Let a few months pass and not even you will remember what you actually said. Taint the message history by inserting GPT3 stories in between. Ideally the cost for separating truth from nature’s lies will cost as much as dear old Science.

This is a building block for ‘Offensive Privacy’ — Just stay away from VR.

Posted in Uncategorized | Leave a comment

Introducing Pipeworld: Spreadsheet Dataflow Computing

Now for something completely different. In the spiritual vein of One Night in Rio: Vacation photos from Plan9 and AWK for multimedia, here is a tool that is the link that ties almost all the projects within the Arcan umbrella together into one – and one we have been building towards for a depressing number of years and tens of thousands of hours.

Pipeworld (github link) combines ‘dataflow‘ programming (like Excel or Userland) with a Zooming-Tiling User Interface (ZUI). It builds dynamic pipelines similarly to (Ultimate Plumber), and leverages most of the gamut of Arcan features — from terminal emulator liberated CLIs and TUIs to dynamic network transparency. It follows many of the principles for a Diverging Desktop Future, particularly towards the idea of making clients simpler and more composable by focusing on interactive “breadboarding” data exchange.

There is a whole lot of ground to cover, so let’s get started.

The following is a video (youtube) that joins all the clips in this article together, but you will likely understand more of what is going on by reading through the sections.

Shortlinks to the individual sections are as follows:

Zoomable (Tiling) UI

The core is based around ‘cells’ of various types. These naturally tile in two dimensions as rows and columns. The first cell on each row determines the default behaviour of the row, and moving selection around will ‘pan’ the view to fit the current cell.

Each row has a scale factor, and the same goes for the cell. This means that different sets of cells can have different sizes (‘heterogenous zooming’) with different post-processing based on magnification and so on. Some cells can switch contents based on magnification, while others stay blissfully ignorant.

In the following short clip you see some of this in play, using a combination of keybindings as well as mouse gestures. This might make some feel a bit nauseous as this is not something that generalises well to groups of observers (we have solutions for that too) — but it is a different effect when it is your interaction that initiates and control the “zoom”.

Tiling-Zoomable Window Management

The heterogeneous zooming allows for a large number of cells to be visible at the same time. Even when scaled down, client state such as notifications and alerts can still be communicated through decoration colours and animations.

Befitting of tiling workflows, everything can be keyboard controlled. Just like in Durden, mouse gestures, popup menus, keyboards and exotic input devices simply map to writes into file-system like paths.

When- or why- would tens to hundreds of simultaneous windows be useful? Some examples from my day to day would be monitoring, controlling and automating virtual machines, botnets remote shell connections, video surveillance, ticketing systems and so on.

Dataflow Computing and the Expression Cell

Each cell has a type – with the default one being ‘expression’, where you type in expressions in a minimalistic programming language. The result of that expression mutates the type- or the contents- of the cell.

Expressions are processed differently based on which scope they are operating in – with the current scopes being:

  • Expression – Arithmetic, string processing and other ‘basic types’ operations.
  • Cell – Used for event handlers, changing, annotating or otherwise modifying visual or window management behaviour of the current cell.
  • System – Global configuration, shutdown, idle timers and so on.
  • Factory – Producing new cells or mutating existing ones.

In the following clip you can see some basic arithmetic; changing numeric representation; string processing functions and the ‘cell’ scope as a popup to tag a cell with a reference identifier which is then used in another expression.

Expression cells used for basic arithmetic, number and text processing

In a sense this is 3 different kinds of command-line interfaces wrapped into one. These can modify, import from- and export to- cells of other types. This is where things gets more interesting and powerful.

The following video is an example where I first copy the contents of a file to the clipboard of a terminal (the somewhat unknown OSC;52 sequence). I then create an expression cell where I set its contents to a text message. I then start vim in the first terminal and then run the cell-scope expression:

type_keyboard(concat(a1.clipboard, b1), “fast”)

This reads as ‘send the clipboard content of the A1 cell and the content A2 as simulated keypresses into the currently selected cell using a moderately fast typing speed model. From the perspective of vim itself, this looks just like someone typing on a keyboard.

Simulated keypresses using a function composing the clipboard of a cell and the contents of another

There are many more cell types than just terminals, command lines and expressions. Video playback; capture devices; images; native Arcan clients, such as our Qemu backend or Xorg fork and many more. In this clip you can see some of those – including Pipeworld running pipeworld. You can also see the current state of autocompletion and expression helper.

In this video I first open a capture device (web camera). I then spawn a terminal and copy the contents of a shader (GPU processing program) to its clipboard. This gets compiled and assigned the name ‘red‘. Lastly I sample a frame from the capture device using this shader.

Sample and filter a frame from a capture device using a freshly compiled, all organic, shader.

What all of this means practically is that we can gather measurements, trigger inputs and stitch audio, video and generic data streams from clients together in a fashion that more resembles breadboards as used in electronics, and sequencers as used in sound production, than traditional desktop or terminal work.

Terminal, CLI and Pipelines

Lets go back to the command-line for a bit, as we have yet to poke at pipelines. In this clip I run the system scope function: pipe(find /usr”, “grep –line-buffered share”)

Building a pipeline of shell commands

A lot of magic is happening inside of this one. Each client is spawned separately and suspended. Then their runtime stdin/stdout pipes gets remapped based on the current pipeline before the chain is activated. Note that when I reset the cell representing the “find /usr” cell, the grep one remains intact and unaware that find was actually killed off and re-executed.

The API is lacking somewhat still, but technically there is nothing blocking the ‘pipes’ from being a mix between terminal emulators (isatty() == true), stdin/stdout redirected normal processes (isatty() == false) and fully graphical ones.

Typing # into an expression cell gives us a dusty old terminal emulator. We can also add some command to run afterwards, like #find /tmp. ‘Resetting’ a cell means re-evaluating the expression or command it represents.

In the following clip I first list the contents of /tmp and then setup a timer that will reset the cell ~every second. This is indicated by the coloured flash. Then I spawn a normal terminal and create a file in tmp, and you can see it appearing. To show that the ‘terminal’ behind the first command is still alive, I also swap out its colour scheme and have it resize to fit its contents.

Per-cell reset timer repeating a single terminal command

Typing ! into an expression cell switches it into a CLI one. It uses a special mode in afsrv_terminal (the terminal that comes with Arcan) where no terminal protocols are emulated, no ‘pty devices’ are needed, and the CLI is native. As such we are no longer restricted to running a shell like bash that hides (“premature composition”) the identity, state and inputs/outputs of its children and commands.

Non-vt100 command-line with cell controlling new cell creation

Note that each discrete command becomes its own window, and the cell itself dictates the layout and scale of new and old command outputs.

Individual commands can be re-executed, clients run simultaneously and states such as clipboards are kept separate. Clients can switch to being graphical or embed media elements; they can announce their default keybindings; handle dynamic open and save from cut/paste and drag/drop; runtime redirect stdio; they can spawn sub-windows that attach on the same logical row and much more.

Advanced Networking, Sharing and Debugging

In the following clip you can see how quickly I go from a cell with external content and a debugger attached to the process associated with it, with lots of other options for process manipulation and inspection. Protections like YAMA are kept enabled, yet there is no sudo gdb -p mess going on. To learn more about this, see the article on Leveraging the Display Server to improve Debugging.

Lets go deeper. In the following clip I have set ‘arcan-dbgcapture’ as the kernel ‘core_pattern’ handler. Whenever a process crashes, the collector grabs crash information and the underlying executable. These gets wrapped around a Textual-UI with some options of what to do with the information. I then add a ‘listen’ cell that exposes a connection point for this TUI (‘crash’ in the video). Anything that connects to this point gets added to the row that this cell controls. To show it off, I run a simple binary that just crashes outright a few times, and you can see how the collections appear.

Since it is built using arcan-shmif, these connection points can be routed over a network – so swapping that ‘crash’ point out with something like a12://my.machine, embed into your base VM image, distribute to your fuzzing cluster or CI and enjoy network transparent debugging.

Moving on to the networking path and streaming/sharing. The following clip shows me migrating a terminal and qemu cell from a workstation to my laptop via the A12 network protocol. Note that the cell colours and font size change automatically as it is the target a client is presenting on that controls the current look and feel.

(The slight delay for the qemu window is a bug in the qemu Arcan display backend that does not properly re-submit a frame on migration, so it does not appear until the next time the guest updated its output).

Almost all cells can have an ‘encoder’ attached to them. This is simply a one-way composition transform that will convert the contents to some other format. A very practical example is recording to a video file or RTMP host, or something more obscure like OCR for pixels to text conversion.

In the following clip I set a cell encoder that act as a VNC server and then connect to that using the built-in VNC client that is part of Mac OS X. I then destroy the encoder and assigns a new one to a terminal. The OS X VNC viewer reconnects automatically, and you can see that input also works over the return path.

A VNC cell encoder exposing the cell contents as a VNC server

A cell type that blends well with this is ‘composition’ which puts the contents of multiple cells together using some layouting algorithm. The following clip shows that being used.

Composition cell used to stitch together multiple different sources

Attach an encoder to the composition cell and you have compartmented partial desktop sharing, another potent building block for interesting collaboration tools.

Trajectory and Future

Pipeworld will join Safespaces in acting as the main requirement ‘driver’ in improving Arcan and evolving its set of features, while Durden takes the backseat and moves more towards stabilisation.

These projects are not entirely disjunct. Pipewold has been written in such a way that the dataflow and window management can be integrated as tools in these two other environments so that you can mix and match – have Pipeworld be a pulldown HUD in Durden or 360 degrees programmable layers in Safespaces with 3D data actually looking that way.

The analysis and statistics tools that are part of Senseye will join in here, along with other security/reverse engineering projects I have around here.

Accessibility will be one major target for this project. The zoomable nature helps a bit, but much more interesting is the data-oriented workflow; with it comes the ability to logically address / route and treat clients as multi-representation interactive ‘data sources’ with typed inputs and outputs rather than mere opaque box-trees with prematurely composed (mixed contents) pixels and rigid ‘drag and drop’ as main data exchange.

Another major target is collaboration. Since we can dynamically redirect, splice, compose and transform clients in a network friendly way, new collaboration tools emerge organically from the pieces that are already present.

Where we need much more work is at the edges of client and device compatibility, i.e. modify the bridge tools to provide translations to non-native clients. A direct and simple example is taking our Xorg fork, Xarcan, and intercept ‘screen reading’ requests and substitute for whatever we route to it at the moment – as well as exposing composed cell output as capture devices over v4l2-loopback and so on.

I can editorialise about this for hours, and although the clips here show some of what is already in here, there is so much more already existing and — much more to be done.

Posted in Uncategorized | 2 Comments

Durden 0.6 Released

Hot on the heels of the recent Arcan release, it is also time for a release to our reference desktop environment ‘Durden‘.

To refresh memory somewhat, the closest valid comparison is probably the venerable AwesomeWM – but there is quite a lot more to it in Durden. It has been my daily driver for around 5 years now and implements all popular window management styles, and some unique ones.

During this time, it has been used to drive development of Arcan itself — but that will start to wind down now as there is little need for more major features. Instead updates will mainly be improvements to the existing ones before we can safely go 1.0. The two other projects, Safespaces and [Undisclosed], will take its place in helping the engine reach new heights.

Durden has an unusual way of putting a desktop together where everything is structured as a file-system and reconfigurable at run-time with results immediately visible:


This filesystem can be accessed through build-in HUD and popup menus, as well as externally controlled through a unix socket. Through the arcan-cfgfs tool it can even be mounted as a FUSE-file system.

All higher level UI primitives — no matter if it is decoration buttons, keybindings, statusbar and so on are simply references paths in this file system. There is over 600 such paths at the moment.

You can easily extend or ‘slim it down’ by adding or removing ‘tools’ scripts and ‘widgets’ scripts (for the HUD).

We are at the point where many traditional X window managers styles can be transferred through scripts that translate your old dot files into paths that can be run as schemes (atomic set of paths).

If you are interested in helping out with developing such translation scripts, get in contact.

Here is a link to the full Changelog. The rest of the post will describe some of the major changes but since most of them are not particularly visual, videos will be used more sparingly this time around.

Core / Input

Shutdown has received a new important option, ‘silent’. This will still cause a shutdown of Durden but the clients will treat it as if the connection has crashed and enter a sleep-reconnect cycle. Whenever you start Durden again, the clients should come back as if nothing happened.

Touch – This layer has been refactored to make it much easier to add or modify classifiers. New classifiers have been added for (relative mouse, touch-fit and touch-scaled).

More controls have been added to the touchscreen and trackpad input handler through /global/input/touch for runtime tuning, but many more controls still exist as part of the device profiles (see devmaps/touch/…). Touch device profiles have received more options and bindable gestures, as well as different handling for ‘enter n-finger drag’ versus ‘step n-finger drag’.

Rotary – Through /global/input/rotary and devmaps/rotary/… you can now control binding and mapping for devices like the Surface Dial and GriffinPower Mate. These are great alternatives to the mouse wheel for actions like scrolling. This tool also has basic experimental support for 3D mice, though those are much more complex.

New Tools

As always, we have some new tools:

streamdeck – This is a complex one. There is a separate article on the inner workings of it (Interfacing with a ‘Stream Deck’ Device). This video demos it:

showing off a stream deck working with Durden

This is not strictly limited to this narrow class of devices. The tool can be used to provide any display+input device pair to run as a ‘minimap’ screen, or integrate with wm- defined properties like titlebar buttons (in the video), client announced custom keys, workspace and window miniatures, custom bindings and so on.

Popup – This tool allows you to spawn either a custom menu (defined in the devmaps/menus folder) or any subtree of the normal file-system as a popup. Either tied to another UI component or relative to the current known cursor position. The video below shows how they are mapped to parts of the statusbar and titlebar.

Todo – This is a simple task tracker / helper that can be found under /global/tools/todo. You set a task group, add some tasks with a shortname and description and activate. A task is picked at random and appears as a statusbar button. Click on it to postpone or mark for completion and you get a new one, and so on. The best use of this tool is integration with your other ticketing systems if you, like me, work on many projects and sometimes have a tough time choosing what to focus on.

Tracing – This is simply a debug assistant that will merge the engine tracing facility in Arcan with all the logging going on in Durden into one coherent timeline. Saves to the about:// trace format in Chrome or used through a converter with the much better Tracy.

Extbtn – The ‘external button’ tool was also covered in part by another article (Another low-level Arcan client: A tray icon handler). It allows you to map external clients as icon providers into the statusbar tray. This is also a way to mix the status bar contents with multiple external ‘bar-tools’. The tool inside the Arcan source repository has some helper scripts for doing so using the lemon-bar protocol.

This video is from that article, showing off attaching a terminal emulator as well as Xorg with wmaker as a tray icon popup.

Autostart –This tool allows you to define, view and modify paths that will run on startup or reset.

UI Components

HUD – The built-in HUD file browser now triggers on clients (on user input) requests universal save/load, and sorting / searching received a ‘fuzzy matching’ mode (thanks to Cipharus). By typing ‘%’ into the HUD command line you can switch sorting mode, where fuzzy_relevance is a new entry.

Colour-Picker Widget – HUD settings that request an input colour, such as setting the background to a fix colour, will now popup several different palettes or reference image to pick from. In the video below you can see it being used to set a single colour wallpaper.

Statusbar – The popup tool now gets mapped to custom menus for workspace- switching buttons when right-clicking. This also applies to window titlebar. Display buttons are now dynamically added on display hotplug for quicker switching or migrating windows by drag-and-drop.

If window titlebars are marked as hidden due to client-side decorations, or default- setting them to off, they can now be set to ‘merge’ into the center-fill area of the statusbar instead – saving precious vertical space.

Window Management

Mouse controls have been added to the tiling window management modes. This means that you can re-organise / swap / migrate and so on by drag-and-dropping, with a live preview as to where in the hierarchy they will attach. The video below shows that in action:

A ‘column-‘ tab mode has been added that dedicates a side column for each ‘window as tab’.

The previous ‘CLI group’ mode of handling new connections have been re-written in favour of a new swallowing mode (where a window can share the hierarchy spot of another, swapable). It is more robust than the previous tactic, but currently more limited when dealing with multiple clients.

The option for controlling which workspace new clients spawn on has also been added, along with the (tiling) option of having subwindows spawn as children to their parent window.

Visual / Accessibility

There is now a shared core for caching, loading and generating non-textual icons. Before all icons were picked from a font file, but can now be picked from pre-rastered sources or shaders, and then resampled for the target display output density it is being used on.

The global/displays/<display or current> path has received a ‘zoom’ group that allows you to set or step per-screen magnification around a specific region or relative to the cursor.

Multiple UI components now have optional shadow controls. /global/settings/visual/shadow allows for enabling/disabling the effect, adjusting size, colour and intensity and so on.

The flair – tool has received an effect category for window selection, with a ‘shake’ or ‘flash’ option included.

An ‘invert light’ (color preserving) shader has been added to the collection that can be applied per window, see target/video/shaders.

Posted in Uncategorized | Leave a comment

Arcan versus Xorg: Feature parity and Beyond

This is the follow-up to the ‘Arcan versus Xorg: approaching feature parity’ article which is recommended reading if you have not done so already. 

After that article, there was only one (and a half) real feature left to safely claim parity and that can be covered rather quickly. Thereafter we can nibble on the bites that are in Arcan, but not in Xorg — the reason for the difference in scope is best saved for a different time, although it is a good one.

First, let us not forget that there are more vectors for qualities that are significant to users than just features. Client compatibility is something that has been much lower on the list of priorities, yet is an important quality.

The reason is that prematurely adding support for something like a new display server backend to a toolkit, game engine or windowing library without both necessary and sufficient features in place will lead to a scattered actual feature set. There will be theoretical features, and then the features some clients actually might use some version or interpretation of. These two sets will slip further and further apart unless each affected project has exceptionally alert developers, and the reference implementation having basic hygiene in place regarding conformance verification and validation tools.

We have experimented with such implementations, of course, but that was mainly to test the design and current implementation; to find out where our APIs gets annoying to use, so that nobody else suffers.

The other reason for downplaying compatibility is that client applications has strong biases on how things are “supposed” to work, with the worst offenders being just ‘GUI toolkits’. Small changes can have large impact on application design and you are forced to adapt a certain worldview that lessens your perspective. As a trivial thought exercise, just consider something like a smartphone in a world where touch input was not invented, but eye tracking was flawless.

To re-emphasise the point of the Arcan project as such. It is not, and never was, to reinvent X. The principle program (12 principles for a diverging desktop future) say as much. The project is about showing that there were other paths not taken by the big players, and hint at what those paths offered when followed to their logical conclusion.

Most of the lessons in here are not amazing discoveries from diligent research. Rather, they are uncovering wisdom hidden inside the circuits of elderly arcade machines. It is the reason for header-bar image here still being a cropped PCB from that era.

Enough editorialising, towards the Xorgcism! This will again be painfully long and dry. Appy-polly-loggies in advance for that. Quicknav links are as follows:

Network Transparency

The main missing feature was about network support. Luckily I do not have to cover that again — there are already two articles on the topic: one that digs into the situation in Xorg; one that covers our solution. If there is anything left to re-emphasise, it would be that the networking part is a completely separate process that reinterprets the networked client data.

There are three reasons for this – this first is that optimal local rendering and optimal remote rendering are incompatible ideas that were historically similar enough that the discrepancy could be tolerated. Nowadays we want both and that takes a different architecture where everyone involved adapts to which one of the two that is currently desired.

The second is for security; the client IPC is a privilege boundary. Network access is yet another privilege boundary. Mixing multiple privilege boundaries within the same process practically merges the two into one in this case, greatly reducing security. Even if you have a separate network process, should all it is doing be proxy-patching packets, you gain very little actual protection. This is because your ‘network process’ is just yet another router or proxy on the way to the real parser in the privileged process, and the only primitive often needed (writing to a file-descriptor) is hard or impossible to ‘sandbox’.

The third is for resilience. Recall the lengths we have previously gone to in order to have crash resilience. There has been much more work done in that direction since the linked article, and one of the final stages is to make sure client state can gracefully survive GPU loss and, should the circumstances permit, machine loss. These things do not work cleanly as an after-thought – the consequences permeate everything.

Drawing API

The other feature left was the fabled drawing API – to which I would also consider font management — as that lies somewhere in the middle; fonts were already covered in the previous article as part of the section on ‘mixed-DPI’ drawing.

The thing is, Arcan started with a drawing API – initially for specifying interactive animated presentations, documents and some ehrm, darker things of which only beer and cocktails in shady bars might, briefly, bring back to the surface. The display server angle turned into a detour and necessary evil after certain actors could not ‘leave well-enough alone’. All the projects shown here over the years have been using this API and been driving its development in lieu of a requirements specification / waterfall like design phase.

Anyhow, the ‘animated presentations’ bit brought in all kinds of input devices, as well as audio and video. Video parsing and decoding is a finicky process at the best of times, and the typical libraries for this were obvious exploitation vectors. This meant that process separation with least-privilege was clearly needed. With that you get a realtime/media IPC system, so clean that up, add buttons and there is your basic display server.

As a side-note: Having a ‘drawing API’ on the consumer side is not hard per se and, in fact, is the strategy that won, with local raster being a distant fallback. If you want accelerated rendering through OpenGL or Vulkan, it ends up being command buffers (hey, draw commands!) sent to the GPU that will actually turn it into a signal and/or pixels – the drawing won’t happen in-process through the grace of putting pixels in a buffer. You could add non-accelerated drawing to Wayland with a one line addition to wl_shm defining a buffer format as SVG, wouldn’t that be interesting? ( ͡° ͜ʖ ͡°)

J’accuse! – I hear you zoladats triumphantly proclaim. That is not for the clients to use now is it. So lwa-ts me explain the two sides to this story. A longer version with charts and code and everything can be found in the article on AWK for Multimedia.

The big bit about a drawing API is the level of abstraction it provides. If it is at the low level of “putPixel()”, you lose — the framebuffer is dead, long live the framebuffer. If it is about “bind set of uniforms, buffers, gpu program and supply commands” you get what the high end graphics libraries already provide. The real challenge is finding the right set of mid-level functions that provide something more without getting morbidly obese to the scale of whatever flavour ‘absorb all’ UI toolkit that is in fashion these days; to understand how, where and when the JS/HTML5/CSS/… soup fail.

Thus, a network friendly, compact, mid-level drawing++ related API is the target. One that can be used out-of-process and as an “intermediate language” (think LLVM-IR) for higher level UI components or document interpreters to render to, while at the same time having the feature set expected of ‘web-app’ like targets.

The “++” represents the process control, audio and meta-IPC routing that is also necessary to cover the span of what is expected of ‘a desktop’. The processing of these commands translate to another intermediate layer, ‘AGP’ which is a simplified subset of OpenGL that was needed as a decoupling mechanism to eventually switch to-/provide- Vulkan and Software rasterisation. It aims at at a network friendly level at about the quality of DOOM3/WebGL, not great, but thereafter video compression starts to win out.

The following figure illustrates the different ways of running Arcan to cover all combinations of ‘drawing API’:

So Arcan can be a client to itself, although it is a separate binary (arcan_lwa – (Lightweight-application) marked as ArcanLW in the figure.

ArcanLW will connect to Arcan ‘as a display server’ and act like a normal client. The side benefit here is that if you know how to write an app with the one, you can write or patch the window manager in the other. You get one API to control, manipulate or inspect all the routing for the entirety of your desktop. It privileges separates by design, it has been stable for years.

The last piece to this puzzle is to be able to provide RPC like access to this API. Here is why Lua matters more.

For those not in the know, exposing a C API to the Lua VM is about as simple as it gets and it fits other low level calling conventions; register the symbol name and function pointer of the function in the VM namespace; pop arguments from the stack based on the desired types; process; push the function return to the stack.

It can get more involved than this, but not by much. That actually doubles as a protocol decoder, thus it can be used as a decoupling mechanism. Define a bytecode and there is your \out of process\ drawing/WM API to bind to other languages.

Migration and Recovery

While we did brush upon crash recovery in the previous post, as well as a previous article on the subject – there is still a lot to it. Enough work has been done since to warrant yet another article.

People aware of some GTK history knows that it actually has the underused ability to jump between X Displays. The ‘underused’ part can probably be blamed on the, ehrm, ergonomics of spinning up and maintaining multiple X servers to jump between. Politely put, it was not for the faint of heart.

It is a good idea in principle, although there is not much benefit to do it on the toolkit level versus at the IPC level and, as it turns out, much easier — the feature will always apply regardless if the toolkit had enough foresight and resources to implement it- or not.

In X, if a window manager crashes — it is not that big of a deal since a new window-manager-as-client can reconnect and start rebuilding its view of the world. In reality it is rather janky and there are possible left-overs when the window manager additions to the scene graph in Xorg was of a more complex nature.

To learn from this, we come back to resource allocation in Arcan: Each clients gets a primary segment (window++) and then dynamically renegotiates for more (with rejection being the default). The window manager can kill any and all secondary allocations and ‘reset’ the primary — forcing it to try and renegotiate new resources.

This means you have one guaranteed ‘window’ and the rest that can come and go. This is partly to allow for much crazier window management schemes, while at the same time not losing the more simple ones. A client developer gets a known and guaranteed set of available features to serve as a baseline, with sets of extras to add- or react- to.

The WM also gets access to a key/value store for tracking WM specific information about its clients. Both clients and WM gets GUIDs for repairing the clients to this information so things re-appear where expected. It also gets the ability to dynamically redirect clients to other display servers – as well as inform about alternate ones should the present one disappear.

The WM also gets a specific event entrypoint, ‘adopt’ which references clients of unknown origin. This allows controlled handover between WMs for live-switching only relevant clients, as well as a recovery mechanism for the WM to recovery from errors in itself.

This brings us to a handful of possible error scenarios and recoveries:

  1. Bad WM code → Lua virtual machine error handler → Engine Reloads WM (same WM or ‘fallback’ one) → Clients receives ‘RESET’ event → WM “adopt” entry-point called.
  2. Live-lock in WM →Watchdog process sends signal →(same as 1.)
  3. Crash in Engine → Client-side IPC library detects, connect loop (same or new address) →injects ‘RESET’ event.
  4. Live-lock in Engine → Watchdog process sends kill →3.

The extra twist is that the WM can ‘fake’ a crash for one or many clients, so that the procedure from 3 can be triggered on command. It can also send a new address for the client to connect to. This opens up for dynamically migrating clients between local and remote servers. The short clip below shows this going from local to remote:

While the internals for how all this works is quite complicated — both the WM and the client facing code is quite trivial.

Hook Scripts

A lesson to learn from XTEST, XRANDR and the tools that take advantage of these extensions- is that no matter how well you do your requirements engineering, people will disagree with your world view, and sometimes loudly so. Some just need their ‘hacks’ for the sake of it — the act of hacking and the feeling of agency it brings has value that should not be dismissed outright.

Is it possible to provide an outlet for such expressions without painting yourself into a design corner? The answer here are ‘hook scripts’. To understand how and why, we need something of a schematic.

This provides two possible paths for intercepting and manipulating WM behaviour. In the pre- stage the API that the WM sees can be intercepted and patched before the WM scripts themselves have a chance to run.

The other happens after the initialisation entry point but before the actual event loop starts. This allows you to intercept the event handlers themselves and both pre- and post- hook those.

When combined these let you ‘monkey patch’ any part of the WMs view of things, and work around the “stuff-you-don’t-like”. It can also be used to inject features that were not allowed as part of Arcan itself, for whatever reason.

A practical example is how to add external input drivers:

arcan -H hook/external_input.lua myapp &; ARCAN_CONNPATH=extio_1 my_input

This opens up a single-use connection point (extio_1) and additional ones (_2, _3, …) for each time the hook script is added. Then your external input driver is set to connect to that name, whereby ‘myapp’ sees the inputs as coming from the engine itself.

This is yet another point where the Lua tactic shows its strength — if hundreds of thousands of beginner- to intermediate- level developers managed to do it for World of Warcraft and lots of games thereafter; anyone who can juggle things as clumsy as bourne shell scripts around should be able to ‘scratch their itch’ with ease.

Alternate Representations and Accessibility

Now we are at the more unique stuff. In most other display systems, a client performs some version of the following:

  1. Allocate a window.
  2. Tie it to a pixel buffer.
  3. Signal when it is ready to be used.

Add the ability to control parts of its position and composition order and you are surprisingly close to done — the ‘odd man out’ is practically Wayland where even this is convoluted to the extreme.

Arcan has a different view on clients. While the clients can perform the sequence above, the WM can also tell the client to have a new window — and know if the client accepted it or not.

This was brushed upon in the article on ‘Leveraging the Display Server to improve debugging‘ — go there for the longer explanation. The short form is that if we add the notion of more abstract types to our windows, like ‘Debug’ or ‘Accessibility’ and so on – it implies asking the client ‘can you provide such a view of yourself’.

Thus there is no need for a separate ‘input-accessibility-bus’ or complexities of that nature — it is all about re-use of existing mechanisms to form new features.

The obvious normal case for a feature like this is ‘screen sharing’ – there is strictly no need for the client to be able to enumerate resources and displays in order to implement the sharing and collaborations features; it should not need to know how the sausage gets made.

The rest of the desktop is perfectly capable of providing the UI language to annoy a user for such details, or to automatically map and provide the contextually relevant subset. The role of the sharing client should be to receive and serve, not to define and control.

Application Launching and Handover allocation

Arcan can act as a launcher for applications. As such, it knows if an application dies, and will respond to that. This is done in order to improve security, as well as to be able to provide control flows that are common to gaming consoles, smart phones, set-top boxes and so on.

Most of this feature did not come from the strict idea of acting as a ‘launching frontend’ but as a side effect of wanting to be able to privilege-separate any parsing of untrusted contents. This practically means spawning new limited processes that can send the results back, and controlling their life-cycle.

With the launcher path, it means that a client does not ‘connect’ as an external source, but rather inherits its IPC connection as part of being run. In fact, locking down the ‘external connect’ path is a single ‘Hook Script’ that blocks out the ‘target_alloc(str)’ function.

What it affords us is client identity and authenticity. This is a hard problem to implement as an afterthought with the primitives that a POSIX-style userspace provides you, and ultimately it ends up failing spectacularly or juggling cryptographic primitives and failing subtly; the chain of trust is broken.

It also affords us a client-unique key-value store for the metadata config that is needed to ‘remember’ what some window related to a client did – across crashes, reboots and restarts. It also lets us inform our scheduler, resource allocation and the likes to treat these clients special, in order to make ‘fork-bomb’ like Denial of Service attacks much harder.

The twist to this is ‘handover allocation’. Clients themselves may want to repeat the cycle by spawning new processes, as part of the intended feature set, as well as privilege separation / sandboxing of its own. The alternative would be ‘premature composition’ where the client becomes a display server of its own, which is a bad idea.

The following figure illustrates both the ‘launcher’ path as well as the ‘handover’ form:

The user defines the set of allowed targets that ‘MyWM’ can execute (binary + arguments + metadata) through the ‘arcan_db’ tool and the ‘add_target’ command shown at the bottom of the figure.

MyWM refers to this by calling the launch_target builtin function, which sets resources up and spawns the new process with ‘mybin’ running based on what is in the database as ‘thething’. The code in mybin decides that it wants a new window to delegate something, say embedded privilege separated video playback. It requests a new window of the special handover type.

MyWM is friendly enough and ‘accepts’ by calling accept_target as part of the eventloop tied from the launch_target call. New primitives are allocated and pushed to mybin. Mybin launches the new program ‘somebin’ which inherits and opens the connection to Arcan the same way ‘mybin’ did. At the same time, mybin gets an abstract cookie that can be used to define and reposition ‘somebin’ within its own coordinate system and scene graph subgraph. X style reparenting, but without most of the chaos.

More details on some use of this feature can be found in the article on: ‘Another low-level Arcan application: Tray Icon Handler’

State Controls and Transfer

A (not) so big gamble on client compatibility is that for the future, we really need to improve how virtual machines (a class of applications that include ‘terminals’, emulators and ‘modern’ browsers) integrate with the grander system.

All the big actors are moving in this ominous direction as a preface to ‘clouding’ these virtual machines and then keeping things there.

One of the special needs of virtual machine class clients, is that of state controlssuspend, resume, snapshot and restore.

Arcan is ahead of the curve there since we used emulators as the reference model for the ‘needs’ of such a client from the start. They have been leveraged for testing performance, throughput and latency ever since – emulation has hard real-time requirements after all, and a whole bunch of free test cases in the shape of ‘Tool-Assisted Speedruns’. Another such thing is that ‘state controls’ are a natural thing in well-written emulators.

In the Arcan API, this boils down to trivial commands in the WM layer – something like:

bond_target(vid_client_a, vid_client_b)

Is literally enough to tell one client to pack up its state and forward it to client_b, and to tell client b to restore from the incoming state – even if client_b is networked.

This also means that the ‘sharing’ window feature can be combined with the ‘alternate representation’ feature to fake new devices inside of the virtual machine. Something like the following:

define_recordtarget(vid_client_a, {src1, src2}, more_controls_here)

Would create a composition (of src1, src2) and inject into (vid_client_a) that can then present it as a ‘camera’ or ‘screen sharing’ and so on.


This might come as a surprise to some, but audio and video processing are really quite similar. If you only do graphics and have not studied audio, or the other way around, you are missing out. It might take a course or two on ‘Digital Signal Processing’ or something to that effect to not hit any grand barriers, but that should be part of basic computing hygiene by now.

Not only do they have a large overlap in the theoretical foundation, but their internal structure from an operating system position does as well. Separating the two is something that has increased systemic complexity by a fair amount. Instead of using the same IPC primitives; transfer modes; device navigation; security; hotplug notification; output routing and redirection; we traditionally maintain two or more ecosystems for doing 90% of the same thing – even though the HDMI cables themselves tell us to stop.

What then follows is that a lot of applications need to combine the two in a coordinated and synchronised way. So instead of these two domains work over the same IPC system that has done the painfully heavy lifting, a third is added (Binder, D-Bus and so on), increasing systemic complexity by magnitudes.

While the feature itself is in a more primitive stage, much of it piggybacks from the work done on video and input latency – so when those stages are closer to complete the last bits of audio will get filled in as well. Again similarly to graphics where we can’t realistically aspire to be the next ‘Unreal Engine’, the role of audio support here is not to try replacing ‘pro’ audio but to fill out the ‘simple to middle’ level with coherent controls that meshes well with desktop style coordination – and wrap it in controls that can delegate to ‘pro’ audio consumers, much like we handle features like colour management.

This becomes especially important for the networked desktop use case, for accessibility (the ‘screen reader’) as well as for VR and AR experiences where positional audio suddenly becomes one of the most powerful tools available.

Exotic Input and VR

This is another chapter both for the future, for the present and for improving the past. The previous article mostly skimmed over the input model, and as with the other features — there is a lot of nuance to it.

The input model in Arcan covers both the expected basics (mouse, touch and ‘translated’ – i.e. keyboards) as well as game (‘assistive’) devices, eye trackers and similar sensors. This goes all the way to weird hybrid-I/O (see Interfacing with a stream deck). It is possible to safely and securely attach external input devices as shown in the section on hook scripts.

Two other “input” parts are client-announced input labels and VR.

Starting with the client-announced input labels – any client can announce a set of active input labels. These are short high level tags e.g. “Copy_Clipboard” with some kind of description in the current language settings, “Copy the word at the cursor”, some nature of its type (analog or digital) as well as a suggested default keybinding (Ctrl+C), if digital.

The WM can leverage this feature to provide a coherent binding interface, work around conflicts with existing bindings from other clients or the WM itself. Subsequently it can be used to implement your ‘hotkey’ manager (best done as a hook script) without overreaching and giving it access to anything other than what is strictly necessary.

The WM simply sets the appropriate label ‘tag’ to the next input event that gets routed to a client. This also meshes with other accessibility features, such as routing through a voice synthesizer to say that you pressed ‘copy clipboard’ rather than ctrl-down-C and then guessing whatever that meant in the context of the active application.

VR is an interesting example of a complex input model with a short window of opportunity; the ultimate avatar matches your every muscle at high sample rates.

This is a complex operation that aggregates sensors from multiple timing sources, including arrays of video cameras and laser projects. It does not mesh well with the traditional event loop structure as queues gets saturated instantly.

The span between ‘present’ to ‘possible’ inputs is also much wider – not everyone should need a full body suit, or have all fingers and toes present, and the sample streams themselves are sensitive as a number of strong biometrics can be extracted.

The following diagram tries to take a stab at explaining how the layers mesh together to provide VR composition and input management.

Akin to launch_target mentioned previously, there is a function for launch_avfeed available to the WM for launching the ‘decode’ frameserver. This provides a privilege separated producer process that provides a single input feed. Internally it works like any old client would, but since we have a well defined role and interface for it, the consequences for forcibly killing/restarting it are predictable.

It can be used to retrieve video feeds from webcams, and the file browser in Durden uses it for live previews of images and video. For VR, it allows us to access and map the many possible tracking cameras (inside out, outside in, hands, …) then compose/embed (AR), filter and perform computer vision to improve position tracking and so on.

For device control itself, we have yet another process/binary that is mapped to the function vr_setup. This one provides display metadata about the display panel and lens configuration, as well as hotplug events of ‘limbs’ (slots matched to the average human anatomy). The WM decides which ‘limbs’ map to which objects in 3d space, and how or if they should be forwarded to any clients.

The sample streams will not be processed as a normal input event, that would be too costly. Instead they continuously update designated locations in shared memory, along with timestamps and checksums – based on the currently set limb-map.

A client connects and identifies its primary segment as the L eye, the R eye or LR in a side-by-side configuration. For the L and R cases, a secondary segment is then allocated through the primary, to cover for the missing eye. It requests an extended mapping for VR data transfer (similarly to how colour management transfer its metadata back and forth).

A Path from Curses – TUI

On top of the mid-high level drawing API and the low-level client API we have in the client facing side of the ‘SHMIF’ IPC system, there is also a separate client API for text dominant clients. This API is built on top of SHMIF, and exposes a stable subset of its features.

The included terminal emulator, afsrv_terminal, is built using this API – and it has a basic alternate shell that completely skip terminal protocols and pseudo-terminals.

The API provides basic window segmentation and positioning that integrates with the window manager of the desktop, and similar integration features specifically for command line so that the WM can decide how to present and interpret completions and suggestions. It can trigger alerts and notifications, as well as announce inputs in a similar fashion as described in the ‘exotic input and VR’ category.

The output buffer format, tpack, has been mentioned before. It enables server-side rendering, both for efficient network remoting (since SHMIF is involved, the network transparency parts mentioned earlier applies). It is also for optimal sharing of font caches and resolved glyphs (and GPU texture atlases) transparently between clients, allowing for zero overdraw and minimal input to output latency.

Colors and palette can be defined and redefined by your window manager; TUI clients themselves then have the option of 24-bit colours that they pick themselves – but also the preferred option of semantic labels that map to the active palette. This means a way out of the terminal-legacy “Red is now Green” problem of specifying colours.

Content synchronisation is always explicit, so that no window should have tearing or draw output that will be replaced miliseconds later, or waste a vsync on an incomplete update.

All text is unicode and the clipboard has a side channel so large paste operations do not block or otherwise interfere with stdin/stdout. Locale (input/output) language can be provided, and a getopt- like replacement for input arguments that are packed over an environment variable in order to not complicate legacy argc/argv further.

Attention alerts can be sent, as well as notifications. There are controls for describing content length and byte-precise seeking, as well as job progress status.

Speaking of stdin/stdout/stderr – those can be redirected dynamically by the server. Additional input/output pipes can be dynamically announced and added. As part of that feature a TUI client can also announce support for input and output types, both as immediate requests or capability hints for universal open/save and file browser integration.

There are a few basic primitives to get started. Examples of such primitives are readline, buffer view (everyone deserves a hex editor) and list view, but the core will not go deeper than that. The focus, instead is on providing discoverable command line interfaces and one-off commands that focus on processing and providing data for data flow graphs instead of pipelines.

Posted in Uncategorized | 1 Comment

Arcan 0.6 – ‘M’ – Start Networking

This time around, the changes are big enough across the board that the sub-projects will get individual posts instead of being clumped together, and that will become a recurring theme as the progress cadence becomes less and less interlocked.

We also have a sister blog at that will slowly cover higher level design philosophy, rants and reasoning behind some of what is being done here. A few observant ones have pieced together the puzzle — but most have not.

This release is a thematic shift from low level graphics plumbing to the network transparency related code. We will still make and accept patches, changes and features to the lower video layers, of course — ‘Moby Blit’ is still out there — but focus will be elsewhere. Hopefully this will be one of the last time these massive releases make sense, and we can tick on a (bi-)monthly basis for a while.

As such, it is dedicated to the memory of a fellow explorer and dear friend, Morten, whose insights have been instrumental to the work here and elsewhere. A taste of the depths of such contributions are well illustrated by the ‘Farewell’ post over at Mamedev. Thanks for everything buddy :'(.

A big thanks to Ryan Gordon (and those who chipped in) for the Icculus Microgrant.

Release Time! The following video goes through the demonstrable highlights:

Here are the shortcuts to some of the key advancements:

If all of this isn’t enough, there is much more in the Changelog.

Network Transparency

This release brings us the first working version of ‘arcan-net’. This is an arcan-shmif server that translates clients running locally to another instance across the network. It has been in use for a while, but don’t trust it (yet) in tougher security compartments than you would VNC/RDB.

Not only that, but it is a lot more powerful than the corresponding form in X. It uses a protocol we are lovingly calling ‘A12’ — It is not the X12 some people wanted, but at least it is a twelve. It was also the last missing piece before being able to safely claim to be far beyond feature parity with Xorg, so expect the comparison article series to get its next instalment soon enough. With that it is also time to start chasing compatibility (client support) and performance over features.

There is so much ground to cover here that there is a separate article on the subject with more detailed coverage: A12 – Advancing Network Transparency on the Desktop

Roughly speaking, the features currently cover:

  • Single Client Forwarding (‘X11’-like), Desktop Forwarding (‘RFB/RDP/SPICE’-like)
  • Remote Input and Meta-data (‘Synergy’- style)
  • Audio, Video and Binary Transfers
  • Basic blob- caching
  • Lossless and Lossy video compression, defaults based on window types
  • Interactive adjustment of compression parameters
  • X25519 + Chacha8 + Blake3 for authenticated encryption
  • Live-client migration
  • Multi-threaded processing
  • Crude Privilege separation

This meshes really well with other arcan-shmif features, such as the one on external input drivers mentioned below. The video below shows a single nugget from how ‘transparent’ things can be, live migration, drag and drop:

This is yet another reason as to why ‘crash resilience‘ matters – if you solve the one, you get a lot more interesting things in return.

On-Demand Client Debugging

When the WM initiates window creation, it can now push windows of a debug type. Should the client cooperate, the first time that happens the client will still get a new window to provide its debug data through.

Should the client fail to comply, or the WM explicitly asks it, a debug interface built into the arcan-shmif library will now silently activate, allowing the horror movie scenario of ‘the call is coming from inside the house’ to unfold.

For better understanding the breadth and depth of the feature, please refer to the following article: Leveraging the Display Server to Improve Debugging

KMSCon/FBCon like Console

The default distribution now includes the ‘KMScon- like’ ‘console’ window manager, though it also accepts graphical clients, so if all you need is a “kiosk like” one-application runner per virtual ‘terminal’ (or the template for one), this fits that bill.

The initial form of this was described in the related article on: Writing a console replacement using Arcan.

it has since been extended some, getting support for an OSD keyboard and basic wayland client support as well via the script below. The video below shows it and the keyboard working on a Pinephone:

If you have an unrequited love for minimalism confused as simplicity (minimalism doesn’t have room for love, being simply so minimal), this is the tradeoff path that will get you there, eventually.

Builtins, Hook Script Chaining

There are multiple facilities for encouraging opt-in code reuse between applications and window managers, both for developers and advanced end users. These are provided in the ‘system-scripts’ namespace and consume the folder names ‘builtin’ and ‘hooks’. The idea is to let the heavier subprojects like durden act as incubators, and the implementations gets cleaned up and re-usable as such shared builtin scripts.

This time around, the builtins have been extended with:

  • builtin/decorator.lua – for adding borders and mouse controls for border manipulation to a surface
  • builtin/wayland.lua – for adding some of the special considerations that wayland clients require
  • builtin/osdkbd.lua – for building custom on-screen display keyboards

This means that the WM side of thing for adding Wayland and XWayland support to a project boils down to about 2/3rds of this script that is used for testing. API is stable and supported, has been for a while, will remain as such.

The engine facility for ‘hook scripts’ that are run transparently to the application itself have been improved to allow chaining multiple ones together – greatly improving their utility (actually make them useful).

Previously, these were limited to a single one, which was less than ideal. Now that multiple ones can be chained together, many features that can be implemented in a window manager agnostic way can be shared between all of them.

This is the safer, more secure and much more powerful way to get all the ‘xdotool/xrandr/…’ kind of hacks going without needlessly exposing them over some other IPC.

Some are primarily used for testing, like this auto-shutdown one:

arcan -H hooks/shutdown.lua durden shutdown=200

Regardless of the other scripts running, this will issue a shutdown sequence after 200 logical ticks on a 25Hz timer. Since these scripts can simulate and inject other tricky event paths, like display and input hotplug and playback, they are ideal for a window manager testing corpus as the rest of the system can run headless.

Others are more interesting, such as ‘hooks/external_input.lua’ which opens up dedicated single-client connection points that are allowed to inject input, and only that. This is also used to extend the engine with input drivers that would not be allowed into the main engine due to their dependencies or licenses.

This can even be combined with the network proxy mentioned above to allow, for instance, sharing of input devices from other machines (which would enable Synergy- like virtual KMS workflows) and proprietary drivers running inside of containers.

-- on each desktop participating
arcan -H hooks/external_input.lua durden
ARCAN_CONNPATH=extio_1 arcan-net -t -l 6680

-- remote machine
arcan console
ARCAN_CONNPATH=console ARCAN_ARG=host= afsrv_remoting

Eye Tracking and Stream Decks

One external driver example is the Tobii 4C eye tracker driver. Even though their hardware is cool, it is still a company that are about as 90ies terrible and backwards with their software ecosystem as can be; shrinkwrap licenses restricting allowed use cases of the sensor data (*facepalm*), closed source drivers, firmware clamped hardware capability and little in terms of serious competition. This is just depressing giving the potential eye tracking actually has outside of vile adtech. There used to be a driver available on their website; it was pulled in a move that can only be assumed to be about further fscking their customers over. A few searches should land you with repositories that mirrored the .debs.

A rough Arcan input driver as part of the repository here, and some docker container to reduce the taint here. Inject the blobs from the driver and experiment with eye tracking inside of Arcan. Other exotic input and LED output clients will also be exposed through that repository rather than the main one.

As covered in the article on ‘Interfacing with a stream deck device, there is also a support tool for ElGato “streamdeck” like devices added that lets you map window contents, ui controls, custom macros etc. unto the device, though the real meat of this features comes with the next release of Durden.

The following clip is from that feature, recorded in an AirBnB somewhere in Shinjuku:

You can see how server-side decorations are being leveraged to map the titlebar contents and controls to the display, mixed with window manager controls like workspace switching with live previews, as well as client-announced custom inputs.

XWayland Client Isolation

We give up. The Wayland protocol service has received support for XWayland clients.

For the last five years or so, the plan was to only support native Wayland clients. After all this time, it turns out that the X11 backends for various toolkits are still infinitely more reliable and robust through X than through native Wayland.

As with anything else here, we treat it, and the obligatory meta-window manager every Wayland Compositor has to have in order to work around the hackish way XWayland uses Wayland, as a separate process for the privsep space to be as tight as possible.

The best way of using this is for the individual X clients that you might need to run; browsers and games being the main suspects.

arcan-wayland -exec-x11 chromium

This will spawn an instance of the wayland protocol server side, which spawns the virtual window manager, which inherits a clipboard window and spawns an Xwayland instance along with chromium.

This means that each client gets their own infrastructure, working directory and so on. It consumes more memory and sometimes has a noticeable startup time (most of that being MESA), but enables much stronger separation between clients.

If that is undesired, it can, of course, be run as a normal service:

arcan-wayland -xwl
DISPLAY=:0 xterm

This would give you a global Wayland display that also provides XWayland on demand, and clients that expect to spawn multiple processes that connect at different times, to work.

This does not mean that Xarcan is defunct or will become be abandoned as the goals are quite different. In fact, Xarcan has recently been synched to changes in upstream Xorg. The Wayland path is currently to get the transparent rootless one, while Xarcan behaves more like Xephyr; ‘Virtual Machine’ like controls where you provide your own window manager and so on. Both modes have justified use cases.

That said, it is likely that rootless window support will be added to Xarcan as well to free legacy client support from the hands of Wayland entirely, as I am, quite frankly, so thoroughly disappointed in- and tired of- coding anything Wayland to the point that I’d much rather mess with Xorg internals than read more chapters from this scattered in the wind, half-java RMI love letter, half hand MS-COM+ anti-patterns handbook. Had the hours spent debugging this been billable, I’d buy a Tesla then return it for the money.

Speaking more of this second coming of TERMCAP – this round of “protocol” pokémon added: xdg_decoration, kwin_decoration, xdg_output, zwp_dma_buf, and probably some others that I forgot. This seems to now be enough to run the winner of ‘what the actual f are you doing?’ clients, Firefox – until a nightly update breaks it again, that is.

Synchronisation Conductor

Although it is not a very visible feature, it is still very important in its own right. In preparation for proper VFR/VRR (variable frame-rate/refresh rate), better biased client processing (two sides of the same coin when you factor in network transparency), a lot of the old synchronisation code has been consolidated and reworked into a ‘conductor’. The conductor exposes a uniform set of synchronisation strategies for the policy layer (WM) to switch between.

The strategies themselves will need a lot of tuning before they provide the best contextual balance and trade-off for gaming (minimise latency), processing (maximise throughput), visual fidelity (filters, colour processing, animation smoothness) and minimising energy consumption (mobility, cloud hosting). There is no ‘one size solution fits all’ here, and these are problems that in their own rights would consume a master level thesis, or three, on the subject.

It might help to think of it as analogue to the role of the heart in the human body. A single beat has far reaching consequences in a complex system. The ideal rate depends on what the body is supposed to do; running a marathon is different from a sprint and different from bedrest – yet all need to be supported. If you are academically inclined and actually want to dip into the fundamentals, finish a course or two on queue theory then contact me.

To this effect, there are API additions on both the high and the low end. For the WM side, we have added the ability to set a synchronisation focus target that gets preferential treatment so that when mapped to a variable- output and deadlines are closing in, we can try and bias the way it receives input and how the frames gets synchronised compared to other clients that are fighting for the same resources using differently staged releases for ‘the current focus’ and ‘the herd’ to avoid a twisted take on the sleeping barber problem.

On the low end, we can now continuously forward information about the next upcoming deadline on a per client basis, as well as provide different deadlines based on active strategy, segment type, window management state and client state (networked or local for instance).

With this feature we have also added much needed tracing features for most of our layers so that it is possible to on-demand collect a minimal overhead description of the sequence of events and their timings. There are so may combinations of asynchronous, threaded and non-blocking operations here across so many layers that measuring and troubleshooting anything performance related pretty much demands it. If in doubt when it comes to tracing and viewers, disregard any suggestions about webcrap like Google im-‘Perfetto’, and give the love it deserves, it is a beautiful piece of work.

New Tool: Arcan-trayicon

The list of tools has also been expanded with a convenience wrapper / launcher utility, ‘arcan-trayicon’ which lets you register two SVG icons (inactive and active) into some UI element in a supporting window manager. When clicked, the wrapped application is launched and attached as a popup to this icon. This combined with some bits from the next section should be all that is needed to wrap system services and network interface tools with a UI yet not bring any of that D-Bus stench with it.

This tool also got a write up of its own: Another low level arcan-client: A tray icon handler

TUI, Server-side Rendering, Widgets and non-VTx shell

The rich-text oriented client developer facing API, arcan-tui, has undergone serious backend refactoring. It can now render into a specialised text-encoding scheme (TPACK), and a server-side renderer for the same format has been added to Arcan.

This is the first step to drastically cut down on rendering costs and memory consumption as the glyph cache can now be generated and shared between otherwise independent windows of the same type, as well as using accelerated rendering without unnecessarily forcing the TUI clients to have GPU access. For tracking the feature, check the text rendering section in the wiki on TUI.

It also provides the server with a textual representation to work with for other features, such as lossless magnification and screen readers (see Text-To-Speech below).

It has also received more abstract UI widgets so that certain features gets a shared base. Some of these are visible in the debugging video that was shown earlier. This round we added three widgets (out of a planned total of four):

Bufferwnd – take a buffer and provide a read/write hex-/ascii-/unicode- viewer or editor interface to it.

Listwnd – take a list of items, tag separators, submenus, inactive and hidden items, and prevent it as a selectable list. This is a suitable base for more advanced components (tree view etc).

Readline – act as libreadline/linenoise to query the user for input. It still lacks some of the important bells and whistles to fully cover that use-case, and hopefully those holes will be plugged shortly.

The last planned component (on top of more features and quality work to the above) is Linewnd that will provide scrollback buffer like behaviour. This will eventually remove the last remnants of the libtsm ‘screen’ that we used to build things in the past, but are now slowing us down.

The pre-existing ‘screenshot’ copy widget hidden inside every arcan-tui window also got an added ‘annotation mode’ where it is possible to write and highlight parts of the copied surface.

Speaking of arcan-tui, there is also a prototype neovim UI built using it – it is roughly on par with current curses/terminal emulator based neovim, but will soon leverage many more of the features in arcan-tui, and will be used as a requirements ‘specification’ to make sure that we have not missed anything. It can be found in the repository here.

On a related note, the included terminal emulator, afsrv_terminal, has received a special mode (ARCAN_ARG=cli afsrv_terminal) that serves as a basis for terminal emulator liberated command-line shells. The video below shows some of the current state of that shell, running inside a window manager that is well, still part of my big pile of secrets.

Worthwhile to note is that the ‘new’ windows are negotiated by the shell, and based on what kind of a command that is to be executed – sets up the environment accordingly. Wayland routes through arcan-wayland and so on.

Text to Speech and other Decode improvements

To recap, the ‘decode’ frameserver is the special dedicated client that is intended to absorb all dangerous and likely insecure parsing from the main process to a ‘kill on sight, single role’ one that gets aggressively sandboxed.

It is used both for basic decoding and for asynch- live content previews in parts like the durden filesystem browser. It is also used in a ‘plan9 plumber’-like fashion for clients to hand over media presentation (both embedded and stand-alone).

When finished, no parser working on untrusted data should remain in the arcan process, no matter if Arcan is used as the outer display server or as a ‘browser/electron-lite’ like app-engine nested inside itself or as an X/Wayland client.

In this round, we have added support for text-to-speech and chiselled out basic 3D model transfers. The text to speech, combined with the existing OCR feature of the ‘encode’ frameserver, provides WMs like Durden the last needed missing primitives pieces for greatly improved accessibility.

These improvements also have their role in VR, with positional audio and HRTFs (still disabled in upstream builds) – notification sounds, longer text-to-speech passages and so on can, if carefully configured, be much more natural and comfortable than visual ‘widget’ like alerts when you can position the source ‘in the room’.

Platform Improvements

The way headless mode (no attached displays, ever) operates has been reworked as a separate binary with a slightly different setup. It will now output to an instance of the ‘afsrv_encode’ process, while also using it as input layer.

This allows us to run the whole thing on a server in a container and either record or stream its output, but also to serve a remote desktop or single app host:

ARCAN_VIDEO_ENCODE=protocol=vnc:pass=letmeoutiambeggingforhelp arcan_headless durden 

Of course, you can also switch that ugly ‘vnc’ for a12.

Combined with the dynamic part of the network transparency bits, you can have a headless server where your clients normally ‘live’, then connect to it and temporarily redirect individual clients to a machine of your liking.

As part of our ongoing effort in improving systemic resilience (the ability to recover from unforeseen events), the egl-dri (native graphics on Linux/BSDs) privileged separation part that does driver negotiation has received support for two levels of watchdog triggers.

The first will reset the scripting layer, protecting against bad scripts getting stuck in an infinite loop, while the second will trigger crash recovery for the engine process itself – trying to at least recover clients.

To complement this, and continue down the path towards proper non-cooperative, asymmetric multi-GPU support (runtime swapping graphics stack and driver versions and persisting across multiple graphics stacks at the same time, say GBM+KMS/EGLStreams), it is now possible to set a configuration where one GPU acts as a stand-in for another, and have the WM cooperatively trigger a swap. 

This requires both client and server to be able to completely tear down and rebuild accelerated graphics use. This also provides an easy path for testing that we do not leak state between rebuilds (define a GPU as a backup for itself and set a timer that randomly resets), which makes the last missing step — transferring individual clients and render jobs between being native to one or many different GPUs at once — approachable.

Posted in Uncategorized | 2 Comments

A12 – Advancing Network Transparency on the Desktop

This article is is the main course to the appetiser that was The X Network Transparency Myth (2018). In it, we will go through how the pieces in the Arcan ecosystem tie together to advance the idea of network transparency for the desktop and how it sets the stage for a fully networked desktop.

Some of the points worth recalling from the X article are:

  1. ‘transparency’ is evaluated from the perspective of the user; it is not even desirable for the underlying layers to be written so that they operate the same for local rendering as they would across a network. The local-optimal case is necessarily different from the remote one, the mechanisms are not the same and the differences will keep on growing organically with the advancement of hardware and display/rendering techniques.
  2. side-band protocols splitting up the desktop into multiple IPC systems for audio, meta, fonts, … increases the difficulty to succeed with anything close to a transparent experience, as the network layer needs to take all of these into consideration as well as trying to synchronise them.

To add a little to the first argument: it should also not be transparent to the window manager as some actions have drastically different impact on the user interface side to security and expectations. For example, Clipboard/DND locally is not (supposed to be) a complicated thing. When applied across a network, however, such things can degrade the experience for anything else. Other examples is that you want to block some sensitive inputs from being accidentally forwarded to a networked window and so on, it has happened in the past that the wrong sudo password has, indeed, been sent to the wrong ssh session.

This target has been worked on for a long time, as suggested by this part from the old demo from 2012/2013. Already back then the drag/slice to compose-transform-and-share case exposed out of compositor sharing and streaming; something that only now is appearing elsewhere in a comparably limited form.

We are on the third or fourth re-implementation of the idea, and the first one that is considered having a good enough of a design to commit to using and building upon. There are many fascinating nuances to this problem that only appear when you ‘try to go to 11’.

As per usual, parts of this post will be quite verbose and technical. Here are some shortcuts to jump around so that you don’t lose interest from details that seem irrelevant to you.


Starting with some short clips of the development progress – and then work through the tools and design needed to make this happen. It might be short, but there is a whole world of nuance and detail to it.

(~early 2019) – forced compression, OSX viewer, (bad) audio:

Composited Xarcan (desktop to pinephone), compression based on window type:

Here is a native arcan client with crypto, local GPU “hot-unplug” to software rendering handover and compression negotiation (h264):

Here is ‘server-side’ text rendering of text-only windows, font, style and size controlled by presenting device — client migrates back when window is closed:

In the videos, you can see (if you squint) instances of live migration between display servers over a network, with a few twists. For example, the decorations, input mapping, font preferences and other visuals change to match the machine that the client is currently presenting on and that audio also comes along, because Arcan does multimedia, not only video.

What is less visible is that the change in border colour, a security feature in Durden, is used to signify that the window comes from a networked source, a property that can also be used to filter sensitive actions. The neo-vim window in the video even goes so far as to have its text surfaces rendered server side, as its UI driver is written using our terminal-protocol liberated TUI API. This is also why the font changes; it is the device you present on that defines visuals and input response, not the device you run the program on.

Also note how the clients “jumps” back when the window is closed on the remote side; this is one of the many payoffs from having a systemic mindset when it comes to  ‘crash resilience‘ – the IPC system itself is designed in such a way that necessary state can be reconstructed and dynamic state is tracked and renegotiated when needed. The effect is that a client is forcefully detached from the current display server with the instruction of switching to another. The keystore (while a work in progress) allows you to define the conditions for when and how it jumps to which machines and picks keys accordingly.

That dynamic state is tracked and can be renegotiated as a ‘reset’ matters on the client level as well, the basic set of guaranteed features when a client opens a local connection roughly generalises between all imaginable window management styles. Those that are dynamically (re-) negotiated cannot be relied upon. So when a client is migrated to a user that has say, accessibility needs, or is in a VR environment, the appropriate extras gets added when the client connects there, and then removed when it moves somewhere else. This is an essential primitive for network transparency as a collaboration feature.

Basic Primitives: Arcan-net, SHMIF and A12

There are three building blocks in play here, a tool called arcan-net which combines the two others: A12 and SHMIF.

A12 is a ‘work in progress’ protocol – it’s not the X12 that some people called for, but it’s “a” twelve. It strives to be remote optimal – compression tactics based on connectivity, content type and context of use, deferred (presentation side) rendering with data-native representation when possible (pixel buffers as a last resort, not the default); support caching of common states such as fonts; handle cancellation of normally ‘atomic’ operations such as clipboard cut and paste and so on.

SHMIF is the IPC system and API used to work with most other parts of Arcan. It is designed to be locally optimal: shared memory and system ABI in lock free ring-buffers preferred over socket/pipe pack/unpack transfers; minimal sustained set of system calls needed (for least-privilege sandboxing); resource allocations on a strict regimen (DoS prevention and exploit mitigation); fixed based set of necessary capabilities and user-controlled opt-in for higher level ones.

SHMIF has a large number of features that were specifically picked for correcting the wrongs done to X- like network transparency by the gradual introduction of side-bands and good old fashioned negligence. Part of this is that all necessary and sufficient data exchange used to compose a desktop goes over the same IPC system — one that is free of unnecessary Linuxisms to boot. While it would hurt a bit and take some effort, there are few stops for packing our bags and going someplace else, heck it used to run on Windows and still works on OSX. Rumour has it there are iOS and Android versions hidden away somewhere.

Contrast this with other setups where you need a large weave of IPC systems to get the same job done; Wayland for video and some input and some metadata; PulseAudio for audio; PipeWire for some video and some audio; D-Bus for some metadata and controls; D-Conf for some other metadata; Spice/RFB(VNC)/RDP for composited desktop sharing; Waypipe for partial Wayland sharing, X11 for partial X / XWayland sharing: SSH+VT***+Terminal emulator for CLI/TUI and less unsafe Waypipe / X11 transport; Synergy for mouse and keyboard and clipboard and so on. Each of these with their own take (or lack thereof) on authentication and synchronization, implementing many of the most difficult tasks again and again in incompatible ways yet still end up with features missing and exponentially more lines of code when compared to the solution here.

Back to Arcan-net. It exposes an a12 server and an a12 client, as well as acting as a shmif server, a shmif client and taking care of managing authentication keys. In that sense it behaves like any old network proxy. While not going too far into the practical details, showing off some of the setup might help.

On the active display server side:

void@ arcan-net -l 31337

This will listen for incoming connections on the marked port, and map them to the currently active local connection point. To dive further into the connection point concept, either read the comparison between Arcan vs Xorg or simply think ‘Desktop UI address’; The WM exports named connection points and assigns different policies based on that.

On the client side we can have the complex-persistent option that forwards new clients as they come:

arcan-net -s netdemo 31337
ARCAN_CONNPATH=netdemo one_arcan_client &
ARCAN_CONNPATH=netdemo another_arcan_client &

Or the one-time simpler version which forks/exec arcan-net and inherits the connection primitive needed to setup a SHMIF connection:

ARCAN_CONNPATH=a12://keyid@host:port one_arcan_client

Or, and this is important for understanding the demo, an api function through the WM:

target_devicehint(client_vid,"a12://keyid@", true)

This triggers the SHMIF implementation tied to the window of a client to disconnect from the current display server connection, connect to a remote one through arcan-net, then tell the application part of the client to rebuild essential state as the previous connection has, in a sense, ‘crashed’. The same mechanism is then used to define a fallback (‘should the connection be lost, go here instead’). This is the self-healing aspect of proper resilience.

There are WM APIs for all the possible network sharing scenarios so it can be handled as user interfaces without any command line work.

I mentioned ‘authentication’ before, where is that happening? So this is another part of the implementation that is still settling. We are reasonably comfortable with the cryptography engineering, which is currently undergoing external/independent review. 

For those versed in that peculiar part of the world, the much condensed form is currently AE/EtM, ChaCha8 for stream cipher, BLAKE3 for KDF and HMAC, x25519 for asymmetric key exchange (optionally with initial MinimaLT like ephemeral exchange) and PAKE-like (possibly changing to IZKP) n-unknown-initial PK authentication (rather than PKI), with periodic rekeying.

Example Uses

For great security (and justice): As demoed, moving applications to present on other devices is a neat thing for getting more use out of computers that are otherwise cast aside when you have a big beefy workstation; put those SBCs to use, security compartment per device and happily ignore most of the endless stream of speculative attacks instead of allowing mitigations to bring processing power back a decade.

Take that clusterboard of SOPINE modules, boot them into a ramdisk with a single-use browser and let that be your ‘tab’, pull the reset pin when done and that very expensive chrome- or less expensive firefox- exploit chain will neither be able to grab many interesting tokens nor gain meaningful persistence or lateral movement. Hook it up to the same collection/triaging system used for the fuzzers you have running and pluck those sweet exploit chains out of the air ready for re-use responsible disclosure.

Take those privacy eroding apps and adtech drivel that managed to weasel their way into a position of societal necessity and run them ‘per device’ but far away from your person; feed their sensory inputs with sweet, GPT-3-esque plausibly modelled nonsense, tainting whatever machine-learning trend that feeds on this particular perversion until they are forced to pay cost in order to discern truth from tales.

For great mobility: Ideally, the use and experience should be seamless enough that you would not need to care where the client itself is currently running – that is to continue down the path of Sun Ray thin clients towards thin applications as a counterpoint to the mental- and technical dead end that is the modern web.

Applications that are capable of serialising their seed- and delta- states and responds to the corresponding SHMIF events for RESTORE/STORE and RESET can first remote render as the safe default, while in the background synchronise state with the viewing device (should the same or compatible software exist there) and switch over to device-local execution when done.

Using the ‘headless’ arcan binary like this:

ARCAN_VIDEO_ENCODE=protocol=a12:port=666 arcan_headless my_wm

Then have an Arcan instance provide ‘your real composited desktop’ running ‘in the cloud’. Let the applications and their state live there while travelling across unsafe spaces, and redirect the relevant ones to your device when it is safe and necessary to do so.

For great collaboration:

Think of an ‘window’ as a view into an application and its state as a living document that can both be shared and transferred between machines and between people. What one would call ‘clipboard cut and paste’ or ‘drag and drop’ is not far from ‘file sharing’ if you adjust the strength of your reading glasses a little bit. There are pitfalls, but that is our job to smooth over.

With decorations, fonts, semantic colour palette, input bindings and so on being defined by the device a client is presenting on rather than the settings of the local client through some god-forsaken config service or font specification, differences in aesthetics, accessibility and workflow can be accounted for rather than accentuated.

Take that ‘window’ and throw it to your colleagues, they can pull out what state they need or add to it and return it back to you when finished. Splice your webcam and direct it to your friend and there is your video conferencing without any greedy middle-men.

For great composability:

The arcan-net setup, as shown before, covers remote presentation/interaction with  SHMIF clients. This includes exotic external input drivers and sensors, enabling advanced takes on “virtual KMS” workflows that applications like Synergy provides.

Take one of those spare tablets gathering dust and use it as a remote viewer/display – or forward otherwise useless binary blobbed special input devices (Tobii Eyetrackers come into mind) from the confines of a VM.

Take those statusbar- applications, run them on your compute cluster or home automation setup and forward to your viewing device for SCADA- like monitoring and control.

Personally, my fuzzing clusters ‘phone home’ by forwarding wrapper and control clients whenever a node finds something to my desktop. By leveraging the display server to improve debugging over the same communication channels I get my crash inspections over without juggling symbols, network file systems and what not around. Redirect the interesting ones to the specialist on call for such purposes.

The range of applications that various combinations of these tools enable is daunting. Your entire desktop, with live application state, can literally be made out of distributed components that migrate between your devices – drag and drop:able.

Protocol State

The reference implementation is in heavy/daily use, but there are a number of things to sort out before we would dream to push- or put- it to use for sensitive tasks. If you want to run your terror-cell or other shady business with the tools, go ahead. Anything or anyone more honest and decent – stay away until told that it is reasonably safe.

All the infrastructure around this will be actively developed for a long time, and there is a fair list of coming interesting and advanced things as the focus for this work. Track the for changes.

Some of the highlights from my perspective:

  • Safety features (side channel analysis resistance, transitive trust) – machine learning is listening in and interactive web is no different; the field is ripe with tools that can reconstruct much of plaintext from rather minor measurements without having to attack the cryptography engineering or primitives themselves.
  • Spliced interactive/non-interactive subviews – for group collaboration.
  • Compressed video passthrough negotiation – to ensure that no pack/unpack stage for already compressed sources like video persist.
  • ALT/AGP packing (Arcan rendering command buffers) – to stream both mid-level graphics and its virtual-GPU like backend as draw primitives when no better representation is available.
  • Latency/Performance work – Better domain specific carrier protocols (UDT and the likes), progressive compression for certain client types, buffer backpressure mitigations strategies. Network state deadline estimation for better client animations.

But more on those at a later time.

Demo Walkthrough

Having, hopefully, wet your appetite – lets close this one by explaining what went on in the demo.

As you might have seen in the example command lines at the top, or in the previous articles, “connection points” is a key primitive here. They allow the window manager (WM) and the user to define ‘UI addresses’ so clients know where to go, and the WM knows which policy to use to regulate the IPC mechanisms provided.

Any ‘segment’ (group of audio/video/event-io roughly corresponding to a window) has a primary connection point, and a fallback one. Should the primary connection point fail, the client will try to reconnect to a mutable fallback, and then rebuild itself there. This allows clients to move between display server instances, something that covers elaborate sharing, more safe/robust multi-GPU support etc.

This fallback is provided through the WM via a call to target_devicehint. It comes in two main flavours, “soft” (use only on a severed connection), and “hard” (leave now or die).

In the videos, the Durden WM is used. To make a long story short, it has an overwhelming amount of features (600+ unique paths last I counted), structured as a virtual filesystem (even mountable). In this filesystem, the specific path “/target/share/migrate=connpoint” tells the currently selected window to “hard” migrate to a connection point. In this case it is the special flavour of a12://my_other_machine that indirectly fires up arcan-net to do most of the dirty work.

With the stacking workspace layout mode, there is a feature called “cursor regions” – parts of the workspace that trigger various file system paths depending on if the mouse is moving, dragging or clicking.

The way things are setup here then, is that:

  1. When a window is dragged into the left edge area of the screen, draw a portal graphic.
  2. Default the contents of the portal to be noise, spawn an instance of the ‘remoting’ frameserver connecting to the host IP of the left laptop (why you can see parts of the remote wallpaper in one of the videos).
  3. If connected, show the contents of this remoting connection.
  4. If the window is dropped over the portal, send the /target/share/migrate command, pointing to the remote server in question.
  5. If the drag action leaves the region, kill the remoting connection and hide the portal.

Onwards and upwards towards more interesting things. /B

Posted in Uncategorized | Leave a comment

Leveraging the “Display Server” to Improve Debugging

I spend most of my time digging through software-in-execution rather than software-at-rest (e.g. source code). Sometimes the subject of study is malware hissing like a snake and lashing out at the barriers of a virtual machine; sometimes it is terrible software deserving of an exploit being written; sometimes it is a driver on a far-away device that flips bits when a clock got skewed or circuits gets too hot — most of the time it is something between these extremes.

Many years ago, a friend and I distilled some thoughts on the matter into the free-as-in-beer ‘systemic software debugging‘. Maybe one day I have enough of both structure and incentive to revisit the topic in a less confined setting. Until such a time, it might be useful to document some experiments and experiences along the way, which  brings us to the topic of this post: ways of using the display server infrastructure and its position in the user-space stack, to reason about- and bootstrap- debugging. 

While that may come off as a bit strange, first recall that the”Display server” is a misnomer (hence the “”): the tradition is that it, of course, serve much more than the display. In the majority of cases you would also find that they ‘serve’ user-initiated blob transfers (‘clipboard’ and ‘drag&drop’) as well as a range of input devices (keyboard, mice, …). In that sense a terminal emulator, web browser and virtual machine monitor rightly fall into this category. What I try to refer to is the IPC system or amalgamation of IPC systems that stitch user-space together into the thing you interact with.

The article is structured around three groups of problems and respective solutions:

  • Activation and Output
  • Chain of Trust
  • Process and Identity

Each explaining parts of what goes on in the following video:

The central theme is that the code that comes with the IPC system (xlib, xcb, arcan-shmif, …) can be used to construct debugging tools from inside the targeted process, and piggyback on the same IPC system for activation, inputs and information gathering/output.

The gains is that you get a user friendly, zero-cost until activation, high-bandwidth, variable number of channels to collect process information, that can cooperate with the client and let it provide its higher level view of its debug state, while at the same time add custom primitives with few to none additional allocations post-instrumentation. 


To add some demarcation, the more interesting target here is live inspection of mature software as part of a grander system – far away from the comforts of a debug build with click-to-set breakpoints launched from the safety of an integrated development environment. The culprit is not obvious and the issue might not be reliably repeatable.

Your goals might not be to fix the issue, but to gather evidence and communicate to someone that can.

Also bear in mind that this is not directly about “the debugger” in the sense of the nuclear powered Suisse army knife that is also known a ‘symbolic debugger’ tool such as ‘gdb’ and ‘lldb’, but rather the whole gamut of tooling needed to understand what role a piece of software fulfils and how it is currently functioning within that role.

Think how the ‘intervention’ friendly version of this intimidating chart from Brendan Gregg’s post on Linux Performance would look like for the ‘applications block’ and you get closer to the idea:

Activation and Output

This first group of problems covers software that wants to cooperate by adding features now that may be useful for debugging later. Some refer to this as ‘pre-bugging’.

Consider the notion that you are a responsible, pro-active developer. You understand the need for others to inspect what your application or library is doing, and that there are things in the execution environment you simply cannot account for while remaining sane and getting things done. You want to make it easier for the next one in line, and get higher quality feedback about what went wrong out there in the field.

What are your normal practical options?

  1. Command-line argument
  2. Environment Variable
  3. Specific User Interface toggle

These are all problematic, though in somewhat different ways. 

With the first two options you have the problem of communicating that the feature is available (how will the user discover it, how will you remember it is there) –, man pages, FAQ/Wiki, ancients words of wisdom spray painted on a live chicken, and so on – something need to announce that the option is there.

Options 1 and 2 are also quite static; they get set and evaluated when the program is started, and if you want to activate debug output dynamically, well, tough luck. Your problem then needs to be reproducible both with- and without- debug output enabled.

The actual output also comes with noticeable system impact – sending strings to STDOUT, STDERR may break (introduce new bugs) other processing steps the user might have, common in the traditional UNIX pipes and filters structure. They are also not necessarily ‘dumb pipes’, isatty() is very much a thing (as is baud rate), as is threading. The combination of these properties makes for an awful communication channel to try and multiplex debug output on.

The other option, writing to a log device or file can clog up storage, wear down flash storage, and inadvertently leave sensitive user information around. Formatting the output itself is also a thing to consider, even ‘simple’ printf has some with serious gotcha’s (read up on locale-defined behaviour) and information that is better presented as changes over time than a stream of samples will need other tools to be built for post-processing. 

Option 3 involve quite a lot more of work to implement, as the feature should mesh with other user interface properties. When properly done however, it can add quite a lot of power to the software – look no further than the set of debugging tools built into Chrome or Firefox. Of course, it helps that these are also development environment in and of itself to incentive the cost and effort. While often a better option, it still composes poorly with other non-cooperative information collection.

Post-mortem (crash-dump) is a slightly different story and one that calls for a much longer discussion. This is out of scope for this article, though the primitives that emerge will work as both a transport for core dumps and as a way of annotating them, but is a decent follow-up topic.

Enough with the problem space description, where does the display server fit in?

In the most trivial of window systems, a client connects somehow, requests/allocates some kind of storage container (window), draws into it either directly as characters or pixels, or indirectly through drawing commands to the server itself or through an intermediary (like the GPU). In return, the client gets input events, clipboard actions and so on back over the same or related channel. This holds true even for a terminal emulator.

These windows may come with some kind of type model attached; indirectly through the combination of properties set (parent window, draw order) and directly through some other type tag (X11, for instance, has a long list of popups, tooltips, menus, dnd, and so on), and arcan has a really (too much actually) long one.

Step 1 – Add a debug type

This allows other client agnostic tools to enumerate windows of this type, compose them together and record/stream/share. Controls for activation are there, as well as a data channel that is high bandwidth, capable of cleanly multiplex multiple data types.

Step 2 – Two-sided allocation

Now for the more complicated step. Recall the problem of saying when debugging features are needed (command-line or environment). The ideal approach would be initiated at user request during run time, with zero cost when not in use.

This is more complicated to achieve as it taps into the capabilities of the IPC system and its API design. A simplified version would be possible in the context of X11 as a notification message about a debug window being requested, then let the client allocate  a window with the right type. This still leaves us with some of the drawbacks of the third option, namely what to do with uncooperative clients, which will be covered in the next section.

Now the first part of the video is explainable – The heart-emoji button in the title-bar simply sends a request to the client that a debug window is wanted. The client in question (the terminal emulator that comes with Arcan) responds by creating a new window, then renders relevant parts of the emulator state machine.

Chain of Trust

After the trace- like outputs of getting debug data out of a client, the more complicated matter comes with proper debug interfaces that also provides process control, resource modification and so on; interfaces that do not rely on the client actually cooperating.

Assuming we have a process identifier to the client in question (also a problem, see the last category, identity, for that). Lets try and attach a debugger. The normal way for that is firing up an IDE or terminal, with something to the effect of:

# gdb -p SWEETPIDOFMINE ptrace

Only to be promptly met with something like:

Operation not permitted

Low level debugging interfaces tend to be ‘problematic’ in every conceivable way.  If you are not convinced, read the horror show of a manpage that belongs to ptrace if you have not done so already; Search around a little bit and count the number of vulnerabilities it has had a leading role in. Study the role of J-Tag in embedded system exploitation. My point is that these things are not designed as much as they spring in to life as a intermittent necessity that evolves into a permanent liability. Originally, the /proc (PDF, Processes as Files) filesystem was proposed as a better design. One can safely say that it did not turn out that way.

So what is at fault in the example above? 

To lessen the damage, and to make malware authors work for it a little bit, Linux-land has YAMA (and other, equally intimidating mechanisms) which imposes certain rules on who gets to attach as a tracer depending on what is to be traced and when. You can turn it off universally – a universally bad idea – by poking around in procfs.

You can also use the ‘I don’t have time for this’ workaround of prefixing ‘gdb -p’ with the battering ram that is sudo. My regular stance on anything sudo, suid, polkit etc. is that it is an indicator of poor design somewhere or everywhere. Friend’s don’t let friends sudo. From power-on to regular use, there should not ever be a need or even an implementation for converting a lower privilege context to a higher one. Any privilege manipulation a client should have access to is reducing towards the least amount of privileges. You should, of course, have means to place yourself where you (or your organization) like in the chain of trust and have the controls to reduce from there – but I digress.

The problem with the YAMA design from the ptrace perspective is that you are practically left with no other choice. Given that your lower privilege client (gdb) now gets escalated to a much higher one, then attached to parse  super complex data   from a process that you by the very action indicate that you don’t comprehend or trust is a fantastically poor idea.

So what to do about the situation? Well there are other rules to modify ptrace relationships. Normally, a parent is allowed to trace its child – and that is how gdb attaches to begin with, but that does not work in the attach case.

Enter the subtleties of prctl, and now we come into the Fantastic (debug) Voyage of Isaac Asimov-fame part of the adventure.

From the last section we got the mechanisms for negotiating a debug control channel initiated from the display server. Now the extension is a bit more complicated, and this is one place where arcan-shmif has the upper hand. Instead of just sending an event saying ‘could you please create a window of type X’, we have the ability to force a window upon a client.

The shorthand form of what happened in the demo was roughly (pseudo-Lua):

local buf = alloc_buffer(width, height)
target_alloc(client_handle, buf, event_handler, "debug")

This actually creates a full IPC channel, and sends it to the client over the existing one, with type already locked to DEBUG.

This is a server-side initiated signal of user-intent. The same mechanism is used to request a screen-reader friendly view for accessibility options, and it is used for initiating screenshots, video streams, … (the later angel is covered in the article on interfacing with a stream deck device). 

The client gets it as part of its normal event loop (see also: the handover part in the trayicon article). Pseudo-code wise, the important bit is:

if segment_type == "debug" or does_not_match(request):
debug_wnd = arcan_shmif_acquire(...)

But what happens if the client does not actually map it? 

Well, if you look at this part of the video, the second window is populated from within the client process, but without cooperation from the client.

The IPC library code knows if the client mapped an incoming window or not. If it didn’t, the library takes it upon itself to provide an implementation. The current one is a bit bare, but there is a lot of fun potential in this one – pending a much needed primitive for ‘suspend all threads except me’. This is quite close to a ‘grey area set’ of techniques known as process parasites

Now, we can use this to spin off a debugger, though it is neither clean nor easy.

Prctl has a property called ‘PR_SET_PTRACER’. This allows a process to specify another process that will be allowed to attach to it via ptrace. The naive approach here would be to fork out, set the tracer to the pid returned from fork(). It also would not work, for multiple reasons.

One is that gdb then lacks a way to draw, and distinguishes from stdin/stdout being TTYs or not. Luckily enough we have a terminal emulator that can inherit an arcan-shmif debug window and use as its display. 

So the hidden debug interface uses the debug window to request another debug window that gets inherited into the terminal and used to set up the drawing and emulation for gdb to work.

The experienced POSIX connoisseur will see the chicken and egg problem here; the process to be debugged needs to wait until it knows the PID of the debugger process, in order to set the permitted tracer, and the debugger needs the PID of the target process in order to attach.

So the current solution is:

  1. Create two pipe pairs [parent to debugger, debuffer to parent] and inherit the appropriate ends.
  2. Have the terminal emulator stop before exec()ing the child process, and write back the ‘to become debugger’ PID back over the pipe. Block on read.
  3. (In parent/debug target) receive the PID, set prtctl, write the own process PID back.
  4. The child process received the trace-target PID, adds it to the arguments and continues exec() into the debugger.
  5. Profit.

The clip below shows it in action:

If the tool to launch this way does not come from a controlled context because it, in turn, spawns new processes where some subprocess needs to perform the attach action it becomes even more masochistic. In that case, one would need to repeat this dance by ptracing the chain of children until a child attempts to call ptrace, and then perform the setup.

Now we have communication channels and bootstrapping ptrace-tools, retaining chain of trust and negotiated over the display server, initiated by the user. The tools have access to APIs that can break free of the terminal emulator shackles. Good building blocks for more advanced instruments.

Process & Identity

Procfs can be used to explore how the operating system perceives a specific process, which resources are currently allocated and so on. In order to do that, some identifier is needed. So the trick question to start this off, is how do you figure out the process identifier of a process tied to an X window?

A quick search around and we get this (stack overflow suggestion):

xprop _NET_WM_PID | cut -d' ' -f3

The problem is just that this does not really work. The atom, _NET_WM_PID is based on the client being nice enough to provide it, and nice enough to provide the real one and not just some random nonsense, like the pid of the X server itself – fun basic anti-debugging.

Even if the process ID is retrieved is truthful and correct, it might not be the case when you start reading from its proc entries – it is inherently race:y.

In modern times, this problem does not get easier when we take containers and other para-virtualization techniques into account where the view of the outer system and its resources from the perspective of a process is vastly different from other processes.

On the other hand, if the code collecting the data runs from within the target process itself, the /proc/self directory can be relied on. There are a number of really interesting venues to pursue; look at the debug-if implementation in the code-base for some of those ideas, or ping me if you are interested in chatting about these things.

For the remainder of this article, we will settle in the bits implemented thus far, which brings us to the last part of this video.

The bits showcased here is that we can open and modify environment variables, as well as enumerate the currently open file descriptors – and even make a copy if their contents from the current state. The HUD menu that appears in the outer WM is  server-side file-picking, and the hex-editor that appears is one of the standard arcan-tui widgets, bufferwnd.

What is not showcased is that we can spawn out a shell from the context of this process for normal interactive exploration, and redirect/intercept/man-in-the middle intercept pipes and sockets (with live editing through the hex editor).

Rounding things off, two powerful venues around the corner here is that:

  1.  When combined with dynamic network transparency/redirection, we have an easy to use and lightning fast way of setting up multiple attached troubleshooting tools and ‘one-click’ share them with someone else.
  2. Since these primitives can nest by ‘handing over’ new window primitives, we get ways of building dynamic hierarchies reusing the carrier for multiple types of data – think in the context of a whole-system virtualization like Qemu. All layers [Qemu itself], [Guest OS shell], [Application inside guest].

If you use Arcan, that is 🙂

Posted in Uncategorized | Leave a comment

Interfacing with a ‘Stream Deck’ Device

Continuing the series on using the various Arcan APIs, we get to another use case that works a bit differently here than elsewhere. What makes it interesting enough for a post is how the low and high levels fit together for a device class that challenges normal boundaries on what are valid input and output devices.

Before we start, here are some references to the previous articles in the series:


The lucky device in question will be an “ElGato Stream Deck“, which is basically a low resolution budget display with an overlay of translucent buttons and a thin input grid layer that captures button presses.

Operating on the ‘mechanism, not policy’ principle, the tools we will be writing are not strictly bound to this kind of hardware. You can run it inside a Raspberry Pi with a touch display and use arcan-net to run the client remotely and you get quite the nifty visual remote controller. Why not bind it to a “display-pad” secondary small display that some recent laptops replace the trackpad with, or use the output as a model texture in safespaces to remove the need for most of the keybindings.

Here is a video of the results of this walkthrough in action, mainly showing the produced output from a short script being mapped to the display:

Demo of the arcan-streamdeck driver connected to a sample application

The (low level) repository can be found here:

There is also a high level tool that has been added to Durden(streamdeck) for these kinds of displays. It uses the same low level code, communication and setup as we will go through here. The rest of this tool is a bit more complicated, but for good reason; it can handle multiple devices of different display sizes and cell dimensions; it exposes dynamic WM state and live contents preview; titlebar decoration buttons; client announced icon bindings; and a ton of other features. It can basically be used as a very competent accessibility tool.

This video shows that tool in more real action:

Demo of the ‘streamdeck’ tool in Durden connected to the ‘arcan-streamdeck’ driver

Note how the buttons relayout based on the window that gets selected. The titlebar decoration buttons gets moved to the display, and if the client that is tied to the selected window announces any custom key inputs, those are added to the list as well. Window contents and workspace previews also gets real-time mapped to the display and can be used for window selection.

We will skip the build system and USB/reversing specific part, but basically each button is updated by sending a chunked raw bitmap image with a header that specifies button index and chunk, and inputs are received as reports with a byte per designated key.

For the Arcan side, we need both the low-level code that maps to the device, and a high-level script that defines the client behaviour. In this article, we will jump between the two as needed. The low-level parts will be prefixed with “Client-side” and the high-level with “Server-side”.

(Client-side) We start with the normal connection dance:

struct arcan_shmif_cont C = arcan_shmif_open(
hid_device* deck_device = hid_open(0x0fd9, 0x0060, NULL);

Note the SEGID_ENCODER type. This tells the server side that the client wants to receive audio/video data rather than to provide it. Just building and running this would yield complaints that hey, it couldn’t connect anywhere. Even if the ARCAN_CONNPATH environment variable would point to a valid connection point, it is very likely that the listener on the other end rejects attempts at creating an encoder.

To remedy this, we need to specify the high-level reaction. Normally, we can write or modify an Arcan appl specifically for this, but this time we will use another nifty engine feature, and that is ‘hook scripts’.

Hook scripts are generic scripts that can be user-enabled, regardless of the appl:

arcan -H "hooks/streamdeck.lua" /path/to/my/appl

This would prompt the engine to first start the application as normal, but when it has finished executing the entry point, the hook script will be loaded. These scripts are located in the ‘system-scripts’ namespace, defaulting to /usr/share/arcan/scripts and redirectable via the ARCAN_SCRIPTPATH environment variable.

(Server-side) The skeleton of our hook script (stremdeck.lua) will start like this:

local function open_streamdeck_cp(name)
    local maxw = 72 * 5
    local maxh = 72 * 3
    target_alloc(name, maxw, maxh,
        function(source, status)
            if status.kind == "terminated" then


This reads as:

  1. Open a connection point called streamdeck, tied to a (72 * 5), (72 * 3)’ buffer
  2. When that connection terminates, re-open the connection point

Had the “open_streamdeck_cp” call been moved to the state where status.kind == “connected”, multiple clients would be allowed to hook up and receive data over this connection point. This time, we enforce a single active client at a time – but the trick of forcing manual reactivation of a connection point makes it trivial to rate-limit connections and prevent clients from causing stalls or staging denial-of-service like attacks.

The code above still behaves like ‘normal’ clients, even though this client wants to be fed video. Running everything and stitching together like this:

ARCAN_SCRIPTPATH=. arcan -H "streamdeck.lua" console &
git clone
cd arcan-devices/streamdeck ; meson build ; cd build ; ninja
ARCAN_CONNPATH=streamdeck ./arcan-streamdeck

would still yield nothing of interest. We need to tell the engine what to forward to the client, and how. No external client is permitted to have a working primary segment acting as an encoder for security (confidentiality) reasons, as there is no visual identity for a user to make an informed judgement about what to forward to any specific client. The exception has traditionally been the ‘encode’ frameserver, as that is spawned from the server side where the chain of trust would remain intact. Even then, it is the server side (through the window manager) that specifies what an encoder gets to see.

The normal way of allowing a client to receive audio/video is through a new subsegment established over the primary. Scripting wise, this is initiated server-side through calling define_recordtarget on an identifier referencing a client segment; client side receives a new segment event and maps it by calling arcan_shmif_acquire.

There is an exemption to this, which we will leverage here. The internals of ‘define_recordtarget‘ is that an offscreen rendertarget (a so called ‘FBO’) is created, with an automated or scripted refresh trigger. The contents can be forwarded to an internal or external recipient whenever there is an update. With the recently added ‘rendertarget_bind‘ function, we can swap out the recipient of a rendertarget with the handle of our connected client.

(Server-side) First, we hook up a little check to see that the connected client is of the right type.

if status.kind == "registered" then
    if status.segkind ~= "encoder" then
    build_rendertarget(source, maxw, maxh) -- to implement

(Server-side) Then we implement this ‘build_rendertarget’ function:

local function build_rendertarget(source, w, h)
    local buf = alloc_surface(w, h)
    local img = color_surface(w, h, 0, 255, 0)
    blend_image(img, 1.0, 100)
    define_rendertarget(buf, {img})
    rendertarget_bind(buf, source)
    link_image(buf, source)

This should be quite straight forward; first we allocate a buffer that will fit our offscreen rendering. Then we allocate a single coloured image that we fade in over the course of (100 / 25 = 4) seconds. Then we create the rendertarget with this image attached. Lastly comes the magic that makes the source act as a recipient of the rendertarget contents, as well as a life-cycle trick that makes sure buf gets deleted automatically when source is destroyed.

Now time for the client side steps. We add a normal event loop:

    struct arcan_event ev;
    if (arcan_shmif_poll(&C, &ev) > 0){
        if (ev.category != EVENT_TARGET)
        if (ev.tgt.kind == TARGET_COMMAND_EXIT)
        if (ev.tgt.kind == TARGET_COMMAND_STEPFRAME)
            consume_video(&C, deck_device);

Whenever a STEPFRAME event is received, the video buffer pointed to inside of the arcan_shmif_cont structure has been updated, and will be held in that state until released through arcan_shmif_signal(&C, SHMIF_SIGVID);

The code in the repository is more complicated as it handles repacking into checksummed tiles and sending those that have changed to the USB device. Reprinting that here would just add noise, so a simple loop implementation for consume_video would be:

for (size_t y = 0; y < C->h; y++)
  for (size_t x = 0; x < C->w; x++){
    uint8_t r, g, b, a;
    SHMIF_RGBA_DECOMP(C->vidp[y * C->pitch + x], &r, &g, &b, &a);
/* do something with r, g, b, a */
arcan_shmif_signal(C, SHMIF_SIGVID);

Video covered, time for the input routine. The astute reader might have noticed that the shmif poll loop would always spin and never block. This is because we will use libusb for a crude form of blocking here.

Patching the loop from before, we get this:

int old_mask = 0;
  uint8_t inbuf[64];    
  ssize_t nr = hid_read_timeout(deck_device,
                                inbuf, sizeof inbuf, 64);
  if (-1 == nr)

  int mask = get_mask(inbuf, nr);
  if (old_mask != mask){
      update_mask(C, old_mask, mask);
      old_mask = mask;

We will get a single report in inbuf, so just enumerate the number of bytes read, and map each to a bit in the mask (get_mask). The last Arcan piece will be in update_mask:

static void update_mask(struct arcan_shmif_cont* C,
                        int mask,
                        int changed)
    int ind;
    struct arcan_event ev = {
        .category = EVENT_IO,
        .io = {
            .devkind = EVENT_IDEVKIND_GAMEDEV,
            .datatype = EVENT_IDATATYPE_DIGITAL

    while((ind = ffs(changed)){ = ind; = !!(changed & (1 << (ind-1)));
        changed = changed & ~(1 << (ind - 1));
        arcan_shmif_enqueue(C, &ev);

What happens above is simply that we enumerate the bit positions that have changed since last time, translate its position to a numeric 0-index, and mark it as pressed or released. The io substructure in the arcan_event structure is quite complex as it contains some legacy left-overs, and encompasses a really wide range of devices. The Device-kind ‘GAMEDEV’ used here is the catch-all that does not cover other established input device types (keyboards, mice, touch panels, eye trackers, …).

For the last step, we should extend the input handler in the Lua script as well. If the callback set when issuing target_alloc is changed to contain this:

function(source, status, input)
    if status.kind == "input" and _G[APPLID .. "_input"] then
        key = translate_input(input)
        if key then
            _G[APPLID .. "_input"](key)
-- code from previous sections goes here

The missing translate_input function should filter out unwanted input events so that the client does not get free reign to inject whatever it wants into the main input handler of the running appl. Recall that we are writing a hook-script that can be forced into any appl.

The implementation of translate_input then:

local function translate_input(input)
    if not then
    local keyind = 97 + (input.subid % 25)
    return {
        kind = "digital",
        digital = true,
        devid = 0,
        subid = input.subid,
        translated = true,
        number = keyind,
        keysym = keyind,
        active =,
        utf8 = and string.char(keyind) or ""

Would have the input be translated to act as a keyboard input. The input table is quite complex, as shown in the documentation for the related target_input function that forwards input to a client.

This brings us to a point where the device is functional. Before ending this, let us do two minor adjustments – one technical and one cosmetic. Adding another state to the event handler in the target_alloc handler:

if status.kind == "connected" then
     target_flags(source, TARGET_BLOCKADOPT)   

This one is subtle – but should the engine reset the Lua VM due to a scripting error, or as a user trigger from switching appl, the frameserver created through target_alloc would be re-exposed to the scripts through a possible _adopt event handler. Setting the flag above will cause the source to be destroyed rather than adopted, preventing this from causing the client to ‘appear’ as a normal one in that edge condition.

Lastly, and for fun since just a green screen is not very interesting, we go a bit old school demo scene through this that adds a plasma as well as showing how GPU programs are set up and used:

local frag = [[
varying vec2 texco;
uniform int timestamp;
uniform float fract_timestamp;
float f1(float x, float y, float t){
    return sin(sqrt(10.0*(x*x+y*y)+0.05)+t)+0.5);
float f2(float x, float y, float t){
    return sin(5.0*sin(t*0.5)+y*cos(t*0.3)+t)+0.5;
float f3(float x, float y, float t){
    return sin(3.0*x+t)+0.5;

const float pi = 3.1415926;
void main()
    float time = float(timestamp) + fract_timestamp;
    float sin_ht = sin(t*0.5)+0.5;
    float cx = texco.s*4.0;
    float cy = texco.t*2.0;
    float v = f1(cx, cy, t);
    v += f2(cx, cy, t);
    v += f3(cx*sin_ht, cy*sin_ht, t);
    float r = sin(t);
    float g = cos(t+v*pi);
    float b = sin(t+v*pi);
    gl_FragColor = vec4(r,g,b,1.0);

local shid = build_shader(nil, frag, "plasma")

-- and at the end of build_rendertarget:
image_shader(img, shid);

Posted in Uncategorized | Leave a comment

Another low-level Arcan client: A tray icon handler

This is the third part in the ongoing series exploring the different levels of APIs that are available for Arcan clients. For reference, the previous parts in the series are here:


In this part, we will write a really nifty little tool that will act as a tray icon (and more). When clicked or triggered via a custom input binding or a global hotkey, the tool will launch a second application that attach as a “popup” to the tray icon, with the icon representing whether the application is alive or not. This video shows it running inside of the Durden desktop, first wrapping a terminal emulator, then an X server running windowmaker (tradition by now).

The API features it covers are as follows:

  1. Registering custom key bindings.
  2. Negotiating a connection on behalf of a third party.
  3. Spawning a new process that inherits a negotiated connection.
  4. Server-side triggered coarse-grained timers.

The tool itself turned out useful enough to be added inside the tool collection in the main Arcan repository, so you can find it here:

A neat thing is that this clean and small tool can actually cover a number of use-cases without any modifications to the ‘real’ client, staying entirely in line with the ‘do one thing and well’ philosophy.

From an outline perspective, this is what our tool will do:

  1. Connect and identify as an ‘ICON’.
  2. Render a SVG into the segment from 1.
  3. On activation (click or otherwise), request a ‘HANDOVER’ subsegment.
  4. Inherit this handover connection into a new process and executable.
  5. Keep track of the child and update the icon state accordingly.

Getting the icon (1, 2)

The structure of the code itself will be similar to that of the previous article, so we will use that as a reference and describe the relevant modifications.The first change for this is easy, switch out the MEDIA part of the connection call to an ICON:

struct arg_arr* args;
struct arcan_shmif_cont conn =
arcan_shmif_open(SEGID_ICON, SHMIF_ACQUIRE_FATALFAIL, &args);

As a recap, the primary segment type is a strong hint to the window manager about what kind of a connection this is. This type has some impact on how the connection will be synchronised and prioritised, what decorations will be assigned, which kinds of subsegments will be allowed to be allocated and so on. The rule of thumb is that type + connection-point is the main selector for behaviour.

Capture the initial data like before, this time around we care about the initial size, upper dimensions and output density as we need to follow them:

struct arcan_shmif_initial* cfg;
arcan_shmif_initial(&conn, &cfg);

struct trayicon_state state = {
.density = cfg->density * 0.393700787f,
.bin = argv[3],
.argv = &argv[4],
.client_pid = -1,
.envv = environ,
.icon_normal = argv[1],
.icon_pressed = argv[2]

Only thing of note is that the SVG renderer we will use is stuck in the dark ages and uses the archaic ‘DPI’ for density, so convert to metric. Rendering SVG is out of scope, so we pull that from a 3rd party: nanosvg. The actual drawing and signalling is not different than what was done in the last article, so little point in reiterating that here. This link takes you to the lines in question.

Using the naive event loop from last time would cause some problems , so we will modify it slightly:

static void event_loop(
struct arcan_shmif_cont* C, struct trayicon_state* trayicon)

struct arcan_event ev;
while(arcan_shmif_wait(C, &ev)){
do_event(C, &ev, trayicon);

while(arcan_shmif_poll(C, &ev) > 0)
do_event(C, &ev, trayicon);

if (trayicon->dirty){
arcan_shmif_signal(C, SHMIF_SIGVID);

The innermost while loop is new. For this specific application, it is ok that the first wait is blocking as the program is entirely event driven. Responding to some of the events can be expensive however, particularly those events that would redraw the icon state. To be more responsive to ‘event storms’ where a stall somewhere else in the system could cause multiple events that would counteract each-other, we flush them out and only update after that is done.

Intermission: Durden tool, “external buttons”.

Features like a tray requires cooperation with the window manager. The easiest way to achieve this is through an addressing scheme we call ‘connection points’. Review the previous articles in the series if you do not know about this concept already.

For Durden, there is an explicit script that maintains the connections for external statusbar button, reroutes input and deals with positioning, user configuration and so on. This link points to the source of that tool.

In short, it exposes ways of creating connection points that wait for an icon to connect, and maps actions and data on this connection into the statusbar. It also adds strong restrictions on what this icon connection can do, what other resources it can try and negotiate, and how input gets routed to it.

As with anything else in Durden, the settings and controls for enabling this feature is exposed as a filesystem (one that can be mounted over FUSE). The specific path that needs to be enabled for this demo is:


Then the settings are exposed and discoverable in following path:


This allows the destination and behaviour to be highly tuneable and composes well with all other desktop mechanisms, from input macros to timers.

On Activation, Request a Subsegment (3)

Back to our tool. Lets modify the event loop to respond to click events, and to register a custom input binding slot. This labelhint can also carry a suggestion as to the default keysymbol (F1, l, UP, …) + modifier (ctrl, alt, …) though it is, as normal, something for the WM to decide what to do with. As soon as we have the connection from arcan_shmif_open:

arcan_shmif_enqueue(&conn, &(struct arcan_event){
.ext.labelhint = {
.label = "ACTIVATE",
.descr = "tooltip description goes here"

Next in the event loop we will check for any digital event matching our label, or a digital press from a mouse type device.

if (ev->category == EVENT_IO){
if (ev->io.kind != EVENT_IO_BUTTON ||

if (strcmp(ev->io.label, "ACTIVATE") == 0 ||
ev->io.devkind == EVENT_IDEVKIND_MOUSE){
toggle_client(C, trayicon);

In toggle_client, we do the normal ‘check if the child is alive, if so, kill it and spawn a TERM to KILL thread’. It is just verbose and not very interesting. See this link for that code. The more interesting is if we want to spawn a child. First, we do this:

arcan_shmif_enqueue(C, &(struct arcan_event){
.ext.kind = ARCAN_EVENT(SEGREQ),
.ext.segreq.kind = SEGID_HANDOVER

This says ‘hey, I would like to create a new segment on behalf of someone else’. This allows the server side to accept the new client in, but also track origin without resorting to fundamentally broken things like asserting that pid == connection which was the X11 dance with _NET_WM_PID. This is an asynchronous request, so we need to do something in the event loop as well.

Handover to new Process (4)

We are reaching the end of the journey. Time to extend the event loop with something. There are two possible events that can come following a segment request. The boring one is TARGET_COMMAND_REQFAILED, which means that the server side said no. This is also the default if the WM does not explicitly opt-in via a call to target_accept as part of the event handling routine on the server side.

If the request is accepted, the proper resources are allocated and pushed to the client. To help dealing with multiple pending requests and so on, the type can be verified and paired against a client chosen ID, though we do not really care about that here for the time being.

trayicon->client_pid =
arcan_shmif_handover_exec(C, *ev,
trayicon->bin, trayicon->argv, trayicon->envv, false);

if (-1 != trayicon->client_pid){
render_to(C, trayicon);

It is worthwhile to note that NEWSEGMENT events can arrive at any time. If they do not match a previous request, it means that the WM force pushed a resource as a feature probe and signalling intent. This will become important in an upcoming article.

If we do not handle this event, the internals of the client side library will drop the resources on the next call into any of the arcan shmif functions that is not directly related to accepting the request. Here we use a specific derivative of the exec- family, arcan_shmif_handover_exec. Internally speaking, it sets up a new process but duplicates the control socket and mapping descriptor, passes it as an environment so the new client will find it on arcan_shmif_open instead of using the connection path mechanism.

Track the Child (5)

The last piece of the puzzle is to keep track of our new pid. Now this pose something of a problem with the event loop we have. The proper options tend to be to fire up a monitoring thread that waitpid()s or a signal handler, then forward that back to the event loop where we, instead of the normal wait / poll I/O multiplexes on the raw descriptors. That is a lot of ugly boring code. For this round, we take something simpler.

arcan_shmif_enqueue(&cont, &(struct arcan_event){
.ext.clock.rate = 5

This will periodically emit STEPFRAME events at a rate of one event every 5 ticks on a 25Hz clock (chosen for my PALs). This is a mechanism that can be used for getting frame delivery notifications, server- initiated variable frame pacing mechanism and for client requested low frequency timers. Here we will just use it to check if the child is still alive and flip the icon if the state changed.

In Closing

If you are curious as to how the client side of the tray icon work in Xorg, look no further than the “System Tray Protocol Specification“. A wrapper similar to this one would of course be possible to make within good old’ X, but in a quite more painful and convoluted way ,with much less security, safety and accountability towards the user. Recall, even the smallest of ‘icon’ in X is no less capable than a full blown application, and this lack of nuance is is kind of the problem.

The main point of this article was mainly to show off some of the basic features, primarily the HANDOVER execution, as it is something we will use later for much more advanced things. The secondary point was to illustrate the balance of power and control, with clients still able to achieve traditional edge case features, but not without the consent and constraints set by the user, and that can only be a good thing.

Posted in Uncategorized | Leave a comment

Writing a low-level Arcan Client

This is a follow up article to the higher level writing a kmscon/console replacement. In this article, we will instead use the low-level C API to write a simple client.

To recap, there are 3 APIs for writing clients:

  1. Advanced/Low- level: “shmif”
  2. Mid- level: TUI (text-dominant, basic-multimedia)
  3. High-level: ALT (interactive/advanced multimedia, think ‘app’)

We will cover each of these in turn over several articles, starting with the first one, shmif.

SHMIF or “SHared Memory InterFace” is intended for extending the engine with features like new input drivers; specialised tools that are supposed to extend the window manager you are running; for tight multi-process integration with the engine scene graph and all of its features. The wayland protocol implementation is built on this API, as are things like the libretro support, the QEmu and Xorg backends and so on. In short: it is not for every day. Technically speaking, it is comparable to xlib/xcb/wayland-client, but with quite a few more features, and a lot more comfortable to actually use.

As with the WM article, we will use the ‘console’ git at: to store the code.


For this we will need a C compiler like gcc or clang, and a build system that can deal with pkgconfig. This time around we will use the meson build system.

Create a file and an empty main.c file in a folder. Define the meson contents as:

project('demo', 'c', default_options : ['c_std=c11'])
shmif = dependency('arcan-shmif')
executable('demo', 'main.c', dependencies : shmif)

Note the standard being set to C11. Arcan-shmif makes quite heavy use of C11 features and its more advanced memory model, going back to C99 or even older is not an option.

Making a Connection

Open the empty main.c file and add:

#include <arcan_shmif.h>
int main(int argc, char** argv)

An unorthodox detail here is that the shmif headers, by default, pulls in the other headers that it needs from the standard library. The rationale being that it is just a frustrating and annoying time waste remembering all the <stdatomic.h>, <stdint.h>, <inttypes.h> etc. used. There is an ifdef to get rid of that behaviour should one be so religiously inclined.

struct arg_arr* args;
struct arcan_shmif_cont conn =

This is the simplified version of opening a connection, there is also a shmif_open_ext function that allows you to specify a UUID (for settings persistence), application title, current identity and other metadata.

Three things of note here. The first is SEGID_MEDIA, which is the lose type that we identify as. This is primarily a guide for the WM to decide a suitable ‘policy’ in terms of graphics post processing, scheduling and so on.

The second is SHMIF_ACQUIRE_FATALFAIL. There are a number of connection control flags that regulate connection looping, crash recovery, guard threads and other advanced features. The FATALFAIL one here says to simply exit if a connection could not be made.

The third thing is the args parameter (which could be NULL). This act as a way of letting the user or server pass arguments to the application without interfering with the standard mandated argv.

We can also use a slightly more verbose setup:

struct arg_arr* args;
struct arcan_shmif_cont conn = arcan_shmif_open_ext(
  SHMIF_ACQUIRE_FATALFAIL, &args, (struct shmif_open_ext){
.type = SEGID_MEDIA, .title = "demo"
}, sizeof struct shmif_open_ext

This version allows more information to be provided on connection startup, particularly static / fixed connection title, and dynamic content identity, ident, along with a 128-bit GUID that can be used to help the server side maintain settings persistence across connections.

Inside shmif_cont

This is a technical explanation that can be skipped. If you are not interested, jump forward to the section called ‘Signalling Data’.

The basis of shmif is as the name implies, shared memory. When you make a connection, you get a ‘primary segment’, each connection has one. Destroy it (arcan_shmif_drop) and all other resources are terminated.

The following sketch shows how such a segment is organised:

The context (shmif-cont) is the developer facing structure that contains a local copy of current negotiated properties, like dimensions and buffer sizes. It also has pointers that map to the currently active buffers. Some operations, like arcan_shmif_signal and arcan_shmif_resize may modify this structure and renegotiate the contents of the shared memory.

There is a control socket, a semaphore for audio, video and events that are used to synchronise with the server side. These are primarily there for optimisation purposes, descriptor passing and I/O multiplexing and could be omitted in favour of better memory- based OS primitives (futex + named GPU buffers) where available.

The shared memory is split into two regions, one fixed- size and one variable- sized. The fixed sized region contains negotiation metadata on renegotation, transfer direction (server to client or client to server) as well as a ring buffer for input events, and one for output events. With this setup, events can be routed from their source zero-copy, as well as allowing both sides to have visibility into queue saturation, allowing optimisations like reordering and merging.

The variable-sized region can change on renegotiation (arcan_shmif_resize), where the typically pattern is a change to ‘aproto’ and ‘audio’ after connection setup, then video changes dynamically in response to user actions. Thus for most cases, this can be cheaply extended with truncate+remap.

Aproto contains various more advanced substructures, and are only used for very specialised targets like VR an HDR metadata. Audio works either with multiple-small buffers that the server side initialises to the ideal size of the playback device for latency, or as a big slush buffer for non-latency sensitive streaming audio. Lastly, the video buffer is something we will get into later in the ‘Data Signalling’ section.

Getting Parameters

This is a side-track that can be omitted in our short example, but it is still relevant to bring up. If you recall the ‘writing a WM’ article, there was this weird connection state marked as ‘preroll’, where the WM gets a synchronous step of telling the client all the things it should need to know in order to produce a frame that is immediately useful. This cuts down on tons of connection negotiation verbiage, and greatly reduces the risk of WM policy immediately invalidating the first buffers being produced.

By adding:

struct arcan_shmif_initial* cfg;
arcan_shmif_initial(&conn, &cfg);

We get access to any and all such data that the window manager provided, or some build time default or fallback. This includes properties like:

  • accelerated GPU device (resource-handle)
  • preferred font(s) (resource-handle) and respective size
  • preferred output dimensions, density, colour channel orientation
  • audio samplerate
  • language, locale and colour scheme preferences

These are soft hints. The client has no obligation to use them, though it would be foolish not to. The related resources will be automatically freed on the next call into shmif.

All of these properties can be changed dynamically as part of the normal event loop, and it is possible for the client to forego the preroll- stage with a flag to the _open call, and thus gain a few ms connection setup time. For this example, since we are not drawing internationalised shaped text, composing UI elements or playing back audio, we can safely ignore all this.

Data Signalling

Time to send some data. In this example we will just draw a gradient that moves to easily see frame, present size and updates. The outer loop will look like this:

size_t ts = 0;
draw_frame(&conn, ts++);
arcan_shmif_signal(&conn, SHMIF_SIGVID);

This will block until the server side has acknowledged the transfer and released the client to allow it to produce another frame.

Then the drawing routine something like this:

void draw_frame(struct arcan_shmif_cont* C, size_t ts)
float ss = 255.0 / C->w;
float st = 255.0 / C->h;
for (size_t y = 0; y < C->h; y++){
for (size_t x = 0; x < C->w; x++){
uint8_t r = ts + ss * x;
uint8_t g = ts + st * y;
size_t pos = y * C->pitch + x;
C->vidp[pos] = SHMIF_RGBA(r, g, 0, 0xff);

The more eye catching thing about this whole affair should be the shmif_pixel vidp[pos] = SHMIF_RGBA part. In the same way computers have a native endian and so on, this display system has a native compile-time defined pixel format (and audio sample format for that matter). While it can be changed for specialised builds, the vast majority of the time this will be some permutation of 32-bit RGBA, and we statically commit to one and use a packing macro to help clients. The reason is simply to have a safe, guaranteed, default, and leave the more compact- or more complex- formats to advanced transfer mechanisms, like GPU opaque buffer handles.

Now there is easily a ten page essay hidden in unpacking the cases where the render-loop above would be tolerable, and where you would want something more refined, but that is best left for another time; the real subject matter there, synchronisation, is the single most complex system graphics topic around – one that even trumps the subjects of colour spaces and mixed capability output compositing.

Other available options over the same interfaces include:

  • Switching to a GPU accelerated context and pass opaque handles
  • Toggle alpha channel and linear-/ non-linear RGB colour space
  • Dirty-rectangles updates
  • Request upcoming deadline
  • Set a desired presentation time
  • Enable poll()-able event-, timer- or ‘no-synch’ triggered transfers
  • Switch to mail-box n-buffers transfers

This would also work with SHMIF_SIGAUD for audio transfers, and both operations could be going on in its own thread with another being responsible for managing the event loop, which is next on the list.

Event Loop

So far our communication has been quite one-sided, how about we replace the frame counter with something event driven, such as when the user presses a certain key, we advance the timer and submit a frame.

Lets get rid of the for (;;) entirely and move to an event loop.

static void event_loop(struct arcan_shmif_cont* C)
struct arcan_event ev;
size_t step = 0;

while(arcan_shmif_wait(C, &ev)){
bool dirty = false;

if (ev.category == EVENT_IO)
dirty |= handle_input(C, &ev);
else if (ev.category == EVENT_TARGET)
dirty |= handle_target(C, &ev);
if (!dirty)

draw_frame(conn, step++);
arcan_shmif_signal(conn, SIGVID);

This will loop until the parent decides to close the connection. As mentioned before, the event model for engine internals and shmif are actually shared, even though most of the categories are masked out for IPC purposes. The three relevant ones are IO (input devices), TARGET (server->client) and EXTERNAL (client -> server).

We will stub handle target for now:

static bool handle_target(
struct arcan_shmif_cont* C, struct arcan_event* ev)
return false;

While for input, we set the return to indicate that a new frame should be generated.

static bool handle_input(
  struct arcan_shmif_cont* C, struct arcan_event* ev)
return (
ev->io.kind == EVENT_IO_BUTTON &&
ev->io.devkind == EVENT_IDEVKIND_KEYBOARD &&
ev->io.datatype == EVENT_IDATATYPE_TRANSLATED &&
ev->io.input.translated.keysym == 108 &&

The IO event structure itself is a big lady that has evolved since practically the start of this project back in ~2003. It encompasses game devices, generic sensors, keyboards, mice, touch screens and eye trackers so far.

Here we are simply concerned if a key is pressed on any keyboard, which has a button like behaviour and translation tables, and if that key resolves to the ‘L’ key (108). That number is actually defined in a static table, inside arcan_tuisym.h and in the builtin/keyboard.lua script.

Just as with synchronisation, there is a lot more that could, and should, be elaborated on regarding problems with this kind of event processing, but small incremental steps is the recipe here.

Renegotiation (Resize)

Time to handle at least one TARGET event, and that is DISPLAYHINT. This event correlates to the WM calling the target_displayhint() function, providing updates about on how the segment will be displayed. This carries density, dimensions, visibility and so on.

So in the previously stubbed handle_target function:

static bool handle_target(
struct arcan_shmif_cont* C, struct arcan_event* ev)
size_t w = ev->tgt.ioevs[0].uiv;
size_t h = ev->tgt.ioevs[1].uiv;
if (w && h){
arcan_shmif_resize(C, w, h);
return true;
return false;

This structure has a more “syscall”- like approach with a small number of primitive data type slots described in the header close to where each event itself is defined. This deviates from both _IO and _EXTERNAL which has actual C union/struct style fields. The reality of that design is mostly legacy, and one that could be adjusted to union- like overloads, but it is low on the list of priorities.

That aside, there is never any direct harm done in not reacting to these events, the server side always assumes minimal capabilities as client default behaviour.

For the event handler here, we simply sanity-check that we get an updated width/height in the DISPLAYHINT, and forward it to the resize operation on the connection. While most operations are asynchronous across the API barrier, resize is one of the exceptions. Partly because it is the most costly and cross-cutting action there is, and partly because so much data / state can change meaning across a resize that it is a reasonable synchronous barrier to have.

Do note that the resize action here does not trigger a resized event in the perspective of the WM. As long as the client stays within allocation tolerances, the synchronous resize is just between the engine’s view of memory mappings and that of the client. The event is actually generated when the newly resized buffer is signalled, and that is true if running across a network as well.

In Closing

This is a good place to stop until next time. If we combine this with the ‘Console’ WM from the last article, we are already at a stage where most of the old school “I want a framebuffer” style applications can be implemented. When we return to this particular API in the series later, we can start to explore the large space of things to do from here, like:

  • Spawning new windows
  • Defining a custom icon, mouse cursor, titlebar
  • Switching to 3D rendered or accelerated handle passing mode
  • Handling state block transfers
  • Registering custom key bindings, key/value config persistence
  • Announcing format encoding/decoding support for universal file picking / browser integration
  • Playing back audio

But none of these require anything fundamentally different from what has been covered, merely extending the local event loop handler and emitting a few events.

Posted in Uncategorized | Leave a comment