This article is is the main course to the appetiser that was The X Network Transparency Myth (2018). In it, we will go through how the pieces in the Arcan ecosystem tie together to advance the idea of network transparency for the desktop and how it sets the stage for a fully networked desktop.
Some of the points worth recalling from the X article are:
- ‘transparency’ is evaluated from the perspective of the user; it is not even desirable for the underlying layers to be written so that they operate the same for local rendering as they would across a network. The local-optimal case is necessarily different from the remote one, the mechanisms are not the same and the differences will keep on growing organically with the advancement of hardware and display/rendering techniques.
- side-band protocols splitting up the desktop into multiple IPC systems for audio, meta, fonts, … increases the difficulty to succeed with anything close to a transparent experience, as the network layer needs to take all of these into consideration as well as trying to synchronise them.
To add a little to the first argument: it should also not be transparent to the window manager as some actions have drastically different impact on the user interface side to security and expectations. For example, Clipboard/DND locally is not (supposed to be) a complicated thing. When applied across a network, however, such things can degrade the experience for anything else. Other examples is that you want to block some sensitive inputs from being accidentally forwarded to a networked window and so on, it has happened in the past that the wrong sudo password has, indeed, been sent to the wrong ssh session.
This target has been worked on for a long time, as suggested by this part from the old demo from 2012/2013. Already back then the drag/slice to compose-transform-and-share case exposed out of compositor sharing and streaming; something that only now is appearing elsewhere in a comparably limited form.
We are on the third or fourth re-implementation of the idea, and the first one that is considered having a good enough of a design to commit to using and building upon. There are many fascinating nuances to this problem that only appear when you ‘try to go to 11’.
As per usual, parts of this post will be quite verbose and technical. Here are some shortcuts to jump around so that you don’t lose interest from details that seem irrelevant to you.
- Basic primitives: Arcan-net, A12 and SHMIF
- Example Usecases
- Protocol State and Development
- Demo Explained
Starting with some short clips of the development progress – and then work through the tools and design needed to make this happen. It might be short, but there is a whole world of nuance and detail to it.
(~early 2019) – forced compression, OSX viewer, (bad) audio:
Composited Xarcan (desktop to pinephone), compression based on window type:
Here is a native arcan client with crypto, local GPU “hot-unplug” to software rendering handover and compression negotiation (h264):
Here is ‘server-side’ text rendering of text-only windows, font, style and size controlled by presenting device — client migrates back when window is closed:
In the videos, you can see (if you squint) instances of live migration between display servers over a network, with a few twists. For example, the decorations, input mapping, font preferences and other visuals change to match the machine that the client is currently presenting on and that audio also comes along, because Arcan does multimedia, not only video.
What is less visible is that the change in border colour, a security feature in Durden, is used to signify that the window comes from a networked source, a property that can also be used to filter sensitive actions. The neo-vim window in the video even goes so far as to have its text surfaces rendered server side, as its UI driver is written using our terminal-protocol liberated TUI API. This is also why the font changes; it is the device you present on that defines visuals and input response, not the device you run the program on.
Also note how the clients “jumps” back when the window is closed on the remote side; this is one of the many payoffs from having a systemic mindset when it comes to ‘crash resilience‘ – the IPC system itself is designed in such a way that necessary state can be reconstructed and dynamic state is tracked and renegotiated when needed. The effect is that a client is forcefully detached from the current display server with the instruction of switching to another. The keystore (while a work in progress) allows you to define the conditions for when and how it jumps to which machines and picks keys accordingly.
That dynamic state is tracked and can be renegotiated as a ‘reset’ matters on the client level as well, the basic set of guaranteed features when a client opens a local connection roughly generalises between all imaginable window management styles. Those that are dynamically (re-) negotiated cannot be relied upon. So when a client is migrated to a user that has say, accessibility needs, or is in a VR environment, the appropriate extras gets added when the client connects there, and then removed when it moves somewhere else. This is an essential primitive for network transparency as a collaboration feature.
Basic Primitives: Arcan-net, SHMIF and A12
There are three building blocks in play here, a tool called arcan-net which combines the two others: A12 and SHMIF.
A12 is a ‘work in progress’ protocol – it’s not the X12 that some people called for, but it’s “a” twelve. It strives to be remote optimal – compression tactics based on connectivity, content type and context of use, deferred (presentation side) rendering with data-native representation when possible (pixel buffers as a last resort, not the default); support caching of common states such as fonts; handle cancellation of normally ‘atomic’ operations such as clipboard cut and paste and so on.
SHMIF is the IPC system and API used to work with most other parts of Arcan. It is designed to be locally optimal: shared memory and system ABI in lock free ring-buffers preferred over socket/pipe pack/unpack transfers; minimal sustained set of system calls needed (for least-privilege sandboxing); resource allocations on a strict regimen (DoS prevention and exploit mitigation); fixed based set of necessary capabilities and user-controlled opt-in for higher level ones.
SHMIF has a large number of features that were specifically picked for correcting the wrongs done to X- like network transparency by the gradual introduction of side-bands and good old fashioned negligence. Part of this is that all necessary and sufficient data exchange used to compose a desktop goes over the same IPC system — one that is free of unnecessary Linuxisms to boot. While it would hurt a bit and take some effort, there are few stops for packing our bags and going someplace else, heck it used to run on Windows and still works on OSX. Rumour has it there are iOS and Android versions hidden away somewhere.
Contrast this with other setups where you need a large weave of IPC systems to get the same job done; Wayland for video and some input and some metadata; PulseAudio for audio; PipeWire for some video and some audio; D-Bus for some metadata and controls; D-Conf for some other metadata; Spice/RFB(VNC)/RDP for composited desktop sharing; Waypipe for partial Wayland sharing, X11 for partial X / XWayland sharing: SSH+VT***+Terminal emulator for CLI/TUI and less unsafe Waypipe / X11 transport; Synergy for mouse and keyboard and clipboard and so on. Each of these with their own take (or lack thereof) on authentication and synchronization, implementing many of the most difficult tasks again and again in incompatible ways yet still end up with features missing and exponentially more lines of code when compared to the solution here.
Back to Arcan-net. It exposes an a12 server and an a12 client, as well as acting as a shmif server, a shmif client and taking care of managing authentication keys. In that sense it behaves like any old network proxy. While not going too far into the practical details, showing off some of the setup might help.
On the active display server side:
email@example.com# arcan-net -l 31337
This will listen for incoming connections on the marked port, and map them to the currently active local connection point. To dive further into the connection point concept, either read the comparison between Arcan vs Xorg or simply think ‘Desktop UI address’; The WM exports named connection points and assigns different policies based on that.
On the client side we can have the complex-persistent option that forwards new clients as they come:
arcan-net -s netdemo 18.104.22.168 31337
ARCAN_CONNPATH=netdemo one_arcan_client &
ARCAN_CONNPATH=netdemo another_arcan_client &
Or the one-time simpler version which forks/exec arcan-net and inherits the connection primitive needed to setup a SHMIF connection:
Or, and this is important for understanding the demo, an api function through the WM:
This triggers the SHMIF implementation tied to the window of a client to disconnect from the current display server connection, connect to a remote one through arcan-net, then tell the application part of the client to rebuild essential state as the previous connection has, in a sense, ‘crashed’. The same mechanism is then used to define a fallback (‘should the connection be lost, go here instead’). This is the self-healing aspect of proper resilience.
There are WM APIs for all the possible network sharing scenarios so it can be handled as user interfaces without any command line work.
I mentioned ‘authentication’ before, where is that happening? So this is another part of the implementation that is still settling. We are reasonably comfortable with the cryptography engineering, which is currently undergoing external/independent review.
For those versed in that peculiar part of the world, the much condensed form is currently AE/EtM, ChaCha8 for stream cipher, BLAKE3 for KDF and HMAC, x25519 for asymmetric key exchange (optionally with initial MinimaLT like ephemeral exchange) and PAKE-like (possibly changing to IZKP) n-unknown-initial PK authentication (rather than PKI), with periodic rekeying.
For great security (and justice): As demoed, moving applications to present on other devices is a neat thing for getting more use out of computers that are otherwise cast aside when you have a big beefy workstation; put those SBCs to use, security compartment per device and happily ignore most of the endless stream of speculative attacks instead of allowing mitigations to bring processing power back a decade.
Take that clusterboard of SOPINE modules, boot them into a ramdisk with a single-use browser and let that be your ‘tab’, pull the reset pin when done and that very expensive chrome- or less expensive firefox- exploit chain will neither be able to grab many interesting tokens nor gain meaningful persistence or lateral movement. Hook it up to the same collection/triaging system used for the fuzzers you have running and pluck those sweet exploit chains out of the air ready for
re-use responsible disclosure.
Take those privacy eroding apps and adtech drivel that managed to weasel their way into a position of societal necessity and run them ‘per device’ but far away from your person; feed their sensory inputs with sweet, GPT-3-esque plausibly modelled nonsense, tainting whatever machine-learning trend that feeds on this particular perversion until they are forced to pay cost in order to discern truth from tales.
For great mobility: Ideally, the use and experience should be seamless enough that you would not need to care where the client itself is currently running – that is to continue down the path of Sun Ray thin clients towards thin applications as a counterpoint to the mental- and technical dead end that is the modern web.
Applications that are capable of serialising their seed- and delta- states and responds to the corresponding SHMIF events for RESTORE/STORE and RESET can first remote render as the safe default, while in the background synchronise state with the viewing device (should the same or compatible software exist there) and switch over to device-local execution when done.
Using the ‘headless’ arcan binary like this:
ARCAN_VIDEO_ENCODE=protocol=a12:port=666 arcan_headless my_wm
Then have an Arcan instance provide ‘your real composited desktop’ running ‘in the cloud’. Let the applications and their state live there while travelling across unsafe spaces, and redirect the relevant ones to your device when it is safe and necessary to do so.
For great collaboration:
Think of an ‘window’ as a view into an application and its state as a living document that can both be shared and transferred between machines and between people. What one would call ‘clipboard cut and paste’ or ‘drag and drop’ is not far from ‘file sharing’ if you adjust the strength of your reading glasses a little bit. There are pitfalls, but that is our job to smooth over.
With decorations, fonts, semantic colour palette, input bindings and so on being defined by the device a client is presenting on rather than the settings of the local client through some god-forsaken config service or font specification, differences in aesthetics, accessibility and workflow can be accounted for rather than accentuated.
Take that ‘window’ and throw it to your colleagues, they can pull out what state they need or add to it and return it back to you when finished. Splice your webcam and direct it to your friend and there is your video conferencing without any greedy middle-men.
For great composability:
The arcan-net setup, as shown before, covers remote presentation/interaction with SHMIF clients. This includes exotic external input drivers and sensors, enabling advanced takes on “virtual KMS” workflows that applications like Synergy provides.
Take one of those spare tablets gathering dust and use it as a remote viewer/display – or forward otherwise useless binary blobbed special input devices (Tobii Eyetrackers come into mind) from the confines of a VM.
Take those statusbar- applications, run them on your compute cluster or home automation setup and forward to your viewing device for SCADA- like monitoring and control.
Personally, my fuzzing clusters ‘phone home’ by forwarding wrapper and control clients whenever a node finds something to my desktop. By leveraging the display server to improve debugging over the same communication channels I get my crash inspections over without juggling symbols, network file systems and what not around. Redirect the interesting ones to the specialist on call for such purposes.
The range of applications that various combinations of these tools enable is daunting. Your entire desktop, with live application state, can literally be made out of distributed components that migrate between your devices – drag and drop:able.
The reference implementation is in heavy/daily use, but there are a number of things to sort out before we would dream to push- or put- it to use for sensitive tasks. If you want to run your terror-cell or other shady business with the tools, go ahead. Anything or anyone more honest and decent – stay away until told that it is reasonably safe.
All the infrastructure around this will be actively developed for a long time, and there is a fair list of coming interesting and advanced things as the focus for this work. Track the README.md for changes.
Some of the highlights from my perspective:
- Safety features (side channel analysis resistance, transitive trust) – machine learning is listening in and interactive web is no different; the field is ripe with tools that can reconstruct much of plaintext from rather minor measurements without having to attack the cryptography engineering or primitives themselves.
- Spliced interactive/non-interactive subviews – for group collaboration.
- Compressed video passthrough negotiation – to ensure that no pack/unpack stage for already compressed sources like video persist.
- ALT/AGP packing (Arcan rendering command buffers) – to stream both mid-level graphics and its virtual-GPU like backend as draw primitives when no better representation is available.
- Latency/Performance work – Better domain specific carrier protocols (UDT and the likes), progressive compression for certain client types, buffer backpressure mitigations strategies. Network state deadline estimation for better client animations.
But more on those at a later time.
Having, hopefully, wet your appetite – lets close this one by explaining what went on in the demo.
As you might have seen in the example command lines at the top, or in the previous articles, “connection points” is a key primitive here. They allow the window manager (WM) and the user to define ‘UI addresses’ so clients know where to go, and the WM knows which policy to use to regulate the IPC mechanisms provided.
Any ‘segment’ (group of audio/video/event-io roughly corresponding to a window) has a primary connection point, and a fallback one. Should the primary connection point fail, the client will try to reconnect to a mutable fallback, and then rebuild itself there. This allows clients to move between display server instances, something that covers elaborate sharing, more safe/robust multi-GPU support etc.
This fallback is provided through the WM via a call to target_devicehint. It comes in two main flavours, “soft” (use only on a severed connection), and “hard” (leave now or die).
In the videos, the Durden WM is used. To make a long story short, it has an overwhelming amount of features (600+ unique paths last I counted), structured as a virtual filesystem (even mountable). In this filesystem, the specific path “/target/share/migrate=connpoint” tells the currently selected window to “hard” migrate to a connection point. In this case it is the special flavour of a12://my_other_machine that indirectly fires up arcan-net to do most of the dirty work.
With the stacking workspace layout mode, there is a feature called “cursor regions” – parts of the workspace that trigger various file system paths depending on if the mouse is moving, dragging or clicking.
The way things are setup here then, is that:
- When a window is dragged into the left edge area of the screen, draw a portal graphic.
- Default the contents of the portal to be noise, spawn an instance of the ‘remoting’ frameserver connecting to the host IP of the left laptop (why you can see parts of the remote wallpaper in one of the videos).
- If connected, show the contents of this remoting connection.
- If the window is dropped over the portal, send the /target/share/migrate command, pointing to the remote server in question.
- If the drag action leaves the region, kill the remoting connection and hide the portal.
Onwards and upwards towards more interesting things. /B