This time around, the changes are big enough across the board that the sub-projects will get individual posts instead of being clumped together, and that will become a recurring theme as the progress cadence becomes less and less interlocked.
We also have a sister blog at www.divergent-desktop.org that will slowly cover higher level design philosophy, rants and reasoning behind some of what is being done here. A few observant ones have pieced together the puzzle — but most have not.
This release is a thematic shift from low level graphics plumbing to the network transparency related code. We will still make and accept patches, changes and features to the lower video layers, of course — ‘Moby Blit’ is still out there — but focus will be elsewhere. Hopefully this will be one of the last time these massive releases make sense, and we can tick on a (bi-)monthly basis for a while.
As such, it is dedicated to the memory of a fellow explorer and dear friend, Morten, whose insights have been instrumental to the work here and elsewhere. A taste of the depths of such contributions are well illustrated by the ‘Farewell’ post over at Mamedev. Thanks for everything buddy :'(.
A big thanks to Ryan Gordon (and those who chipped in) for the Icculus Microgrant.
Release Time! The following video goes through the demonstrable highlights:
Here are the shortcuts to some of the key advancements:
- Network Transparency
- On-Demand Client Debugging
- Built-ins and Chained Hook Scripts
- KMSCon/FBCon like Window Manager
- XWayland Client Isolation and other Wayland Bits
- Eye Tracking, Stream Decks
- Arcan-trayicon and ‘KMSCon’ Console Window Manager
- Server-side Text Rendering and TUI-Widgets
- Synchronisation Conductor
- Platform Improvements: Headless, Watchdog, GPU Swapping
If all of this isn’t enough, there is much more in the Changelog.
This release brings us the first working version of ‘arcan-net’. This is an arcan-shmif server that translates clients running locally to another instance across the network. It has been in use for a while, but don’t trust it (yet) in tougher security compartments than you would VNC/RDB.
Not only that, but it is a lot more powerful than the corresponding form in X. It uses a protocol we are lovingly calling ‘A12’ — It is not the X12 some people wanted, but at least it is a twelve. It was also the last missing piece before being able to safely claim to be far beyond feature parity with Xorg, so expect the comparison article series to get its next instalment soon enough. With that it is also time to start chasing compatibility (client support) and performance over features.
There is so much ground to cover here that there is a separate article on the subject with more detailed coverage: A12 – Advancing Network Transparency on the Desktop
Roughly speaking, the features currently cover:
- Single Client Forwarding (‘X11’-like), Desktop Forwarding (‘RFB/RDP/SPICE’-like)
- Remote Input and Meta-data (‘Synergy’- style)
- Audio, Video and Binary Transfers
- Basic blob- caching
- Lossless and Lossy video compression, defaults based on window types
- Interactive adjustment of compression parameters
- X25519 + Chacha8 + Blake3 for authenticated encryption
- Live-client migration
- Multi-threaded processing
- Crude Privilege separation
This meshes really well with other arcan-shmif features, such as the one on external input drivers mentioned below. The video below shows a single nugget from how ‘transparent’ things can be, live migration, drag and drop:
This is yet another reason as to why ‘crash resilience‘ matters – if you solve the one, you get a lot more interesting things in return.
On-Demand Client Debugging
When the WM initiates window creation, it can now push windows of a debug type. Should the client cooperate, the first time that happens the client will still get a new window to provide its debug data through.
Should the client fail to comply, or the WM explicitly asks it, a debug interface built into the arcan-shmif library will now silently activate, allowing the horror movie scenario of ‘the call is coming from inside the house’ to unfold.
For better understanding the breadth and depth of the feature, please refer to the following article: Leveraging the Display Server to Improve Debugging
KMSCon/FBCon like Console
The default distribution now includes the ‘KMScon- like’ ‘console’ window manager, though it also accepts graphical clients, so if all you need is a “kiosk like” one-application runner per virtual ‘terminal’ (or the template for one), this fits that bill.
The initial form of this was described in the related article on: Writing a console replacement using Arcan.
it has since been extended some, getting support for an OSD keyboard and basic wayland client support as well via the script below. The video below shows it and the keyboard working on a Pinephone:
If you have an unrequited love for minimalism confused as simplicity (minimalism doesn’t have room for love, being simply so minimal), this is the tradeoff path that will get you there, eventually.
Builtins, Hook Script Chaining
There are multiple facilities for encouraging opt-in code reuse between applications and window managers, both for developers and advanced end users. These are provided in the ‘system-scripts’ namespace and consume the folder names ‘builtin’ and ‘hooks’. The idea is to let the heavier subprojects like durden act as incubators, and the implementations gets cleaned up and re-usable as such shared builtin scripts.
This time around, the builtins have been extended with:
- builtin/decorator.lua – for adding borders and mouse controls for border manipulation to a surface
- builtin/wayland.lua – for adding some of the special considerations that wayland clients require
- builtin/osdkbd.lua – for building custom on-screen display keyboards
This means that the WM side of thing for adding Wayland and XWayland support to a project boils down to about 2/3rds of this script that is used for testing. API is stable and supported, has been for a while, will remain as such.
The engine facility for ‘hook scripts’ that are run transparently to the application itself have been improved to allow chaining multiple ones together – greatly improving their utility (actually make them useful).
Previously, these were limited to a single one, which was less than ideal. Now that multiple ones can be chained together, many features that can be implemented in a window manager agnostic way can be shared between all of them.
This is the safer, more secure and much more powerful way to get all the ‘xdotool/xrandr/…’ kind of hacks going without needlessly exposing them over some other IPC.
Some are primarily used for testing, like this auto-shutdown one:
arcan -H hooks/shutdown.lua durden shutdown=200
Regardless of the other scripts running, this will issue a shutdown sequence after 200 logical ticks on a 25Hz timer. Since these scripts can simulate and inject other tricky event paths, like display and input hotplug and playback, they are ideal for a window manager testing corpus as the rest of the system can run headless.
Others are more interesting, such as ‘hooks/external_input.lua’ which opens up dedicated single-client connection points that are allowed to inject input, and only that. This is also used to extend the engine with input drivers that would not be allowed into the main engine due to their dependencies or licenses.
This can even be combined with the network proxy mentioned above to allow, for instance, sharing of input devices from other machines (which would enable Synergy- like virtual KMS workflows) and proprietary drivers running inside of containers.
-- on each desktop participating arcan -H hooks/external_input.lua durden ARCAN_CONNPATH=extio_1 arcan-net -t -l 6680 -- remote machine arcan console ARCAN_CONNPATH=console ARCAN_ARG=host=10.0.0.1 afsrv_remoting
Eye Tracking and Stream Decks
One external driver example is the Tobii 4C eye tracker driver. Even though their hardware is cool, it is still a company that are about as 90ies terrible and backwards with their software ecosystem as can be; shrinkwrap licenses restricting allowed use cases of the sensor data (*facepalm*), closed source drivers, firmware clamped hardware capability and little in terms of serious competition. This is just depressing giving the potential eye tracking actually has outside of vile adtech. There used to be a driver available on their website; it was pulled in a move that can only be assumed to be about further fscking their customers over. A few searches should land you with repositories that mirrored the .debs.
A rough Arcan input driver as part of the repository here, and some docker container to reduce the taint here. Inject the blobs from the driver and experiment with eye tracking inside of Arcan. Other exotic input and LED output clients will also be exposed through that repository rather than the main one.
As covered in the article on ‘Interfacing with a stream deck device, there is also a support tool for ElGato “streamdeck” like devices added that lets you map window contents, ui controls, custom macros etc. unto the device, though the real meat of this features comes with the next release of Durden.
The following clip is from that feature, recorded in an AirBnB somewhere in Shinjuku:
You can see how server-side decorations are being leveraged to map the titlebar contents and controls to the display, mixed with window manager controls like workspace switching with live previews, as well as client-announced custom inputs.
XWayland Client Isolation
We give up. The Wayland protocol service has received support for XWayland clients.
For the last five years or so, the plan was to only support native Wayland clients. After all this time, it turns out that the X11 backends for various toolkits are still infinitely more reliable and robust through X than through native Wayland.
As with anything else here, we treat it, and the obligatory meta-window manager every Wayland Compositor has to have in order to work around the hackish way XWayland uses Wayland, as a separate process for the privsep space to be as tight as possible.
The best way of using this is for the individual X clients that you might need to run; browsers and games being the main suspects.
arcan-wayland -exec-x11 chromium
This will spawn an instance of the wayland protocol server side, which spawns the virtual window manager, which inherits a clipboard window and spawns an Xwayland instance along with chromium.
This means that each client gets their own infrastructure, working directory and so on. It consumes more memory and sometimes has a noticeable startup time (most of that being MESA), but enables much stronger separation between clients.
If that is undesired, it can, of course, be run as a normal service:
This would give you a global Wayland display that also provides XWayland on demand, and clients that expect to spawn multiple processes that connect at different times, to work.
This does not mean that Xarcan is defunct or will become be abandoned as the goals are quite different. In fact, Xarcan has recently been synched to changes in upstream Xorg. The Wayland path is currently to get the transparent rootless one, while Xarcan behaves more like Xephyr; ‘Virtual Machine’ like controls where you provide your own window manager and so on. Both modes have justified use cases.
That said, it is likely that rootless window support will be added to Xarcan as well to free legacy client support from the hands of Wayland entirely, as I am, quite frankly, so thoroughly disappointed in- and tired of- coding anything Wayland to the point that I’d much rather mess with Xorg internals than read more chapters from this scattered in the wind, half-java RMI love letter, half hand MS-COM+ anti-patterns handbook. Had the hours spent debugging this been billable, I’d buy a Tesla then return it for the money.
Speaking more of this second coming of TERMCAP – this round of “protocol” pokémon added: xdg_decoration, kwin_decoration, xdg_output, zwp_dma_buf, and probably some others that I forgot. This seems to now be enough to run the winner of ‘what the actual f are you doing?’ clients, Firefox – until a nightly update breaks it again, that is.
Although it is not a very visible feature, it is still very important in its own right. In preparation for proper VFR/VRR (variable frame-rate/refresh rate), better biased client processing (two sides of the same coin when you factor in network transparency), a lot of the old synchronisation code has been consolidated and reworked into a ‘conductor’. The conductor exposes a uniform set of synchronisation strategies for the policy layer (WM) to switch between.
The strategies themselves will need a lot of tuning before they provide the best contextual balance and trade-off for gaming (minimise latency), processing (maximise throughput), visual fidelity (filters, colour processing, animation smoothness) and minimising energy consumption (mobility, cloud hosting). There is no ‘one size solution fits all’ here, and these are problems that in their own rights would consume a master level thesis, or three, on the subject.
It might help to think of it as analogue to the role of the heart in the human body. A single beat has far reaching consequences in a complex system. The ideal rate depends on what the body is supposed to do; running a marathon is different from a sprint and different from bedrest – yet all need to be supported. If you are academically inclined and actually want to dip into the fundamentals, finish a course or two on queue theory then contact me.
To this effect, there are API additions on both the high and the low end. For the WM side, we have added the ability to set a synchronisation focus target that gets preferential treatment so that when mapped to a variable- output and deadlines are closing in, we can try and bias the way it receives input and how the frames gets synchronised compared to other clients that are fighting for the same resources using differently staged releases for ‘the current focus’ and ‘the herd’ to avoid a twisted take on the sleeping barber problem.
On the low end, we can now continuously forward information about the next upcoming deadline on a per client basis, as well as provide different deadlines based on active strategy, segment type, window management state and client state (networked or local for instance).
With this feature we have also added much needed tracing features for most of our layers so that it is possible to on-demand collect a minimal overhead description of the sequence of events and their timings. There are so may combinations of asynchronous, threaded and non-blocking operations here across so many layers that measuring and troubleshooting anything performance related pretty much demands it. If in doubt when it comes to tracing and viewers, disregard any suggestions about webcrap like Google im-‘Perfetto’, and give https://github.com/wolfpld/tracy the love it deserves, it is a beautiful piece of work.
New Tool: Arcan-trayicon
The list of tools has also been expanded with a convenience wrapper / launcher utility, ‘arcan-trayicon’ which lets you register two SVG icons (inactive and active) into some UI element in a supporting window manager. When clicked, the wrapped application is launched and attached as a popup to this icon. This combined with some bits from the next section should be all that is needed to wrap system services and network interface tools with a UI yet not bring any of that D-Bus stench with it.
This tool also got a write up of its own: Another low level arcan-client: A tray icon handler
TUI, Server-side Rendering, Widgets and non-VTx shell
The rich-text oriented client developer facing API, arcan-tui, has undergone serious backend refactoring. It can now render into a specialised text-encoding scheme (TPACK), and a server-side renderer for the same format has been added to Arcan.
This is the first step to drastically cut down on rendering costs and memory consumption as the glyph cache can now be generated and shared between otherwise independent windows of the same type, as well as using accelerated rendering without unnecessarily forcing the TUI clients to have GPU access. For tracking the feature, check the text rendering section in the wiki on TUI.
It also provides the server with a textual representation to work with for other features, such as lossless magnification and screen readers (see Text-To-Speech below).
It has also received more abstract UI widgets so that certain features gets a shared base. Some of these are visible in the debugging video that was shown earlier. This round we added three widgets (out of a planned total of four):
Bufferwnd – take a buffer and provide a read/write hex-/ascii-/unicode- viewer or editor interface to it.
Listwnd – take a list of items, tag separators, submenus, inactive and hidden items, and prevent it as a selectable list. This is a suitable base for more advanced components (tree view etc).
Readline – act as libreadline/linenoise to query the user for input. It still lacks some of the important bells and whistles to fully cover that use-case, and hopefully those holes will be plugged shortly.
The last planned component (on top of more features and quality work to the above) is Linewnd that will provide scrollback buffer like behaviour. This will eventually remove the last remnants of the libtsm ‘screen’ that we used to build things in the past, but are now slowing us down.
The pre-existing ‘screenshot’ copy widget hidden inside every arcan-tui window also got an added ‘annotation mode’ where it is possible to write and highlight parts of the copied surface.
Speaking of arcan-tui, there is also a prototype neovim UI built using it – it is roughly on par with current curses/terminal emulator based neovim, but will soon leverage many more of the features in arcan-tui, and will be used as a requirements ‘specification’ to make sure that we have not missed anything. It can be found in the repository here.
On a related note, the included terminal emulator, afsrv_terminal, has received a special mode (ARCAN_ARG=cli afsrv_terminal) that serves as a basis for terminal emulator liberated command-line shells. The video below shows some of the current state of that shell, running inside a window manager that is well, still part of my big pile of secrets.
Worthwhile to note is that the ‘new’ windows are negotiated by the shell, and based on what kind of a command that is to be executed – sets up the environment accordingly. Wayland routes through arcan-wayland and so on.
Text to Speech and other Decode improvements
To recap, the ‘decode’ frameserver is the special dedicated client that is intended to absorb all dangerous and likely insecure parsing from the main process to a ‘kill on sight, single role’ one that gets aggressively sandboxed.
It is used both for basic decoding and for asynch- live content previews in parts like the durden filesystem browser. It is also used in a ‘plan9 plumber’-like fashion for clients to hand over media presentation (both embedded and stand-alone).
When finished, no parser working on untrusted data should remain in the arcan process, no matter if Arcan is used as the outer display server or as a ‘browser/electron-lite’ like app-engine nested inside itself or as an X/Wayland client.
In this round, we have added support for text-to-speech and chiselled out basic 3D model transfers. The text to speech, combined with the existing OCR feature of the ‘encode’ frameserver, provides WMs like Durden the last needed missing primitives pieces for greatly improved accessibility.
These improvements also have their role in VR, with positional audio and HRTFs (still disabled in upstream builds) – notification sounds, longer text-to-speech passages and so on can, if carefully configured, be much more natural and comfortable than visual ‘widget’ like alerts when you can position the source ‘in the room’.
The way headless mode (no attached displays, ever) operates has been reworked as a separate binary with a slightly different setup. It will now output to an instance of the ‘afsrv_encode’ process, while also using it as input layer.
This allows us to run the whole thing on a server in a container and either record or stream its output, but also to serve a remote desktop or single app host:
ARCAN_VIDEO_ENCODE=protocol=vnc:pass=letmeoutiambeggingforhelp arcan_headless durden
Of course, you can also switch that ugly ‘vnc’ for a12.
Combined with the dynamic part of the network transparency bits, you can have a headless server where your clients normally ‘live’, then connect to it and temporarily redirect individual clients to a machine of your liking.
As part of our ongoing effort in improving systemic resilience (the ability to recover from unforeseen events), the egl-dri (native graphics on Linux/BSDs) privileged separation part that does driver negotiation has received support for two levels of watchdog triggers.
The first will reset the scripting layer, protecting against bad scripts getting stuck in an infinite loop, while the second will trigger crash recovery for the engine process itself – trying to at least recover clients.
To complement this, and continue down the path towards proper non-cooperative, asymmetric multi-GPU support (runtime swapping graphics stack and driver versions and persisting across multiple graphics stacks at the same time, say GBM+KMS/EGLStreams), it is now possible to set a configuration where one GPU acts as a stand-in for another, and have the WM cooperatively trigger a swap.
This requires both client and server to be able to completely tear down and rebuild accelerated graphics use. This also provides an easy path for testing that we do not leak state between rebuilds (define a GPU as a backup for itself and set a timer that randomly resets), which makes the last missing step — transferring individual clients and render jobs between being native to one or many different GPUs at once — approachable.
I’ve been following (and coding) human interfaces since the mid-90s including a very basic Finder replacement way back in the day. Without getting into a huge diatribe I’ll instead meekly ask if you have considered if windows were a mistake. Certainly few people use separate windows for web pages anymore, instead preferring to use tabs. MS Metro doesn’t use them. IOS and Android don’t either while even Mac OS apps are increasingly used as full screen only.
Windows clutter the screen obscuring whatever is behind them necessitating constant window management by the user. They ARE programmatically easier to implement especially when you have multiple processes demanding a frame to draw in. Having every process choose how big a window is means the WM doesn’t have to negotiate this between processes. This is why I fell in love with the idea of OpenDoc back in the day as it negotiated how components could draw in fields within a single document (OD documentation is still available if you look hard enough).
Anyway, for the record I think windows were a mistake. Historically they came into being via a dirty hack which eventually became the “multifinder” without any concern how different apps would interact.
I have every oh so often – the unsatisfying conclusion is the Clintonesque ‘that depends on your definition of what constitutes windowy’. Complex window hierarchies as a generic thing is rarely worth it, 90ies era MDI going the way of the dodo and all that. Even in Wayland xdg-wm-base protocol they settle on one active “toplevel” at a time, but at the same time sort-of reinvents all of the same problems through subsurfaces and accentuating the problem through “layers” and so on.
In the end the input context matters a whole lot – is it a “write once, patch sometimes, read often” form that is our document? is it an interactive rapid back-and-forth with something exploratory? analog, digital or hybrids? The crude precision, visual occlusion and poor haptics of “touch-displays” certainly contribute to the drift towards “one large task at a time”. The larger problem, and that holds for the single-windows application as well, is what I go into at divergent-desktop.org as ‘premature composition’; the client deliberately takes on the role of compositing multiple data sources into the single window that gets forwarded. It is often a one-way transforms that throw away a lot of useful metadata which ultimately limits the kinds of processing graphs we can actually build. That boils down to a control vs freedom thing. The next article, ‘Introducing Pipeworld’ will take a rather different approach to the concept.
That said, ‘Windows’ are a bit down on the list of “problematic areas”. I would, for instance, rank accidental singleton-IPC such as the clipboard, or worse, drag’n’drop higher on my list of grievances.