Time for another fairly beefy Arcan release. For those thinking in semantic versioning still surprised at the change-set versus version number here (‘just a minor update?’) do recall that as per the roadmap, disregarding the optimistic timeline, we work with release-rarely-build-yourself thematic versions until that fated 1.0 where we switch to semantic and release-often.
On that scale, 0.5.x was the display server role, 0.6.x is focused on the networking layer as the main feature and development driver. 0.7.x will be improving audio and some missing compatibility/3D/VR bits. Then it gets substantially easier/faster – 0.8.x will be optimization and performance. 0.9.x will be security — hardening attack surface, verification of protections, continuous fuzzing infrastructure and so on.
After the FreeNode IRC kerfuffle, do note that the oldtimers IRC channel has moved to Libera and whatever remains of the old freenode channel is to be assumed malicious. For some of the younger folks we have added a Discord server pending more manageable and sane alternatives – pick your poison(s).
The detailed changelog can be found here, and the release highlights are as follows:
- Client support (X11, Wayland, …)
- Text-UI API
- Native Graphics
- Lua API
- Build and Ports
For the main engine there has been quite some refactoring to reduce input latency; better accommodate variable-refresh rate display; prepare for asymmetric uncooperative multi-GPU and GPU handover; explicit synchronisation and runtime transitions back and forth between low (16-bit) to standard (32-bit) to high-definition rendering (10-bit + fp16/fp32).
The biggest task has been to add support for “direct-to-drain” in the event queue management. This allows for more event producers (e.g. input drivers) to get a short path for minimising input latency. It also allows our scheduler (“conductor”) to reorder event delivery to better accommodate a WM specified focus target.
The major challenge has been that some operations are unsafe to perform when a GPU is busy drawing and scanning out to displays. Misuse can cause subtle visual glitches or stutter in inputs that are fiendishly hard to catch and diagnose. This was a big ticket item as part of the synchronization topic.
Since the Window Manager needs to be able to be first in line to filter- translate- remap- resample- mask- and otherwise respond- to input events — quite some care had to be taken to extend the API to make sure that it remains ergonomic enough to respond to input events in both a safe mode and a low latency one.
With this also comes the ability to allow selected clients to get a higher direct-to-drain queue dispatch so that we can have more efficient hot pluggable external input drivers, something that would, for example, benefit our VR service.
As mentioned before, the main focus of this development branch is still the networking layer, and our related SSH/VNC/RDP/X11 replacement A12 network protocol, which is a priority for some of our sponsors.
While much of this work is not directly visible, some of the bigger changes have been to drop any and all traces of the DEFLATE support and go all in on ZSTD for binary transfers, TPACK and lossless image modes.
Connection-modes: The ‘last’ missing connection mode has mostly been sorted, though there are some nuances still to work out, particularly for serving complex clients like Xarcan.
To elaborate: since the tools (not the protocol or library) were built to mimic the use of X11 style forwarding, the ‘normal’ default push model worked like this:
"sink-inbound": arcan-net -l 6680 "source-outbound": ARCAN_CONNPATH=a12://some.ip:6680 my_client
This does not work for all cases and some find it rather non-intuitive to use. In some scenarios you might want to ‘serve’ an application instead:
"source-inbound": arcan-net -l 6680 -exec /usr/bin/afsrv_terminal "sink-outbound" arcan-net some.ip 6680 (or arcan-net keyid@)
While simpler, this form has the downside that ‘crash recovery’ won’t work the same, as the listening end don’t know where to redirect ‘some_arcan_client’ here. For the other form, the keyid@ (see keystore below) can have multiple redundancies should one host lost connectivity or fail to respond.
Keystore: A basic keystore has been added for managing cryptographic key material and identities, and while the tooling is lacking, it should be sufficient for testing and non-sensitive use.
The following example shows how to setup one up, using a one-time password for authenticating the public keys, with the ‘push’ form of connectivity:
"source-outbound": export ARCAN_STATEPATH=/path/to/store arcan-net keystore myhost 192.168.0.2 6680 echo 'somepass' | arcan-net -a 1 -s out myhost@ & ARCAN_CONNPATH=out afsrv_terminal "sink-inbound": echo 'somepass' | arcan-net -a 1 -l 6680
One difference from ssh “type yes to be subject to man in the middle” is that a shared secret is used the first time (-a n == read or prompt secret from stdin and accept the first n public keys that had a valid shared secret). Another one is that the host tag for outbound connections can have multiple possible endpoints by adding more hosts to the same keyid. This ties into crash recovery as that also covers loss of connectivity.
The new default is that if there is no working keystore, arcan-net will refuse to run. You have to opt out of stronger authentication in one or two forms:
1: cat mypw | arcan-net -a --soft-auth -l 6680 2: arcan-net --soft-auth -l 6680
With the first form using a password stored in ‘mypw’, while the second will just appear protected (asymmetric primitives generated on the fly and assumed trusted) if you look at the network traffic (i.e. requires active MiM still), anyone is still allowed to connect.
For anything more serious the suggestion would still be to tunnel over Wireguard. I personally have no problems running a12 over untrusted networks for certain tasks — but my threat model is not your threat model and all that.
Backpressure: Changes in both protocol and implementation has been made to better track the current video frame back-pressure as part of congestion control. This is a rather involved topic that will need several iterations to find good parameter sets to switch between depending on the contents — media playback that doesn’t depend on user interaction to produce input can go with deeper buffers, while games would prefer shallow ones, with depth based on latency and bandwidth estimation.
The upside is that improvements here also translate to the local non-networked case — every layer is involved in getting timing better.
What is next for the networking stack is to find an optimal way of leveraging the keystore to answer the question of “which of the devices that know and have a previously established relationship to are currently reachable, at home or overseas” in a privacy preserving way (i.e. service discovery).
Other big ticket items is to make the keystore friendlier for setup and authentication of unknown public keys, expand back-pressure management with dynamic video codec bitrate selection and pre-compressed video passthrough so that a client that exposes a window with a h264 stream first tries to forward that intact, and only recompress if the network conditions are more strict.
XArcan, Wayland and other transports
While not strictly part of this ‘version’ as it is a standalone client and separate repository, our Xorg fork “Xarcan” now handles GPU transitions more gracefully and has received basic clipboard integration as well as accelerated cursor support.
The last updates to this should be pushed and tagged within the coming weeks, but worth mentioning nonetheless, and much more work is planned for this one as we continue to move further away from using Wayland as a client compatibility solution — even if that means writing custom Qt and Chrome-ozone platform plugins.
One thing on the compatibility workbench that, albeit experimental still, is that it is now possible to use X Window Managers to manage and decorate Arcan windows — while still not granting default access to their contents or input routing. Thus for those who are stuck in old ways when it comes to configuration, keyboards and window management, there is a migration path close to being useable.
A longer article on how/why this works, and how it can also be used to move Xorg forward, is being written and should be published shortly.
The Wayland implementation in the arcan-wayland service have also seen some updates, with the main ones being repackaging wl-shm (shared memory) buffers to forward as dma-buf when possible (keeping the GPU transfer cost outside of the composition process), and forwarding pulseaudio sockets into the temporary directory used with -exec and -exec-x11 single client use.
Our “curses/vt100 replacement” client API switches to server-side text rendering only. Cell attributes can now operate in indexed-colour mode where the server-defined colour scheme gets resolved and applied automatically.
API calls for hinting about window contents for server-side scroll controls are now exposed.
The related Lua bindings have been updated and are closer to complete, and now quite usable – with mostly the widgets left that are missing a Lua friendly API.
The NeoVIM frontend have been updated accordingly and now also allows buffer transfers. This is useful in Pipeworld pipelines for building mixed graphical/text/tui processing pipelines and using VIM as the editor for REPL like workflows.
There is also experimental integration with the Kakoune editor that can be found here thanks to Cipharius.
Our “single-purpose” designated special clients for engine core offloading and security compartmentation of sensitive tasks has also seen some new capabilities:
Decode: Added support for proto=img with a similar codebase and sandboxing setup as the aloadimage tool uses. It can also be run in a daemon like mode where the main engine can offload image decoding and vector image re-rasterisation (for low<->high DPI transitions).
The UVC based decode video capture stage has also received controls for toggling tracking dirty regions when running multi-buffered, so if you are using UVC based capture devices such as the ElGato Camlink 4k with stable low frequency changing sources (HDMI or good quality security cameras) fewer wasteful frames should propagate.
Terminal: the terminal emulator received support for dynamically switching colour schemes and server-side defined colour preferences. It can now resize to fit to contents when kept alive after the shell exits. It also received a pty-less ‘piped’ mode for having multiple instances displaying stdin/stdout contents as part of inspectable and individually controllable pipelines.
This is heavily used in Pipeworld as seen here:
The low level graphics platform used when arcan acts as the native / primary display server has seen a number of smaller fixes to how modifiers are setup and propagated. More importantly though:
Front buffer text-surfaces: When the WM requests that a source with a TUI (TPACK) is to be mapped as the source to a single display, it will switch to single-front buffer rendering. This means that our terminal emulator and similar simple TUI clients can be run with close to the lowest possible latency and CPU cost.
With the pending updates to the default included ‘console’ window manager (see: Writing a console replacement using Arcan ) it is shaping up as a competent graphics-capable getty/fbcon/kmscon replacement.
FBO/EGLScreen scanout transitions: We can now also dynamically switch between the preferred ‘FBO’ direct scanout and the much maligned EGLScreen based recomposition scanout dynamically.
The reblit-to-EGL stage is still the safe default, as there are some really quirky multi-rotated-display resolution switching corner cases to work out still.
Low/SDR/Deep/HDR (+-VRR) transitions: The transitions back and forth between these were a big and fairly complicated part of the HDR/Color Processing work and are now working for the low/sdr/deep cases. There are low level dragons to slay here before the last stretch for working HDR composition, somewhat being blocked by my rather pricey DisplayHDR1000 monitor crashing outright on amdgpu when enabled, while on my intel GPUs the driver crashes instead. Fun times.
On the scripting side, there is now an optional “input_raw” entry point for indicating support for out-of-bound/direct-to-drain input event processing to reflect the changes to core input routing mentioned before.
Non-blocking I/O: The open_nonblock set of functions now has a “data_handler” function for setting a rising-edge event handler to make it cleaner to use than the non-blocking periodic polling approach. Similarly, the write call has added automatic queueing and a completion event. The scheduler will try to favour I/O operations while the platform is busy rendering/scanning out with GPUs in a locked state.
Platform control functions: The video_displaymodes() function has received an overloaded form for switching between desired sink colour depth (Low, Normal, Deep, HDR) as well as variable refresh toggles. Similarly, the map_video_display() function now also supports mapping several layers for hardware platforms with hardware controlled basic composition.
For platforms with low level input translation tables on keyboards (e.g. evdev), the input_remap_translation() function has been added to control/hint/swap desired LMVO (layout/model/variant/option) being applied to one or many keyboards.
Staggered client launch: The state control functions resume_target/suspend_target controls can now be applied to clients that are still locked in the initial preroll/open. This allows for staggering client releases on event storms — or when you want to have more precise control over client environment stage, e.g. remap stdin/stdout/stderr to descriptors delivered over the shmif connection. Pipeworld, for example, uses this to implement its pipe(cmd_1, cmd_2, …) command where the entire pipeline is first built and executed, then activated in one go.
Build / Packaging / OS Specific
BSDs: We should now be usable on all the major BSDs (Open, Free, Net, Dragonfly) thanks to Jan Beich, Zrj, Antonio, Leonid and others.
Also worthwhile to note is that the OpenBSD performance regression have been solved for 6.9, but it seems 7.0 brought other timing issues. It turned out to be tied to the VRR conductor refactoring making assumptions about kernel tick granularity.
This is now dynamically probed and reverted to a display-clocked scheduling tactic if the OS Kernel has a low scheduler tick configured (the default in OpenBSD being 100Hz still). While not recommended for prolonged use, the “-W processing” synchronisation strategy mitigates sluggishness at the expense of CPU.
A new ‘arcan-dbgcapture’ tool has been added. This acts as a tui- wrapper for core dumps, and is best used as a core_pattern handler on Linux. Here you can see it being used in this way inside Pipeworld: