Sunsetting Cursed Terminal Emulation

This is the final part concluding the long journey on how to migrate away from terminal emulation as the main building block for command-lines, text-dominant shells and user interfaces in general.

First, a few links to previous parts:

On top of those, and the many related demos, we have specialised tangents on accessibility and networking. The only remaining thing we have yet to cover is the programmer’s interface to all of this.

Before slipping into that dark night, I would like to summarise the mental model I use for ‘terminals’:

Terminal emulators emulate a machine that has never existed, a fantasy computer. They are constructed just like other emulators for the weakest of devices; a big old state table and switch (*pc++){case OP_1: implement OP_1(); case OP_2 ...} with a quirk being that the real horrors are hidden elsewhere. Many indulge in expanding this fantasy computer, adding all kinds of crazy expansion boards for faster video modes and co-processors. This is nothing new in itself — we have done that to practically every computer ever.

The tragedy is that it is taken seriously as a default.

Another day at the office is just running a terminal emulator (xterm) inside a terminal emulator (console) so you can use a terminal emulator-translator (ssh) to emulate more terminals (tmux). It is emulation all the way down instead of as a last resort for compatibility.

The other quirk is the flora of instruction sets to emulate and the subset selection heuristics. These are downplayed as ‘just text’ but everything that touches American Standard Code for Information Interchange are raw instructions for this machine. Programs for it are everywhere and most of them transform or expand the code, without being treated with the same scrutiny as one would for other code generators such as compilers.

A related take worth highlighting is this quote:

It feels a little unsettling that what we now use, effectively, as a data encoding format (ASCII) continues to hold – and it’s up to us to handle when writing software, too – things that correspond not to textual data, but TTY key press and serial link events, and that they follow you everywhere, even in places where the are no teletypes, no serial links, and no keys. Like most people who poked at their innards I used to think that terminal emulators are nasty, but for the longest time I thought that’s just because we took the “emulator” part a bit too seriously and didn’t bother to question it because we’re intellectually lazy. Now I realize it’s legitimately hard to question it, because a lot of assumptions are actually baked into the damn data model – terminal emulators aren’t leaky abstractions, they’re quite literally the abstraction being leaked :-D.

x64k@lobste.rs

Computer people have tried to tame this beast before, using abstraction libraries such as TurboVision and Curses (hence the title of this post). One might be tricked into thinking that keeping these interfaces and substituting the implementation with something befitting the real computing environment would be enough. That is where the ‘leaky abstraction’ part meets ‘just text’ pops up like a little demon and plants itself firmly on your shoulder. Since it is so easy to sneak in arbitrary instructions in-band, people have done a lot of that, all over the place — going so far as to publishing cutesy little articles encouraging the malpractice.

Which building blocks do we have to plug the leaks with?

The Arcan project is suspiciously large in scope. Much of it is not needed for this particular problem. There are a few parts to single out, and a simplified build-mode (-DBUILD_PRESET=client) to assist with that. This will produce:

  • libarcan-tui
  • libarcan-shmif
  • libarcan-shmif-server
  • libarcan-a12
  • afsrv_terminal
  • arcan-net

[libarcan-tui] is the meat in this dish. It fits the ‘ncurses’ bin (.. and ‘libreadline’ and other widgets). There are also Lua bindings with some extra conveniences to smooth over other related POSIX details to help escape the clutches of /dev/tty.

Its reference implementation uses [libarcan-shmif] to talk to the desktop, using some variant of [libarcan-shmif-server] for the other half of that equation. One could substitute in [libx11/win32/…] for those, but some features make that quite difficult. For an technical walkthrough how those work, see the recent writeup of A deeper dive into the SHMIF IPC system.

To fill the portability gap across mobile and desktop, we have a (young) separate project, smash, managed by Valts Liepiņš. For the Zig capable, this is a good place to support.

[libarcan-a12] provides the reference implementation for [A12], the wire transport for this across devices. [arcan-net] providing a standalone binary wrapper for common use cases.

[afsrv_terminal] is the last piece in the equation. It is a standalone binary which, at first glance, is a terminal emulator(!) that outputs into [libarcan-tui]. That is needed for compatibility and proof of feature parity. When passed an environment (ARCAN_ARG=cli=lua) it ignores the terminal emulation part; enables the Lua bindings with some added functions useful specifically for shell work; and runs a bootstrap script that loads the real shell. An example of such is the Lash#Cat9 link from before.

Enough babbling. Time for some code.

We start with a low level C ‘hello world’, then repeat it using the Lua bindings:

#include <arcan_tui.h>

static void redraw(struct tui_context *C)
{
  arcan_tui_erase_screen(C, false);
  struct tui_screenattr text =
      arcan_tui_defattr(C, TUI_COL_ALERT);
  arcan_tui_writestr(C, "hello world", text);
}

static void resized(struct tui_context *C,
                    size_t px_w, size_t px_h,
                    size_t cols, size_t rows, void *T)
{
  redraw(C);
}

int main(int argc, char* argv[])
{
  struct tui_cbcfg cbcfg = {
   .resized = resized
  };

  arcan_tui_conn *con =
    arcan_tui_open_display("myname", "hello");

  struct tui_context *tui =
    arcan_tui_setup(con, NULL, &cbcfg, sizeof cbcfg);

  if (!tui)
    return EXIT_FAILURE;

/* draw something immediately */
  redraw(tui);
  arcan_tui_refresh(&tui);

/* main processing loop */
  int inf[] = {STDIN_FILENO};
  while(1){

/* block (-1) until there are events happening on a set of contexts or inbound data on a set of descriptors (inf) */
    struct tui_process_res result =
      arcan_tui_process(&tui, 1, inf, 1, -1);

/* synch any changes imposed by the event handlers */
    arcan_tui_refresh(&tui);
  }

/* shutdown, with no error message */
  arcan_tui_destroy(tui, NULL);

  return EXIT_SUCCESS;
}

There are quite a few nuances to cover here already.

For starters, opening a display matches the pattern used in EGL and elsewhere in using an opaque pointer to allow the implementation to support different outer display systems. It takes an immutable ‘name’ of the application and a mutable ‘identity’ (e.g. the current open document, path or other resource).

With the display, we can acquire a context (window) and set a table of event handlers. The table also lets the implementation know which features you support/need and which ones you do not.

Some of the possible handlers include:

  • Input: Text codepoint, Abstract announced label, Native keys with modifiers
  • Input: Mouse buttons and motion
  • Window Management State: Visibility, size and colourscheme changes
  • Acquiring new window contexts
  • Language and location preferences
  • Clipboard paste
  • State transfer requests
  • File transfer requests
  • Seeking requests (horizontal / vertical scrolling)
Note: Passing the size of the vtable doubles as an additive version identifier. This makes it possible for features to be appended to the table without breaking backwards compatibility.

A tui context represents a single window. You can request additional ones from a set of types, and have ones pushed to you as a user request for alternate representations (mainly accessibility and debugging).

Calls to arcan_tui_refresh will atomically forward changes to the contents of the window. If nothing has changed it will return immediately.

When drawing we have a lot of shaping attributes to apply, from the regular bold/italic/underline/strikethrough to shaping hints, double-width characters, ligature substitutions, border drawing and foreground/background colours.

Note: Colours are a special case. The terminal tradition is ugly - there used to be a small set of reference colours, the emulator remapping them as desired (RED is now YELLOW) and several different extensions to specify explicit red/green/blue values or a wider palette. Here there is a semantic palette (ALERT, LABEL, ...) as well as the legacy one. The values they map to are tracked per window and can be dynamically updated by the outer display system (this triggers a 'recolor' event). They are resolved to red/green/blue values when the window is packed on refresh.

The process function takes a set of tui contexts, a set of file descriptors, their sizes and a timeout. The set of contexts is for handling multiple windows and the set of file descriptors and timeout for the common pattern of being input triggered.

Note: Since the UI processing is all out of band, STDIN and STDOUT are left intact and you can mask all signal handling. This removes the need for isatty() and you get both pipeline processing through an intact STDIN/STDOUT and an interactive UI. The choice of colouring or not colouring output is up to the final presentation.

In Lua, it would look like this:

tui = require 'arcantui'
local function redraw(wnd)
 local attr = tui.attr({fg = tui.ALERT})
 wnd:write_to(0, 0, "Hello World")
end

local handlers = 
{
  resized = redraw
}

wnd = tui.open("myname", "hello", {handlers = handlers})
redraw(wnd)

repeat
  wnd:refesh()
until not wnd:process()

There are a lot of bells and whistles in order to cover everything people have ever done with terminals, far too many to go through here. You have helpers for asynchronous transfers between file descriptors with progress updates; arbitrarily many cursors; verifying if a certain unicode codepoint has a representation in the current set of fonts; spawning new processes with embed-able windows; window manager integration controls; per row attributes; notifications and much more. Then there are widgets for showing buffers, navigating lists of items and so on.

More Windows

Since there is no restriction on a single screen we can be more adamant about avoiding premature composition – the desktop and its window manager should naturally be responsible for such decisions.

To request a new window, you do this:

static bool on_window(struct tui_context *T,
                      arcan_tui_connection *C,
                      uint32_t id,
                      uint8_t type,
                      void *T)
{
/* we use a custom id to distinguish between multiple
 * request and windows pushed to us */
    if (id != 0xcafe)
        return false;

    struct tui_cbcfg cbcfg = {
/* fill in as needed */
    };

    struct tui_context* new = 
          arcan_tui_setup(T, C, &cbcfg, sizeof(cbcfg));

/* window allocation can always fail */
    if (!new)
       return false;

/* do something with new */
  return true;
}

/* set in initial handler table before context creation */
cbcfg.subwindow = on_window;

...

struct tui_subwnd_req req = {
 .hint = TUIWND_SPLIT_LEFT,
 .rows = 80,
 .cols = 25
};

arcan_tui_request_subwnd_ext(tui, TUI_WND_TUI,
                             0xcafe, req, sizeof(req));

As you can see it is very similar to how the original context was created. [new] should be added to the processing loop, or spawned into a thread with its own separate one.

Note: The context setup now uses a parent as a reference. This inherits all dynamic properties regarding locale, colours and text so there is a starting point for the implementation to work with.

Among the nuances is the HINT. This is a way to communicate to the outer display system how its window is intended to be used. Among the possible values are for setting it to a discrete tab, but also for embedding into other windows or be ‘swallowed’ (take the parents place until closed).

In this clip from Cat9 you can see that in two forms:

In the first form we take the contents of a previous job, request a new window and move the job output into it.

The second one is something more advanced. The requested type is actually HANDOVER, and we use the helper:

pid_t arcan_tui_handover(struct tui_context*,
                         arcan_tui_conn*,
                         const char *path,
                         char *const argv[],
                         char *const env[], int flags);

This helper creates a new process that inherits window ownership. The context can still be referenced as a proxy and used with the arcan_tui_wndhint function to reposition or reanchor.

This is useful for embedding as done in the clip, where we also get feedback on scaling and dimensions. This is where some of [shmif] bleeds through; if the new process is graphical and uses that library, such as the video player and PDF viewer in the clip, we can embed and even interact with graphical applications even though the TUI process itself has no concept of bitmap graphics.

In Lua, it would look like this:

wnd:new_window("handover",
    function(wnd, new)
        if not new then
           return
        end
        local in, out, err, pid = 
           wnd:phandover("/usr/bin/afsrv_terminal", "")
    end
)

Reading a line

Lets take a look at the common case: a command-line, i.e. our ‘libreadline’ replacement. This is a rather complicated affair since it involves text editing; suggestions; prompt; auto-completion; help information; keybinding controls; masking characters for password entry; searchable command history and so on.

Some of that you can see in this clip, also from Cat9.

The prompt is live updated (shown by the ticking clock), feedback on syntax/validation errors are highlighted, toggle-able extended help as to what a completion does and error messages explaining the validation failure.

#include <arcan_tui_readline.h>

...
struct tui_context *T;
/* context has been setup like before */
...

struct tui_readline_opts options = 
{
  .anchor_row = 1
  .allow_exit = true
};

arcan_tui_readline_setup(T, options, sizeof options);
int status;
char* out;

while (1){
    struct tui_process_res result =
      arcan_tui_process(&T, 1, NULL, 0, -1);

    status = arcan_tui_readline_finished(T, &out);
    if (status)
        break;
}

if (status == READLINE_STATUS_DONE){
/* do something with *out */
   free(out);
}

This swaps out relevant parts of whatever previous handler table you have for ones provided by the readline widget, and reverts it once finished. There are a lot of options to pick from when setting up the widget, including callback handlers for requesting history, validating current input, password masking, filtering undesired characters and so on.

For Lua, we can go with a more involved example:

local readline = 
  wnd:readline(
               function(_, message)
                   print("you entered", message)
               end
              )

readline:set_prompt("type something: ")
readline:set_history({"things", "from", "the", "past"})
readline:suggest({"some", "thing", "to", "add"})

while (wnd:process()) do
    wnd:refresh()
end

Networking

All this would be only marginally useful if it did not also work across the network. Remote administration and similar tasks are, of course, passport holding citizens in this world and even more work has gone into the networking protocol itself.

Say that you have written a new application using libarcan-tui and you want to share it with others or access it from another computer. From the developer angle, there is nothing to do, the existing tooling provides everything.

We will omit authentication key management here, since that is an big topic in its own right. That’s done with the ‘soft-auth’ argument for the pull apprach, and a12:// rather than a12s:// for the push one.

Speaking of push and pull, running something like:

arcan-net --soft-auth -l 6680 -- /path/to/my_thing

Would set the device up to serve an instance of my_thing to every inbound connection. This is what we mean with ‘pull’, the display end connects, gets served an application and pulls the output to its display. This is what you would be most familiar with through ssh.

Push flips the direction around and is what you would expect from X11 style remoting with something like DISPLAY=some.host:port xterm.

Here ‘my.other.host’ would be setup with arcan-net --soft-auth -l 6680. To push my_thing: ARCAN_CONNPATH="a12://my.other.device" /path/to/my_thing.

Both have its uses. Normally I have a number of devices fuzzing, doing traffic monitoring, or monkey testing with a core dump crash handler installed. All of these attach their corresponding TUI interaction tool and on-demand push to whichever device I am currently working on directly, or to a fallback multiplexing server that keeps them around until I check in.

What is interesting in the network case is how it differs from terminal times over the wire. This is nuanced with many factors in play as congestion control isn’t exactly XON/XOFF or the RS-232 RTS/CTS of yore. Is the priority interactive latency? memory consumption locally? memory consumption remotely? low network bandwidth?

To our disadvantage there is potentially more data per packed cell, but ZSTD compresses that very well. At the same time it is only the visible changes that are transferred, not some theatre piece with ASCII-character LOGO turtle on acid playing the lead. Any back-pressure propagates from the network tool to the source so pending updates merge there.

In a simple line-mode screen, ssh + your shell doing a ‘find /’ would start sending a colossal amount of data on the wire. Here the strategy is partly up to the shell.

In the following clip from Cat9, you can see that it detects a data heavy job and switches to a slow updating statistics view until it terminates or I instruct it to show what is going on. Importantly I can keep on working and I am not shoving 127MiB of data over the wire.

For an alt-screen tool, like vim and tmux, they instead get to pay the price for cursor movements to draw at non-continuous locations; style changes; border drawing characters and so on. Then suffer all the crazy things ncurses does (like padding or delaying writes to align with the baudrate of the connection). Then suffer the crazy things emulators have to do (no incoming data last n miliseconds, then it’s probably safe to draw). Then suffer the crazy things they themselves have to do to understand if a key is a key or a control character.

Posted in Uncategorized | Leave a comment

Arcan 0.7 – The All Tomato

Just as we concluded our first NLnet grant, it is also time to say goodbye to the second phase of the project, ‘anarchy on the desktop’, and enter the third and final one.

As per the old roadmap, 0.7 is the last window of opportunity for any trailing features. For 0.8 we finally get to pull out the performance tuning tricks, 0.9 for hardening the attack surfaces, to be followed by me disappearing in a puff of smoke.

I am also happy to say that we have received two new NLnet grants, one administrated by me, the other by Cipharius. Both of these build on the networking layer. For my part we’ll extend the directory server end to have a more flexible programmable remote side, for supporting linking multiple ones together and making the entire stack ‘turnkey’ easy to deploy and use.

In the other end there will be a simpler access layer for letting a device join your ‘one desktop, many devices’ network and extend it with its storage and sensors (cameras, displays, IMUs, and so on), or act as a portable viewer, Smash, for accessing it from within the confines of the corporate walled gardens.

I have often and, rightly so, been accused of not being a very public person. As an exemption to that rule I did partake in an interview by the venerable David Chisnall which can be read over at lobste.rs here: https://lobste.rs/s/w3zkxx/lobsters_interview_with_bjorn_stahl — if you are at all curious about my background and motivation.

Let’s briefly look at the building blocks in place:

IPC systemSHMIF.

Way out of terminal insanity – From The Dawn of a new Command Line Interface into The Day of a new Command-Line Interface: Shell.

Covering special needsaccessibility, debugging and security.

NetworkingA12: Visions of the Fully Networked Desktop.

Legacy Strategy – SHMIF and A12 were both verified to ensure that anything that could be done (arcan vs Xorg part 1, part 2) through previous popular tools and protocols can still be done and persist onwards (missing – bringing X along).

All these latch into the scheme layed out in ‘Arcan as OS design‘.

Now we can braid these strands together into one rope, whip it into shape and see what should dangle from it.

This release doesn’t have many demonstrable items for the core engine itself, mostly fixes all across the board. Instead we will recap some changes and ongoing work for Lash#Cat9 and Xarcan. Our reference desktop environment, Durden, will get a separate release post when some of the tools has received more polish work.

As a teaser though, the following clip shows how the network tool can map in the key-local private file store and route it to our decode plumber:

Directory server triggered file streaming

In a similar vein, the following clip shows the same approach to picking a file from the directory store, setting it as the drag source and then dropping it into Cat9 and letting it expand it into a job view.

This act as the development baseline for more things, particularly letting controller appls running server-side define other namespaces, scan / index and annotate, distribute across a network of directories and search.

Other highlights for that release will be a new streaming/sharing composition tool; trainable mouse gestures; new initial setup configuration helper; desktop icons; on-screen keyboard; input typer helper with dictionary/IME completion; stereoscopic support for Xreal Air style glasses and more bespoke displays such as Looking Glass (lenticular) and Tilt5 (projected, retroreflective AR).

Lash#Cat9

Technically, Lash is a Lua scripting environment that is wrapper around our curses replacement, libarcan-tui, adding some convenience features to make it easier to write a command-line shell. Cat9 is the current reference shell for this.

The major features added to Cat9 over the last ~2 years since its inception has all received individual write ups:

In A spreadsheet and a debugger walk into a shell we cover a Debug Adapter Protocol Implementation as well as an interactive spreadsheet that can be populated by regular shell commands. The following two clips come from that article:

Sampling debugger watchset into a spreadsheet
Creating a spreadsheet, running cells that map to shell commands, mixing with builtin functions

In Cat9 Microdosing: stash and list we cover a ‘ls’ replacement that effectively turns it into a file manager, and a scratchpad for accumulating files into a workset of files to monitor for changes, or forward to compression / transfer tools. The following two clips from that article:

Using list to navigate with first mouse, then keyboard, opening a media file with open-swallow
Using list to create a stash of files, and removing them in one fell swoop.

In Cat9 Microdosing: each and contain we add the option to absorb the output of previous or current jobs into a collection that can then be referenced by other commands as a single unit, as well as defining asynchronous processing options over a range of data from other jobs (best used with the stash mentioned above). The following two clips come from that article:

Using contain to swallow ongoing jobs, merging them into an overview
Using list to build a stash, then each to run a command on each item, merging into a container job

The next planned such write-up contains a social timeline for things like mastodon and email timeline by frontending ‘madonctl’ and ‘himalaya’ as well as more developer tools like scm based integration. The following clip is a sneak peak from that:

SCM monitor detecting a fossil repository, exposing its tickets, controls for staging commits, viewing diffs and navigating timeline.

Xarcan

In the last release post we hinted at how Xarcan can be used to keep your own window manager, letting Arcan act as a display driver — as well as a security, control and configuration plane that selectively merge in arcan native clients with options for how, or if, X clients gets to see inputs and clipboard action.

The following clip was used for that:

Window Maker managing windows and decorations, Arcan handling display control and mixing in arcan-shmif clients.

It’s almost as if what Adam Jackson said long ago rings true: (https://lists.freedesktop.org/archives/wayland-devel/2017-September/034960.html)

One thing I’ve never really been thrilled with about the Xwayland design is that the wayland compositor wants also to be the X window manager, and that all the related state about window position and whatnot is just internal to the compositor. xwin and xquartz don’t have this problem, you can run twm as your window manager and still move windows, but in xwayland that doesn’t work because wayland refuses to allow you to position your own window for (what I consider) essentially religious reasons [5]. And as a result the compositor gets a lot more complicated and you _still_ need to change the X server to get things to work. What I’d have preferred is a wl_x11_interop protocol that Xwayland could use to send this kind of advisory state, which mutter could optionally only expose to Xwayland if it really wanted to be less functional than X.

‘Essentially religious’ reasons indeed. What really happens is that Xarcan synchronises X11 ConfigureWindow and Stacking information with the corresponding VIEWPORT events.

This lets the Arcan side of the WM equation take the output video and decompose it into degenerate regions of linked coordinate spaces. It then uses these regions to determine if it should forward input or not.

For Arcan clients, it creates empty placeholder windows in X11 space, and swaps out its contents for the real one during scanout. It lets the X11 side know there is an object there, but prevents it for having access to its contents – unless desired.

That last part can be solved with controlled sharing. In the following clip you can see how I drag the contents of an Arcan window into it, letting legacy ‘screen sharing’ or ‘screenshotting’ (as flawed as those concepts are, but that is a long rant) applications see and capture it:

Sharing an arcan application window into an Xserver letting GIMP screenshot it

There is quite a lot more going on and planned:

An interesting coming change to arcan-shmif will have it re-use the ‘platform’ layer in Arcan that is responsible for managing input devices and controlling displays when there is no other arcan-shmif server to connect to.

This will effectively turn the Xarcan DDX into a standalone Xserver that works as expected, but at the same time keep up to date with features and changes to the KMS/GBM part of the graphics stack.

This lets us drop a whole lot of cruft from Xorg: XNest/Xephyr is no longer needed, neither is XAce or XFree86. We can eliminate quite a few slowdowns and pitfalls in startup.

Since it’s trivial for us to compartment and apply policy based on client origin, it will also be possible to partially share in the other direction, such as letting Xarcan act exclusively as an input-driver in order to bring xf86-input-wacom and XIM along.

Before then we have a few minor immediate cleanups left, mainly fixing regressions in GLAMOR caused by changes to DRI3 and PRESENT and some work-arounds for XDND to be able to work between multiple, networked, X servers. It’s not hard, just tedious.

Now it’s time to head off to 38c3. See you next year for the final reveal.

Posted in Uncategorized | 8 Comments

A deeper dive into the SHMIF IPC system

This is a technical description of the IPC system used throughout the Arcan project, from both a designer and developer perspective, with annotations on legacy and considerations along the way. It’s one of a few gems inside of the Arcan ecosystem, and thousands of hours have gone into it alone.

The write-up assumes a basic computer science background, while the sections prefixed with ‘comment’ are more advanced.

History

SHMIF, or “SHared Memory InterFace” has a long history, dating back to around 2007. It was first added to cover the need to least-privilege separate all parsing of untrusted data in the main engine, simply because the ffmpeg libraries couldn’t stop corrupting memory – generally a bad thing and we had more than enough of that from GPU drivers doing their part.

With parsers sandboxed, it evolved to also work as a linker interposed- or injected shellcode- way of manipulating 3rd party audio/video processing and event loop without getting caught doing so. Rumour has it that it was once used to automate a lot of the tedium in games such as World of Warcraft, and was not caught doing so.

It was written to be portable across many operating systems. The initial version ran on both Windows, OSX, BSDs and Linux. There were also non-public versions that ran on Android and iOS. These days the focus remains on BSDs and Linux, with the networking version of the same model, “A12”, intended to retain compatibility with the others.

Its design is based on lessons learned from emulating arcade games of yore, as they represent the most varied and complex display systems to date. The data model evolved from increasingly complex experiments, up to- and beyond- the point of painstakingly going through every single dispatch function in X11 to guarantee that we did not miss anything. The safety and recovery aspects come from many lessons learned breaking and fixing control systems for power grids. The debugging and performance choices came from working on a last-resort debugging tiger team on (mainly) Android.

Layout

There is a shared memory region, and a set of OS specific primitives to account for inadequacies in how various kernels expose controls over memory allocation and use. Combined we refer to these as a segment. The first one established is referred to as the ‘primary’ and it is the only one that is guaranteed on a successful connection. Additional ones are negotiable, and the default is to reject any new allocation. This decision is ultimately left to the window management policy.

Comment: In this form, there is no mechanism for a client to influence allocations in the server end. Further compromises are possible (as we come to later) in order to gain more features, but for a hardened setup, this is one out of several ways we reduce the options for exploiting any vulnerabilities or staging a denial of service attack.

The shared memory is split into a fixed static region and a dynamic one.

The following figure shows the rough contents of these regions:

Fields (not to scale) and layout of the two regions of the shared memory region of a segment

The order of the fields in the static region is organic, it has been extended over time. To avoid breaking compatibility, changes have been appended as more metadata was needed. The region marked ‘aux’ is 0- sized by default; it is only used for negotiating advanced features e.g. HDR metadata and VR device support.

Some of the more relevant and non-obvious members of the static regions are:

  • DMS – Dead Man’s Switch. If it is ever modified the segment is considered dead. After this point no modifications to the page will be processed by the other side. (See ‘Safety Measures’ section).
  • Verification Cookie. This is a checksum comprised of calculating the offsets and values of other members in the region. Both sides periodically calculate and compare this value to detect version mismatches or corruption.
  • Inbound/Outbound event buffers. – These are fixed slot ring buffers of 128b events. They can be thought of as asynchronous ‘system’ calls (See ‘Event Processing’ section).
  • Segment Token. A unique identifier for this specific segment. This can be used by the client end to reference other events if the identifier has been shared by some other mechanism. The ‘VIEWPORT’ event, for instance, instructs window management for repositioning or embedding segments owned by other clients or threads.

The entire memory region is treated as an unsafe contested area; one side populates it with changes it wants to see done and through some synchronisation trigger, and the other side verifies and applies or rejects them.

Comment: For debugging and inspection, this means a single snapshot of the mapped memory range is sufficient to inspect the state of the connection and trivial to write analysis, fuzzing and reporting tools for.

The raw layout is not necessarily exposed to the consumer of the corresponding library. Instead a context structure (struct arcan_shmif_cont) contains the developer-relevant pointers to the corresponding subregions.

Comment: While the implementations for this interface live in userspace, the design intent was to be able to have the server end live completely in a kernel, and have this act as the sole system call interface.

Each segment has a type that is transferred once from the client to the server during the REGISTER event (or when requesting a new one through a SEGREQ event). This is mainly a window management and server hint to control event response, but also determines if video and audio buffers are for (default) client to server or (screen recording and similar features) server to client.

First Connection (Client)

The client end comes as a library, libarcan-shmif. The rough skeleton that we will unpack here looks like this.

#include <arcan_shmif.h>
int main(int argc, char **argv)
{
  struct arg_arr args;
  struct arcan_shmif_cont C =
    arcan_shmif_open(SEGID_APPLICATION,
                     SHMIF_ACQUIRE_FATALFAIL,
                     &args);

  struct arcan_shmif_initial *config;
  arcan_shmif_initial(&C, &config);

/* send audio/video */
/* event processing */

  arcan_shmif_drop(&C);
}

The SEGID_ part is a hint to the server as to the intended use of this connection and how it could manage its resource allocation and scheduling. There is a handful of types available, but APPLICATION is a safe generic one. A video player would be wise to use MEDIA (extremely sensitive to synchronisation but not input), while a game would use, well, GAME (high resource utilisation, input latency and most-recent presentation more important than “perfect” frames).

The FATALFAIL part simply marks that there is no point to continue if a connection can’t be negotiated. It saves some error checking and unifies 'fprintf(stderr, "Couldn't connect")' like shenanigans.

The arg_arr ‘args’ is a form of passing command line arguments to the client without breaking traditional getopt/argv. It can be used to check for key=value pairs through something like ‘if (arg_lookup(&args, "myopt", 0, &val)){ ... }‘ .

A good question here would be, how does the client know to find the server? The actual mechanism is OS dependent, but for the POSIX case there are two main options that the library is looking for: the ARCAN_CONNPATH and ARCAN_SOCKIN_FD environment variables. The value for CONNPATH is the name of a connection point and is defined by the server side.

Comment: The connection point name is semantic. This stands in contrast to how Xorg does with its DISPLAY=:[number] where number normally came from the virtual terminal the user was starting Xorg from. The server end can spawn multiple connection points with different names and apply different policies based on the name.

ARCAN_SOCKIN_FD is used to reference a file descriptor inherited into the process running arcan_shmif_open. This is used when the server itself spawns the new process. It is also used in the special case of ARCAN_CONNPATH being set to “a12://” or “a12s://”. This form actually starts arcan-net to initialise a network connection to a remote host, which creates a single-use shmif server for the client to bind to. This is one of the ways we translate from local IPC to network protocol.

The 'arcan_shmif_initial' part gives the information needed to create correct content for the first frame. This includes user preferred text format (size, hinting), output density (DPI aware drawing over ‘scaling’ idiocy), colour scheme (contrast for accessibility, colour blindness or light/dark) and even locale (to migrate away from LC_… /getlocale) and location (latitude/longitude/elevation).

Comment: For accelerated graphics it also contains a reference to the GPU device to use for rendering, this lets the server compartment or load-balance between multiple accelerators.

Now for the ‘send audio/video’ part.

shmif_pixel px = SHMIF_RGBA(0x00, 0xff, 0x00, 0xff);
for (size_t y = 0; y < C.h; y++)
  for (size_t x = 0; x < C.w; x++)
    C.vidp[Y * C.pitch + x] = px;

arcan_shmif_signal(C, SHMIF_SIGVID);

This fills the dynamic video buffer part of the segment with a full opaque green pixel in linear RGB in whatever packing format the system considers native (embedded in the SHMIF_RGBA macro).

Comment: While on most systems (these days) that would be 32-bit RGBA, it is treated as compile time native as CPU endianness would be. Low-end embedded might want RGB565, special devices like eInk might want RGB800 and so on. 

There are a lot of options available here, but most have to deal with special synchronisation or buffering needs. These are covered in the ‘Special Case’ sections on Synchronisation and on Accelerated Graphics.

For this example we omitted the aural representation, but if you have a synthesizer core or even tone-generator the same pattern apply; switch vidp for audp and SHMIF_SIGVID for SHMIF_SIGAUD (it is a bitmask, use both if you have both).

Comment: The common distinction between audio and video is something we strongly oppose. It causes needless complexity and suffering trying to have one IPC system for audio, then another for video and then trying to repair and synchronise  the two after the fact. It is one of those historical mistakes that should have ended yesterday, but the state of audio on most systems is almost as bad as video.

At this stage we are already done (13 lines of code, zero need for error handling) but for something more polite, we will flesh out the ‘event processing’ part.


struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
 if (ev.category == EVENT_IO){
/* mouse, keyboard, eyetracker, ... handling goes here */
 }

 switch(ev.tgt.kind){
 case TARGET_COMMAND_EXIT:
/* any custom cleanup goes here */
 break;
 case TARGET_COMMAND_RESET:
/* assume that we are back where we started */
 default:
 break;
 }
}

This will block until an event is received, though more options are covered in the section on ‘Synchronisation’. No action is ever expected of us, we just get polite suggestions ‘it would be nice if you do something about this’. The category part will only be EVENT_IO or EVENT_TARGET and the next section will dip into why.

Comment: The _RESET event in particular is interesting and will be covered in the 'Recovery' Special Case. It can be initiated by the outer desktop for whatever reason, and just suggest 'go back to whatever your starting state was, I have forgotten everything' but is also used if the server has crashed and the implementation recovered from it, or is shutting down and have already handed responsibilities over to another.

The event data model cover 32 different server to client possibilities, and 22 client to server. Together they cover everything needed for a full desktop and more, but it is descriptive, not normative. React to the ones relevant to you, ignore the others.

First Connection (Server)

There are two implementations for the server end; one inside the arcan codebase tailored to work better with its more advanced resource management, crash resiliency and scripting runtime. The other comes as a library, libarcan-shmif-server, and is mainly used by the arcan-net networking tool which translates this into the A12 network protocol.

Let’s walk through a short example which accepts a single client connection, and in the next section do the same thing for the client application end. Normal C foreplay is omitted for brevity.

#include <arcan_shmif.h>
#include <arcan_shmif_server.h>

struct shmifsrv_client *cl = 
  shmifsrv_allocate_connpoint("demo", NULL, S_IRWXU, fd);

shmifsrv_monotonic_rebase();

This creates a connection point for a client to bind to. There are two alternatives, shmifsrv_spawn_client and shmifsrv_inherit_connection. Spawn takes care of creating a process with the primitives inside. Inherit takes some preexisting primitive and builds from there. Both return the same shmifsrv_client structure.

Comment: For a controlled embedded or custom OS setting, the spawn client approach is the safest bet. The inherit connection approach is for when there is a delegate responsible for spawning processes and reduces the number of system calls needed to a bare minimum.

The shmifsrv_monotonic_rebase() call sets up the internal timekeeping necessary to provide a coarse (25Hz) grained CLK (clock) signal.

Now we need some processing that interleaves with rendering/input processing loops, which is a larger topic and out of scope here.

int status;
while ((status = shmifsrv_poll(cl) <= CLIENT_NOT_READY)
{
/* event and buffer processing goes here */
}

if (status == CLIENT_DEAD)
{
  shmifsrv_free(cl, SHMIFSRV_FREE_FULL);
  exit(EXIT_SUCCESS);
}

It is possible to extract an OS specific identifier for I/O multiplexing so that _poll is only invoked when there is some signalled inbound data via shmifsrv_client_handle().

Comment: The flag passed to _free determines the client side visibility. It is possible to just free the server side resources and not signal the dead man's switch. This can be used to transparently pass the client to another shmif server process or instance.

Before we get to the event and buffer processing part, there is also some timekeeping that should be managed outside of a higher frequency render loop.

int left;
int ticks = shmifsrv_monotonic_tick(&left);
while (ticks--){
  shmifsrv_tick(cl);
}

This provides both integrity and liveness checks and manages client requested timers. The (left) returns the number of milliseconds until the next tick. This is used as feedback for a more advanced scheduler, if you have one (and you should).

Now to the event processing:

struct arcan_event buf[64];
size_t n_events, index = 0;

if ((n_events = shmifsrv_dequeue_events(cl, buf, 64)){
  while (index != n_events){
    struct arcan_event* ev = &buf[index++];
    if (shmifsrv_process_event(cl, ev))
      continue;

/* event handlers go here */
  }
}

This will dequeue, at most, 64 events into the buffer. Each event is forwarded back into the library in order to allow a subset of internally managed ones. They are just routed through the developer to allow complete visibility. You can use arcan_shmif_eventstr() to get a human readable representation of its contents.

Comment: The reason for having a limit is that a clever and malicious client could set things up in a way that would race to stall the server or exhaust its file descriptor space as part of a denial of service, either to affect the user directly or as part of trying to make an exploitation chain more robust.

Now for the ‘event processing’ part.

if (ev->category != EVENT_EXTERNAL){
  fprintf(stderr, "unexpected event category\n");
  continue;
}

switch (ev->ext.kind){
case EVENT_EXTERNAL_REGISTER:
/* only allow this once on the client */
  arcan_shmifsrv_enqueue(cl, &(struct arcan_event){
   .category = TARGET_COMMAND,
   .tgt = {
     .kind = TARGET_COMMAND_ACTIVATE
   }
  }
break;
default:
break;
}

The event data model has a lot of display server specific nuances to it, neither is necessary except for the one above. This unlocks the client from the ‘preroll’ state where it accumulates information received into the “arcan_shmif_initial” structure as covered in the client section. Any information necessary for a client to produce a correct first frame goes before the ‘ACTIVATE’ one. The most likely ones you want is DISPLAYHINT, OUTPUTHINT and FONTHINT to instruct the client about the size it will be scaled to, the density, colourspace and subchannel layout it will be presented through, as well as the preferred size of the most important of primitives, text.

Comment: There are a number of event categories, but only one reserved for clients (EVENT_EXTERNAL). The other categories are for display server internals. The reason they are exposed over SHMIF is for the 'server' end to be split across many processes and still interleave with event processing in the server. This allows us to have external sensors, input drivers etc. all as discrete threads or processes without changing anything else in the architecture. It also allows a transformation from using it as a kernel-userspace boundary to a microkernel form.

The last part is to deal with ‘buffer processing’ part of the previous code.

/* separate function */
bool audio_cb(shmif_asample *buf,
              size_t n_samples,
              unsigned channels, unsigned rate,
              void *tag)
{
  /* forward buf[n_samples] to audio device or mixer
   * configured to handle [channels] at [rate] */
  return true;
}

if (status & CLIENT_VBUFFER_READY){
  struct shmifsrv_vbuffer vbuf = shmifsrv_video(cl);
  /* forward vbuf content to GPU */
  shmifsrv_video_step(cl);
}

if (status & CLIENT_ABUFFER_READY){
  shmifsrv_audio(cl, audio_cb, NULL);
}

The contents of vbuf is nuanced. There is a raw buffer or opaque GPU system handle + metadata (timing, dirty regions, …), or TPACK (see section on ‘Text Only Windows’) and a set of flags corresponding to the ‘presentation-hints’ on how buffer contents should be interpreted concerning coordinate system, alpha blending and so on.

Comment: Most of the graphics processing properties are things any competent scanout engine has hardware acceleration for, excluding TPACK (and even ancient graphics adaptors used to have those as 'text mode' display resolution). There are support functions to unpack this into a compact list of text lines and their colouring and formatting in "arcan_tui.h" as arcan_tui_tunpack().
Comment: For accelerated GPU handles it is possible to refuse it by sending a BUFFERFAIL event. This will force the client implementation to convert accelerated GPU-local content into the shared pixel format on their end. This is further covered in 'Special Case: Accelerated Graphics'. It doubles as a security measure, preventing the client from submitting command buffers to the GPU that will never finish and livelock composition that way (or leverage any of the many vulnerabilities GPU side). On a hardened system this would be used in tandem with IO-MMU isolation.

In total this lands us with less than 100 lines of code with very granular controls, a fraction of what other systems need to just boilerplate graphics alone. If you only want a 101 level – take on how SHMIF works, we are done; there is a lot more to it if the topic fascinates you, but it gets more difficult from here.

Synchronisation

While one might be tempted to think that ‘display servers’ are about well, providing access to the display, its real job is actually desktop IPC with soft realtime constraints. The bread and butter for such systems is synchronisation. If you fail to realise this you are in for a world of hurt and dealing with it after the fact will brew a storm of complexity.

Comment: It is also the hardest problem in the space - figuring out who among many stakeholders knows what; when do they know it; when is something said relevant or dated. All of those are difficult as is, but it gets much worse when you also need to factor in resonance effects, malicious influence and that some stake holders reason asynchronously about some things, and synchronously about others. As icing on an already fattening cake you need to weigh in the domain specific (audio / video) nuances. Troubleshooting boils down to profiling, and problems manifest as 'jitter' and 'judder' and how those fit with human cognition. Virtual Reality is a good testing ground here even if you are otherwise uninterested in the space.
Comment: beginner mistakes here are fairly easy to spot; if someone responds to synchronisation problems by increasing buffer sizes (landing in a version of the network engineering famous 'buffer bloat' problem) or arbitrary sleep calls (even though some might be necessary without adequate kernel level primitives) they are only shifting the problem around.

Recommended study here is ‘queuing theory’ and ‘signal processing’ for a deeper understanding.

To revisit the previous code examples on the client end:

arcan_shmif_signal(&C, SHMIF_SIGVID);

This is synchronous and blocking. The thread will not continue execution until the server end has said it is ok (the shmifsrv_video_step code). The server can defer this in order to prioritise other clients or to stagger releases to mitigate the ‘thundering herd’ problem.

For normal applications, this is often sufficient and comparable to ‘VSYNC’. When you have tighter latency requirements and/or it is costly to produce a frame, you need something more. The historically ‘easier’ solution has been to just add another buffer:

arcan_shmif_resize_ext(&C, C.w, C.h,
                       (struct shmif_resize_ext){
                         .vbuf_cnt = 2
                       });

_resize and _resize_ext calls are both also synchronous and blocking. This is because the server end needs the controls to guarantee that enough memory is available and permitted. It will recalculate all the buffer offsets (vidp, audp, …) and verification cookie in the context and possibly move the base address around to satisfy virtual or physical memory constraints.

Comment: Some accelerated display scanout controllers have hard requirements on physically continuous memory at fixed linear addresses and those are a limited and scarce resource and why such resize request might fail, especially in tight embedded development settings. Same applies when dealing with virtual GPUs in virtual machines and so on. The other option to still satisfy a request is to buffer in the server end, causing an extra copy with increased power consumption and less memory bandwidth available for other uses.

The request above would make the first arcan_shmif_signal call return immediately, and only block if another signal call happens before the server is able to consume the buffer from the first. Otherwise the context pointer (C.vidp) will be changed to point to the new free buffer slot. This also has the added cost of adding another display refresh period of latency.

Comment: It is possible to increase the buffer count even further, but this changes the semantics to indicate that only the most recently submitted buffer should be considered and others can be discarded. This counters the latency problem of the double buffering problem at the expense of memory consumption. This has historically been called 'triple buffering' but, to much confusion, has also been used for the 'double buffering' behaviour with just deeper buffer queues and is thus meaningless.

Not every part of a buffer might have actually changed. A common optimisation is to annotate which region that should be considered, and for regular UI applications (blinking cursor, single widget updates, …) this substantially cuts down on memory transfers. To cover this you can mark such regions with calls to arcan_shmif_dirty() prior to signalling.

Comment: While some might be tempted to annotate every pixel, there are strong diminishing returns as soon as you go above just one region due to constraints on memory transfers. Internally the shmif client library implementation will just merge multiple calls to _dirty into the extents of all changes. For the triple buffering behaviour mentioned in the previous comment, dirty regions won't have any effect at all as changes only present in the one buffer would not guarantee to transfer over in the next and the cost of trying to merge them on the composition end would cancel out the savings in the first place.

There are more synchronisation nuances to cover, but to avoid making this section even more exhausting, we will stick to the two most relevant. The first of these look like this:

arcan_shmif_signal(&C, SHMIF_SIGVID | SHMIF_SIGBLK_NONE);

This returns immediately and you can chose to check if it is safe to draw into the video buffer yourself (through arcan_shmif_signalstatus), or to simply continue writing into the buffer and risk ‘tearing’ in favour of lower latency. This amounts to what is commonly called ‘Disable VSYNC’ in games.

Comment: For those that have written games in days of yore, you might also recall 'chasing the beam', unless dementia has taken over by now. Since game rendering can have predictable write patterns and display scanout can have predictable read patterns it is possible to align your writes such that you raster lines up to just before the one the display is currently reading. This is neither true for modern rendering nor is it true for modern displays, 'the framebuffer' is a lie. Still, for emulation of old systems, it is possible, but impractical, to repeatedly access the 'vpts' field of the static region to figure out how many milliseconds are left until the next VBLANK and stage your rendering accordingly.

The last option is to to keep the SHMIF_SIGBLK_NONE behaviour, but adding the flag SHMIF_RHINT_VSIGNAL_EV to C.hints prior to a _resize call. This will provide you with a TARGET_COMMAND_STEPFRAME event and you can latch your rendering to that one alone and let your event loop block entirely.

Comment: Enabling STEPFRAME event delivery by sending a CLOCKREQ request provides a secondary path for advanced latency management as it enables feedback in presentation timing. Sampling the 'vpts' field of the static region would provide information about upcoming deadline and STEPFRAME contains metadata about presentation timing as to when the contents was actually presented to scale up/down effects and quality. Current game development is full of these kinds of tricks.

Event Processing

Event loop construction is an interesting and nuanced topic. We start by returning to the naive one introduced in the client section:

struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
/* interpretation goes here */
}

Responding to each event that arrives here should be as fast as possible. This is easy most of the time. Whenever a response includes rendering however, response times can vary by a lot. Some events are more prone to this, with mouse motion and resize requests being common offenders.

What happens then is that the number of events in the incoming queue starts to grow. If the rate of dispatched events is lower than that of the incoming one, we get buffer back-pressure.

This applies to both the server and the client side. Each side has different constraints and call for different mitigation. The server end is more vulnerable here as it has multiple clients to process, and higher costs for processing events as most prompt some form of managerial decision.

Comment: Events from client A can directly or indirectly cause a larger number of responses from clients B and C (amplification), which in turn can cascade into further responses from A. This can snowball fast into 'event storms' and back-pressure building up in others as 'resonance effects'.

One small change to the loop improves on the client end of the equation:

bool process_event(struct arcan_event *ev)
{
/* interpretation now goes here */
}

struct arcan_event ev;

while (arcan_shmif_wait(&C, &ev)){
  bool dirty = process_event(&ev);
  size_t cap = PP_QUEUE_SIZE;

  while (arcan_shmif_poll(&C, &ev) > 0 && cap--){
    dirty |= process_event(&ev);
  }

  if (dirty){
    render();
  }
}

This will flush out as much of the inbound queue (or up to a cap corresponding to the size of the ring buffer) as possible, and only render after all have been applied. This prevents earlier events in the queue from being overdrawn by responses to later ones in the queue.

Comment: Since the event queue is a ring-buffer visible to both sides, it is possible for either party to atomically inspect the head and tail values to determine the current state of the other end, as well as incoming workload. This is a powerful advantage over other possible carriers, e.g. sockets.

The full data model would take another lengthy post to flesh out, so we will only look at one event which highlights library internals. That event is ‘TARGET_COMMAND_DISPLAYHINT’. This event is used to indicate the size that the server end would prefer the window to have. The client is free to respond to this by resizing to the corresponding dimensions. If it doesn’t, the server can still scale and post process – it has the final say on the matter.

As mentioned earlier, resize is synchronous and blocking due to its systemic cost so it makes sense to keep them at a minimum. Some of that responsibility falls on the window manager to ensure that a drag resize using a 2kHz mouse doesn’t also result in 2000 DISPLAYHINTs. Even if that would happen, the implementation has another trick up its sleve.

There is a small number of events which are considered costly and can be coalesced. When _wait or _poll encounters such an event, it sweeps the entire pending queue looking for similar ones, merging them together into the one eventually returned, providing only the most recent state.

Comment: There is a tendency for IPC systems to be designed as generally as possible, even if their actual narrow use is known. This defers the decision to commit to any one domain specific data model, making optimisations such as this one impossible -- you can't coalesce or discard what you don't know or understand, at least not responsibly with predictable outcome. This does not make complexity go away, to the contrary, now every consumer has increased responsibility to manage queuing elsewhere. The problem doesn't magically disappear just because you have an XML based code generator or some-such nonsense.

Most of this article has been about a single segment, though in ‘rich’ applications you would have more: child windows, popups, custom mouse cursors and so on. We already mentioned it is possible to request more, even though only one is ever guaranteed. This is not more difficult than setting up the primary one. You submit a SEGREQ event with a custom identifier and type. Eventually you either get a NEWSEG event or REQFAIL event back with the same identifier. For the NEWSEG you forward the event structure to arcan_shmif_acquire and you get a new arcan_shmif_cont structure back.

What does this have to do with queue management? Well, each new segment has their own separate queues and each segment can be processed and rendered on separate threads independent of each other. There is a monotonic per-client global counter and timestamp as part of each event to account for ordering requirements between ‘windows’, but in practice those are exceedingly rare.

A final part about the events themselves. SHMIF is an IPC system, it is not a protocol, it doesn’t cross device boundaries. We have a separate and decoupled networking protocol specifically for that. As an IPC system we can and should take advantage of device and operating system specific nuances.

Two such details is that each event has a fixed size, 128 bytes (64 did not cover all cases) which amounts to 2 cache lines for the vast majority of architectures out there. They are in linear continuous buffers at a native aligned base with access patterns that prefetch very well. The packing of different fields is tied to the system ABI which is designed to be optimal for whatever you are running on.

Safety Measures

We are almost done with the overall walkthrough, then we can finish off with some special cases and exotic features. Before then there is one caveat to cover.

In previous sections we have brushed upon a few tactics that protect against misuse; the validation cookie to detect corruption, version and code generation mismatches as well as the dead man’s switch. There is still one glaring issue from shared memory event management and audio/video signalling solution: what happens if the other end livelocks or crashes while we are locked waiting for a response?

In a socket based setup the common ‘solution’ for a crash in the other end would cause it to detach and you can detect that. For a live lock it is to resort to a kind of ping-pong protocol and somehow disambiguate between that and a natural stall for some other part of the system (very often, GPU driver).

By default (there is a flag to disable this) each segment gets a guard thread. This guard thread periodically (default: every second) checks the aliveness of a monitoring process identifier that the server filled out, as well as if the dead man’s switch has been released. If that happens, it immediately unlocks all internal semaphores causing any locked call into the shmif library to release and any further calls to error out so the natural event loop takes hold. This setup is also used to not only detect- but recover from- crashes (see ‘Special Case: Recovery and Migration’).

This might not be enough for troubleshooting or even communicating to a user that something is wrong. For this we have the ‘last_words’ part of the memory region. This is an area the client can fill out with a human presentable error message that the server end can forward to relevant stakeholders.

The Arcan engine itself splits out into two parts. One potentially privileged parent supervision process that is used to negotiate device access, and the main engine. This supervision process also acts as a watchdog. Every time the engine enters and exits a dangerous area, e.g. the graphics platform or the scripting VM, it registers a timestamp with the parent. If this exceeds some threshold, the parent first signals the engine to try and gracefully recover (the scripting VM is able to do that) and if the problem persists, shuts down the engine. This will trigger the guard threads inside clients and they, in turn, enter a reconnect, migrate or shutdown state.

Special Case: Handover Allocation

As we covered previously, requesting a new segment to use as a window takes a type that indicates its role and purpose. One such type that sticks out is ‘SEGID_HANDOVER’. This means that the real type will be provided later and that the segment will be handed over to a new client.

To better illustrate, let’s take a code example:

arcan_shmif_enqueue(&C, &(struct arcan_event){
  .category = EVENT_EXTERNAL,
  .ext.kind = ARCAN_EVENT(SEGREQ),
  .ext.segreq.kind = SEGID_HANDOVER
});

...

uint32_t new_token;

/* in event handler, 'ev' being the inbound event */
case TARGET_COMMAND_NEWSEGMENT:
  arcan_shmif_handover_exec(&C, ev,
                            "/path/to/something",
                            argvv, envv,
                            0 /* detach-options */);  
  new_token = ev.ext.ioevs[4].uiv;
break;

This would launch “/path/to/something” so that when it calls arcan_shmif_open it will actually use the segment we received in order to connect. We can then use new_token in other events to manage some of it, e.g. reposition its windows, inject input and more. All of this retains the chain of trust: the server end knows who invited the new client in and can treat it accordingly.

This can be used to embed other clients into your own window, to build an external window manager and so on. In our ‘command lines without terminal emulation’ shell, Lash#Cat9, we use that to manage graphical clients while still being ‘text only’ ourselves. For other examples, see the article on Another Low Level Arcan Client: A Trayicon Handler.

Special Case: Recovery and Migration

A key feature of SHMIF is that it can redirect and reconnect clients manually. Through this we can even transition back and forth between local and networked operations. The section on ‘Safety Measures’ covered how it works in SHMIF internals. There is also an article on ‘Crash Resilient Wayland Compositing’ (2017) that demonstrates this.

When a client connects, the library enqueues a ‘REGISTER’ event that contains a generated UUID. This can be leveraged by the window manager to persist location on the desktop and so on. At any stage it can also send a ‘DEVICEHINT’ event back.

This event is used to provide an opaque handle for GPU access in operating systems which requires that (which can further be used to load balance between multiple GPUs), but it can also mention a ‘fallback connection point’. Should the server end die (or pretend that it has died), the library will try to connect to that connection point instead.

If it is successful, it will inject the ‘TARGET_COMMAND_RESET’ event that we covered earlier. We will use the following clip from “A12: Visions of the Fully Networked Desktop“. as a starting point.

Migration of a client between devices through simulated recovery

In it, you see Lash#Cat9 CLI shell inside the ‘Durden’ Window manager having a video clip as an embedded handover allocation. This has previously used the local discovery feature of the network protocol to detect that the tablet in front of the screen (a Surface Go) is available as a sink and has added it as an icon in the statusbar — unfortunately occluded by the monitor bezel in the clip.

When I drag the window and drop it on that icon, Arcan sends a DEVICEHINT with the connection primitive needed for the network embedded into the event. It then pulls the dead man’s switch, forcing the shmif library to go into recover. Since it remembers the connection from the DEVICEHINT, it reconnects and rebuilds itself there.

This feature is not only leveraged for network migration as shown, but also for compartmentalisation between multiple instances; for crash recovery; for driver upgrades and for upgrading the display server itself. All using the same code paths.

Special Case: Accelerated Graphics

Many ‘modern’ clients unfortunately have a hard dependency to a GPU, and unfortunately the mechanisms for binding accelerated graphics between display server and client are anything but portable.

Comment: Khronos (of OpenGL and Vulkan fame) tried to define a solution of their own (OpenWF) that failed miserably. What happened instead is even worse; the compromise once made for embedded systems, 'EGL' got monkey patched with a few extensions that practically undoes near all of its original design and purpose, and it is suddenly what most are stuck with.

There is a lot of bad blood and vitriol on the subject that we will omit here and just focus on the SHMIF provided interface. Recall the normal way of starting a SHMIF client:

#include <arcan_shmif.h>
int main(int argc, char **argv)
{
  struct arg_arr args;
  struct arcan_shmif_cont C =
    arcan_shmif_open(SEGID_APPLICATION,
                     SHMIF_ACQUIRE_FATALFAIL,
                     &args);
}

This does still apply. A client is always expected to start a normal connection first, and then try to bootstrap that to accelerated, which can fail. The reasoning for that is iff permissions or GPU driver problems stop us from providing an accelerated connection, the regular one can still be used to communicate that to the user rather than have them dig through trace outputs for the answer.

To extend a context to being accelerated you can do something like this:

struct arcan_shmifext_setup cfg = {
  .api = API_OPENGL,
  .major = 4,
  .minor = 2
/* other options go here */
};

int status = arcan_shmifext_setup(&C, &cfg);
if (status != SHMIFEXT_OK){
/* configuration couldn't be filled */
}

There are a number of options to provide in the config that requires some background with OpenGL etc. to make any sense so we skip those. If you know, you know and if you don’t, enjoy the bliss. If the setup is OK, the passed ‘cfg’ is modified to return the negotiated values, which might be slightly different than what you requested.

Afterwards, you can then continue with arcan_shmifext_lookup() to extract the function pointers to the parts of the requested API that you need to use, bound to the driver end of the created context.

When writing platform backends to existing applications, they often provide their own way of doing all this and we do need a way to work with that. If there already is a context living in your process and you want to manually export and forward a resource, it is possible through:

size_t n_planes = 4;
struct shmifext_buffer_plane planes[n_planes];
n_planes = arcan_shmifext_export_image(&C, (uintptr_t) MyDisplay, (uintptr_t) my_texture_id, n_planes, planes);

if (n_planes){
  arcan_shmifext_signal_planes(&C,
                               SHMIF_SIGVID,
                               n_planes, planes);
}

There are a several more support functions with similar patterns for context management, importing already exported images and so on, but this should be enough to get an idea of the process.

Special Case: Debugging and Accessibility

We have already shown how the client end can request new segments through SEGREQ events and how those are provided through NEWSEGMENT events coming back. Another nuance to this is that the server end can push a NEWSEGMENT without having to wait for a SEGREQ in advance.

This can be used to probe for support for things, such as custom client mouse cursors, or to signal a paste or drag and drop action (clipboard is just yet another segment being pushed), as the server end will know if the client mapped the segment or not.

There is nothing stopping us from actually mapping and populating the segment from within libarcan-shmif, and there are two cases where that is actually done. One is for SEGID_DEBUG and another for SEGID_ACCESSIBILITY.

There are two longer articles related to how this works in more depth, one on ‘Leveraging the Display Server to Improve Debugging’ and another on ‘Accessible Arcan: Out of Sight’.

If one of these are received, libarcan-shmif will (unless the feature is compiled out) internally spawn a new thread. In the debugging case it will provide a text interface for attaching a debugger, for exploring open files, inspecting environment and memory from within the process itself. In the accessibility case it will latch on to frame delivery (SHMIF_SIGVID) in order to overlay text descriptions that gets clocked to the video frames being delivered and the dirty regions being updated.

Special Case: Text-only Windows

There are more reasons as to why a set of fonts and desired font size is provided during the preroll state and why there is a ‘text rows and columns’ field in the static region.

Among the hints that can be set for the video region, there is SHMIF_RHINT_TPACK. This changes the interpretation of the contents of the video buffer region to use a packing format (TPACK) which is basically a few bytes of headers and then a number of rows where each row has a header covering how it should be processed (shaping, right-to-left) along with a number of cells with the formatting, colour and possible font local glyph indices (for ligature substitutions).

The format is complete enough to draw anything a terminal emulator would be able to throw at it, but also do things the terminal emulator can’t, such as annotation layers or borders that fall outside of the ‘grid’, saving precious space.

This approach lets the server end handle the complex task of rendering text. It also means that the costly glyph caches, GPU related acceleration primitives like distance fields and so on can all be shared between windows. It means that the server can apply the heuristic for ‘chasing the beam’ style minimal latency or tailor updates to the idiosyncrasy of eInk displays when appropriate, and that the actual colours used will fit with the overall visual theme of the desktop while letting the client focus on providing ‘just text’.

While it is possible to build these yourself, there is a higher level abstraction support library, ‘libarcan-tui’ for that purpose. The details of that, however, is a story for another time.

Posted in Uncategorized | Leave a comment

Accessible Arcan : Out of Sight

Our reference desktop environment, Durden, rarely gets covered here these days. This is mostly due to the major features are since long in place and that part of the project is biding its time with smaller fixes while waiting for improvements in the rest of the stack.

Recently the stars aligned and I had some time over to work on the accessibility story, particularly on the ‘no vision’ parts. Other forms (e.g. limited mobility, low vision) will be covered in future articles but most things are already in place, including eye tracking and bespoke devices like stream decks.

Here is a short recording of the first run of a clean installation, setting things up.

This is enabled by default during first setup. The first question presented will be to disable it, but there is no hidden trapdoor combination or setting to enable it for the first time.

The following recording shows using the builtin menu system to start a terminal emulator and open a PDF and the mouse cursor to navigate, with OCR results as I go. These will be elaborated on further in this article.

There is a previous higher-level article which covered a rougher outline of what is intended, but this one is more specific about work that can be used today — albeit with some rough corners still.

One detail to take away from that article is how the architecture splits user data processing into specific one-purpose replaceable programs. These can be isolated and restricted to a much stronger degree than a more generic application.

These are referred to as frameservers. They have a role (archetype) and the ones of interest here are decode and encode. Decode translates from a computer native representative to a human presentable one, like image loading into a pixel soup or annotated text into sound via synthesised speech. Encode goes from potentially lossy computer to human translation such as pixel soup to text via OCR, image description or transcribing audio.

Another detail is that the “screen reader” here is not the traditionally detached component that tries to stitch narration together through multiple sidebands. Instead, it is an “always present first class mechanism” that the window manager should leverage. There is no disconnect between how we provide visual information from aural and they naturally blend with the extensive network transparency as part of our ‘many devices, one desktop’ design target.

Key Features

Here is a short list of the things currently in place:

  • Multiple simultaneous positional voice profiles for different types of information
  • On-Demand Client requested Accessibility windows
  • Force-injected client accessibility fallback with default OCR latched to frame updates
  • Buffered Text Input with lookup oracle
  • All desktop interaction controls as a file-system
  • Command-line shell aware
  • Keyboard controlled OCR
  • Content aware mouse- audio feedback
  • Special controls for text-only windows tracking changes

There is a lot more planned or in progress:

  • Premade bootable live / VM image and generator
  • Bindings to additional client support (AccessKit)
  • Compatibility preset profiles to help transition from NVDA/Orca
  • Extend accessibility fallback with description oracles (LLM:offline, …)
  • Extend speech process format with waveform/tone/samples for formatting
  • Extended lookup oracle
  • Stronger language controls/toggles
  • Scope based ‘few-line’ text editor extension to shell
  • Haptic support
  • Braille Terminal output (lacks hardware access currently)
  • Indexer for extracting or generating alt-text descriptions

Screen Reading Basics

Let’s unpack some of what happens during setup.

The TTS tool in Durden uses one or many speech profiles that can be found in the durden/devmaps/tts folder. Each profile describe one voice, how its speech synthesis should work, which kinds of information it should convey and any input controls.

They allow for multiple voices to carry different information at different positions to form a soundscape, so you can use ‘clean’ voices for important notifications and ‘fast robotic’ for dense information where actions like flush pending speech buffer doesn’t accidentally cancel out something important.

A compact form starts something like this:

model = "English (Great Britain)",
gain = 1.0, gap = 10, pitch = 60, rate = 180, range = 60, channel = "l", name = "basic_eng", punct = 1, cappitch = 5000

These are just the basic voice synthesis parameters one would expect. Then it gets better.

actions = {
select = {"title", "title_text"},
menu = "menu",
clipboard = "clip",
clipboard_paste = "clip-paste",
notification = "notify"
}

bindings = {
m1_r = "/global/tools/tts/voices/basic_eng/flush",
m1_t = "/global/tools/tts/voices/basic_eng/slow_replay"
}

These tell what this voice gets to do. The action keys mark which subsystems it should jack into, and some custom prefix announcement as value. The profile above would present all menu navigation, system notifications, clipboard access and window title on selection.

The bindings are keyboard binding overlays that take priority when the voice is active. Just like any other binding, they map to paths in the virtual filesystem that Durden is structured around. The two examples shown cancels all pending speech for the specific voice, or turns down the speech rate to low and repeats the last message then returns it back to the voice default. There are, of course, many others to chose from including things like key echo and so on.

Holding down any of the meta or accessibility bound buttons for a few seconds without activating a specific keybinding will play the current ones back for you to ease learning or refresh your memory.

By adding a position = {-10, 0, -10} attribute to the profile the system switches to 3D positional audio and, in this example, positions the voice to your back left. This feature also introduced the /target/audio/position=x,y,z which lets you move any audio from the selected window to a specific position around you, along with /target/audio/move=x,y,z,dt which would slide the voice around over time.

With an event trigger, e.g. /target/triggers/select/add=/target/audio/position=0,0,0 and /target/triggers/deselect/add=/target/audio/position=10,0,-10 the soundscape around you also match window management state itself.

The cursor profile has a few more things to it:

cursor = {
alt_text = "over ",
xy_beep = {65.41, 523.25},
xy_beep_tone = "sine",
xy_tuitone = "square",
xy_tuitone_empty = "triangle"
...
}

The alt_text option would read any tagged UI elements with their text description. XY beeps specifies the frequency range to map the mouse cursor coordinates based on their screen position so that the pitch and gain changes as you slide it across the screen.

The waveform used to generate the wave will also change with the type of contents the cursor is over, so that text windows such as terminal emulators will get a distinct tone and distinguishes between empty and populated cells.

The following clip shows navigating over UI elements, an browser window and a terminal window. You can also hear how the ‘select text to copy to clipboard’ doubles as a more reliable means of hearing text contents in uncooperative windows.

There are also more experimental parts to the cursor, such as using a GPU preprocessing stage to attenuate edge features and then convert a custom region beneath the cursor into sounds. While it takes some training to decipher, this is another form of seeing with sound and applies to any graphical content, including webcam feeds. After some hours I (barely) managed to play some graphical adventure games with it.

Kicking it up a notch

Time to bring out the spice weasel and get back to the OCR part I briefly mentioned earlier.

The astute reader of this blog will recall the post on Leveraging the “Display Server” to Improve Debugging. A main point of that is that the IPC system is designed such that the window manager can push a typed data container (window) and the client can map and populate it. If a client doesn’t, there is an implementation that comes along with the IPC library. That is not only true for the debug type but for the accessibility one as well.

This means that with the push of a button we can probe the accessibility support for a window, and if none exists, substitute our own. This is partly intended to provide support for AccessKit which will complete the solution with cooperative navigation of the data model of an application.

The current fallback spawns an encode session which latches frame delivery to the OCR engine. The client doesn’t get to continue rendering without the OCR pass completed so the contents is forced to stay in synch.

This also where our terminal replacement comes in, particularly the TUI library used to provide something infinitely better than ncurses. The current implementation turns the accessibility implementation into a window attachment with a larger font (for the ‘low vision’ case), the shell populates it with the most important content and the text to speech tool kicks in.

Such text-only windows also get added controls, /global/tools/tts/voices/basic_eng/text_window/(speak_new, at_cursor, cursor_row, synch_cursor, step_row=n) for one dedicated reading cursor, that you move around separately.

Our custom CLI shell, Cat9, probes accessibility at startup. If found, it will adjust its presentation, layout and input control to match so that there is instant feedback. There is still the normal view to explore with keyboard controls, but added aural cues. The following clip demonstrates this both visually and aurally:

clip of using basic cat9 shell controls, with readline, completion and content feedback through the accessibility window

All this is, of course, network transparent.

A final related nugget covers both accessibility and security. The path /target/input/text lets you prepare a text locally that will be sent as simulated discrete characters. These are spaced based on a typing model, meaning that for the many poor clients that stream over a network, someone with signal processing 101 doing side channel analysis for reconstructing plaintext from encrypted channel metadata will be none the wiser.

This is useful for other things as well. For an input prompt one can set an oracle which provides suggestion completions. This is provided by the decode frameserver through hunspell, though other ones are just around the corner for filling the role of Input Method Engines, Password Manager integration, more complex grammar suggestions and offline LLM.

The following clip shows how I first type something into this tool, then read it back from the content of the window itself.

The current caveat is that it still does not work with X11 and Wayland clients due to their embarrassingly poor input models. Some workarounds are on their way, but there are a lot of problems to work around, especially for non-latin languages.

Posted in Uncategorized | Leave a comment

A Spreadsheet and a Debugger walk into a Shell

Here we continue the series of posts on the development of a command-line shell which defies terminal emulation by using the display server API locally and a purpose built network protocol remotely.

Previous episodes include: The Day of a new Command-Line Interface: Shell (high level) Whipping up a new Shell – Lash#Cat9 (technical demo), Cat9 Microdosing: Stash and List (feature highlight), Cat9 Microdosing: Each and Contain (feature highlight)

For this round we have added an interactive spreadsheet representation and a Debug Adapter Protocol implementation. These come as two discrete sets of builtins (groups of commands), ‘dev’ and ‘spreadsheet’ with a surprise interaction or two.

The intent for the dev builtin is to eventually collect more and more useful tools for managing all aspects of software development, from source control management to building, testing and fuzzing.

Starting with the spreadsheet. By typing:

builtin spreadsheet
new

You would get something like the following screenshot:

The following clip shows how I spawn new spreadsheet, populated by a CSV source and using the mouse cursor to interact with the layout.

This complements the form intended for media processing that was presented a few years ago as “Pipeworld” and re-uses the same language and parsing logic.

Cells can be populated with static contents, expressions like =max(A1:A4) or even shell commands through the ! prefix, e.g. !date +%Y:%M:%S. The following clip shows some basic expression use, as well as forced reprocessing of expressions and shell commands.

Combining shell commands with expressions that are re-executed on request

More useful is to populate the sheet with outputs from some command and processed by Lua patterns. The following screenshot shows the result of running insert #0 4 separate "%s+:" !cat /proc/cpuinfo

Populating a spreadsheet at some insertion point using a shell command split and separated with Lua patterns

Exporting can be done using the existing copy builtin with control over output format, subrange and so on: copy #4(csv, compact, a1:b5) to export the resolved values of a1,b1,a2,b2 … as CSV.

There is still more to be done for this to be a complete replacement for the likes of sc-im, and to add things like plotting via gnuplot (if I can ever get their plot language to behave) and graphviz, as well more experimental things like importing Makefiles – — but at least for my, admittedly humble, spreadsheet uses it is good enough for daily driving.

Onwards to the Debugger

To start things you would do something like the following:

builtin dev
debug launch ./test

The ‘dev’ builtin set is used as it will eventually accumulate all developer task related commands like building, deployment, source control management and so on.

The following clip shows the default behaviour for that in action. In it you can see how multiple discrete jobs are created for managing threads, breakpoints and so on, detachable and with mouse cursor handling in place.

Basic debugger use and navigation

In the clip I also showed the process of stepping through threads, spawning source view, setting and toggling breakpoints, inspecting registers and modifying target variables.

I have spent thousands of hours staring at the GDB CLI prompt, and hated nearly every second of it. Not because of the relaxing task of debugging code or exploring software state itself, but for the absolutely atrocious terminal thrashing interface even with mitigation layers such as pwndbg. In fairness, a debugger TUI is in the deepest end of the complexity pool to get going.

There is a lot that goes into handling the protocol, and quite a few telltale signs of its designers, so we have just passed the point of basic bring-up. Importantly it is all composable with the data manipulation, filtering and transfer tools we already have elsewhere.

As an example of that we have ‘contain’ from the previous article, for instance, to bunch all the subwindows together into one contained job, useful when running multiple debug sessions side by side to step through multi-process issues.

We do have some other conveniences in place. Stepping controls are defined by the granularity of the job it represents, so stepping in the disassembly view would step instructions, while stepping in the source view would go by line or statement and so on.

Now to close the loop and mix in the spreadsheet part. In the following clip you see me picking a few registers and thread source location that gets added to a watch set. Whenever thread execution is stopped, these will be resampled and updated.

I then click the ‘spreadsheet’ option which will create a spreadsheet and populate it with the contents of the watchset as I go.

Live mapping watched dataset to spreadsheet

With all this in place we can almost start stitching the many other related projects together, from the data visualization from Senseye (closing in on 10 years…) with the window management from Pipeworld to the harnessing and setup from Leveraging the Display Server to Improve Debugging and build the panopticon of debugging from the plan presented in “Retooling and Securing Systemic Debugging” (2012, doi:10.1007/978-3-642-34210-3_10). But that is for another time.

Posted in Uncategorized | 1 Comment

Cat9 Microdosing: Each and Contain

Time to continue the journey towards better CLI shells without the constraints of terminal emulation. In the previous post we looked into the new commands list and stash. List was for providing an interactive and live updated take on ‘ls’, and stash for batching file operations.

This time we get contain for merging jobs together into datasets and each for batched command execution.

First a quick look at the latest UI convenience: view #job detach . This simply allows us to take any job and pop it out into its own window. The following clip shows how it works with ‘list’, initiated with mouse drag on the job bar. List was chosen as it uses both keyboard and mouse navigation, as well as spawns new windows of its own.

Showing detaching a list job, navigating and watching a media resource and re-attaching it again.

Onwards to contain. In the following clip I create a new job container by typing contain new. I then spawn a few noisy jobs and tell contain to adopt them through contain add.

Creating a container, manually adding two jobs to it and then toggling between viewing their contents and an overview of their status.

By default contain will show an overview, coloured by their current run status. I can step through the job outputs either by clicking on their respective index in the job bar or type in contain show 1 (or any other valid index).

The container can also be set to automatically capture new jobs. In the following clip I spawn such a container and then run some commands. Those get added into the container automatically.

Using contain capture to spawn a new container that absorbs new jobs until cancelled.

Contain meshes with commands like repeat, applying the action for all contained jobs at once. It gets spicier when I chose to merge the output of multiple contained jobs, either by right-clicking their entry in the job bar, or manually by running contain #0 show 1 2. These are then treated as a single dataset for any other commands, e.g. copy #0(1-100) that operate on the data for a job.

Contain even applies to interactive jobs. In the following clip I contain a ‘list’ in a detached window and show that mouse navigation is still working.

contain catch on an interactive builtin (list) detached into a separate window working as expected

Moving on to each. Each is related to ‘for’ in Bash and similar shells, locally known as the syntax that I never recall when I need to and rarely get to do precisely what I want. Since we accumulate previous command outputs in discrete and typed contexts, we can avoid the “for I in file1 file2 file 3 do xyz $I done;’ form and reference the data to operate through our job and slicing syntax.

Starting simple, running this:

each #0(1,3,5-7) !! cat $arg

Anything before !! will be treated as part of the each command, and anything after will be reprocessed and parsed with $arg substituted for the sliced data, with some special sauce as $dir which will check if it is referring to a file and substitute its path, or use the path of the referenced job.

While it might at the quickest of glances look similar to the ‘for’ setup, the actual processing is anything but. Recall that everything we do here is asynchronous. If I would swap out ‘cat $arg‘ for ‘v! cat $arg‘ each invocation would spawn a new vertically split window, attach a legacy terminal emulator to it, and run the cat command.

Each also supports processing arguments:

each (sequential) #0(1,3,5-7) !!open $arg

Would functionally make it into a playlist. In this clip you can see how the media in the stash opens, and each time I close the window it launches the next in line.

Using each on a stash of files to build a sequential playlist, running the next command when the previous finishes

Since we are not fighting for a single stdin/stdout pipeline, we have more interesting options:

each (merge) #0 !!cat $arg

This joins forces with the contain command by spawning a new container and attach the new jobs automatically.

The contained set of jobs also interact well with other commands, like trigger or repeat. In the following clip I repeat the previous form of running each on a stash of files. I then run the merge / cat command listed above and you can see how the commands keep progressing in parallel. Running repeat on the container would repeat the commands that had finished executing, merging output with the previous run, while letting the ongoing ones continue until completion.

Each with (merge) option on a stash of files, using sh ‘cat’ to read the contents of the files, repeating the completed subset.

The container here would also respect commands like trigger. Having a stash of makefiles, running them through a contained each like this:

each (merge) #stash !! make -f $arg -C $dir
trigger #0 ok alert "completed successfully"

Would treat each in the stash as a makefile, dispatch make, merge it into a container, associate a trigger with all merged jobs completing successfully and trigger a desktop notification.

That is enough for this time, next time around we will (likely) see what we can do to assist developer tooling such as the venerable gdb.

Posted in Uncategorized | Leave a comment

Cat9 Microdosing: Stash and List

There has been numerous quality of life improvements added to our terminal emulation liberated command-line environment, Lash, to the curses-like “libarcan-tui” it relies on, but particularly to its reference shell implementation, Cat9. These have mainly been subtle enough that there is little point in making the effort with longer dedicated write-ups on their various implications and overall splendour.

In general my criteria for any non-release write-up on this site is that the contents should, at the very least, highlight and counter some of the many deficiencies in the popular solutions, as well as fit into a grander story.

For the chapter on text-dominant shells there are quite a few candidates in the pipeline. Rather than building them up for an information overload like I did with the original release post, I decided to space them out with a focus on one or two of interest.

For this round I have selected ‘list‘ and ‘stash‘, as they interact and complement each other in a nice way, as well as showcase better integration with the outer desktop.

Before that, lets highlight a few conveniences first. In the following screenshot you see how the suggestion popup gained some width after I pressed F1 and how the borders of the popup itself defies the grid, as it is not using line drawing characters. Many commands now support descriptions of what they or any subcommands do.

screenshot of command completion with extended hints

They also give stronger validation feedback about the validation error and the affected offset.

screenshot of stash map command failing validation, region for failure and reason

As well as hint about what the current argument is expecting:

completion showing that “stash map” is expecting a “source item” argument.

List supplants the well-known ‘ls’ (dating back to the very first AT&T UNIX). Although we still can fork and run ‘ls’ just like any old shell can with the generic improvement that we can reference its contents and context long after it is gone; repeat its execution later in-place and detect changes; slice out selected results and so on. Just as it is bland and boring, it is also problematic.

The file in the following screenshot does not exist, or at least isn’t called that and you can’t refer to it using the output of ls alone. If isatty() applies to stdout it behaves one way and you might get escaping that may or may not work for referencing it interactively depending on your libreadline implementation. It behaves in another when in a pipeline. All parsing of the output is ambiguous, we know that since long, don’t parse ls and all that. The data might be outdated the second it is produced and you wouldn’t know. Tab completion output presents it in yet another way.

screenshot of ls and bash struggling with a file, since everything apparently is a file, all is struggle.

Enter list. As shown in the following screenshot, as soon as I run the command some subtle things happen. One is that the number of ‘active’ jobs hinted at in the prompt does not go back to zero.

screenshot of Cat9 having run ‘ls’ versus our own ‘list’

That is because list is still very much alive. If I create a new file in the folder, the list automatically updates and marks that an item is new. If I click on folders it navigates to them, and if I click on regular files some default open action will be triggered. I can change sorting order without re-execution. For the setup I have here, the default open action tells the outer window manager to swallow up until terminated, and hand over to a dedicated player or viewer and this composes over the network if run that way. The same naturally works if I hand input focus to the list job and navigate using the keyboard. The following clip demonstrates all that:

video clip demonstration interacting with the output of list using both keyboard and mouse

There are quite a few sneaky details to this. If I would, for instance, change my working directory and try to reference contents in the list, what would happen? In the following screenshot you can see the result of copy #0(2,4,5) and how the list resolves to its full path and the presentation metadata is stripped away. The crazy file is now gone after composing with /bin/rm by running rm #0(8) — no fragile escaping necessary.

screenshot of running copy #0(2,4,5) with list correctly separating content from presentation

This brings us to stash. Internally the mouse-2 binding for list was set to stash add #csel($crow) which means add to stash the currently selected row in the currently selected job. Stash is a singleton active job (only one running). It acts as a dynamic list of sorts, but there is, of course, more to it. Stash accumulates files on demand from other source, such as list, drag and drop actions and so on. The following clip shows that in action.

right clicking items in list outputs automatically adds to stash, then using the stash to remove picked files

It also act as a generic staging area. A simple danger is that you “ls |grep pattern” and based on that information run a destructive command like “rm another_pattern”. The caveat being that infinitely many things might have happened between your ls and rm and innocent files are lost in the process. With stash I explicitly build my change set, then stash unlink yes to explicitly commit to only apply to this set.

As with list and others, we have deeper mouse and keyboard interaction as well as visual layout separated from the actual content because we are not a dot matrix printer. Resizing will prioritise content based on the space allotted. Presentation and content is separated. This screenshot shows a compact window having run ‘cat #stash’ to show composition with traditional cli tools.

screenshot show composition of stash contents with /bin/cat via ‘cat #stash’ appearing as job #2 at the top

Perhaps you can mimic some of this (not really, I am just being characteristically smug) via some of the recent trendy perversions of putting more logic into a terminal emulator clinging on to that riced PDP-11 experience as the default.

Another stash action is verify. This queues a checksum calculation for every file referenced in the stash. When that command is repeated, any entry where the checksum has changed gets highlighted. The following screenshot shows a stash of files where one was still being modified when the stash was re-verified.

This is useful to ensure that you are not operating on a stash that is currently being modified, or having preset tripwire protocol, mark the stash as part of the serialisable cat9 state set and move between machines to check that they match. It also goes well with the next item.

You should have noticed the arrows and that the name appears twice. This is for the last feature to demonstrate. If I run stash map <src> <new-dst> the arrow changes. This lets me give a name to the nameless, e.g. file descriptors being passed to the window. If I combine that with stash archive an archive (arc, tar, …) will be built using the mapping presented. The following screenshot shows such an archive built and extracted.

the result of building a stash, remapping (#1) building that as an archive (#2) and extracting it (#3)

In the following clip I use the file-picker in my outer desktop (durden) to drag and drop two files into the shell window. As you can see these are automatically added to the stash. Note that they are not shown by their path, because I am actually running this remotely over our network protocol.

drag and drop into remote shell, resolving into local files and archiving

I then use resolve to ensure that the items (marked !) are resolved to locally readable files as tar is not capable of accepting sets of passed file descriptors and working from (so eventually needs replacement with something better). Lastly I archive them together.

Posted in Uncategorized | 2 Comments

Arcan 0.6.3 – I, pty: the fool

It has been well over a year since last time, but also some fairly beefy updates to sum up for our dear old Desktop Engine. Strap in.

In general project news, there are both grim and good ones. Starting with the grim: Mr. T-iago, a great friend and avid supporter of the project since days of Recon, has left for the mainframe of yore. T forever remains a gentleman; a scholar as Portuguese as Danish Pride can get; a rare fellow reverser knowledgable in the ways of +Fravia; A BJJ brawling DC black badge bearer and Big Lebowsky sweater wearer. Fuck cancer.

In ‘good news everybody’, the project has received promises of generous funding from both a private entity and from EU-NGI/NLNets NGI0 Entrust fund.

The private funding will be used to improve client compatibility starting with the KVM/QEmu UI/Audio driver and working from there down the list of relevant targets; Xorg, Qt, Imgui, SDL/GLFW, Chrome-ozone.

The NLNet sponsored parts aims for improving and documenting our network transparency layer and its associated protocol, A12. The main focus is on safety and privacy aspects of service/device discovery of the implicit web of trust that emerge from keystore management, but also on the server-end state controls and coordination. A deeper dive into parts of this can be found in the recent post on ‘A12: Visions of the Fully Networked Desktop‘. Some of its video clips are used here as well, either because they were so good that they are worth repeating, or the release simply was not visual enough otherwise.

On related projects there are some pop:ing up that attempt to package and build OSes with Arcan as a base component. One early such attempt is Eltanin.

A note for the packagers and others with related interest in the configuration management of the project: we will move away from GitHub and git posthaste. Microsoft has completely obliterated what little good faith that was left in their operation and my account is no longer active. While peeling off that scab, the plan is to move to self-hosted Fossil with a public mirror or two. This will also be used as an opportunity to swap out CMake for Xmake and solidify our stance on ‘Lua pretty much everywhere’. Thanks to the advancements of our network tooling, the hope is that pseudonymous issue collaboration, forum and ephemeral communication all can be migrated away from our discord with GitHub and well, Discord, soon enough, but that is a different post. Since we are practically rolling release yet such distributions rarely apply that rule to packages, we will soon start automatically / incrementally tag updates .topic.release.patch on a weekly basis assuming there are any changes since last.

True to form, the following clip is a collection of the ones in this post, set to a fitting tune and padded with some from our terminal-free CLI shell Lash#cat9 (including something not shown before). Mr.T was a strong instigator behind that specific work, though he never got to see it come to fruition. Thank you, Mr. T – the rains have ceased old friend.

Discovery and Networking

The networking protocol, A12 consists of several loosely connected components. First we have the protocol implementation itself, src/a12. This is used to build two different interfaces. One which does not require all the rest of arcan, only the IPC system libraries (libarcan-shmif, libarcan-shmif-server). This is the arcan-net binary.

As shown in previous release posts, it can be used to setup and manage most practical aspects of the protocol, but it is a developer or a system tool more than an end user thing.

The other is afsrv_net and its corresponding path in the high level Lua APIs, e.g “net_open, net_discover) …” for building networked arcan appls. While using nearly the same code for key management and communication as arcan-net itself, since a graphical environment is guaranteed, there are more options for building user accessible interfaces and integrations.

In this old clip you can see a tool in our reference desktop environment, Durden , used to connect to an arcan-a12 directory server hosting appls (such as durden itself, or our more experimental dataflow ZUI, pipeworld), selecting an appl, downloading, running, changing some visible configuration and synching state.

Showing networked configuration persistence and appl download/execution.

In this clip we take things even further.

Live-editing and synching an appl across multiple devices through a shared external server.

Here I repeat the ‘download and run appl’ scenario, but I also open it up, modify it and you can see how other devices using it live updates as well.

The code for this part of the server has also been refactored to strongly (stdio + fdpassing + no-filesystem) contain the processing of each client.

A non-polling local network discovery form has also been added. Previously the keystore was just swept, trying to make connections to the hosts mapped to each petname. Now devices beacon a challenged set of identities that others can trigger from.

In the following clip there is a debug build running on an old 1st gen Surface Go connected over a dodgy WiFi. It has been been marked as discoverable (arcan-net discover beacon). On the main desktop I tell Durden to look for devices, and it is set to alert as a status-bar button on new discoveries. I drag a browser window to it, and decide to share-active, meaning to allowing it to provide input for this window only. I type a few things on the attached keyboard to show that it indeed doing that. I toggle a post-processing effect on the desktop end, showing that it is possible to mix-in accessibility features while sharing.

Lan network discovery with streamed sharing and contained input injection.

Yet another nice step towards the ideals defined in our principles as well as for implementing Arcan as OS Design.

Tan, Tui!

We are finally closing to the end of the long standing goal and sub-project of migrating away from terminal emulators as a way of building command line and text dominant applications, replacing it with the much more display server friendly arcan-tui library.

The more recent details of this can be found in the blog post on: Day of a new command line interface: shell along with the technical counterpart: Whipping up a new shell: Lash#Cat9, and tangentially Writing a console replacement using Arcan (for the getty like- replacement), although there is one article yet missing for the developer side about how we also have the means to get rid of curses.

Many of the deprecated functions previously around only to support how our terminal emulator used the screen abstraction in TSM are now blocked out by default, and the screen abstraction itself is gone from the TUI libraries, making updates faster and memory requirements much lower.

Border drawing is now processed as cell attributes and not actual ‘line character’ glyphs, making them faster to draw, consuming no precious grid space and does not interfere with text selection or confuse screen readers. The readline widget uses them by default, shown in the completion popup in the next clip.

This clip also shows how cooperation with the outer WM continues to advance. It is possible to dock into a possible tray or status-bar, as well as supporting ‘swallowing’ window management mode. The swallow trigger can be seen at the end of the clip, prompted by the s! command prefix.

CLI shell with non-grid borders, file system monitoring and WM negotiated client open modes

The gritty details on how embedding and controlling external processes work has also advanced. In the following clip you can see how a TUI client delegates embedded media playback, but can influence both scaling, positioning and some input routing. I then tear it out, and send it on its merry way to another computing device thanks to proper network transparency.

Transition between embedded external media in a tui surface to WM management to network redirection.

Cipharius also added a caching mechanism to the server side text processing, meaning that text surfaces (including all Tui clients) will now share glyph caches between each other, reducing the cost for windows that share default (font, size, hinting and density).

There are still some more gains to be had on the server side text processing end, especially for XR and accessibility by moving to atlases with distance field representation. Then we can re-add some of the features that were temporarily stripped during the move to server-side text, like smooth scrolling and BiDi/shaping. Combining that with map_video_display(…) and we can have all the knobs and tools to get fullscreen TUIs with embedded media racing the beam, reliably and tear-free across the stack — getting dangerously close to perfect balance between latency and power consumption.

Compatibility Work and NSA 0 Day

As stated in the previous release post, we’re pretty much ‘done’ wizzbang feature wise – anything left to add is rather minor. This is the point when one should think about compatibility in all kinds of directions — backwards, forwards and sideways.

In that spirit, much more work gone into our Xorg DDX, Xarcan. A number of interesting things has happened, warranting an article of its own, but several different ways of using it has emerged.

One is that it now supports the ‘Redirected’ way of translating window hierarchies into windows that integrate with the Arcan side in the way pioneered by Xwin and Xquartz way back in the early 00s. This means you can push a single X11 application over A12 — network transparent outbound:

 ARCAN_CONNPATH=a12://my.host Xarcan -redirect -exec chromium

Or serve it up inbound:

arcan-net --soft-auth -l 6680 -- /usr/bin/Xarcan -redirect -exec chromium

This clip shows that in Durden:

Network pushed chrome through single rootless Xarcan.

You can still go the old route through arcan-wayland -exec-x11 chromium but it will increasingly lag behind in features and performance unless someone steps up, as I am personally absolutely completely done touching anything Wayland code wise — the post mortem analysis of how ‘so simple it is nigh useless’ into ‘X but magnitudes worse’ is one for the ages.

With Xorg basically up for grabs – there are a lot of fun ventures to pursue here and it is contrary to popular belief quite salvageable, much more so than its proposed replacement.

As is the case with window management, there is a larger story on how to fix the remaining Xorg security and accessibility nuances. A historical oddity uncovered while playing around with this that will be left as an exercise to the reader to exploit: here is a part of the XAce security mechanism and attempt to define an intended security perimeter. You should be able to come up with at least three ways of circumventing it, without ever chasing overemphasized trails on langsec ‘memory safety’ and dull things like that — In code contributed by no other than the NSA.

More interesting still is the third mode shown in this clip:

Never before have Xeyes looked this confused.

Ignoring some glitches courtesy of my HDMI capture setup, you can see me starting Xarcan that proceeds to setup window maker. This makes Durden automatically assign it a new fullscreen workspace, with some special rules applied. I start Xeyes to feel less lonely, and, as per usual, it begins to death stare the mouse cursor. I start Pipeworld through the Durden HUD, but any window that spawns in a workspace owned by Xarcan somehow gets decorated and treated as part of the Xorg space, yet Xeyes loses track. Not depicted but also true: not even the mighty xinput --test-xi2--root knows what is going on. There might be something less neighbourly to this story.

In other news, afsrv_terminal has received support for another terminal emulation state machine, that of suckless. The point was partly to get something slimmer across the hips than TSM, and to get a very close 1:1 example “writing something like this in Arcan-tui vs X11”, server side text included. It also gives us a better point of comparison for quality aspects like text rendering, and quantifiers for latency and memory consumption.

For those interested in the Zig programming language, the true C-lang successor: Cipharius wrote an Arcan-Tui friendly frontend to the Kakoune editor, https://github.com/cipharius/kakoune-arcan.

Not only that, but he also whipped up a Remarkable2 friendly tool that will be used for various forms of sharing its features and resources over A12, what a guy! Hopefully the same approach can be used to bridge other sheltered ecosystems that thought they were safe from assimilation.

Audio

As a prelude to what is actually a large part of the topic for the 0.7 series of releases, Cipharius also dug into the build system and started to split out the engine paths previously tied to OpenAL.

While the work is mainly structural, it leads to being able to build LWA, Lightweight Arcan (compare the relationship to a less architecturally tragic form of what Electron is to Chrome) without having to patch and link a special version of OpenAL, which is one of the heavier parts of the current build environment.

Testing this out, it is now possible to chose a ‘stub’ audio platform which disables all audio processing. Thus if you never use the audio parts in neither playback nor recording or streaming, you can disable it and let the CPU idle some more.

With that a segment type for an audio only segment has been added. The use for this is high-performance critical situations where the synchronous nature of resizes could cause underrun artifacts like audible clicks on window resize operations. It will also be used to convey dynamic positional audio sources when mixing for HRTFs in XR and for general surround sound.

Video

On the video platform side, the EGLStreams code have been evicted and atomic modeset is now the default over legacy – layering violations and robust synchronisation primitives be damned. Now Nvidia users can enjoy things randomly working in a different way from before — yet one that seem to basically use the same internal driver paths according to my friend Ghidra Binaryblobbington; almost as if the one was shoehorned into the other at minimal effort. Curious how that came to be.

More plumbing have been made to HDR processing, and experimenting with the WM/Client side of things are next up as the core- bits are usable enough.

In a previous release we added the option for mapping / setting 10-bit and 16-bit /channel modes to corresponding rendertarget storage passes and mapping them onwards.

On the scripting level the WM can use “image_metadata(…)” to attach custom metadata to a video object (such as the rendertarget output from a colour corrected offscreen pass). If that object is mapped via “map_video_display(…)” the metadata will be passed on to the next layer (presumably your display).

For the client size, if one uses extended resize and toggles SHMIF_META_HDR as the list of buffer complements and the WM has marked that as permissible through target_flags(…), each signalled frame will now sync the associated metadata (though the WM can still override this through the image_metadata function, since HDR content in the wild come from various degrees of brokenness).

Next up is to wire this through our main testing tools for the purpose, aloadimage and afsrv_decode – and thereafter make sure that when arcan is used as a client/toolkit for itself, it too can tag the post-composition metadata.

Compressed Passthrough

The decode frameserver, used to aggregate media capture and parsing into one-off sandboxed processes, has received support for h264 video frames when running against a video device using UVC. With this there is also the option to try and pass-through the still-compressed frames through to the display server end.

Arcan rejects such frames as any local client should just offload into afsrv_decode and embed the resulting surface as shown in the Tui example. When redirecting over arcan-net on the other hand, that is a different story. That one will now happily try to passthrough and just multiplex/encrypt/stream – saving us a a full decompress/compress cycle and reduce degradation in transit.

This will turn out to be useful in quite a few networked video cases, like game streaming, networked surveillance cameras and low-tier media hardware.

Tracing

Cipharius has also been busy elsewhere in the project. The tracing hooks that are sprinkled all over the place can now be built with a version that fully integrates with the awesome Tracy profiler. This will be instrumental for the 0.8 branch when it is time to finally start optimizing for real.

Posted in Uncategorized | Leave a comment

A12: Visions of the Fully Networked Desktop

This is a higher level update on what has been going on with the current focus topic for Arcan releases, that is the network transparency (2020). It comes as a companion to the upcoming release post as a way to give it more context.

Backstory

A12 is our network protocol that lets applications written against libarcan-shmif communicate remotely. A12 and SHMIF share most of the same data and event model, but they have very different approaches to transmission, queuing, compression, authentication, confidentiality and so on.

Just as SHMIF consolidates a forest of IPC systems with an untapped common ground into one system around this common ground, A12 consolidates a forest of protocols into one. The history for this part spans many years, starting way before the naughty bits of protocol design and implementation.

The preparations cover everything from decoupling command line and shell from the terminal; due diligence decomposing expectations of features from existing solutions and their respective flaws; working in crash resilience to transition from local to networked operation and back; making sure it fits the security and accessibility story; that it is observable and debuggable; that it meshes with obscure input devices and as many varied windowing schemes as possible — on a foundation presented over a decade ago as a dissertation on the traditional and future role of networked display servers for surveillance and control in critical infrastructure.

This is far from the full ambition and scope of Arcan as a whole, but a substantial building block making the rest attainable and not just a pipe (world) dream. For the rest of this article I will cover some of the scenarios that the design is verified against, and match that to work that is either completed or close to completion.

This has been graciously sponsored by NLnet as part of their Zero Entrust fund.

With the networked desktop we make a few grand assertions:

  • The digital ‘you’ is the combined swarm of your devices — phone, smart watch, desktop, laptop, gaming devices, home server, network glasses, security token devices, note-taking eInk tablet, etc.
  • The digital ‘you’ persists as a story written in data captured and processed by these devices.
  • The digital ‘you’ is fluidly pieced together over an otherwise invisible communication substrate.

It is the ability to coordinate the swarm; control the sampling, accuracy and truthfulness of the data; to route the communication that, when combined, sets the perimeter of your digital agency. If you are unable to redirect-, delay- or deny- the communication; to intercept sampling; to erase storage or to re-purpose the devices — somewhere those parts of your digital story got surrendered, or were never yours to begin with.

The modern ‘cloud’ stitches these together in such convenient and seamless a way that it is easy to miss how it knows more about your digital you than you yourself — that you are confined to a stall rather than roaming the cyber-scented grasslands. This has been the game for a long time, even though that was not always the case.

I have referenced these talk in the past, but [27C3] Jeroen Massar ‘How the Internet Sees you’ and ‘[28C3] Stefan Burschka – Datamining for hackers’ both remain good reminders of just how visible things are from the ground up, and that is pre-‘Snowden-scare’ days. Very little has changed in this regard, and it looks like the aggressors are getting bolder by the minute.

This is all moody and gloomy, but that is not why we are here, now is it? No! we should try to improve the old and maybe create something new and reach higher heights or at least see different sights.

The Arcan play thus far has been to define and implement a user-scriptable control plane, and figure out a useful set of functions; a window manager in lack of a better word. This is not a particularly accurate one, so we use ‘appl’: not just an app, not really an application. This set has been repeatedly demonstrated to be both useful and sufficient for replicating anything that has been achieved elsewhere within the ‘desktop computing’ mental model — and then some.

Next up is the network communication substrate part, and why A12 is being forged. The current state is that the desktop ‘remoting’ case is all covered. We have a better reply to anything VNC / SSH / X11 / SPICE / RDP / Synergy / … has to say on the matter. This is a small yet important part of the story. For the rest of this article I will go through one of the means for linking the networked parts together.

Directory Server

The ongoing focus is an optional extension to the core protocol, which is the ‘directory server’.

The following diagram tries to cover how it fits together:

To describe the figure and the respective roles; source, sink, directory, appl, applgroup. The directory server act as a rendezvous point for discovery and connection negotiation, but also as a state store. It can act isolated or be linked/unlinked to other directory servers.

The directory hosts 0 to many arcan appls. Assuming a working arcan setup, running:

arcan-net arcan.divergent-desktop.org durden

Would connect to our public/test directory (hosted at openbsd.amsterdam), download, extract and run the ‘durden’ package. This can be kept on your device and handled offline, or it can retain the connection to the directory server and automatically sync to new versions of the appl as well as participate in the applgroup.

The configuration settings (state) of the appl gets its own per-user personal store on the directory server. The same store can also be used to submit debug reports and similar alternate data streams.

The following clip shows me running durden from the directory server as shown above. This is on a clean machine from the Linux console. I change the default display background colour and the mouse cursor to be very large just for something to be easy to spot in the video. Then I shut it down and re-issue the same command but with --block-state added. The desktop starts up with the default ‘calibrate/first-run’ config. I shutdown and repeat again, without the --block-state argument, and you can see how the previously directory-stored configuration returns.

Directory server side state persistence

The authentication and access control is only using the key pair that the client provided as “identity”, meaning that no tracking cookies or similar shenanigans are ever necessary. It is the client that explicitly push the state update and what this entails can be inspected or blocked at will.

Users with permission can also push an update to the appl itself. If that happens the changes would immediately sync to others currently running it.

In the following clip I run the ‘pipeworld’ appl on two machines from the same directory server. On the one machine, I open up the downloaded code, modify the default wallpaper and issue arcan-net --push-appl ./pipeworld arcan.divergent-desktop.org.

You can see how both machines trigger on the update, downloads and switches to the new version. The update cascade is atomic and optional. The effect is that I can live-develop, deploy, test and collect traces at the push of a button across a variety of devices.

As with all the clips here, they are recorded with fairly realistic network conditions: remote VPS, laptop tethered to a spotty mobile link and the desktop on a beefy wired connection.

Live appl editing and updating

Sourcing and Sinking

It is also possible to connect a source or another directory to a dynamic index. Others that are still connected would get notified, and can chose to try and sink it.

The directory will negotiate the connection between the source and sink based on network conditions, and proxy-forward the keymaterial needed for the two parties to establish an end to end encrypted connection.

With this, sharing a compatible piece of software can be as easy as:

A12_IDENT=qemu ARCAN_CONNPATH=a12://my.directory.server /usr/bin/qemu -display arcan disk.img

The above would expose a single arcan client as a source, and have it be used by a single sink at the time, reconnecting to the directory if the sink connection is severed.

Similarly, sinking the source is not more difficult:

arcan-net my.directory.server "<qemu"

If the network conditions makes the source unreachable to the sink, the directory can act as a tunnel, and if the source is currently not available, wait and get notified when it is:

arcan-net --keep-alive --tunnel my.directory.server "<qemu"

From an appl standpoint, there are dedicated APIs for repeating the same process. net_discover and net_open. Those are slightly more nuanced as they also have ways for discovering devices on the local network, or ones that have been tagged in the keystore with information of how to find them elsewhere. The ‘durden’ desktop appl comes with a tool using those functions.

This clip shows a test of me starting a client locally by browsing to a local video clip and launching it in a media player. I then migrating it to my directory as a drop action on the statusbar icon that was added when I connected to the directory. You can see the matching icon on the laptop reacting with an alert when the change is discovered. I request to sink it and it appears.

Moving a local media player client first to the directory, and then tunnel-sinking it to the laptop

Appl Group

Devices that are connected to a directory and running the same appl also get a local messaging group. The appl can leverage this to uni- and broadcast- short messages to other instances of itself, providing an easy building block for collaboration.

Code- wise, a minimal example would look something like this:

local vid = net_open("@stdin",
function(source, status)
    if status.kind == "message" then
        print(status.message)
    end
end)

message_target(vid, "message=hi there")

This example reads as ‘join the appl-group as “me” and send a message to everyone else. Others would receive “from=me:message=hi there”.

The following clip shows a testing tool for Pipeworld, running on both the desktop and the laptop. The tool adds a row for the directory server appl-group, and stores the message from users as a new cell on that row.

Messaging between two clients in the same appl-group

How this is leveraged is of course up to the appl itself.

Ongoing Work

Ongoing work in this area is using the key material from a previous directory-discovered source-sink pairing to re-bond and establish new connections without 3rd party involvement for the local area case.

This means that you can use your cloud hosted VPS as a rendezvous to discover devices, but then have them re-discover mesh networks locally without internet connections or dependencies to the likes of DNS.

More paranoid setups would include having your galvanically isolated attestation and build server inject the keymaterial into the base read-only / flashed firmware OS image and still allow individual devices to re-discover each-other in hostile hotel WiFi-like settings.

There are two big additions coming to the directory server soon.

Arcan supports user-defined namespaces. Normally these are used for providing a logical name mapping from some part of the file-system (e.g. /mount/thumb_drive to “my thumbdrive”) to be used by storage management services, cryptSetup and so on.

The logical extension to this is to have the directory server also provide a shared namespace for others attached to the same appl, as well as a private store so that you have somewhere to store data you create, encrypted and signed by default.

The second part is to let you slot in a server side set of scripts that can interface and govern the appl group, as well as dynamically launch and attach directory server hosted sources.

Other, more experimental bits concern how to query the network structure that emerges from letting others attach their own directory to another, either forming a distributed applgroup for load balancing and network performance or a dynamic mesh network of different hosted appls shared by friends, possibly compiled translations from other networked document formats.

By letting the directory server attached appl governing scripts opt-in to responding to search queries we get the mechanism for asking ‘where in my network did I store a picture or document that match this description, sorted and prioritised by my current location’ and so on.

Posted in Uncategorized | 1 Comment

The quest for a secure and accessible desktop

This article is an overview of accessibility and security efforts in Arcan “the desktop engine”: past, present and those just around the corner. It is not the great and detailed one I had planned, merely a tribute, presented in order to assist ongoing conversations elsewhere and as an overlay to the set of problems described by Casey Reeves here (linux:audio, linux:video).

Because of this the critique of other models is omitted but not forgotten, as is the firey speech about why UI toolkits need to “just fucking die already!” as well as the one about how strapping a walled garden handsfree to your face hardly augments your reality as much as it might “deglove” your brain — but it is all related.

What should be said about the state of things though is that I strongly disagree with the entire premise of some default “works for me”, a set of trapdoor features (repeatedly pressing shift is not a cue to modal dialog a ‘do you want to enable sticky keys?’ in someone’s face) and a soup of accessibility measures hidden in a setting menu somewhere. I consider it, at most, a poor form of ableism.

Due to the topic at hand this post will be text only. Short-links to the sections and their summaries are as follows:

Philosophy argues that accessibility is a broader perspective on how we attach and detach from computing both new and old.

Security extends the philosophy argument to say that security mechanisms and accessibility tooling need to evolve hand in hand, and not as a sacrifice of one quality in order to improve another.

Thereafter it gets technical:

Desktop Engine is a walk-through of the different layers of Arcan specifically from the angle of building more secure and accessible computing interfaces.

Client-Level covers the specific tools external sources and sinks can leverage to increase their level of cooperation towards accessibility.

Frameservers expands on role-designated clients for configurable degrees of protection and segment the desktop privilege level by the kind of data processing being made.

Shell and TUIs covers the middle ground from command-line shell and system management tools to composed desktop.

Examples: Durden – Pipeworld highlights preparations made for accessibility in a traditional desktop environment, and follows up with how it extends in a more unconventional one.

Trajectory briefly covers how this is likely to evolve and where the best opportunities to join in and contribute lie.

Philosophy

‘Secure and accessible’ is one of the grand narratives governed by the principles for a diverging desktop future and has been one since before the very first public presentation of Arcan itself. As such it is not a target of any one single principle, but a goal they all combine in order to deliver.

We view accessibility as a fundamental mission for computing. A rough mission statement would be ‘to increase agency, in spite of our stern and mischievous mother Nature’. It is for all of us or for none of us.

Whatever your preconditions or current state of decay happens to be, computing should still be there to grow the boundaries of your perceivable world. That world can be ever expanded upon, even if you are below 40 years of age and still believe yourself to be more than capable of anything — the mantis shrimp laughs at your crude perception of colours; a computer did not always fit in a pocket.

This reaches into both the future and into the past; gilded memories of youth and retrocomputing alike. Your current desktop and smart phone will eventually become dust in wind and tears in rain. Being suddenly rejected access to that past is a substantial loss and not just a mere hindrance — as the companies that to this day retains a win95 machine for their abandoned CNC control software would attest; some critical infrastructures still run on OS/2.

Computing history does not simply go away, it just becomes less accessible. This is why we chose to use virtualisation (including, but not limited to, virtual machines), not the toolkit, as the default ‘last resort’ model- and compartment- for compatibility that needs to be made accessible.

Security

A special topic with substantial overlap is that of security. Some have scoffed in the past at the idea of security having anything to do with the matter, yet if your threat model grows more complex directly through you or indirectly via friends or family performing sensitive work; managing access to substantial funds; having a tiff with the local ol’ drug peddler; fighting an abusive spouse or spotting the wrong criminal in the act — you will find out first hand how maintaining proper operational security (“opsec”) absolutely forms a substantial disability in a digitally traceable society where targeting information is ridiculously cheap to come by.

A trivial symptom of someone failing to understand this property are sprinkles of ‘we have just given up, here is a deliverable oh great product owner’ kind of dialog boxes: “Do you allow the file manager to access the file system yes/no/potatosalad?”, “The thing you just downloaded to do things with your camera did not pay us to pretend to trust them on your behalf and is trying to access your camera, which we reject by default to inconvenience you further; exit through the app store. To override this action please go to settings/cellar/display department, press flashlight and do beware of the leopard. We reserve the right to remove this option at any point in the future”.

On top of how such measures struggle with accessibility tooling as they, by their very nature and intent, can automate things some lobotomised designer thought should be exclusively interactive, the often ignored ‘Denial of Service‘ attack vector is really important, if not paramount.

This is because you do stupid things when put under wrong kinds of stress and pressure, hence the phrase ‘being under duress’. Stripping someone of agency at choke points is a tried and true tactic to achieve that. With tech advancing, integrating and gate-keeping just about everything it also provides great opportunity to deny someone service.

Revoking features disguised as gavaged-fed updates, reducing interaction or presentation options or dropping compatibility in the name of security is a surefire way of signalling that you do not actually care about it, just pretend to. These are properties that you develop in tandem or not at all.

Desktop Engine

Arcan is presented as a ‘desktop engine’ and not as a ‘display server’ for good reason. In this context it means to consolidate all the computing bits and computer pieces useful when mapping our perception and interaction into an authoritative behaviour and permission model, one entirely controlled by user provided scripts.

Such scripts can be small and short and just enable the bare essentials, as shown in Writing a console replacement using Arcan (2018) as well as reach towards more experimental modes like for VR (Safespaces: An Open Source VR Desktop, 2018). An exclusively aural desktop with spatial sound is fairly simple to achieve in this model, as is one controlled exclusively through eye movement or grunts of arousal, displeasure or both.

This time we are only concerned about how the desktop engine relates to the accessibility topic. To dive deeper, there already is Arcan as Operating System Design (2021) for a broader picture and Arcan versus Xorg – Approaching Feature Parity (2018), Arcan versus Xorg: Feature parity and Beyond (2020) for the display server role.

The core engine has the usual bells and whistles; an OS integration platform; media scene graph traversal; scheduler; scripting runtime; configuration database; supervisory process and interprocess communication library. Most other tools, from network translation and others come through separate processes using this interprocess communication library.

By itself it does nothing. You load a bunch of scripts that has some restrictions on what things are named and how they can access secondary resources like images. This is referred to as an ‘appl’ – more than an app, less than a full blown application. The scripting API is quite powerful, with primitives comparable with that of a web browser but thankfully layered on Lua instead of Javascript.

The end user- or an aggregate acting on their behalf- selects and tunes the active set of scripts. For the desktop perspective, this becomes your window manager. The set is interchangeable at runtime, sources- and sinks- (“clients”) adjust accordingly.

The user can augment the appl they chose to run with hook scripts, intended for agnostically adding specialised modifications of common functions to support exotic input devices and custom workarounds basically aggregating your desktop ‘hacks’.

The appl may open named ‘connection points’ (role like ‘statusbar’) to which external sources and sinks attach with an expressed coarse grained type (e.g. game, sensor or virtual-machine). Sources and sinks can work with both video as well as with audio. They also have an inbound and an outbound event queue. This ties in with the interprocess communication library mentioned above. The relevant parts of the event model is covered further below in the ‘Client Level’ section.

Connections over these named points can detach, reattach and redirect. By doing so the system can recover from crashes and transition between local and networked operation transparently.

This has been demonstrated to be sufficient even for fairly complex assistive devices. For more on that see the article on Interfacing with a ‘Stream Deck’ Device (2019) and on experimenting with LEDs (2017).

The most successful application to date has not been the use as a desktop, but similar to the setup of AWK” for Multimedia (2017) — for industrial machine vision verification and validation setups with hundreds of thousands of accumulated hours with multiple concurrent sources at high frame rates (1000+Hz) and real-time processing, which covered substantial portions of the overall project development costs.

Client Level

As mentioned in the engine section we have the concept of Window-Manager (WM) initiated named connection points (role) to which a client connects and announces a primary type.

These connection points are single-use and need to be explicitly re-opened to encourage the WM to differentiate and rate-limit. This works as a counter measure to Denial of Service through resource exhaustion. It is also one of the main building blocks for crash recovery (2017) and network migration (2020).

The WM can substitute connection point for named launch targets defined in a locked down database. This does not change the actual client code, but provides a chain-of-trust for sensitive specialised clients such as input drivers for input method engines and special assistive devices like eye trackers.

The WM combines connection point and type to pick a set of ‘preroll’ parameters. These parameters contain everything that is supposedly needed to produce a correct first frame.

This includes:

  • Preferred ‘comfortable visible font size’, font, hinting and possible fallbacks for missing glyphs.
  • Target display density and related properties for precise font rasterisation.
  • Preferred GPU device and access tokens.
  • Fallback connection points in the event of a server crash or connection loss.
  • Input and Output language preferences as per ISO-3166-1, ISO-639-2.
  • Basic colour palette (for dark mode, high contrast, colour vision deficiency).

Any and all such parameters can also be changed dynamically per client.

The client now has several options to announce further opt-in capabilities on top of a regular event loop:

  • labelhints are used to define a set of input tags that the WM can attach to input events, as a way of both communicating default keybindings and to let the WM map them to both analog and digital input sources. T2S can leverage this and say, for instance, ‘COPY_AT_CURSOR’ instead of CTRL+C.
  • contenthints are used to tell how much of available content is currently visible. This lets the WM decorate or describe boundaries and control seeking instead of clients embedding notoriously problematic controls like scroll bars that take precise eye-hand-coordination, good edge detection and has poor discoverability with the current ‘autohide’ trends.
  • state/bchunkhints are used to indicate that it can provide alternate data streams from a set of extensions (clipboard and drag and drop are just specialised versions of this), and if the exact current state can be snapshotted and serialised for rollback/network migration/persistence.

The WM now has the option to request alternate representations. This was used to implement a sudo-free debugging path in Leveraging the “Display Server” to Improve Debugging (2020) and the article eludes to other purposes, such as accessibility.

While X11 and others have the idea of a client creating a resource (like a window) and mapping/populating it and so on – Arcan also has a ‘push’ one. The WM can ‘send’ a window to a client. If the window gets mapped/used, it acts as an acknowledgement that the client understands what to do with it. These are typed. In the linked article, the ‘debug’ one was used (though with some extra special sauce).

There is another loosely defined one of ‘accessibility’. It signals that the client can provide a representation of the source window that is reduced to a ‘no-nonsense’ form. Thus you can have a main window with the regular bells and whistles of animated dynamic controls and transitions while also providing a matching secondary output which carries only the attention deficit, text-to-speech friendly and information dense part.

Frameservers

While the scripting itself is in a memory safe environment, it is by definition in a privileged role where caution is warranted regardless — memory is never safe, wear a helmet. For this purpose, traditionally dangerous operations such as parsing untrusted data is delegated to specialised clients referred to as frameservers. This also applies to transfers carrying potentially compromising data, e.g. desktop sharing or a regular image encode where someone might be tempted to embed some tracking metadata.

Frameservers come in build-time defined sets, designated to perform one task and be terminated with predictable consequence and repeatable results. The engine can act as the processing they provide is guaranteed. They get specialised scripting API functions to be easier to use for longer complicated processing chains.

Frameservers are chainloaded by a short stub that sets up logging and debugging; system call filtering and file system namespacing; hypervisors for hardware enforced separation all the way to network translation for per-device compartments — pick your own level of performance to protection trade-offs.

The decode frameserver takes an incoming data stream, be it an image file, input pipe or usb camera feed and outputs a decoded audio or video stream. This is also used for voice synthesis – the data stream would be your instructions for synthesis.

The encode frameserver works in the other direction: a human-native input stream to compute-friendly representation. Video streaming is an obvious one, but of more interest here is OCR — region of image interest goes in, text comes out.

Combine these properties and we have covered most bases for uncooperative forms of “screen reading”.

Shell and TUIs

One of the longest running subproject in Arcan has been that of arcan-tui (dream, dawn, day:shell, day:lash) – a migration path away from terminal emulators to better text-dominant user interfaces which follows the same synch, data and output transfer as everything else without the terminal-emulator legacy.

On top of exposing the CLI shell to the features mentioned in Client Level above, and the prospects of getting clean separation between the outputs and inputs of shell tasks, text is actually rendered server-side.

This means that the WM understands the formatting of each line of text and not merely as a soup of pixels or sequence of glyphs. It knows where the cursor is and where it has been and box drawing characters are cell attributes and not actually part of the text. Any window, including accessibility, can decide to provide this ‘TPACK’ format, deferring rendering to the last possible moment. With that using better composition tactics comes cheap. A big win is using signed distance field forms of the glyphs to let one zoom and pan at minimum cost with maximum sharpness.

Just as the WM model is defined entirely through scripts, so is the local job control and presentation for the command-line shell. Not being restricted to terminal emulation allows for discrete parallel output streams, prompt suggestions and event notifications. These different types can all be forwarded and rendered or played back separately, with different voices at different amplitude and head-relative position.

Examples: Durden and Pipeworld

Returning to the Appl “WM” level, we have several ready to build and expand on. The longest running one, ‘Durden’ has been the daily driver workhorse for many years by now, and is the closest to reaching a stable 1.0 release, although documentation and dissemination lags behind its actual state.

While first a humble tiling window manager, it has grown to cover nearly all possible management schemes, including some quite unique ones. What is especially important here is that it is organised as a file system.

Every possible action (and there are well over 600 by now) has a file system path with a logical name, such as /global/display/displays/current/zoom/cursor, a user intended label, a long form description and rules for validating and setting reasonable values. Keybindings, UI buttons, customised themes and presets are all just references to such paths. This makes it friendly to audio-only discovery, nothing in the model gets hidden behind a strictly visual form. Even file browsing attaches itself onto the same structure.

It is the best starting point for quickly testing out new assistive input and mixed vision methods, and has quite a few of them baked in already. Examples on the internal experimentation board that has yet to be cleaned up for upstream are edge detection and eye tracker guided mouse warping and selection; content update aware OCR and content-adaptive luma stepping to catch ‘flashbangs’.

The upcoming release has text to speech integration added, with the t2s tool supporting dynamic voices, and assigning different roles to different voices.

Even more experimental is the dataflow WM, Pipeworld. Although the introduction post (2021) painted a broad picture with more fancy visual effects and its zooming-UI, it has real potential as an environment for the visually impaired for a few reasons:

  • All actions yield a row:column like address, and the factory expression that sprung them into place is kept and can and modified/repeated.
  • It is 2D spatial, there is a cursor relative ‘above’, ‘below’, ‘left’, ‘right’ that can map to positional audio for alerts and notifications and a distance metric can be used for attenuation.
  • As with durden, there is a file system like view of all actions, but also a command-line interface for any kind of processing or navigation.
  • Each row and cell has magnification-minification as part of the zooming user interface, with zoom-level guided filtration.

Trajectory

The current boom in machine learning based video/audio analysis and synthesis fits like a glove with the frameserver approach to division of responsibilities. While current OCR is based on Tesseract and the voice synthesis uses eSpeak, each integration is in the 100-200 lines of code in a stand alone binary that can be swapped out and applied without restarting.

Adding access to more refined versions with ‘your voice’, transcription, automated translation or image contents description that then apply across the desktop is about as easy at it gets. Such extensions, along with packaging and demographic friendly presets would be suggested points where the project can be leveraged for your own ends without the time investment it would take to get comfortable with all the other project components.

While the current topic / release branch is focused on the network protocol and its supported tooling, when that is at a sufficiently high standard, we will return to a number of features with impact for accessibility and overlap with virtual reality.

Examples in that direction would be individually adapted HRTFs and better head tracking for improved spatial sound and better handling of dynamic audio device routing when you have many of them.

A more advanced one would be indoor positioning for ‘geographically bound computing’ with information being stored, ordered, secured and searched based on where you are – as a trivial example, my cooking recipes are mainly accessed in the kitchen, my doomsday weapon schematics in my workshop and my tax and banking in my home office.

Posted in Uncategorized | Leave a comment