Leveraging the “Display Server” to Improve Debugging

I spend most of my time digging through software-in-execution rather than software-at-rest (e.g. source code). Sometimes the subject of study is malware hissing like a snake and lashing out at the barriers of a virtual machine; sometimes it is terrible software deserving of an exploit being written; sometimes it is a driver on a far-away device that flips bits when a clock got skewed or circuits gets too hot — most of the time it is something between these extremes.

Many years ago, a friend and I distilled some thoughts on the matter into the free-as-in-beer ‘systemic software debugging‘. Maybe one day I have enough of both structure and incentive to revisit the topic in a less confined setting. Until such a time, it might be useful to document some experiments and experiences along the way, which  brings us to the topic of this post: ways of using the display server infrastructure and its position in the user-space stack, to reason about- and bootstrap- debugging. 

While that may come off as a bit strange, first recall that the”Display server” is a misnomer (hence the “”): the tradition is that it, of course, serve much more than the display. In the majority of cases you would also find that they ‘serve’ user-initiated blob transfers (‘clipboard’ and ‘drag&drop’) as well as a range of input devices (keyboard, mice, …). In that sense a terminal emulator, web browser and virtual machine monitor rightly fall into this category. What I try to refer to is the IPC system or amalgamation of IPC systems that stitch user-space together into the thing you interact with.

The article is structured around three groups of problems and respective solutions:

  • Activation and Output
  • Chain of Trust
  • Process and Identity

Each explaining parts of what goes on in the following video:

The central theme is that the code that comes with the IPC system (xlib, xcb, arcan-shmif, …) can be used to construct debugging tools from inside the targeted process, and piggyback on the same IPC system for activation, inputs and information gathering/output.

The gains is that you get a user friendly, zero-cost until activation, high-bandwidth, variable number of channels to collect process information, that can cooperate with the client and let it provide its higher level view of its debug state, while at the same time add custom primitives with few to none additional allocations post-instrumentation. 

Context

To add some demarcation, the more interesting target here is live inspection of mature software as part of a grander system – far away from the comforts of a debug build with click-to-set breakpoints launched from the safety of an integrated development environment. The culprit is not obvious and the issue might not be reliably repeatable.

Your goals might not be to fix the issue, but to gather evidence and communicate to someone that can.

Also bear in mind that this is not directly about “the debugger” in the sense of the nuclear powered Suisse army knife that is also known a ‘symbolic debugger’ tool such as ‘gdb’ and ‘lldb’, but rather the whole gamut of tooling needed to understand what role a piece of software fulfils and how it is currently functioning within that role.

Think how the ‘intervention’ friendly version of this intimidating chart from Brendan Gregg’s post on Linux Performance would look like for the ‘applications block’ and you get closer to the idea:

Activation and Output

This first group of problems covers software that wants to cooperate by adding features now that may be useful for debugging later. Some refer to this as ‘pre-bugging’.

Consider the notion that you are a responsible, pro-active developer. You understand the need for others to inspect what your application or library is doing, and that there are things in the execution environment you simply cannot account for while remaining sane and getting things done. You want to make it easier for the next one in line, and get higher quality feedback about what went wrong out there in the field.

What are your normal practical options?

  1. Command-line argument
  2. Environment Variable
  3. Specific User Interface toggle

These are all problematic, though in somewhat different ways. 

With the first two options you have the problem of communicating that the feature is available (how will the user discover it, how will you remember it is there) – README.md, man pages, FAQ/Wiki, ancients words of wisdom spray painted on a live chicken, and so on – something need to announce that the option is there.

Options 1 and 2 are also quite static; they get set and evaluated when the program is started, and if you want to activate debug output dynamically, well, tough luck. Your problem then needs to be reproducible both with- and without- debug output enabled.

The actual output also comes with noticeable system impact – sending strings to STDOUT, STDERR may break (introduce new bugs) other processing steps the user might have, common in the traditional UNIX pipes and filters structure. They are also not necessarily ‘dumb pipes’, isatty() is very much a thing (as is baud rate), as is threading. The combination of these properties makes for an awful communication channel to try and multiplex debug output on.

The other option, writing to a log device or file can clog up storage, wear down flash storage, and inadvertently leave sensitive user information around. Formatting the output itself is also a thing to consider, even ‘simple’ printf has some with serious gotcha’s (read up on locale-defined behaviour) and information that is better presented as changes over time than a stream of samples will need other tools to be built for post-processing. 

Option 3 involve quite a lot more of work to implement, as the feature should mesh with other user interface properties. When properly done however, it can add quite a lot of power to the software – look no further than the set of debugging tools built into Chrome or Firefox. Of course, it helps that these are also development environment in and of itself to incentive the cost and effort. While often a better option, it still composes poorly with other non-cooperative information collection.

Post-mortem (crash-dump) is a slightly different story and one that calls for a much longer discussion. This is out of scope for this article, though the primitives that emerge will work as both a transport for core dumps and as a way of annotating them, but is a decent follow-up topic.

Enough with the problem space description, where does the display server fit in?

In the most trivial of window systems, a client connects somehow, requests/allocates some kind of storage container (window), draws into it either directly as characters or pixels, or indirectly through drawing commands to the server itself or through an intermediary (like the GPU). In return, the client gets input events, clipboard actions and so on back over the same or related channel. This holds true even for a terminal emulator.

These windows may come with some kind of type model attached; indirectly through the combination of properties set (parent window, draw order) and directly through some other type tag (X11, for instance, has a long list of popups, tooltips, menus, dnd, and so on), and arcan has a really (too much actually) long one.

Step 1 – Add a debug type

This allows other client agnostic tools to enumerate windows of this type, compose them together and record/stream/share. Controls for activation are there, as well as a data channel that is high bandwidth, capable of cleanly multiplex multiple data types.

Step 2 – Two-sided allocation

Now for the more complicated step. Recall the problem of saying when debugging features are needed (command-line or environment). The ideal approach would be initiated at user request during run time, with zero cost when not in use.

This is more complicated to achieve as it taps into the capabilities of the IPC system and its API design. A simplified version would be possible in the context of X11 as a notification message about a debug window being requested, then let the client allocate  a window with the right type. This still leaves us with some of the drawbacks of the third option, namely what to do with uncooperative clients, which will be covered in the next section.

Now the first part of the video is explainable – The heart-emoji button in the title-bar simply sends a request to the client that a debug window is wanted. The client in question (the terminal emulator that comes with Arcan) responds by creating a new window, then renders relevant parts of the emulator state machine.

Chain of Trust

After the trace- like outputs of getting debug data out of a client, the more complicated matter comes with proper debug interfaces that also provides process control, resource modification and so on; interfaces that do not rely on the client actually cooperating.

Assuming we have a process identifier to the client in question (also a problem, see the last category, identity, for that). Lets try and attach a debugger. The normal way for that is firing up an IDE or terminal, with something to the effect of:

# gdb -p SWEETPIDOFMINE ptrace

Only to be promptly met with something like:

Operation not permitted

Low level debugging interfaces tend to be ‘problematic’ in every conceivable way.  If you are not convinced, read the horror show of a manpage that belongs to ptrace if you have not done so already; Search around a little bit and count the number of vulnerabilities it has had a leading role in. Study the role of J-Tag in embedded system exploitation. My point is that these things are not designed as much as they spring in to life as a intermittent necessity that evolves into a permanent liability. Originally, the /proc (PDF, Processes as Files) filesystem was proposed as a better design. One can safely say that it did not turn out that way.

So what is at fault in the example above? 

To lessen the damage, and to make malware authors work for it a little bit, Linux-land has YAMA (and other, equally intimidating mechanisms) which imposes certain rules on who gets to attach as a tracer depending on what is to be traced and when. You can turn it off universally – a universally bad idea – by poking around in procfs.

You can also use the ‘I don’t have time for this’ workaround of prefixing ‘gdb -p’ with the battering ram that is sudo. My regular stance on anything sudo, suid, polkit etc. is that it is an indicator of poor design somewhere or everywhere. Friend’s don’t let friends sudo. From power-on to regular use, there should not ever be a need or even an implementation for converting a lower privilege context to a higher one. Any privilege manipulation a client should have access to is reducing towards the least amount of privileges. You should, of course, have means to place yourself where you (or your organization) like in the chain of trust and have the controls to reduce from there – but I digress.

The problem with the YAMA design from the ptrace perspective is that you are practically left with no other choice. Given that your lower privilege client (gdb) now gets escalated to a much higher one, then attached to parse  super complex data   from a process that you by the very action indicate that you don’t comprehend or trust is a fantastically poor idea.

So what to do about the situation? Well there are other rules to modify ptrace relationships. Normally, a parent is allowed to trace its child – and that is how gdb attaches to begin with, but that does not work in the attach case.

Enter the subtleties of prctl, and now we come into the Fantastic (debug) Voyage of Isaac Asimov-fame part of the adventure.

From the last section we got the mechanisms for negotiating a debug control channel initiated from the display server. Now the extension is a bit more complicated, and this is one place where arcan-shmif has the upper hand. Instead of just sending an event saying ‘could you please create a window of type X’, we have the ability to force a window upon a client.

The shorthand form of what happened in the demo was roughly (pseudo-Lua):

on_click:
local buf = alloc_buffer(width, height)
target_alloc(client_handle, buf, event_handler, "debug")

This actually creates a full IPC channel, and sends it to the client over the existing one, with type already locked to DEBUG.

This is a server-side initiated signal of user-intent. The same mechanism is used to request a screen-reader friendly view for accessibility options, and it is used for initiating screenshots, video streams, … (the later angel is covered in the article on interfacing with a stream deck device). 

The client gets it as part of its normal event loop (see also: the handover part in the trayicon article). Pseudo-code wise, the important bit is:

case TARGET_COMMAND_NEWSEGMENT:
if segment_type == "debug" or does_not_match(request):
debug_wnd = arcan_shmif_acquire(...)

But what happens if the client does not actually map it? 

Well, if you look at this part of the video, the second window is populated from within the client process, but without cooperation from the client.

The IPC library code knows if the client mapped an incoming window or not. If it didn’t, the library takes it upon itself to provide an implementation. The current one is a bit bare, but there is a lot of fun potential in this one – pending a much needed primitive for ‘suspend all threads except me’. This is quite close to a ‘grey area set’ of techniques known as process parasites

Now, we can use this to spin off a debugger, though it is neither clean nor easy.

Prctl has a property called ‘PR_SET_PTRACER’. This allows a process to specify another process that will be allowed to attach to it via ptrace. The naive approach here would be to fork out, set the tracer to the pid returned from fork(). It also would not work, for multiple reasons.

One is that gdb then lacks a way to draw, and distinguishes from stdin/stdout being TTYs or not. Luckily enough we have a terminal emulator that can inherit an arcan-shmif debug window and use as its display. 

So the hidden debug interface uses the debug window to request another debug window that gets inherited into the terminal and used to set up the drawing and emulation for gdb to work.

The experienced POSIX connoisseur will see the chicken and egg problem here; the process to be debugged needs to wait until it knows the PID of the debugger process, in order to set the permitted tracer, and the debugger needs the PID of the target process in order to attach.

So the current solution is:

  1. Create two pipe pairs [parent to debugger, debuffer to parent] and inherit the appropriate ends.
  2. Have the terminal emulator stop before exec()ing the child process, and write back the ‘to become debugger’ PID back over the pipe. Block on read.
  3. (In parent/debug target) receive the PID, set prtctl, write the own process PID back.
  4. The child process received the trace-target PID, adds it to the arguments and continues exec() into the debugger.
  5. Profit.

The clip below shows it in action:

If the tool to launch this way does not come from a controlled context because it, in turn, spawns new processes where some subprocess needs to perform the attach action it becomes even more masochistic. In that case, one would need to repeat this dance by ptracing the chain of children until a child attempts to call ptrace, and then perform the setup.

Now we have communication channels and bootstrapping ptrace-tools, retaining chain of trust and negotiated over the display server, initiated by the user. The tools have access to APIs that can break free of the terminal emulator shackles. Good building blocks for more advanced instruments.

Process & Identity

Procfs can be used to explore how the operating system perceives a specific process, which resources are currently allocated and so on. In order to do that, some identifier is needed. So the trick question to start this off, is how do you figure out the process identifier of a process tied to an X window?

A quick search around and we get this (stack overflow suggestion):

xprop _NET_WM_PID | cut -d' ' -f3

The problem is just that this does not really work. The atom, _NET_WM_PID is based on the client being nice enough to provide it, and nice enough to provide the real one and not just some random nonsense, like the pid of the X server itself – fun basic anti-debugging.

Even if the process ID is retrieved is truthful and correct, it might not be the case when you start reading from its proc entries – it is inherently race:y.

In modern times, this problem does not get easier when we take containers and other para-virtualization techniques into account where the view of the outer system and its resources from the perspective of a process is vastly different from other processes.

On the other hand, if the code collecting the data runs from within the target process itself, the /proc/self directory can be relied on. There are a number of really interesting venues to pursue; look at the debug-if implementation in the code-base for some of those ideas, or ping me if you are interested in chatting about these things.

For the remainder of this article, we will settle in the bits implemented thus far, which brings us to the last part of this video.

The bits showcased here is that we can open and modify environment variables, as well as enumerate the currently open file descriptors – and even make a copy if their contents from the current state. The HUD menu that appears in the outer WM is  server-side file-picking, and the hex-editor that appears is one of the standard arcan-tui widgets, bufferwnd.

What is not showcased is that we can spawn out a shell from the context of this process for normal interactive exploration, and redirect/intercept/man-in-the middle intercept pipes and sockets (with live editing through the hex editor).

Rounding things off, two powerful venues around the corner here is that:

  1.  When combined with dynamic network transparency/redirection, we have an easy to use and lightning fast way of setting up multiple attached troubleshooting tools and ‘one-click’ share them with someone else.
  2. Since these primitives can nest by ‘handing over’ new window primitives, we get ways of building dynamic hierarchies reusing the carrier for multiple types of data – think in the context of a whole-system virtualization like Qemu. All layers [Qemu itself], [Guest OS shell], [Application inside guest].

If you use Arcan, that is 🙂

Posted in Uncategorized | Leave a comment

Interfacing with a ‘Stream Deck’ Device

Continuing the series on using the various Arcan APIs, we get to another use case that works a bit differently here than elsewhere. What makes it interesting enough for a post is how the low and high levels fit together for a device class that challenges normal boundaries on what are valid input and output devices.

Before we start, here are some references to the previous articles in the series:

  1. https://arcan-fe.com/2019/03/03/writing-a-low-level-arcan-client/
  2. https://arcan-fe.com/2019/05/07/another-low-level-arcan-client-a-tray-icon-handler/

The lucky device in question will be an “ElGato Stream Deck“, which is basically a low resolution budget display with an overlay of translucent buttons and a thin input grid layer that captures button presses.

Operating on the ‘mechanism, not policy’ principle, the tools we will be writing are not strictly bound to this kind of hardware. You can run it inside a Raspberry Pi with a touch display and use arcan-net to run the client remotely and you get quite the nifty visual remote controller. Why not bind it to a “display-pad” secondary small display that some recent laptops replace the trackpad with, or use the output as a model texture in safespaces to remove the need for most of the keybindings.

Here is a video of the results of this walkthrough in action, mainly showing the produced output from a short script being mapped to the display:

Demo of the arcan-streamdeck driver connected to a sample application

The (low level) repository can be found here: https://github.com/letoram/arcan-devices

There is also a high level tool that has been added to Durden(streamdeck) for these kinds of displays. It uses the same low level code, communication and setup as we will go through here. The rest of this tool is a bit more complicated, but for good reason; it can handle multiple devices of different display sizes and cell dimensions; it exposes dynamic WM state and live contents preview; titlebar decoration buttons; client announced icon bindings; and a ton of other features. It can basically be used as a very competent accessibility tool.

This video shows that tool in more real action:

Demo of the ‘streamdeck’ tool in Durden connected to the ‘arcan-streamdeck’ driver

Note how the buttons relayout based on the window that gets selected. The titlebar decoration buttons gets moved to the display, and if the client that is tied to the selected window announces any custom key inputs, those are added to the list as well. Window contents and workspace previews also gets real-time mapped to the display and can be used for window selection.

We will skip the build system and USB/reversing specific part, but basically each button is updated by sending a chunked raw bitmap image with a header that specifies button index and chunk, and inputs are received as reports with a byte per designated key.

For the Arcan side, we need both the low-level code that maps to the device, and a high-level script that defines the client behaviour. In this article, we will jump between the two as needed. The low-level parts will be prefixed with “Client-side” and the high-level with “Server-side”.

(Client-side) We start with the normal connection dance:

struct arcan_shmif_cont C = arcan_shmif_open(
    SEGID_ENCODER, SHMIF_ACQUIRE_FATALFAIL, NULL);
hid_device* deck_device = hid_open(0x0fd9, 0x0060, NULL);

Note the SEGID_ENCODER type. This tells the server side that the client wants to receive audio/video data rather than to provide it. Just building and running this would yield complaints that hey, it couldn’t connect anywhere. Even if the ARCAN_CONNPATH environment variable would point to a valid connection point, it is very likely that the listener on the other end rejects attempts at creating an encoder.

To remedy this, we need to specify the high-level reaction. Normally, we can write or modify an Arcan appl specifically for this, but this time we will use another nifty engine feature, and that is ‘hook scripts’.

Hook scripts are generic scripts that can be user-enabled, regardless of the appl:

arcan -H "hooks/streamdeck.lua" /path/to/my/appl

This would prompt the engine to first start the application as normal, but when it has finished executing the entry point, the hook script will be loaded. These scripts are located in the ‘system-scripts’ namespace, defaulting to /usr/share/arcan/scripts and redirectable via the ARCAN_SCRIPTPATH environment variable.

(Server-side) The skeleton of our hook script (stremdeck.lua) will start like this:

local function open_streamdeck_cp(name)
    local maxw = 72 * 5
    local maxh = 72 * 3
    target_alloc(name, maxw, maxh,
        function(source, status)
            if status.kind == "terminated" then
                delete_image(source)
                open_streamdeck_cp(name)
            end
        end
    )
end

open_streamdeck_cp("streamdeck")

This reads as:

  1. Open a connection point called streamdeck, tied to a (72 * 5), (72 * 3)’ buffer
  2. When that connection terminates, re-open the connection point

Had the “open_streamdeck_cp” call been moved to the state where status.kind == “connected”, multiple clients would be allowed to hook up and receive data over this connection point. This time, we enforce a single active client at a time – but the trick of forcing manual reactivation of a connection point makes it trivial to rate-limit connections and prevent clients from causing stalls or staging denial-of-service like attacks.

The code above still behaves like ‘normal’ clients, even though this client wants to be fed video. Running everything and stitching together like this:

ARCAN_SCRIPTPATH=. arcan -H "streamdeck.lua" console &
git clone https://github.com/letoram/arcan-devices
cd arcan-devices/streamdeck ; meson build ; cd build ; ninja
ARCAN_CONNPATH=streamdeck ./arcan-streamdeck

would still yield nothing of interest. We need to tell the engine what to forward to the client, and how. No external client is permitted to have a working primary segment acting as an encoder for security (confidentiality) reasons, as there is no visual identity for a user to make an informed judgement about what to forward to any specific client. The exception has traditionally been the ‘encode’ frameserver, as that is spawned from the server side where the chain of trust would remain intact. Even then, it is the server side (through the window manager) that specifies what an encoder gets to see.

The normal way of allowing a client to receive audio/video is through a new subsegment established over the primary. Scripting wise, this is initiated server-side through calling define_recordtarget on an identifier referencing a client segment; client side receives a new segment event and maps it by calling arcan_shmif_acquire.

There is an exemption to this, which we will leverage here. The internals of ‘define_recordtarget‘ is that an offscreen rendertarget (a so called ‘FBO’) is created, with an automated or scripted refresh trigger. The contents can be forwarded to an internal or external recipient whenever there is an update. With the recently added ‘rendertarget_bind‘ function, we can swap out the recipient of a rendertarget with the handle of our connected client.

(Server-side) First, we hook up a little check to see that the connected client is of the right type.

if status.kind == "registered" then
    if status.segkind ~= "encoder" then
        delete_image(source)
        open_streamdeck_cp(name)
    end
    build_rendertarget(source, maxw, maxh) -- to implement
end

(Server-side) Then we implement this ‘build_rendertarget’ function:

local function build_rendertarget(source, w, h)
    local buf = alloc_surface(w, h)
    local img = color_surface(w, h, 0, 255, 0)
    blend_image(img, 1.0, 100)
    define_rendertarget(buf, {img})
    rendertarget_bind(buf, source)
    link_image(buf, source)
end

This should be quite straight forward; first we allocate a buffer that will fit our offscreen rendering. Then we allocate a single coloured image that we fade in over the course of (100 / 25 = 4) seconds. Then we create the rendertarget with this image attached. Lastly comes the magic that makes the source act as a recipient of the rendertarget contents, as well as a life-cycle trick that makes sure buf gets deleted automatically when source is destroyed.

Now time for the client side steps. We add a normal event loop:

for(;;){
    struct arcan_event ev;
    if (arcan_shmif_poll(&C, &ev) > 0){
        if (ev.category != EVENT_TARGET)
            continue;
        if (ev.tgt.kind == TARGET_COMMAND_EXIT)
            break;
        if (ev.tgt.kind == TARGET_COMMAND_STEPFRAME)
            consume_video(&C, deck_device);
    }
}

Whenever a STEPFRAME event is received, the video buffer pointed to inside of the arcan_shmif_cont structure has been updated, and will be held in that state until released through arcan_shmif_signal(&C, SHMIF_SIGVID);

The code in the repository is more complicated as it handles repacking into checksummed tiles and sending those that have changed to the USB device. Reprinting that here would just add noise, so a simple loop implementation for consume_video would be:

for (size_t y = 0; y < C->h; y++)
  for (size_t x = 0; x < C->w; x++){
    uint8_t r, g, b, a;
    SHMIF_RGBA_DECOMP(C->vidp[y * C->pitch + x], &r, &g, &b, &a);
/* do something with r, g, b, a */
  }
arcan_shmif_signal(C, SHMIF_SIGVID);

Video covered, time for the input routine. The astute reader might have noticed that the shmif poll loop would always spin and never block. This is because we will use libusb for a crude form of blocking here.

Patching the loop from before, we get this:

int old_mask = 0;
for(;;){
  uint8_t inbuf[64];    
  ssize_t nr = hid_read_timeout(deck_device,
                                inbuf, sizeof inbuf, 64);
  if (-1 == nr)
      break;

  int mask = get_mask(inbuf, nr);
  if (old_mask != mask){
      update_mask(C, old_mask, mask);
      old_mask = mask;
  }
  ...

We will get a single report in inbuf, so just enumerate the number of bytes read, and map each to a bit in the mask (get_mask). The last Arcan piece will be in update_mask:

static void update_mask(struct arcan_shmif_cont* C,
                        int mask,
                        int changed)
{
    int ind;
    struct arcan_event ev = {
        .category = EVENT_IO,
        .io = {
            .devkind = EVENT_IDEVKIND_GAMEDEV,
            .datatype = EVENT_IDATATYPE_DIGITAL
        }
    };

    while((ind = ffs(changed)){
        ev.io.subid = ind;
        ev.io.input.digital.active = !!(changed & (1 << (ind-1)));
        changed = changed & ~(1 << (ind - 1));
        arcan_shmif_enqueue(C, &ev);
    }
}

What happens above is simply that we enumerate the bit positions that have changed since last time, translate its position to a numeric 0-index, and mark it as pressed or released. The io substructure in the arcan_event structure is quite complex as it contains some legacy left-overs, and encompasses a really wide range of devices. The Device-kind ‘GAMEDEV’ used here is the catch-all that does not cover other established input device types (keyboards, mice, touch panels, eye trackers, …).

For the last step, we should extend the input handler in the Lua script as well. If the callback set when issuing target_alloc is changed to contain this:

function(source, status, input)
    if status.kind == "input" and _G[APPLID .. "_input"] then
        key = translate_input(input)
        if key then
            _G[APPLID .. "_input"](key)
        end
    end
-- code from previous sections goes here
end

The missing translate_input function should filter out unwanted input events so that the client does not get free reign to inject whatever it wants into the main input handler of the running appl. Recall that we are writing a hook-script that can be forced into any appl.

The implementation of translate_input then:

local function translate_input(input)
    if not input.digital then
       return
    end
    local keyind = 97 + (input.subid % 25)
    return {
        kind = "digital",
        digital = true,
        devid = 0,
        subid = input.subid,
        translated = true,
        number = keyind,
        keysym = keyind,
        active = input.active,
        utf8 = input.active and string.char(keyind) or ""
    }
end

Would have the input be translated to act as a keyboard input. The input table is quite complex, as shown in the documentation for the related target_input function that forwards input to a client.

This brings us to a point where the device is functional. Before ending this, let us do two minor adjustments – one technical and one cosmetic. Adding another state to the event handler in the target_alloc handler:

if status.kind == "connected" then
     target_flags(source, TARGET_BLOCKADOPT)   
end

This one is subtle – but should the engine reset the Lua VM due to a scripting error, or as a user trigger from switching appl, the frameserver created through target_alloc would be re-exposed to the scripts through a possible _adopt event handler. Setting the flag above will cause the source to be destroyed rather than adopted, preventing this from causing the client to ‘appear’ as a normal one in that edge condition.

Lastly, and for fun since just a green screen is not very interesting, we go a bit old school demo scene through this that adds a plasma as well as showing how GPU programs are set up and used:

local frag = [[
varying vec2 texco;
uniform int timestamp;
uniform float fract_timestamp;
float f1(float x, float y, float t){
    return sin(sqrt(10.0*(x*x+y*y)+0.05)+t)+0.5);
}
float f2(float x, float y, float t){
    return sin(5.0*sin(t*0.5)+y*cos(t*0.3)+t)+0.5;
}
float f3(float x, float y, float t){
    return sin(3.0*x+t)+0.5;
}

const float pi = 3.1415926;
void main()
{
    float time = float(timestamp) + fract_timestamp;
    float sin_ht = sin(t*0.5)+0.5;
    float cx = texco.s*4.0;
    float cy = texco.t*2.0;
    float v = f1(cx, cy, t);
    v += f2(cx, cy, t);
    v += f3(cx*sin_ht, cy*sin_ht, t);
    float r = sin(t);
    float g = cos(t+v*pi);
    float b = sin(t+v*pi);
    gl_FragColor = vec4(r,g,b,1.0);
}
]]

local shid = build_shader(nil, frag, "plasma")

-- and at the end of build_rendertarget:
image_shader(img, shid);

Posted in Uncategorized | Leave a comment

Another low-level Arcan client: A tray icon handler

This is the third part in the ongoing series exploring the different levels of APIs that are available for Arcan clients. For reference, the previous parts in the series are here:

  1. https://arcan-fe.com/2018/10/31/walkthrough-writing-a-kmscon-console-like-window-manager-using-arcan/
  2. https://arcan-fe.com/2019/03/03/writing-a-low-level-arcan-client/

In this part, we will write a really nifty little tool that will act as a tray icon (and more). When clicked or triggered via a custom input binding or a global hotkey, the tool will launch a second application that attach as a “popup” to the tray icon, with the icon representing whether the application is alive or not. This video shows it running inside of the Durden desktop, first wrapping a terminal emulator, then an X server running windowmaker (tradition by now).

The API features it covers are as follows:

  1. Registering custom key bindings.
  2. Negotiating a connection on behalf of a third party.
  3. Spawning a new process that inherits a negotiated connection.
  4. Server-side triggered coarse-grained timers.

The tool itself turned out useful enough to be added inside the tool collection in the main Arcan repository, so you can find it here: https://github.com/letoram/arcan/tree/master/src/tools/trayicon/src

A neat thing is that this clean and small tool can actually cover a number of use-cases without any modifications to the ‘real’ client, staying entirely in line with the ‘do one thing and well’ philosophy.

From an outline perspective, this is what our tool will do:

  1. Connect and identify as an ‘ICON’.
  2. Render a SVG into the segment from 1.
  3. On activation (click or otherwise), request a ‘HANDOVER’ subsegment.
  4. Inherit this handover connection into a new process and executable.
  5. Keep track of the child and update the icon state accordingly.

Getting the icon (1, 2)

The structure of the code itself will be similar to that of the previous article, so we will use that as a reference and describe the relevant modifications.The first change for this is easy, switch out the MEDIA part of the connection call to an ICON:

struct arg_arr* args;
struct arcan_shmif_cont conn =
arcan_shmif_open(SEGID_ICON, SHMIF_ACQUIRE_FATALFAIL, &args);

As a recap, the primary segment type is a strong hint to the window manager about what kind of a connection this is. This type has some impact on how the connection will be synchronised and prioritised, what decorations will be assigned, which kinds of subsegments will be allowed to be allocated and so on. The rule of thumb is that type + connection-point is the main selector for behaviour.

Capture the initial data like before, this time around we care about the initial size, upper dimensions and output density as we need to follow them:

struct arcan_shmif_initial* cfg;
arcan_shmif_initial(&conn, &cfg);

struct trayicon_state state = {
.density = cfg->density * 0.393700787f,
.bin = argv[3],
.argv = &argv[4],
.client_pid = -1,
.envv = environ,
.icon_normal = argv[1],
.icon_pressed = argv[2]
};

Only thing of note is that the SVG renderer we will use is stuck in the dark ages and uses the archaic ‘DPI’ for density, so convert to metric. Rendering SVG is out of scope, so we pull that from a 3rd party: nanosvg. The actual drawing and signalling is not different than what was done in the last article, so little point in reiterating that here. This link takes you to the lines in question.

Using the naive event loop from last time would cause some problems , so we will modify it slightly:

static void event_loop(
struct arcan_shmif_cont* C, struct trayicon_state* trayicon)

{
struct arcan_event ev;
while(arcan_shmif_wait(C, &ev)){
do_event(C, &ev, trayicon);

while(arcan_shmif_poll(C, &ev) > 0)
do_event(C, &ev, trayicon);

if (trayicon->dirty){
arcan_shmif_signal(C, SHMIF_SIGVID);
}
}
}

The innermost while loop is new. For this specific application, it is ok that the first wait is blocking as the program is entirely event driven. Responding to some of the events can be expensive however, particularly those events that would redraw the icon state. To be more responsive to ‘event storms’ where a stall somewhere else in the system could cause multiple events that would counteract each-other, we flush them out and only update after that is done.

Intermission: Durden tool, “external buttons”.

Features like a tray requires cooperation with the window manager. The easiest way to achieve this is through an addressing scheme we call ‘connection points’. Review the previous articles in the series if you do not know about this concept already.

For Durden, there is an explicit script that maintains the connections for external statusbar button, reroutes input and deals with positioning, user configuration and so on. This link points to the source of that tool.

In short, it exposes ways of creating connection points that wait for an icon to connect, and maps actions and data on this connection into the statusbar. It also adds strong restrictions on what this icon connection can do, what other resources it can try and negotiate, and how input gets routed to it.

As with anything else in Durden, the settings and controls for enabling this feature is exposed as a filesystem (one that can be mounted over FUSE). The specific path that needs to be enabled for this demo is:

/global/settings/statusbar/buttons/right/add_external=demo

Then the settings are exposed and discoverable in following path:

/global/settings/statusbar/buttons/right/external/demo

This allows the destination and behaviour to be highly tuneable and composes well with all other desktop mechanisms, from input macros to timers.

On Activation, Request a Subsegment (3)

Back to our tool. Lets modify the event loop to respond to click events, and to register a custom input binding slot. This labelhint can also carry a suggestion as to the default keysymbol (F1, l, UP, …) + modifier (ctrl, alt, …) though it is, as normal, something for the WM to decide what to do with. As soon as we have the connection from arcan_shmif_open:

arcan_shmif_enqueue(&conn, &(struct arcan_event){
.ext.kind = ARCAN_EVENT(LABELHINT),
.ext.labelhint = {
.idatatype = EVENT_IDATATYPE_DIGITAL,
.label = "ACTIVATE",
.descr = "tooltip description goes here"
}
});

Next in the event loop we will check for any digital event matching our label, or a digital press from a mouse type device.

if (ev->category == EVENT_IO){
if (ev->io.kind != EVENT_IO_BUTTON ||
!ev->io.input.digital.active)
return;

if (strcmp(ev->io.label, "ACTIVATE") == 0 ||
ev->io.devkind == EVENT_IDEVKIND_MOUSE){
toggle_client(C, trayicon);
}
return;
}

In toggle_client, we do the normal ‘check if the child is alive, if so, kill it and spawn a TERM to KILL thread’. It is just verbose and not very interesting. See this link for that code. The more interesting is if we want to spawn a child. First, we do this:

arcan_shmif_enqueue(C, &(struct arcan_event){
.ext.kind = ARCAN_EVENT(SEGREQ),
.ext.segreq.kind = SEGID_HANDOVER
});

This says ‘hey, I would like to create a new segment on behalf of someone else’. This allows the server side to accept the new client in, but also track origin without resorting to fundamentally broken things like asserting that pid == connection which was the X11 dance with _NET_WM_PID. This is an asynchronous request, so we need to do something in the event loop as well.

Handover to new Process (4)

We are reaching the end of the journey. Time to extend the event loop with something. There are two possible events that can come following a segment request. The boring one is TARGET_COMMAND_REQFAILED, which means that the server side said no. This is also the default if the WM does not explicitly opt-in via a call to target_accept as part of the event handling routine on the server side.

If the request is accepted, the proper resources are allocated and pushed to the client. To help dealing with multiple pending requests and so on, the type can be verified and paired against a client chosen ID, though we do not really care about that here for the time being.

case TARGET_COMMAND_NEWSEGMENT:{
trayicon->client_pid =
arcan_shmif_handover_exec(C, *ev,
trayicon->bin, trayicon->argv, trayicon->envv, false);

if (-1 != trayicon->client_pid){
render_to(C, trayicon);
}
}
break;

It is worthwhile to note that NEWSEGMENT events can arrive at any time. If they do not match a previous request, it means that the WM force pushed a resource as a feature probe and signalling intent. This will become important in an upcoming article.

If we do not handle this event, the internals of the client side library will drop the resources on the next call into any of the arcan shmif functions that is not directly related to accepting the request. Here we use a specific derivative of the exec- family, arcan_shmif_handover_exec. Internally speaking, it sets up a new process but duplicates the control socket and mapping descriptor, passes it as an environment so the new client will find it on arcan_shmif_open instead of using the connection path mechanism.

Track the Child (5)

The last piece of the puzzle is to keep track of our new pid. Now this pose something of a problem with the event loop we have. The proper options tend to be to fire up a monitoring thread that waitpid()s or a signal handler, then forward that back to the event loop where we, instead of the normal wait / poll I/O multiplexes on the raw descriptors. That is a lot of ugly boring code. For this round, we take something simpler.

arcan_shmif_enqueue(&cont, &(struct arcan_event){
.ext.kind = ARCAN_EVENT(CLOCKREQ),
.ext.clock.rate = 5
});

This will periodically emit STEPFRAME events at a rate of one event every 5 ticks on a 25Hz clock (chosen for my PALs). This is a mechanism that can be used for getting frame delivery notifications, server- initiated variable frame pacing mechanism and for client requested low frequency timers. Here we will just use it to check if the child is still alive and flip the icon if the state changed.

In Closing

If you are curious as to how the client side of the tray icon work in Xorg, look no further than the “System Tray Protocol Specification“. A wrapper similar to this one would of course be possible to make within good old’ X, but in a quite more painful and convoluted way ,with much less security, safety and accountability towards the user. Recall, even the smallest of ‘icon’ in X is no less capable than a full blown application, and this lack of nuance is is kind of the problem.

The main point of this article was mainly to show off some of the basic features, primarily the HANDOVER execution, as it is something we will use later for much more advanced things. The secondary point was to illustrate the balance of power and control, with clients still able to achieve traditional edge case features, but not without the consent and constraints set by the user, and that can only be a good thing.

Posted in Uncategorized | Leave a comment

Writing a low-level Arcan Client

This is a follow up article to the higher level writing a kmscon/console replacement. In this article, we will instead use the low-level C API to write a simple client.

To recap, there are 3 APIs for writing clients:

  1. Advanced/Low- level: “shmif”
  2. Mid- level: TUI (text-dominant, basic-multimedia)
  3. High-level: ALT (interactive/advanced multimedia, think ‘app’)

We will cover each of these in turn over several articles, starting with the first one, shmif.

SHMIF or “SHared Memory InterFace” is intended for extending the engine with features like new input drivers; specialised tools that are supposed to extend the window manager you are running; for tight multi-process integration with the engine scene graph and all of its features. The wayland protocol implementation is built on this API, as are things like the libretro support, the QEmu and Xorg backends and so on. In short: it is not for every day. Technically speaking, it is comparable to xlib/xcb/wayland-client, but with quite a few more features, and a lot more comfortable to actually use.

As with the WM article, we will use the ‘console’ git at: https://github.com/letoram/console to store the code.

Prelude

For this we will need a C compiler like gcc or clang, and a build system that can deal with pkgconfig. This time around we will use the meson build system.

Create a meson.build file and an empty main.c file in a folder. Define the meson contents as:

project('demo', 'c', default_options : ['c_std=c11'])
shmif = dependency('arcan-shmif')
executable('demo', 'main.c', dependencies : shmif)

Note the standard being set to C11. Arcan-shmif makes quite heavy use of C11 features and its more advanced memory model, going back to C99 or even older is not an option.

Making a Connection

Open the empty main.c file and add:

#include <arcan_shmif.h>
int main(int argc, char** argv)
{
return EXIT_SUCCESS;
}

An unorthodox detail here is that the shmif headers, by default, pulls in the other headers that it needs from the standard library. The rationale being that it is just a frustrating and annoying time waste remembering all the <stdatomic.h>, <stdint.h>, <inttypes.h> etc. used. There is an ifdef to get rid of that behaviour should one be so religiously inclined.

struct arg_arr* args;
struct arcan_shmif_cont conn =
arcan_shmif_open(SEGID_MEDIA,
SHMIF_ACQUIRE_FATALFAIL, &args);

This is the simplified version of opening a connection, there is also a shmif_open_ext function that allows you to specify a UUID (for settings persistence), application title, current identity and other metadata.

Three things of note here. The first is SEGID_MEDIA, which is the lose type that we identify as. This is primarily a guide for the WM to decide a suitable ‘policy’ in terms of graphics post processing, scheduling and so on.

The second is SHMIF_ACQUIRE_FATALFAIL. There are a number of connection control flags that regulate connection looping, crash recovery, guard threads and other advanced features. The FATALFAIL one here says to simply exit if a connection could not be made.

The third thing is the args parameter (which could be NULL). This act as a way of letting the user or server pass arguments to the application without interfering with the standard mandated argv.

We can also use a slightly more verbose setup:

struct arg_arr* args;
struct arcan_shmif_cont conn = arcan_shmif_open_ext(
  SHMIF_ACQUIRE_FATALFAIL, &args, (struct shmif_open_ext){
.type = SEGID_MEDIA, .title = "demo"
}, sizeof struct shmif_open_ext
);

This version allows more information to be provided on connection startup, particularly static / fixed connection title, and dynamic content identity, ident, along with a 128-bit GUID that can be used to help the server side maintain settings persistence across connections.

Inside shmif_cont

This is a technical explanation that can be skipped. If you are not interested, jump forward to the section called ‘Signalling Data’.

The basis of shmif is as the name implies, shared memory. When you make a connection, you get a ‘primary segment’, each connection has one. Destroy it (arcan_shmif_drop) and all other resources are terminated.

The following sketch shows how such a segment is organised:

The context (shmif-cont) is the developer facing structure that contains a local copy of current negotiated properties, like dimensions and buffer sizes. It also has pointers that map to the currently active buffers. Some operations, like arcan_shmif_signal and arcan_shmif_resize may modify this structure and renegotiate the contents of the shared memory.

There is a control socket, a semaphore for audio, video and events that are used to synchronise with the server side. These are primarily there for optimisation purposes, descriptor passing and I/O multiplexing and could be omitted in favour of better memory- based OS primitives (futex + named GPU buffers) where available.

The shared memory is split into two regions, one fixed- size and one variable- sized. The fixed sized region contains negotiation metadata on renegotation, transfer direction (server to client or client to server) as well as a ring buffer for input events, and one for output events. With this setup, events can be routed from their source zero-copy, as well as allowing both sides to have visibility into queue saturation, allowing optimisations like reordering and merging.

The variable-sized region can change on renegotiation (arcan_shmif_resize), where the typically pattern is a change to ‘aproto’ and ‘audio’ after connection setup, then video changes dynamically in response to user actions. Thus for most cases, this can be cheaply extended with truncate+remap.

Aproto contains various more advanced substructures, and are only used for very specialised targets like VR an HDR metadata. Audio works either with multiple-small buffers that the server side initialises to the ideal size of the playback device for latency, or as a big slush buffer for non-latency sensitive streaming audio. Lastly, the video buffer is something we will get into later in the ‘Data Signalling’ section.

Getting Parameters

This is a side-track that can be omitted in our short example, but it is still relevant to bring up. If you recall the ‘writing a WM’ article, there was this weird connection state marked as ‘preroll’, where the WM gets a synchronous step of telling the client all the things it should need to know in order to produce a frame that is immediately useful. This cuts down on tons of connection negotiation verbiage, and greatly reduces the risk of WM policy immediately invalidating the first buffers being produced.

By adding:

struct arcan_shmif_initial* cfg;
arcan_shmif_initial(&conn, &cfg);

We get access to any and all such data that the window manager provided, or some build time default or fallback. This includes properties like:

  • accelerated GPU device (resource-handle)
  • preferred font(s) (resource-handle) and respective size
  • preferred output dimensions, density, colour channel orientation
  • audio samplerate
  • language, locale and colour scheme preferences

These are soft hints. The client has no obligation to use them, though it would be foolish not to. The related resources will be automatically freed on the next call into shmif.

All of these properties can be changed dynamically as part of the normal event loop, and it is possible for the client to forego the preroll- stage with a flag to the _open call, and thus gain a few ms connection setup time. For this example, since we are not drawing internationalised shaped text, composing UI elements or playing back audio, we can safely ignore all this.

Data Signalling

Time to send some data. In this example we will just draw a gradient that moves to easily see frame, present size and updates. The outer loop will look like this:

size_t ts = 0;
for(;;){
draw_frame(&conn, ts++);
arcan_shmif_signal(&conn, SHMIF_SIGVID);
}

This will block until the server side has acknowledged the transfer and released the client to allow it to produce another frame.

Then the drawing routine something like this:

void draw_frame(struct arcan_shmif_cont* C, size_t ts)
{
float ss = 255.0 / C->w;
float st = 255.0 / C->h;
for (size_t y = 0; y < C->h; y++){
for (size_t x = 0; x < C->w; x++){
uint8_t r = ts + ss * x;
uint8_t g = ts + st * y;
size_t pos = y * C->pitch + x;
C->vidp[pos] = SHMIF_RGBA(r, g, 0, 0xff);
}
}
}

The more eye catching thing about this whole affair should be the shmif_pixel vidp[pos] = SHMIF_RGBA part. In the same way computers have a native endian and so on, this display system has a native compile-time defined pixel format (and audio sample format for that matter). While it can be changed for specialised builds, the vast majority of the time this will be some permutation of 32-bit RGBA, and we statically commit to one and use a packing macro to help clients. The reason is simply to have a safe, guaranteed, default, and leave the more compact- or more complex- formats to advanced transfer mechanisms, like GPU opaque buffer handles.

Now there is easily a ten page essay hidden in unpacking the cases where the render-loop above would be tolerable, and where you would want something more refined, but that is best left for another time; the real subject matter there, synchronisation, is the single most complex system graphics topic around – one that even trumps the subjects of colour spaces and mixed capability output compositing.

Other available options over the same interfaces include:

  • Switching to a GPU accelerated context and pass opaque handles
  • Toggle alpha channel and linear-/ non-linear RGB colour space
  • Dirty-rectangles updates
  • Request upcoming deadline
  • Set a desired presentation time
  • Enable poll()-able event-, timer- or ‘no-synch’ triggered transfers
  • Switch to mail-box n-buffers transfers

This would also work with SHMIF_SIGAUD for audio transfers, and both operations could be going on in its own thread with another being responsible for managing the event loop, which is next on the list.

Event Loop

So far our communication has been quite one-sided, how about we replace the frame counter with something event driven, such as when the user presses a certain key, we advance the timer and submit a frame.

Lets get rid of the for (;;) entirely and move to an event loop.

static void event_loop(struct arcan_shmif_cont* C)
{
struct arcan_event ev;
size_t step = 0;

while(arcan_shmif_wait(C, &ev)){
bool dirty = false;

if (ev.category == EVENT_IO)
dirty |= handle_input(C, &ev);
else if (ev.category == EVENT_TARGET)
dirty |= handle_target(C, &ev);
if (!dirty)
continue;

draw_frame(conn, step++);
arcan_shmif_signal(conn, SIGVID);
}
}

This will loop until the parent decides to close the connection. As mentioned before, the event model for engine internals and shmif are actually shared, even though most of the categories are masked out for IPC purposes. The three relevant ones are IO (input devices), TARGET (server->client) and EXTERNAL (client -> server).

We will stub handle target for now:

static bool handle_target(
struct arcan_shmif_cont* C, struct arcan_event* ev)
{
return false;
}

While for input, we set the return to indicate that a new frame should be generated.

static bool handle_input(
  struct arcan_shmif_cont* C, struct arcan_event* ev)
{
return (
ev->io.kind == EVENT_IO_BUTTON &&
ev->io.devkind == EVENT_IDEVKIND_KEYBOARD &&
ev->io.datatype == EVENT_IDATATYPE_TRANSLATED &&
ev->io.input.translated.keysym == 108 &&
ev->io.input.translated.active);
}

The IO event structure itself is a big lady that has evolved since practically the start of this project back in ~2003. It encompasses game devices, generic sensors, keyboards, mice, touch screens and eye trackers so far.

Here we are simply concerned if a key is pressed on any keyboard, which has a button like behaviour and translation tables, and if that key resolves to the ‘L’ key (108). That number is actually defined in a static table, inside arcan_tuisym.h and in the builtin/keyboard.lua script.

Just as with synchronisation, there is a lot more that could, and should, be elaborated on regarding problems with this kind of event processing, but small incremental steps is the recipe here.

Renegotiation (Resize)

Time to handle at least one TARGET event, and that is DISPLAYHINT. This event correlates to the WM calling the target_displayhint() function, providing updates about on how the segment will be displayed. This carries density, dimensions, visibility and so on.

So in the previously stubbed handle_target function:

static bool handle_target(
struct arcan_shmif_cont* C, struct arcan_event* ev)
{
switch(ev->tgt.kind){
  case TARGET_COMMAND_DISPLAYHINT:{
size_t w = ev->tgt.ioevs[0].uiv;
size_t h = ev->tgt.ioevs[1].uiv;
if (w && h){
arcan_shmif_resize(C, w, h);
return true;
}
}
break;
default:
break;
}
return false;
}

This structure has a more “syscall”- like approach with a small number of primitive data type slots described in the header close to where each event itself is defined. This deviates from both _IO and _EXTERNAL which has actual C union/struct style fields. The reality of that design is mostly legacy, and one that could be adjusted to union- like overloads, but it is low on the list of priorities.

That aside, there is never any direct harm done in not reacting to these events, the server side always assumes minimal capabilities as client default behaviour.

For the event handler here, we simply sanity-check that we get an updated width/height in the DISPLAYHINT, and forward it to the resize operation on the connection. While most operations are asynchronous across the API barrier, resize is one of the exceptions. Partly because it is the most costly and cross-cutting action there is, and partly because so much data / state can change meaning across a resize that it is a reasonable synchronous barrier to have.

Do note that the resize action here does not trigger a resized event in the perspective of the WM. As long as the client stays within allocation tolerances, the synchronous resize is just between the engine’s view of memory mappings and that of the client. The event is actually generated when the newly resized buffer is signalled, and that is true if running across a network as well.

In Closing

This is a good place to stop until next time. If we combine this with the ‘Console’ WM from the last article, we are already at a stage where most of the old school “I want a framebuffer” style applications can be implemented. When we return to this particular API in the series later, we can start to explore the large space of things to do from here, like:

  • Spawning new windows
  • Defining a custom icon, mouse cursor, titlebar
  • Switching to 3D rendered or accelerated handle passing mode
  • Handling state block transfers
  • Registering custom key bindings, key/value config persistence
  • Announcing format encoding/decoding support for universal file picking / browser integration
  • Playing back audio

But none of these require anything fundamentally different from what has been covered, merely extending the local event loop handler and emitting a few events.

Posted in Uncategorized | Leave a comment

The X Network Transparency Myth

This article presents an interpretation of the history surrounding the ability for X clients to interact with X servers that are running on other machines over a network; recent arguments as to that ability being defunct and broken; problems with the feature itself; going into what it was, what happened along the way, and where things seem to be heading.

The high level summary of the argumentation herein is that there is validity to the claims that, to this very day, there is such a thing as network transparency in X. It exists on a higher level than streaming pixel buffers, but has a diminishing degree of practical usability and interest. Its technical underpinnings are fundamentally flawed, dated and criminally inefficient. Alas, similarly dated (VNC/RFB) or perversely complex (RDP) solutions are far from reasonable alternatives.

What are the network features of X?

If you play things strict, all of X are. It should be the very point of having a client / server protocol and not an API/ABI.

Protocol vs. API/ABI tangent: Communication that travel across hard system barriers need to consider things like difference in endianness, loss in transit, remote addressing and so on, while the abstract state machine(s) need to account for parameters that are fairly invisible locally. Some examples of such parameters would be the big sporadic delays caused by packet corruption and retransmission, a constantly high base latency (100+ms) and buffer back-pressure (clients keep sending new frames and commands exceeding the available bandwidth of the communication channel, accumulating into local buffers, like stepping on a garden hose and see the bubble grow). The interplay between versions and revisions also tend to matter more in protocol design than in API design, unless you go cheap and reject client – server version mismatch.

Back to X: The real (and only) deal for X networking is in its practical nature; the way things work from a user standpoint. In the days of yore, one could simply chant the following incantation:

DISPLAY=some.ip:port xeyes

Should the gods be willing, you would have its very soul stare back at you through heavily aliased portals. The only difference to the local version would be a change to the “DISPLAY=:0” form, but other than that, the rest was all transparent to the user.

Now, the some.ip:port form assumed you were OK with anyone between you and the endpoint being able to listen in “on the wire”, possibly doing all kinds of nasty stuff with the information in transit. To add insult to injury, Pixel buffers were also not compressed so when they became too numerous or large, the network was anything but happy. The feature was really only ever ‘good’ through the rose tinted glasses of nostalgia on a local area network; your home, school, or business; certainly not across the internet.

The form above also assumes that the X server itself had not been started with the “-nolisten tcp” argument set, or that you were using the better option of letting an SSH client configure forwarding,  introduce compression and provide otherwise preferential treatment like disabling Nagel’s Algorithm. Even then, you had to be practically fine with the idea that some of your communication could be deduced from side channel analysis (hint: even your keypresses looks very distinct from a packet-size over time plot) and so on. Details like this also puts a bit of a dent in the ‘transparent to the user’ idea.

Those details in spite, this was a workable scenario for a long time, even for relatively complex clients like that of the venerable Quake 3. The reason being that even GLX, the X related extensions to OpenGL only had local ‘direct rendering’ as an optional thing. But that was about the tipping point on the OpenGL timeline where the distance between locally optimal rendering and remote optimal rendering became much too great, and the large swath of developers- and users- in charge largely favoured the locally optimal case for desktop like workloads.

The big advantage non-local X had over other remote desktop solutions, of which there are far too many, is exactly this part. As far as the pragmatic user could care, the idea of transparency (or should it be translucency?) was simply to be able to say “hey you, this program, and only this program on this remote machine, get over here!”.

The principal quality was the relative seamlessness of the entire set of features on a per window basis, and that, sadly, goes unmatched to this very day, but with every ‘integrated desktop environment’ advancement, the feature grows weaker and the likelihood of applications being usable partially, or even at all, like this decreases drastically.

What Happened?

An unusably short answer would be: the convergence of many things happened. A slightly longer answer can be found here: X’s network transparency has wound up mostly being a failure. My condensed take is this:

Evolution of accelerated graphics happened, or the ‘Direct Rendering Infrastructure, DRI’ as it is generationally referenced in the Xorg and Linux ecosystems. Applications starting to depend heavily on network unfriendly IPC systems that were being used as a sideband to X rather than in cooperation with it. You wanted sound to go with your application? Sorry. Notification popups going to the wrong machine? oops, now you need D-Bus! and so on.

This technical development is what one side of the argument is poking fun at when they go ‘X is not network transparent!’, while the other side are quick to retort that they are, in fact, running emacs over X on the network to this very day. The easy answer is to try it for yourself, it is not that the mechanisms have suddenly disappeared; it should be a short exercise to gain some practical experience. From my own experiments just prior to writing this article, the results varied wildly from pleasant to painful depending on how the application and its toolkit were written.

Thus far, I have mostly painted a grim portrait, yet there are more interesting sides to this. These more interesting things are XPRA and X2go. X2go address some of the shortcomings in ways that still leverage parts over X without falling back to the lowest “no way out” common denominator of sending an already composited framebuffer across the wire. It does so by using a custom X server with a different line protocol for external communication and a carrier for adding in sound, among other things. Try it out! it is pretty neat.

Alas this approach also falls flat when it comes to accelerated composition past a specific feature-set, which can be seen in the compatibility documentation notes. That aside, X2go is still very actively both developed, and used. The activity on mailing lists, irc and gatherings all act as testament to the relevance of the feature and its current form, from both a user- and a develop- perspective.

What does the future hold?

So outside succumbing to using the web browser and possibly bastardised versions like ‘electron’ as its other springboard, what options are there?

Lets start with the ‘design by committee’ exercise that is Wayland, and use it as an indicator of things that might become a twisted reality.

From what I could find, there is a total of one good blog post/PoC that, in stark contrast to the rambling fever dreams of most forum threads on the subject, experiments technically with the possibility of transparent in the sense of “a client connecting/bridged to a remote server” and not opaque in the sense of “a server compositing and translating n clients to a different protocol”.  Particularly note the issues around keyboard and descriptor passing. Those are significant yet still only the tip of a very unpleasant iceberg.

The post itself does a fair job providing notes on some of the problems, and you can discover a few more for yourself if you patch or proxy the wayland client library implementation to simulate various latencies in the buffer dispatch routine, sprinkle a few “timesleeps” in there. Enjoy troubleshooting why clients gets disconnected or crash sporadically. It turns out testing asynchronous event driven implementations reliably is really hard and not enough effort is being put into toolkit backends for Wayland; too bad most of the responsibilities have been pushed to the toolkit backends in order to claim that the server side is so darn simple.

That is not to say that it cannot be done, of course – the linked blog post showed as much. The issue is that the chasm between a. the “basic” proxy-server/patching support libraries and writing over a socket, even with some video compression, and b. getting to even the level of x2go with the aforementioned problems is a daunting task. Then you would still fight the sharp corners with queueing around back-pressure so data-device (clipboard) actions does not stall everything; the usability problems from D-bus dependent features breaking; audio not being paired, synched and resampled to the video it is tied to; and so on.

The reason I bring this up is that what will eventually happen is eluded to in the Wayland FAQ: 

This doesn’t mean that remote rendering won’t be possible with Wayland, it just means that you will have to put a remote rendering server on top of Wayland. One such server could be the X.org server, but other options include an RDP server, a VNC server or somebody could even invent their own new remote rendering model.

The dumbest thing that can happen is that people take it for the marketing gospel it is, and actually embed VNC on the compositor side. I tried this out of sheer folly back in ~2013 and the experience was most unpleasant.

RFB, the underlying protocol in ‘VNC’, is seriously terrible; even if you factor in the many extensions, proprietary, as well as public. Making fun of X for having a dated view on graphics and in the next breath considering VNC has quite some air of irony to it. RFBs qualities is the inertia in clients being available on nearly every platform, and that the public part of the protocol (RFC6143) is documented in such a coherent and beautiful way that it puts the soup of scattered XML files and TODO sprinkled PDFs that is “modern” Wayland forever in the corner.

The counterpoint to the inertia quality is that the RFB implementations have subtle incompatibilities with each other, so you do not know which features that can be relied on, when they can be relied on, or to what extent; assuming the connection does not just terminate on connection handshake. The later case was, as an example, the case for many years with Apples VNC server being connected to from one not written by Apple. 

The second dumbest thing is to use RDP. It has features. Lots of them. Even a printer server and usb server and file system mount translation. Heck, all the things that Xorg was made fun of for having, is in there, and then some. The reverse engineered implementation of this proprietary Microsoft monstrosity, FreeRDP, is about the code size of the actually used parts of Xorg, give or take some dependencies. In C. In network facing code. See where this is heading? Embed that straight into your privileged Wayland compositor process, and I will just sit here in bitter silence and be annoyed by the fireworks.

The least bad available technology to try and get in there would be the somewhat forgotten SPICE project, which is currently ‘wasted’ as a way of integrating and interacting with KVM/Qemu. In many ways, with the local buffer passing modifications, it makes a reasonably apt local display server API as well. 

Rounding things off, the abstract point of the ‘VNC-‘ idea argument is, of course, the core concept of treating client buffers solely as opaque texture bitmaps in relation to an ordered stream of input and display events; not the underlying protocol as such.

The core of the argument is that networked ‘vector’ drawing is defunct and dead or dying. The problem with that argument is that it is trivially shown to be false, well illustrated by the web browser which shows some of the potential and public interest. We are not just streaming pixel buffers, and for good reason. The argument is only partially right in the X case as X2go shows that there is validity to proper segmentation of the buffers, so that the networking part can optimise and chose compression, caching and other transfer parameters based on the actual non-composited contents.

If you made it this far and want to punish yourself extra – visit or revisit this forum thread and contrast it in relation to this article.

Posted in Uncategorized | 18 Comments

Writing a console replacement using Arcan

In this article, I will show just how little effort that it takes to specify graphics and window management sufficient enough to provide features that surpass kmscon and the ‘regular’ linux console with a directFB like API for clients to boot. It comes with the added bonus that it should work on OSX, FreeBSD, OpenBSD, and within a normal Arcan, X or Wayland desktop as well. Thus, it is not an entry in the pointlessly gimmicky ‘how small can you make XYZ’ but rather something useful yet instructional.

Two things motivated me to write this. The first is simply that some avid readers asked for it after the article on approaching feature parity with Xorg. The second is that this fits the necessary compatibility work needed with the TUI API subproject – a ‘terminal emulator’ (but in reality much more) that will be able to transition from a legacy terminal to handling terminal-freed applications.

The final git repository can be found here: https://github.com/letoram/console

Here are shortcuts to the steps that we will go through:

Each Part starts by referencing the relevant Git commit, and elaborates on some part of the commit that might be less than obvious. To get something out of this post, you should really look at both side by side.

Prelude: Build / Setup

For setup, we need two things – a working Arcan build and a directory tree that Arcan can interpret as the set of scripts (‘appl’) to use. Building from source:

git clone https://github.com/letoram/arcan arcan
cd arcan/external ; ./clone.sh ; mkdir ../build ; cd ../build
cmake -DVIDEO_PLATFORM=XXX ../src ; make

There are a lot of other options, but the important one here is marked with XXX. Some of them are for embedded purposes, some for debugging or performance versus security trade offs.

Even though Arcan support many output methods, the choice of the video platform has far reaching effects and it is hardcoded into the build and thus the supporting tools and libraries. Why this is kept like this and not say, “dynamically loadable plugins” is a long topic, but suffice it to say that it saves a lot of headache in deal-breaking edge cases.

There are two big options for video platform (and others, like input and audio are derived from the needs of the video platform). Option one is sdl or sdl2, which is a lot simpler as it relies on an outer display server to do much of the job, which also limits its features quite a bit.

Option two is the ‘egl-dri’ platform which is the more complicated beast needed to act the part of a display server. The ‘egl-dri’ platform build is the one packaged on voidlinux (xbps-install arcan). There is a smaller ‘egl-gles’ platform hacked together for some sordid binary-blob embedded platforms, but disregard that one for now.

The directory structure is simple: the name of the project as a folder, a .lua file with the same name inside that folder, with a function with the same name. The one we will use will simply be called ‘console’, so:

mkdir console
echo "function console() end" > console/console.lua

This is actually enough to get something that would be launchable with:

 arcan /path/to/console

But it won’t do much–and if you use the native egl-dri platform, you need to somehow kill the process for the resources to be released or rely on the normal keybindings to switch virtual terminal.

The first entry point for execution will always be named the same as the appl, tied to the name of the script file and the name of the folder. This is enforced to make it easy to find where things ‘start’. Any code executed in the scope outside of that will not have access to anything but a bare minimum Lua API.

All event hooks (like input, display hotplug, …) are implemented as suffixes to these naming rules, e.g. the engine will look for a console_input when providing input events.

Part 1: Hello Terminal

Git Commit #1

Breaking down the first function:

function console()
	KEYBOARD = system_load("builtin/keyboard.lua")()
	system_load("builtin/mouse.lua")()
	KEYBOARD:load_keymap(get_key("keymap") or "devmaps/keyboard/default.lua")
	selected = spawn_terminal()
	show_image(selected)
end

The words in bold are reserved Lua keywords and the cursive ones return to Arcan specific functions. The Arcan specific functions are documented in the /doc/*.lua files, with one file per available function.

The first called, system_load is used to pull in other scripts or native code (.dll/.so). The way it searches for resources is a bit special, as the engine works on a hierarchy of ‘namespaces’, basic file system paths that are processed in a specific order. Depending on what kind of a resource you are looking for, different namespaces may be consulted. The ones relevant here are system scripts, shared resources and appl.

System scripts are shared scripts for common features that should be usable to most projects but are not forced in. These are things like keyboard map translation, mouse state machine and gestures and so on. ‘appl’ is the namespace of our own scripts here.

The get_key function is used for persistent configuration data storage. This is stored in a database with a table for arcan specific configuration, for shared launch targets (programs that are allowed to be executed by the engine itself) and for appl specific configuration (that is what we want here). These can be handled in script, but there are also a command-line tool, arcan_db where you can modify these keys by yourself.

The show_image function takes a vid and sets its opacity to fully opaque (visible). A ‘vid’ is a central part in Arcan, and is a numeric reference to a video object. We will use these throughout the walkthrough as they also work as a ‘process/resource’ handle for talking to clients.

When created, VIDs start out invisible to encourage animation to ‘fade them in’ – something that can be achieved by switching to blend_image and provide a duration and optionally an interpolation method. The fancy animations and visuals are out of scope for now though.

Next we add the missing function referenced here, spawn_terminal:

function spawn_terminal()
	local term_arg = get_key("terminal") or "palette=solarized-white"
	return launch_avfeed(term_arg, "terminal", client_event_handler)
end

The get_key part has already been covered, and if we don’t find a ‘terminal’ key to use as an argument to our built-in terminal emulator, the palette=solarized-white argument will be selected.

launch_avfeed(arg, fsrv_type, handler) is where things get interesting. It is used to spawn one of the frameservers, or included external programs that we treat as special ‘one-purpose’ clients. There is one for encoding things, decoding things, networking and so on. There is also one that is a terminal emulator, which is what we are after here. The argument order is a bit out of whack due to legacy and how the function evolved, in hindsight, arg and type should have been swapped. Oh well.

Time for a really critical part, the event handler for a client. This can of course be shared between clients, unique to individual clients based on something like authenticated identity or type but also swapped out at runtime with target_updatehandler.

function client_event_handler(source, status)
	if status.kind == "terminated" then
		return shutdown()

	elseif status.kind == "resized" then
		resize_image(source, status.width, status.height)

	elseif status.kind == "preroll" then
		target_displayhint(source,
			VRESW, VRESH, TD_HINT_IGNORE, {ppcm = VPPCM})
	end
end

There are a ton of possible events that can be handled here, and you can read launch_target for more information. Most of them are related to more advanced opt-in desktop features and can be safely ignored. The ones we handle here are:

terminated‘ meaning that the client has, for some reason, died. You can read its last_words field from the status table for a user-presentable motivation. The backing store, that is the video object and the last submitted frame, is kept alive so that we can still render, compose or do other things with the data so normally, you would delete_image here but we chose to just shutdown.

‘resized’ is kind of important. It means that the size of the backing store has changed, and the next frame drawn will actually be scaled unless you do something. That is there is a separation between the presentation size that the scripts set with a resize_image call and whatever size the backing store has. Here we just synch the presentation to the store.

preroll’ is a special Arcan construction client communication design. Basically, the client is synchronously blocking and waiting for you to tell as much as you care to tell about its parameters and instead of reacting to each as part of an event loop, they get collected into a structure of parameters like presentation language, display density and so on. Here we only use target_displayhint to tell about the preferred output dimensions, focus state and specific display properties like density.

Finally, we need some input:

function console_input(input)
	if input.translated then
		KEYBOARD:patch(input)
	end
	target_input(selected, input)
end

This is an event handler that was talked about before, and one of the more varying one as it pertains to all possible input states. This grabs everything it can from the lower system levels and can be as diverse as sensors, game controllers, touch screens, etc.

The more common two from the perspective here though is ‘translated’ devices (your keyboard) and a mouse. Here we just apply the keyboard translation table (map) that we loaded earlier and forward everything to the selected window. The target_input function is responsible for that, with the possibility to forego routing a source table at all and synthesise the input that you want to ‘inject’.

Part 2: Workspaces and Keybindings

Git Commit #2

While this commit is meatier than the last one, most of it is the refactoring needed to go from one client fullscreen to multiple workspaces and workspace switching and barely anything of it is Arcan specific. The two small details of note here would be the calls to valid_vid and decode_modifiers.

Decode_modifiers is trivial, though its context is not. Keyboards are full of states, and they are transmitted as a bitmap. This function call helps decompose that bitmap as a more manageable type. There are much more to be said about the input model itself as it is much more refined and necessarily complex, and will span multiple articles.

Valid_vid will be used a lot due to a ‘Fail-Early-Often-Hard’ principle in the API design. A lot of functions has a ‘terminal state transition’ note somewhere, meaning that if the arguments you provide mismatch with what is expected, the engine will terminate, generate a snapshot, traceback etc. Depending on how the program was started, it is likely that it will also switch to one of the crash recovery strategies.

This is to make things easier to debug and preserve as much relevant program state as possible. Misuse of VIDs is the more common API mistake, and valid_vid calls can be used as a safeguard against that. It is also the function you need to distinguish a video object with say, a static image source, from one with an external client tied to it.

Part 3: Clipboard and Pasteboard

Git Commit #3

More interesting things in this one and it is a rather complex feature to boot. In fact, it is the most complex part of this entire story. In the client_event_handler you can spot the following:

elseif status.kind == "segment_request" and
       status.segkind == "clipboard" then
		local vid = accept_target(clipboard_handler)
		if not valid_vid(vid) then
			return
		end
		link_image(vid, source)
	end

The event itself is that the client (asynchronously) wants a new subsegment tied to its primary one. If we do not handle this event, a reject will be sent instead and the client will have to go on without one.

By calling accept_target we say that the requested type is something we can handle. This function is context sensitive and only valid within the scope of an event handler processing a segment request. Since all VID allocations can fail (there is a soft configurable limit defaulting to a few thousand, and a hard one at 64k) we verify the result.

The link_image call is also vastly important as it ties properties like coordinate space and lifecycle management of one object to another. When building more complex server side UIs and decorations, this is typically responsible for hierarchically tying things together. Here we use it to make sure the new allocated clipboard resources are destroyed automatically when the client VID is deleted.

Looking at the clipboard handler:

elseif status.kind == "message" then
    tbl, _ = find_client(image_parent(source))
    tbl.clipboard_temp = tbl.clipboard_temp .. status.message
    if not status.multipart then
        clipboard_last = tbl.clipboard_temp
        tbl.clipboard_temp = ""
    end
end

This is a simple text-only clipboard, there are many facilities for enabling more advanced type and data retrieval — both client to client directly and intercepted. Here we stick to just short UTF-8 messages. On the lower levels, every event is actually transmitted and packed as a fixed size in a fixed ring buffer in shared memory.

This serves multiple purposes, one is to avoid the copy-in-copy-out semantics that the write/read calls over a socket like X or Wayland would do. Other reasons are to allow ‘peek/out-of-order’ event processing as a heavy optimisation for costly event types like resize, but also to act as a rate-limit and punish noisy clients that try to saturate event queues to stall / delay / provoke race conditions in other clients or the WM itself. For this reason, larger paste events need to be split up into multiple messages, and really large ones will likely stall this part of the client in favour of saving the world.

Long story short, the WM thus has to explicitly concatenate these messages, and, optionally, say when enough is enough. Here we just buffer indefinitely, but a normal approach would be to cut at a certain length and just kill the per-client clipboard as punishment. For longer data streams, we have the ability to open up asynchronous pipes, either intercepted by the WM or by sending the read end to one clipboard, and the write end to another.

The call to image_parent simply retrieves the parent object of the clipboard, which is the one we linked to earlier and the code itself just pairs a per-client table where we build the clipboard message that the client wants to add.

Lastly, pasting. In the clipboard_paste function we can spot the following:

if not valid_vid(dst_ws.clipboard) then
    dst_ws.clipboard = define_nulltarget(dst_ws.vid,
     "clipboard", function(source, status)
	if status.kind == "terminated" then
            delete_image(source)
        end
   end
)

The important one here is define_nulltarget. There are a number of define_XXXtarget functions depending on how information is to be shared. The common denominator is that they are about allocating and sending data to a client, while most other functions deal with presentation or data coming from a client.

The nulltarget is simple in the form that it only really allocates and uses the event queues, no costly audio or video sharing. It allocates the IPC primitives and forces unto the client, saying ‘here is a new window of a certain type, do something with me!’. Here we use that to create a clipboard inside the recipient (if one doesn’t exist for the purpose already).

We can then target_input the VID of this nulltarget to act as our ‘paste’ operation.

Part 4: External Clients

Git Commit #4

There are quite a few interesting things in this one as well. In the initial console() function, we added a call to target_alloc. This is a special one as it opens up a connection point — a way for an external client to connect to the server. All our previous terminals have been spawned by initiative of the WM itself and using a special code path to ensure that it is the terminal we are getting.

With this connection point, a custom socket name is opened up that a client can access using the ARCAN_CONNPATH environment (or by specifying it explicitly with a low level API). Otherwise it behaves just like normal, with the addition of an event or two.

Thus in the client_event_handler, we add a handler for “registered” and “connected”.

“Connected” is simply that someone has opened the socket and it is now consumed and unlinked. No other client can connect using it. This is by design to encourage rate limiting, tight resource controls and segmenting the UI into multiple different connection groups with a different policy based on connection point used. For old X- like roles, think of having one for an external wallpaper, statusbar or launcher.

Our behaviour here is simply to re-open it by calling ‘target_alloc’ again in the connected stage of the event handler.

The “Registered” means that now the client has provided some kind of authentication primitive (optional) and a type. This type can also be used to further segment the policy that is applied to a connection. In the ‘whitelisted’ function below, we select the ones we accept, and assign a relevant handler. If the type is not whitelisted, the connection is killed by deleting the VID associated with the connection.

Lastly we add a new event handler, _adopt (so console_adopt). This one is a bit special.

function console_adopt(vid, kind, title, have_parent, last)
...
end

We will only concern ourselves with the prototype here. Adopt is called just after the main function entry point, if the engine is in a recovery state. It covers three use cases:

  1. Crash Recovery on Scripting Error
  2. ‘Reset / Reload’ feature for a WM (using system_collapse)
  3. Switch WMs (also using system_collapse)

The engine will save / hide the VIDs of each externally bound connection, and re-expose them to the scripts via this function. The ‘last’ argument will be set on the last one in the chain, and have_parent if it is a subsegment linked to another one, like clipboards.

It is possible for the WM to tag more state in a vid using image_tracetag which can also be recovered here, and that is one way that Durden keeps track of window positions etc. so that they can survive crashes (along with store_key and get_key to get database persistence).

In the handler for this WM, we keep only the primary segment, and we filter type through whitelisted so that we do not inherit connections from a WM switch that we do not know what to do with.

Part 5: Audio

Git Commit #5

Time for a really short one. Arcan is not just a Display Server, and there are reasons for why it is described as a Multimedia Server or a Desktop Engine. One such reason is that it also handles audio. This goes against the grain and traditional wisdom (or lack thereof) of separating a display server and audio server and then spend a ton of effort getting broken synch, device routing and meta-IPC as an effect.

With every event, you can extract the ‘source_audio’ field with gives an AID, the audio identifier that match a VID one. Though the interface is currently much more primitive as advanced audio is later on in the roadmap, the basic is being able to pair an audio source with a video one, and be able to control the volume.

This in this patch, we simply add a keybinding to call audio_gain and extract the AID to  store with other state in the workspace structure.

Part 6: Extra Font Controls

Git Commit #6

Concluding with another short one. To get into the details somewhat on this one. you should read the Font section in the Arcan vs Xorg article.

We simply add calls to target_fonthint during the ‘preroll’ stage in the client event handler:

local font = get_key("terminal_font")
local font_sz = get_key("font_size")

if font and (status.segkind == "tui" or status.segkind == "terminal") then
    target_fonthint(source, font, (tonumber(font_sz) or 12) * FONT_PT_SZ, 2)
else
    target_fonthint(source, (tonumber(font_sz) or 12) * FONT_PT_SZ, 2)
end

The main quirk is possibly that the size is expressed in cm (as density is expressed in ppcm and not imperial garbage), thus a constant multiplier (built-in) is needed to convert from the familiar font PT size.

That is it for now. As stated before, this is supposed to be a small (not minimal) viable WM model that would support many ‘I just need to do XYZ’ users, but at the same time most of the building block needed for our intended terminal emulator replacement project. For that case, we will revisit this WM about once more a little later.

At this stage, most kinds of clients should be working, the fsrv_game, fsrv_decode etc. can be used for libretro cores and video decoding. Other arcan scripts can be built and tested with the arcan_lwa binary. aloadimage for image viewing and Xarcan for running an X server.

The “only” things missing client- support wise is the arcan-wayland bridge for wayland client support due to additional complexity from the wayland allocation scheme and that specific window management behaviour has such a strong presence in the protocol.

Posted in Uncategorized | 11 Comments

Arcan versus Xorg – Approaching Feature Parity

This is the first article out of two where I will go through what I consider to be the relevant Xorg feature set, and compare it, point by point, to how the corresponding solution or category works in Arcan.

This article will solely focus on the Display Server set of features and how they relate to Xorg features, The second article will cover the features that are currently missing (e.g. network transparency) when they have been accounted for, as well a the features that are already present in Arcan (and there are quite a few of those) but does not exist in Xorg.

It is worthwhile to stress that this project in no way attempts to ‘replace’ Xorg in the sense that you can expect to transfer your individual workflow and mental model of how system graphics works without any kind of friction or effort. That said, it has also been an unspoken goal to make sure that everything that can be done in an Xorg environment should be possible here — in general there is nothing wrong with the feature set in X (though a bit limited), it is the nitty gritty details of how these features work, are implemented and interact that has not really kept up with the times or been modelled in a coherent way. Thus, it is a decent requirement specification to start with – just be careful with the implementation and much more can be had for a fraction of the code size.

The very idea of the project as a whole is to find new models for system graphics, preferably those that can survive the prospect of a coming ‘demise’ of the ‘desktop’ as the android/chrome alphabet soup monster keep slurping it up, all the while established ones make concessions after concessions to try and cater to a prospective user base that just want things to “work” at the expense of advanced user agency.

In terms of the ‘classic’ desktop, Xorg with DRI3 work is already quite close to that dreaded ‘good enough’, and the polish needed is mostly within reach. My skepticism is based on the long standing notion that it won’t actually matter or be anywhere near ‘good’ enough.

This article is about as dry an experience as quaffing a cup of freshly ground cinnamon – even after the wall of text was trimmed down to about half of its original length. To make it less painful to read, you can use these shortcuts to jump between the different feature categories.

Most of these sections start with a summary of the X(org) perspective, and then contrast it with how it is handled in Arcan.

As a precursor to the other sections, lets first take a look at two big conceptual differences. The first one is that in Arcan, the only primitive a client really works with is referred to as a ‘segment’. A segment is a typed IPC container that can stream audio and/or video data in one direction, ‘client to server’ or ‘server to client’. It also carry associated events, metadata and event-bound abstract data handles (e.g. file descriptors) in both directions.

The type is a hint to/from the policy layer (which is somewhat similar to a ‘Window Manager’), e.g. popup, multimedia, clipboard, game and so on – used to assign priority and influence decisions such as suppress-suspend or input grabs without letting clients request such state changes, which is otherwise the annoying default in many other environments.

Any connected client gets one segment (the ‘primary’) and closing that one will release all resources associated with the client. A client may negotiate for additional segments, but these can be rejected, and rejection is, just as in real life, the default behaviour. The policy layer has to explicitly opt in to further allocations. A client may also negotiate for extending the capabilities of the segment to support advanced and privileged features, such as access to VR related devices, hardware colour lookup tables and so on.

A second key difference is the aforementioned policy layer. This layer act as both the window manager and event driven I/O “routing” rules combined into one. It is not treated as a normal client, but instead has access to a unique set of privileged functions that can be leveraged to explicitly control everything from display synchronisation, source to output mapping, to event propagation and synthesis.

The premise is that the choice in policy layer should be modifiable at the user’s behest. It can be thought of as a firewall and router for all UI related activity – in control of what goes where, when, why, and in what shape. This layer also have the option to not only accept and map client segments, but also to explicitly push a segment to a client, as a means of probing- and announcing- support for some features.

Clients and Privileges

For those not entirely comfortable with display server parlance, recall the distinction between X11 (X protocol version 11) and Xorg. X11 is a protocol (a literal stack of papers), while Xorg is the current de facto server side implementation of that protocol – though the protocol itself only expose a subset of what the server itself can potentially do. A reasonable parallel is the one of the relationship between HTTP (or SPDY, QUIC) and the rest of the web browser (Firefox, Chrome, …) both in engineering effort and relative importance. For clients to communicate using the protocol, the de facto implementation comes via the xlib library or via the more recent ‘xcb’ library.

This is a setup that is quite rare for user-facing operating systems in general: Windows does not have a display server protocol, and the same goes for OSX and Android. The real value of a protocol is communication between hard system barriers, politics condensed so to speak, but interaction between userspace components is not a particularly hard boundary. The protocol distinction is also not very common in related subsystems like audio. What happens instead is that they do expose APIs for letting other clients interact and integrate with the outer system, but that, by its very definition, only describes the interface – nothing strictly about the communication channel, ordering rules, valid- and invalid- bit patterns or actions to take on non-compliance, as some of those parameters are regulated on an OS level in the shape of the lesser known ABI.

In a similar vein, Arcan does not have a protocol for normal clients as such, and compatibility support for protocols like X11 and Wayland come via translation services built as separate opt-in special clients for stronger separation, sanity and sandboxing reasons.

API-wise, Arcan exposes three APIs. ‘Shmif’, ‘Tui’ and ‘Alt’. The philosophy behind the design is to let the server side absorb systemic complexity, and strive to make clients as trivial as possible to both develop and inspect. No multi-million lines of code toolkit should ever be necessary. The perspective is to instead emphasise the extremes, that is to focus on the small “one task” kind of clients and on integration with bigger “embedded universes” like virtual machines and web browsers, and leave the middle to fester and rot.

The ‘shmif‘ API provides the low level interface and exposes a large set of features over a deceptively simple setup, one that was mostly inspired by old arcade hardware designs. This API is kept in lock-step with the engine. It may have incompatible changes between versions (but so far, not really), and will remain lock stepped up until the fabled ‘1.0’ release. This means that the running engine version needs to match that of the shmif API version. It enforces the attitude of “if you upgrade the server side, upgrade the clients” as well. Therefore, it does not provide means for “extensions”. The reasoning for this is that it should be absolutely clear from a version which features that are present or not. It should not be left up to guesswork, a “TERM=xxx ; termcap” style database or Wayland- “Charlie Foxtrot” style extension registry.

The ‘tui’ API builds upon shmif and primarily strikes at text-oriented user interfaces as a means of dissolving the dependency these have had traditionally to terminal protocols and the need to go via a terminal emulator. This opens the door to better integration with the window management style in the outer desktop; sane input models; multimedia- capable ‘rich text’ command line tools; removing the need for tmux style multiplexers and so on.

Lastly, the ‘alt’ API is the high level engine interface itself and is currently exposed via Lua (see also: AWK for multimedia). It is used on the server side to implement the policy layer, but it can also be used client side via a special Arcan build (referred to as LWA) that acts as a normal 2D/3D renderer.

Going back to the arcan-shmif API — it is used to implement support for a number of features, but also for supporting other protocols. These typically come as separate processes and services that you switch on and off during runtime. The big two examples, right now, being ‘waybridge’ for Wayland support and ‘xarcan’ for X11 client support. These are kept as separate, privilege separated and, optionally, sandboxed processes and can run in all combinations: one bridge to one client, one bridge to many clients and many bridges to many clients.

Programs that use the arcan shmif API are split into three distinct groups: ‘subjugated‘, ‘authenticated‘ and ‘non-authoritative‘ – collectively referred to as ‘frameservers‘ in most of the engine documentation and code.

The authenticated clients are started through the initiative of the window manager and their parameters are predefined in a database. They inherit their respective connection primitives, thus keeping the chain of trust intact.

Subjugated clients have a preset ‘archetype‘. This archetype regulates its intended role in the desktop ecosystem, e.g. media decoding,  encoding, CLI shell etc. and the window manager may forcefully create and destroy these with predictable consequences. They can therefore be considered as a form of ‘do one thing well then die’ graphical clients. The big thing with this is that a window manager can make stronger assumptions about what to make of their presence or lack thereof.

The last ‘Non-authoritative’ clients are ones that more closely match the model seen both in other Wayland compositors and in X11 (X has a very mild form of authentication, Wayland completely fails in this regard) and is simply some client started from a terminal (tty) or, more likely, pseudoterminal (ptty) using some OS specific connection primitive and discovery scheme.

In X11, clients connects and authenticates to an X11 server with a flat (disregarding certain extensions, there are always ‘extensions‘) set of quite far-reaching permissions – clients can iterate, modify, map, read and redirect parts of both the event routing and the scene graph (a structure that controls what to draw, how to draw and when to draw). It is a tried and true way of getting very flexible and dynamic clients – which is also its curse.

In Arcan, there is a minimal default set of permissions which roughly corresponds to drawing into a single re-sizeable buffer, along with a bidirectional event queue. Everything else is default-reject policy, and has to be opted in. This includes attempts to retrieve additional buffers, on the principle that a client should have a hard time to issue commands that induces uncontrolled server-side resource allocations. Partial motivation for the resource stringency is that the display server can run for extended periods of time, and small performance degradation from heap fragmentation and so on- accumulate slowly, presents itself as ‘jankyness’, ‘judder’, ‘lag’ and can be terrifyingly hard to attribute and solve.

The upside is that the ‘window manager’ can extend this on a sliding scale from accepting custom mouse cursors for certain clients, to allowing GPU accelerated transfers, access to colour controls, buffer readbacks and unfiltered event-loop injection. Principally, albeit unwise, there is nothing that stops this layer from opening the permission doors wide open immediately and just replicate the X11 model in its entirety, but it happens on the behest of the user, not as a default.

Window Managers (WM)

One of the hallmark features of any decent display system is to be able to allow the user to manipulate, tune, or replace the interface, event routing and event response as he sees fit.

In X, any client can achieve this effect, which is also one of its many problems – if one or many clients tries to manage windows you have the dubious pleasure of choosing between race conditions (which manifest as short intermittent graphics glitches in conflicted territories like decorations, or buffer-content tearing) or performance degradation in terms of increased latency and/or lowered throughput. In many cases, you actually get both.

Arcan keeps the Window Manager concept as such, but it is not treated as ‘just another client’ with access to the a same set of mechanisms as all the others. Instead, it act as a dominant force with access to a much stronger set of commands. Since the engine both knows and trusts the window manager, it can allow it to drive synchronisation and get around the synchronisation problems inherent to the X approach.

All example window management schemes that have been showed in the videos here over the years are implemented at this level and they have acted as driving forces for evolutionary design of the scripting API. Some of the high level scripts, mouse cursor management being one example, gets generalised and added to a shared set of opt-in ‘builtin’ scripts.

Because of these choice, most of the features that the individual schemes provide can be transplanted into other window managers at a much lower developer effort than what would be ever be possible in Xorg or any other current Wayland compositor for that matter.

Displays vs Connection Points

In X11, a client connects to a ‘display’ specified with the DISPLAY=[host]:[.screen] environment variable. This is an addressing scheme that maps to a domain or network socket through which you can connect to the server.

In Arcan, the environment variable is ARCAN_CONNPATH=”name” which also points to a domain socket (unless the connection is pre-authenticated and the primitives are inherited). The big difference is that the “name” is set by the Window Manager and it is consumed on use, meaning that when a client has connected through it, it disappears until it is reopened by the window manager.

This is a very powerful mechanism as it allows the Window Manager to treat clients differently based on what address they connect to. It means that the API can focus on mechanisms and not policy, and the Window Manager can select policy based on connection origins. Instead of just using the name as an address to a server, it is an address to a desktop location.

Trivial examples would be to have singleton connection points for an external status bar, launchers or for the desktop wallpaper – without having to pollute a protocol with window management policies and worse, modifying clients to enable what is principally window manager scheme dependent policy.

This mechanism has also been used to successfully implement load balancing and denial of service protection (something like while true; do connect_client &; done from a terminal should not be able to render the UI unresponsive or cause the system to crash or terminate).

Reading Window or Screen contents

An X client has a few options for reading the contents of a window, its own or others. This can be used to sample whole or parts of the display server output, but at a fairly notable cost. The actual means vary with the set of extensions that are available.

For instance, if GLX (the GL extensions to the X protocol) is available, the better tactic is to create an offscreen “redirected” window on the server side, bind that to a client accessible texture and draw/copy the intended source into it and then let the GPU sort out the transfer path.

In Arcan, the feature is much more nuanced. A client can request a subsegment of an ‘output’ type, but it has no explicit control over what this means more than “output intended for the client”. The same mechanism is used for the drop part in drag and drop, and the paste part in clipboard paste. The window manager explicitly specifies which sources and which transformations should be sampled and forwarded to the client. On the client side, the code can look like this.

This means that some clients may be provided full copies of the screen, or something else entirely, and the contents can be swapped out at will and at any time, transparently to the client. Since each segment can contain both audio and video, the same approach applies to audio. The philosophy is that the window manager provides user controls for defining selection and composition, and the clients provides processing of composited contents.

The window manager can also chose to force-push an output segment into a client as a means of saying “here’s output that the user decided you should have, do something with it”. The client gets the option to map it and starts reading, or just ignore and implicitly say “I don’t know, I don’t care”. A useful mental model of this is like normal ‘drag and drop’ but where the drag contents is user- defined, and the target is unaware of content origins.

Input and Input Injection

The input layer is a notoriously complicated part of X11. The specs for XInput2 and XKB alone are intimidating reads for sure, not including deprecated but present methods and so on. They also relate to one of the larger issues with X11, being the lack of WM mandated input coordinate space translation and scaling. You have access to zero race condition free ways of controlling who gets control over what input subsystem and when, which is a notable cause of all kinds of problems – but it also severely hinders the features a desktop can provide.

Not only is the input model complicated, but it is also limited. A lot of things has happened to input devices in terms of both speciality accessibility, gaming devices and hybrid systems like VR positioning and gesture input.

The philosophy in Arcan is to have the engine platform layer gather as much as it can from all input devices it has been given permission to use, and provide them to the window manager scripts with minimal processing in between – no acceleration or gesture detection. This is not always reasonable for performance reasons, which boils down to the sample rate of the device and the propagation cost for a sample. It is therefore possible to set rate limits or primitive filters as to not over-saturate event queues – but otherwise the input samples are as raw as possible.

This is done in order to defer decisions on what an input device action actually means. For that reason, there are default scripts the window manager can chose to attach to overlay gesture detection and similar higher level features – then translate and forward to a client.

For input injection, things are even more flexible. The “event model” in the arcan-shmif API match the one the engine uses to translate between the platform layer (where the input events from devices normally originate) and the scripting layer. Client event processing simply means that the event queues in the memory shared with each client gets multiplexed and translated onto a master queue, with a priority and saturation limiter.

A key difference however is that the events provided by a client gets masked and filtered before they are added to the master queue. The default mask is very restrictive, and the events associated with input devices are blocked. The scripting API, however, provides target flags, allowing this mask to be changed. Combine this with the pre-authenticated setup and other programs can act as input drivers in a safe way, yet still present UI components (such as on-screen keyboards) or configuration interface. This code example illustrates a trivial input injector which presses random keys on a random number of keyboards.

Displays, Density and Colour Management

X has the XRANDR extensions for exposing detailed information about output displays. This interface is also used for low level control of resolution, orientation, density and accelerated colour lookup tables.

For colour lookup tables, there are at least two big drawbacks.  The first is that any client can make a display useless by providing broken or bad tables. The second is that there is no coordination about which client knows what or in which order table transforms are to be applied or when – a game might need its brightness adjusted while an image viewer needs more advanced colour management. This will become more pronounced as we drift towards a larger variety in displays, where HDR will probably prove to be even more painful to get right than variations in density.

Exposing low level control of the other display parts comes with the problem that a lot of extra events and coordination is needed for the window manager to understand what is happening, and these solutions worked at a time when CRT displays were at their zenith and not as well now when their collective sun is setting.

Density is a related and complicated subject. A decent writeup of the interactions between X extensions and displays in the context of density can be found here: [Mixed DPI and the X Window System], and it is a better read than what can be covered here.

As for Arcan, there is an extended privileged setup of the SHMIF API where a client gets accessed to enumerating displays and their properties in an RANDR- like way. It can also be used to both retrieve and submit acceleration tables, but it ultimately requires that the window manager opt in and act as an intermediate in order to resolve conflict. Such tables are therefore tracked and managed on a per client level and not per display,.

For resolution and orientation control, see the section on ‘Management Interfaces‘ section further below, and for more details on Density, see the ‘Font Management‘ section. What is worth to mention here is that these properties are all handled ‘per client’ and the window manager is responsible to forward relevant information on a client basis. Thus, multiple clients can live on the same display but have different ideas on what the density and related parameters are.

Font Management

Font rendering and management is a strong candidate, if not a sure winner, for the murkiest part in all of X – though the input stack also come close (since the print server is principally dead). The better description that can be found is probably keithp’s “The Xft Font Library” but it also relies on quite some knowledge on the huge topic of text rendering and how it relates to network transparency and a historical perspective on printing protocols and printers as “very high on DPI, extremely low on refresh rate” kind of displays. The topic of X Font Management in the shape of XFS (the X Font Server) and XFLD (X logical font description) will be left in the history department on account on it practically being deprecated, but there is another aspect that is worth bringing up.

Text is one of the absolutely most important primitives being passed around. No doubt about that. Fonts are connected in order to transform text into something we can see.

This transform depends on the output display, such as its density, subpixel layout and orientation. It also requires user context sensitive information such as presentation language in order to provide proper shaping and substitutions.

As a short example of a ‘simple’ problem here: take subpixel hinting where the individual colour channels are biased in order to provide the appearance of higher text resolution. While it may look better locally, taking a ‘screenshot’ or ‘sharing’ such a window with a remote viewer has the opposite effect.

The division of responsibility for text processing is an intractable problem with no solution in sight that doesn’t have far reaching consequences and quality tradeoffs. The legacy compromise is to work in multiples of 90, 96 DPI or similar constants and then express a scale factor. For new designs, this should make every high school maths and physics teacher cringe in disgust.

In Arcan, a client is normally responsible for all text drawing while the window manager suggests some of the parameters. The way it is managed is that at the moment a connection is opened, the client immediately gets a set of initial properties that includes a set of fonts (primary, secondary etc.), preferred font size, output display properties and language.

These can be dynamically overridden later via a handful of events, primarily FONTHINT (size or hinting changes, transfer of new descriptors), DISPLAYHINT (preferred output size and target density) and GEOHINT (position and input-output languages).

As a twist, clients can switch a segment to use a different output format, TPACK. This format packages enough high level information (glyph, font index, formatting) in a line- or screen- oriented way – and the server side performs the actual font rendering.

Management Interfaces

This category is a bit more eccentric than the others in that for Xorg, a lot of ‘management’ behaviour can be exposed via clients and the quintessential tools that people associate with this is either part of RANDR or via xdotool which more falls into ‘creative use of XTEST and other game mechanics’. The randr- like management is covered in the section on displays.

Some of those roles are split up and covered in the other sections on display management and input injection. Due to the increased role of the window manager, most of the other automation controls have to go through it.

The compromise is that there are support scripts that can be pulled in by the window manager which expose a data model for exposing window management controls ‘as a file system’, along with a support FUSE (demo) driver that allows this file system to be mounted so that the normal command line tools can be used to discover, control and script higher level management in cooperation with the window manager – rather than in competition with it.

Client Contents and Decorations

This is a very infected subject, and it principally requires reaching an agreement  between the clients and the window manager as to the actual contents of client populated buffers. The client cannot do anything to stop the window manager from decorating its windows, and the window manager cannot stop the client from rendering decorations into the buffer used as canvas. The window manager only has a slight advantage as it can forcibly crop the buffer during composition and translate input coordinates accordingly, as shown in this (demo).

Thus, the client need to know if it should, or should not, draw titlebar, border and so on. This decision has far reaching consequences as to the information that needs to be conveyed, understood and synchronised between the window manager and the clients. For instance, titlebar buttons drawn on the client side need to reflect server side window state, and suddenly needs information if it is in a ‘minimised’, ‘maximised’ or some other state.

It is further confused by the part that technically, what many refer to as ‘server side’ decorations are not exactly ‘server side’ – the predominant X solution is that one or even multiple clients (window manager and compositor as possibly separate clients) decorate others. The big problem with this approach, other than it becomes almost uniquely complicated, is synchronisation. The decorations need to reflect the state and size of the decorated client and the speed at which you can do this is limited to the combined round-trip time for the one doing the decorating and the one being decorated.

The principal standpoint in Arcan is that the server is allowed to be complicated to write as it will only be written once, and the clients should be as trivial as possible. The ‘annotations’ that decorations provide should be entirely up to the window manager.

Since the WM is an entirely different and privileged thing to a client here, we can have ‘true’ server side decorations – with the typical approach being to just synthesise them on demand as a shader during the composition stage at a fraction of the cost, even for fancy things like drop shadows.

The ‘compromise’ is that clients can explicitly either indicate the ‘look’ of a cursor from a preset W3C derived list, or request subsegments of a ‘cursor’ and/or a ‘titlebar’ type. As the case is with all secondary allocations these requests are default-deny and window manager opt-in.

Clipboard

The clipboard is traditionally one of the most strange forms of IPC around, with the implementation in Windows arguably being the biggest piece of “WTF” around, tailgated by X. The hallmark “feature” is that you can’t be certain of whatever it is that you “paste” – its origins and its destination, as the interface is inherently prone to race conditions.

The venerable ICCCM defines three ‘selection buffers’: PRIMARY, SECONDARY and CLIPBOARD, with the real difference being the inputs these are bound to (ctrl+c, ctrl+v or menus for CLIPBOARD vs mouse actions for PRIMARY). External clients, clipboard managers, can be used to add / monitor these buffers. There is also the option for clients to negotiate content format by exchanging list of supported formats, and a lengthy protocol for Drag and Drop.

In Arcan, there is no abstract singleton ‘clipboard’ or ‘drag and drop as such’, but rather two types of segments: CLIPBOARD (client to server) and CLIPBOARD_PASTE (server to client). If a client wish to provide clipboard information, it can request a subsegment of the CLIPBOARD type, and if a paste operation occurs, the server force-pushes a CLIPBOARD_PASTE subsegment into the client. These will all accept both audio, video, short messages and file descriptor transfer routing.

This allows for an infinite number of “clipboards”, and how these are shared between different clients is determined by the window manager. Thus, it can be completely configurable and safe to allow one client to act as a clipboard manager or segment into trusted and non-trusted groups for that matter. Example code of a simple clipboard manager can be found here.

It can be set to directly copy one client clipboard with another client clipboard-paste with the window manager intercepting and modifying everything, or to just act as a file descriptor router, you pick your own poison.

Synchronisation

While it might be a bit confusing to think of ‘synchronisation’ as a feature, there are a few things to consider that might make it a bit more clear. Modern system graphics is parallelised and asynchronous to a degree that cannot really be underestimated, and then hits like a brick where you fail to think things through. The task is not made any easier by the fact that most of the parts in the chain that needs synchronising are unreliable, limited or just plain broken.

A fantastic explanation of the absolute basics in this regard can be found in this article by Jasper st. Pierre et. all: XPLAIN. Then follow the developments in this regard to the PRESENT extension and check up on the XSYNC extension. I will not attempt to summarise those here and instead go with a model of the problem itself.

There are three big areas we should consider for synchronisation:

The first area is synchronisation to the displays.

Historically, display synchronisation was ‘easy’ in that you could literally have an interrupt handler that told you when the display wanted new contents and how much time you had left to produce it. This became difficult when you hooked up multiple displays that do not necessarily share the same refresh rate or refresh method.

As an example, when a 60Hz display cooperating with a 50Hz in trying to share contents, you’d likely prefer tearing to the latency of using the greatest common divider (10Hz) as the shared synch period.

Xorg tries its hardest to meet the specific deadlines of the displays that are connected. This is a necessity when trying to ‘race the beam’, down to counting microseconds. This was an admirable approach at the time of its conception, and not so much today.

The second area is the window manager.

The window manager needs to react when its UI components are being manipulated. It needs to relayout / redecorate accordingly, and be ready in time for composition and scanout (synch to display). This can be hard to impossible to strike with both accuracy and precision, with many external factors introducing jitter and delays. This is an area where animation quality tend to highlight how well the solution works as a whole, and actually a good reason to have them for troubleshooting, regardless if they are enabled in production or not.

The third area is synchronising to clients themselves.

A common, if not the most common, kind of client is interactive. This means that it mainly updates or produces new contents in response to the input that it receives. Thus the state of the input samples that are forwarded needs to be fresh or latency will suffer. At the same time, the client need to decide when to stop waiting for input and when to use that input to synthesise the output. The state itself might need client local coordinate transforms which might depend on surface dimensions. The source device might be a high sample-rate mouse device emitting 1khz of samples to a terminal emulator capable of handling a fraction of that. The list goes on and ends up in a web of judgement calls that balances latency to cpu utilisation.

Now, how does Arcan approach the range of synchronisation problems then? Well, on a number of levels.

High level: There is a runtime changeable ‘strategy’ that can be set to favour some property like input latency; animation smoothness; or energy conservation. This is a user choice that gets applied regardless of what the window manager itself decides.

Mid-level: The window manager has controls to set a ‘synchronisation target’ that act as input to the high level strategy. The currently selected window being a good target. This means that the synchronisation target gets favourable treatment.

Low-level: A single client can be ‘mapped’ to an output, side-stepping window management in its entirety. The window manager can also force a synchronisation mode and even manually clock and suggest or enforce the deadline per client.

API-level: A client has some movement range itself in that it can cancel current synchronisation attempts (at the cost of possible tearing), bind itself to ‘as early as possible’, against a dynamic deadline or a ‘triple-buffer’ like strategy where the latest complete frame is the one that gets synched out.

Phew. Needlessly to say, there is a lot more that can be said on the subject while still barely scratching the surface. The main missing pieces — If you have not guessed it by now — are mainly multi-level network transparency and drawing command-stream buffer formats. Then we can focus on what is in Arcan but not in Xorg, features like 3D compositing, VR, audio support, live migration, server recovery, multi-GPU rendering and so on.

Posted in Uncategorized | 17 Comments

Arcan 0.5.5, Durden 0.5

We’ve accumulated enough features and fixes that it is time for yet another release. The beefier changelogs can be found here: (Arcan) and here: (Durden). Starting with the following video which consolidates the visible changes to (indirectly) Arcan and its reference desktop environment, Durden.

Among the many release highlights is the improved OpenBSD support and that we are now packaged in void linux (wiki entry), the premier Linux distribution for those of us who would much prefer a user space free of the more incessant “freedesktop.org” impositions. With the exception of the recently added ‘Safespaces‘ VR desktop project, and its associated hardware device driver bridges, most other key components and supplementary tools are now packaged and ready to be enjoyed inside the void. The Arcan package has been synched to match 0.5.5 and Durden 0.5 will come in a day or two.

In terms of future steps:

The Arcan 0.5 focus with its focus on graphics related subsystems will likely wind down towards the end of the year and put more attention towards advanced network support (that is, not RDP/VNC nonsense), being the key part we lack before comfortably being able to claim to be way beyond feature parity with Xorg.

Durden has about one or two big release more with a feature focus in it (particularly for touch displays, tablets and styluses), then we will switch over to improving the ones we already have and polish away the ‘developer graphics’ aesthetics and just sand down the edges in general.

After this release, the respective git repositories will be managed a bit differently from the ‘everything goes on master’ approach from before. After each ‘big’ release such as this one, a set of key features per subsystem gets picked from ‘the infinitely big pile’ and gets added as a branch with a corresponding (VERSION-SUBSYS-NOTES) file in the root covering a rough breakdown of the changes to be made before the branch is complete.

When a branch has been completed, it gets cleaned up, rebased, squashed and merged unto master – prompting a new version bump (a.b.c.d+1) until there are no more such branches, prompting a new blog post /video and a (a.b.c+1) versioned release.

Thus, if you are monitoring the git – be forewarned that activity on master will only come in bursts and that the topic branches will be the new way to go.

You can see the current ones here: ARCAN and DURDEN. Now, onto the highlighted feature walkthrough.

Browser Improvement: Video Previews

The built-in command-line interface resource browser has been extended with the option to allow known video formats to be automatically previewed when selected or when they pop into view. More detailed controls over preview triggers have also been added.

Relevant menu paths: /global/settings/browser

The next steps for this feature is to add ‘set actions’ for the filtered set, like open as playlist. We are also investigating letting external clients provide previews (based on priority and level of trust), as well as open to client, letting the browser act as a universal file picker.

New Feature: Menu system is mountable

All UI interaction and configuration can now be accessed ‘as a file’. Combined with the added ‘arcan_cfgfs’ (arcan:src/tools/acfgfs) FUSE file system tool, the entire menu system is now mountable – albeit  still a bit slow:

This opens up for a number of testing / automation features, and should also make it easier to discover all the available features when you can basically find and grep. It is implemented over the ‘control’ domain socket. Enable it with the the menu path:

/global/settings/system/control=i_am_in_control

and a socket with the name ‘i_am_in_control’ will be created in the durden/ipc path in the appl_temp filesystem namespace (depending on how you run it, for a local build it might be the same as /path/to/durden, $XDG_DATA_HOME/arcan/appl-out/durden/ipc or $HOME/.arcan/appl-out/durden/ipc on some distributions). You can also access the socket manually or via other/custom tools (fastest approach) than FUSE:

socat - unix-client:/path/to/i_am_in_control

Here you should be able to issue commands like:

 ls /
read /global/settings/visual/mouse_scale
exec /browse
write /global/settings/visual/mouse_scale=2

This also opens up for runtime conversion / application of configuration files from other WM systems, assisting with workflow migration.

It is also possible to monitor events from various desktop subsystems:

monitor wm notification display timers

(current groups being ‘notification’, ‘wm’, ‘display’, ‘connection’, ‘input’, ‘dispatch’, ‘wayland’, ‘ipc’, ‘timers’, ‘clipboard’, ‘clients’)

Combining the monitor feature with the ‘everything is a file’ approach should be complete enough to allow the conversion of old xdotool- like hacks, or, being brave enough, translate the relevant xlib subsets to provide support for running Xorg WMs.

New Tool: VRViewer

It is no coincidence that this tool looks eerily similar to the VR Desktop (Safespaces) – it is mostly the exact same code base, with small parts changed to integrate into the desktop.

The primary uses for this tool are both for developing and testing safespaces without having to enable/disable a HMD every so often, but also for data viewing use cases where you might have stereoscopic and/or geometry projected videos or photos that you want to look at quickly.

Relevant menu paths: /global/tools/vrviewer

The video also shows “drag-and-drop” like cursor-tagging, integrated into the tool to allow windows to migrate or clone into the VR tool.

Advanced Float improvements:

The ‘advanced float’ mode is something of a more hidden tool that slowly but surely aggregates all the features that your rodent friendly DE would provide. Two new features this time: a reveal / hide mode to the autolayouter, and a Window-to-Wallpaper feature that can forward input when no native window has input focus. Below, you can see an instance of Xarcan being used as the ‘wallpaper’. The gridfit- tool that allows compiz- like pseudo-tiling has also received a popup- mode for a pointer device friendly quickpath.

Relevant menu paths: /target/window/workspace_background

The final coming stretch for this part of Durden should be binding menu paths to background icons, and allow window minimization targets to be statusbar, tray or desktop.

New Feature: Decoration Impostors

As a follow up to the argument on client side decorations, we now allow per-client custom cropping of [t | l | d | r] pixels, and the option to define a titlebar impostor that can be toggled on and off. In the video you see gtk3-demo under wayland getting its incessant border and shadow removed, and the “fattyfingerbar” becoming a custom toggle in order to not lose access to client controls. There are still a number of edge cases to work out, but the basic concept is working.

Relevant menu paths: /target/window/crop, /target/window/titlebar/impostor

Speaking of decorations, it is now possible to bind arbitrary menu paths to titlebar buttons in UI (previous method was only by patching autostart.lua and worked globally), with unique per window overrides or mode-specific sets that follow the current workspace layout mode. Workspaces in float mode can also have a different border thickness and border area than the other modes.

The next steps for this feature is automatic detection, integration with the ‘viewport’ protocol, to allow the selected window titlebar to ‘merge’ into the statusbar center-area, to relayout it to be attached to the sides of the window, and to provide an ‘external impostor’ for the statusbar (which is just a special version of the titlebar, same UI code) via a connection point.

New Widget – notifications:

When various subsystems or clients have anything important to say, the message is added to a queue that will be flushed whenever the HUD is interactively toggled. This also occurs on certain important events, like the error triggering crash recovery or abnormal client termination. You can also attach notifications of your own via the IPC system mentioned earlier.

Relevant menu paths: /global/settings/notifications/* (with send=… to emit)

The video clip showed a test client that shows a single colour window for a brief moment, then shuts down due to some “problem” but just before the connection is severed, the client can attach a short ‘last_words’ message to provide a user-readable message explaining why it terminated abnormally. This is added as a notification.

Think of it as the UI friendly equivalent of program exit codes. Since crash recovery in the WM side has gotten seamless enough that you often don’t even notice it happened, the message also carries over as a notification which is also shown in the video.

 

Posted in Uncategorized | 3 Comments

Revisiting the Arcan Project

Two years has passed since the public presentation where Arcan and its ecosystem of side projects and tools started to creep out of the woodworks; although the project had been alive and worked on for well-over a decade at that point. A lot of things have evolved and developed in a positive direction since the time of that presentation, and this post aims to provide an overview of those changes.

But before that, the ‘elevator- pitch’ to the modern tweet-length attention spanned reader goes as follows:

Arcan is a curious blend of a streaming multimedia processor, game engine and display server with a novel design that lends itself well for both complex and simple realtime interactive graphics projects alike, and goes well with anything from Sci-Fi UIs for some homegrown embedded project to full blown desktops. It is highly modular, low on dependencies, yet comes with all necessary batteries included.

The corresponding pitch for the related project Durden is as follows:

Durden is a desktop environment dedicated to- and designed first and foremost for- hackers and professional users. As such it is keyboard dominant, building a better and more efficient CLI than the flaccid terminal emulators of yore. The strategy is to amass as many relevant features and workflows as possible, then leave them turned off yet within runtime reach – with an ‘everything is a file’ approach to control, configuration and automation.

With these reintroductions in place, let us dig through the major developments to the plot:

Stream processing
Crash Recovery and Crash Resilience
VR support
Plan9/UI Security experiments
LEDs as a first class citizen
TUI: Text-based User Interfaces
X and Wayland support
BSD ports

Stream processing

In order to further stress the point that the processing model, engine design and developer APIs are much more generic and flexible than the mere ‘Display Server’ archetype would permit or warrant, the different client- and output- connection mechanisms have all been extended for easier composition with itself. The simply means that multiple arcan instances can chain into each-other, each with a different set of processing scripts.

This feature makes it possible to interactively (with user input) stream-process multimedia (both audio and video), just like we have been able to do with text for oh so long. In this way, the feature also becomes a building block for more feature rich and dynamic media processing and composition than is provided by the likes of the venerable OBS-Studio. For more on this, see: “AWK” for multimedia.

VR support

One might be tempted to think “supporting VR” is simply stereoscopic rendering with a barrel distortion post-process and camera controls tied to a head attached IMU sensor, but that is just the entry level test.

The real thing, unfortunately, requires careful contents planning, load balancing, camera processing, video overlays, positional audio and so on – managing a virtual zoo of complex input devices that makes the nightmares of touchscreen management look like mere push buttons by comparison.

A lot of the basic controls for this have been added to Arcan and many more are to come during the year. The process was recently covered and exemplified with the post Safespaces: An Open Source VR Desktop. It shows how the building blocks are coming together for usable VR (and 3D) desktop workflows.

Crash Recovery and Crash Resilience

This part alone pushed the timeline back a year or two from the earlier roadmap, but was well worth it. An almost unreasonable amount of work has been done in order to make sure that there is incrementally stronger separation between the user interface (window manager), the display server and its clients, and that each of these should be able to recover if one or even two of the other parts would fail.

Clients can detach when they detect that the display server connection is severed, and enter a sleep-reconnect state or switch to a different display server and rebuild itself there, and for Wayland clients, this is done unbeknownst to the client itself. This is one substantial building block towards getting ‘better than X’ network transparency, an upcoming focus target.

For those that are interested, additional technical detail is presented in the article “Crash-Resilient Wayland Compositing” and this is an area where we are close to surpassing the strongest competitor in the field (Windows).

TUI : Text-based User Interfaces

The lessons learnt from writing the built in terminal emulator led to the start of development towards a new API with a number of advanced goals – the main of which is to show that the terminal protocols are ghastly display server protocols in disguise. The associated article: The Dawn of a new Command Line Interface elaborates on this in greater detail – although it is past due for an update.

The condensed point is that the isolated world that APIs (today, ncurses) and related shell (your CLI window manager, e.g. bash, zsh, …) live in due to the terminal protocol heritage radically lowers the quality of the unix command line and doubly so from the schism it has towards the graphical desktop. Arcan is closing in on a way to bridge that gap.

Plan9/UI Security Experiments

The article ‘One night in Rio – Vacation photos from Plan9′ showed how the design and API could be used to expand on a core interesting feature from the window management scheme used in Plan9. This was achieved in two stages:

First stage was providing hierarchical connection tracking and compartmentation, forcing clients spawned from a specific terminal to be tied to tabs belonging to the same window the terminal has, with no means for breaking free.

Second stage was using the connection point namespace (how an external clients finds the server) to force per-connection unique window management policies to account for ‘special needs’ clients/roles like external wallpapers, ‘rofi’ like overlay hud, statusbar etc. without any modifications to client or API.

On top of that, clients gets universally split into trust domains based on the origins of their connection and identity tokens, selectively allowing rate limiting and different permission sets on external connections and their sub-windows.

LEDs as a first class Citizen

Historically, LED output requirements have been modest from a display server perspective – boiling down to possible indicator lights on keyboards. The current reality is that there are many more outputs than the display today to control for ergonomics and overall user experience.

In the post “Playing with LEDs” covers how the once humble subsystem that dealt with external LED controllers for custom projects has been extended to controls for display backlight, RGB keyboards and mice, keyboard backlight and so on.

X and Wayland Support

Compatibility with various external clients have been treated as a lower priority than all the other tasks for a number of reasons that warrants a lengthier, dedicated post. Two years ago the only real compatibility options were as VMs via a patched QEmu backend.

To make this situation more tolerable, a separate Xorg backend, Xarcan was added after it was deemed that was actually much less work than adding support for XWayland. This is covered in the post “Dating my X“.

Wayland support was also added by taking a cue from the world of microkernels and implementing it as a translation service. This approach turned out quite well as a means of compartmentalising, sandboxing and least-privilege separating each wayland client at the protocol level, and the translation service will likely be extended to shield of / contain the radiation from more of certain “free”-desktop projects.

BSD ports

With some minor build system coaxing and input system modifications, most of Arcan can now be built and run on the various BSDs. The tougher nut to crack in this regard was unsurprisingly OpenBSD, With the article ‘Towards Secure System Graphics: Arcan and OpenBSD‘ showing all the little steps and considerations that had to be taken, and what is left to be done.

Though I would normally like to end a post like this with some tepid visions towards the coming two years and the project future in general, this time I will just restrain myself and hint that it is time to get considerably more hardcore.

Posted in Uncategorized | 4 Comments

Towards Secure System Graphics: Arcan and OpenBSD

Let me preface this by saying that this is a (very) long and medium-rare technical article about the security considerations and minutiae of porting (most of) the Arcan ecosystem to work under OpenBSD. The main point of this article is not so much flirting with the OpenBSD crowd or adding further noise to software engineering topics, but to go through the special considerations that had to be taken, as notes to anyone else that decides to go down this overgrown and lonesome trail, or are curious about some less than obvious differences between how these things “work” on Linux vs. other parts of the world.

A disclaimer is also that most of this have been discovered by experimentation and combining bits and pieces scattered in everything from Xorg code to man pages, there may be smarter ways to solve some of the problems mentioned – this is just the best I could find within the time allotted. I’d be happy to be corrected, in patch/pull request form that is 😉

Each section will start with a short rant-like explanation of how it works in Linux, and what the translation to OpenBSD involved or, in the cases that are still partly or fully missing, will require. The topics that will be covered this time are:

Motivation

One of the many lofty goals behind Arcan has been to not only reduce system-wide desktop complexity, to push the envelope in terms of system graphics features, quality and performance – but also to enable experimentation with security sensitive workflows and interaction schemes, with the One Night in Rio: Vacation Photos from Plan9 article covering one such experiment. Working off the classic Confidentiality- Integrity- Availability- information security staple, the article on Crash-Resilient Wayland Compositing also showed the effort being put into the Availability aspect.

The bigger picture of how much this actually entails will be saved for a different article, but the running notes on the current state of things, as well as higher level experiments are kept in [Engine Security] and [Durden-Security/Safety].

Outside of attack surface reduction, exploit mitigations and safety features, there is a more mundane, yet important aspect of security – and that is of software quality itself. This is an area where it is easy to succumb to fad language “fanboyism” or to drift in the Software Homeopathy direction of some bastardised form of code autism, such as counting the number of lines of code per function, indentation style, comment:code ratio and many other ridiculous metrics.

While there are many parts to reaching a lofty goal of supposed ‘quality software’, the one that is the implicit target of this post is the value of a portable codebase as a way of weening out bugs (where #ifdef hell or ‘stick it in a VM masked as a language runtime’ does not count).

The road to- and value of- a portable codebase is built on the ability to swap out system integration layers as a way of questioning the existing codebase about hidden assumptions and reliance on non-standard, non-portable and possibly non-robust behaviours and interfaces.

This does not always have to be on the OS level, but in other layers in the stack as well. For instance, in the domain that Arcan targets, the OpenGL implementation can be switched out for one that uses GLES, though their respective feature sets manifest many painful yet subtle differences.

Similarly, the accelerated buffer sharing mechanism can be swapped between DMA-Buf and EGLStreams. Low-level DRI and evdev based access for graphics and input can be swapped out for high-level LibSDL based. Engine builds are tested with multiple libc:s to avoid “glibc-rot” and so on. Some of these cases are not without friction, and even painful at times, but the net contribution of being able to swap out key culprits as a means of finding and resolving bugs have, in my experience, been a positive one.

Arcan has been ported to quite a few platforms by now – even if not all of them are publicly accessible; there are certain ecosystems (e.g. android, iOS) we have very little interest in publicly supporting in a normal way. The latest addition to the list of actually supported platforms is OpenBSD, added about a year after we had support for FreeBSD.  So, enough with the high-level banter, time to dive into the technical meat of the matter.

Graphics Device Access Differences

Recall that in the BSDs, most of the DRM+GBM+Mesa (DRI) stack (or as Ilja van Sprundel poignantly puts it – the shit sandwich) is mostly lifted straight from Linux, hence why some of the interfaces are decidedly “non-BSD” in their look and feel.

Device Nodes and drmMaster

A normal (e)udev setup provides device nodes on linux as /dev/dri/Card + RenderD. In OpenBSD, these nodes are located at /dev/drm, but merely changing the corresponding ‘open’ calls won’t yield much of a success, unless you are running as root. The reason is a piece of ugly known as drmMaster.

As with most things linux these days, if something previously had permissions – there is now likely both a permission, some hidden additional permission layer and some kind of service manager that shift things around if you ever get too comfortable. Something that once looked like a file actually has the characteristics and behaviour of a file system; file permissions and ownership gets “extended” with hidden attributes, capabilities, and so on.

This is the case here as well. A user may well have R/W access to the device node, but only a user with CAP_SYS_ROOT/ADMIN is allowed to become “drmMaster”, a singleton role that determines who is allowed to configure display outputs (modeset) or send data (scanout) to a display, and who that is allowed to only use the device node to access certain buffers and setup accelerated graphics. As a note, this will get more complicated with the ongoing addition of ‘leases’ – tokenised outsourced slicing of display outputs connectors.

In Linux, there are about 2.5 solutions to this problem. The trivial one is weston-launch, have it installed suid and there you go, along with all of the problems of suid and hard coded paths. The second involves an unholy union between Logind, D-Bus, udev and the ‘session’ and ‘seat’ abstractions that ultimately leads back to who is actually on the active VT, as the TTY driver layer is never going away (insert facepalm picture here) .

The last .5 is the one we have been using up until recently, and involves taking advantage of a corner case in that you do not have to set drmMaster – if no-one else has tried to; buffer scanout and modeset actually work anyway, just ignore the failed call. Set normal user permissions on the card node and there we go. This has the risk that an untrusted process running as the same user can do the same and break display state, which leads us to a case of “avoid mixing trust domains within the same uid/gid”. If you really desire to run software that you don’t trust, well, give such miscreants another uid/gid that has permission on the ‘RenderD’ node for privilege separation and the problem would be solved. Let multi-user on desktop unix just die.

Alas, neither the Master-less scanout nor the render node feature currently exist (#ifdef Linux removed) on OpenBSD, leaving the no-option of running as root or the remaining one of doing what weston-launch did, but in a saner way. This means taking an improved cue from Xenocara. Make the binary suid, and the first thing you do, no argument parsing, no hidden __constructor__ nonsense, just setup a socketpair, fork and drop privileges. Implement a trivial open/close protocol over the socketpair, and have a whitelist of permitted devices. (Fixed in OpenBSD 6.6)

Since we need dynamic and reactive access to a number of other devices anyhow, this tactic works well enough for that as well. There is a caveat here in that the user running the thing can successfully send kill -KILL to the supposedly root process, breaking certain operations like VT switching (see the Missing section).

Input

One of the absolutely murkiest corner in any user – kernel space interface has to be that of input devices: keyboards, trackpads, joysticks, mice and so on. This is one of the rare few areas where I can sit and skim Android documentation and silently whisper “if only” before going back to alternating between screaming into a pillow and tracing through an ever expanding tree of “works 100% – 95% of the time” translation tables, where each new branch manages to be just a tad bit dumber than the ones that came before it.

Starting in Linux land again, first open the documentation. Now, you can close it again – it won’t be useful – I just wanted you to get a glimpse of its horrors. The only interface that can really be used, is evdev, written in cursive just to indicate that it should be pronounced with a hiss. It is a black lump of code that will never turn into a diamond. Instead of being replaced, it gets an ever increasing pile of “helper” libraries (libevdev, libudev, libinput, …) that tries to hide how ugly this layer is. This much resembles a transition towards the Windows approach of ‘kernel interfaces as library code’ rather than comparably clean and observable system calls. The reason for this pattern is, I assume, that no one in their right mind wants to actually work on such a thing within the Linux kernel ecosystem, if it can at all be avoided.

For OpenBSD, the better source of information is probably the PDF (Input Handling in wscons and X) along with the related manpages (yes, there are man pages) on wscons, uhid, wsmouse, wskbd. For the most part, these are obvious and simple, with the caveat that most of the low level controls are left to other tools, or that they work as multiplexer abstract devices without any ‘stream identifier’ for demultiplexation. Then there’s the case of the keyboard layout format. The underlying “problem”, depending on your point of view, is the legacy that exist in both Linux and BSDs in that tradition has meant going from BIOS -> boot loader(s) -> “text” console -> (display manager) -> Xorg and we at least need the option of being able to boot straight into UI.

Hotplug

There are two sides to the hotplug coin: input devices and output devices. The normal developer facing Linux strategy uses udev, the source code of which is recommended reading for anyone still looking for more reasons to get away from linux systems programming. If you want to have extra fun, take a look at how udev distinguishes a keyboard from a joystick.

Udev is an abomination that has been on my blacklist for a number of years. It has all the hallmark traits of a parasitic dependency and is a hard one to ignore. For input devices, the choice was simply to go with inotify on a user provided folder (default, /dev/input), and let whatever automatic or manual process he has populating that folder with the device nodes he thinks that Arcan should try and use.

For output, one might be easily fooled to think that the card device node that is used for everything (almost, see the backlight section) that relate to displays like mode setting, buffer scanout and synch triggers, should also be used to detect when a display is added or removed to this card. Welcome to Linux. How about, instead, we send an event with data over a socket (Netlink) in a comically shitty format, to a daemon (udev) which maps to a path in a filesystem (sysfs best pronounced as Sisyphus) which gets scraped and correlated to a database and then broadcasted to a set of listeners over D-Bus. Oh goodie. One might be so bold to call this an odd design, given that all we need is a signal to trigger a rescan of the CRTCs (“Cathode Ray Tube Controllers” for a fun legacy word) as the related API basically forces us to do our own tracking of connector card set deltas anyhow.

In OpenBSD (>= 6.3) you add the file descriptor of the card device node to a kqueue (❤️) set and wait for it to signal.

Backlight

Backlight is typically reserved for the laptop use case where it is an important one, as not only can modern screens be murdering bright, but also bat away at your battery. This subsystem is actually quite involved in Arcan as it maps as yet another LED controller, though one that gets paired to ‘display added’ events. Hence why we can do things like in  this (youtube video).

The approach we used for Linux is lifted from libbacklight before it became a part of the katamari damacy game that is systemd. Again one might be tempted to think that the backlight control could somehow be tied to the display that it regulates the brightness of, but that is not how this works. Just like with hotplug, there’s scraping around in sysfs.

In OpenBSD, you send an ioctl to your wscons handle.

Xorg

As touched upon in the section on graphics device access, OpenBSD maintains its own Xorg fork known as Xenocara in order to maintain their specific brand of privilege separation. Arcan meanwhile maintains its own Xarcan fork to get more flexibility and security options than what XWayland can offer, at the cost of legacy application “transparency” (though that will be introduced via the Wayland bridge at some point).

The only real difference here right now, in addition to the much superior DRI3(000) rendering and data passing model is not working (see the Missing section further below) – was adding translation tables to go from SDL1.2 keysyms (the unfortunate format that we have to work with, no one, especially not me, is without sin here).

Pledging

A large part of the work behind Arcan has been to resection the entire desktop side of user space to permit much more fine grained privilege separation, particularly when it comes to parsers working on untrusted data – but also for server to client sharing modes. Thus we split out as much of the data transformations and specialised data providers as possible into their own processes – and the same goes for data parsing.

One of the reasons for this is, of course, that these are error prone tasks that should not be allowed to damage the rest of the system in any way, segmenting these out and monitoring for crashes is a decent passive way of finding new less than exciting vulnerabilities in FFmpeg, something that still happens much too often some ten years later.

The other goal is, of course, to be able to apply more precise sandboxing as one process gets assigned one task, like a good Mr. Meeseeks. An indirect beneficial side effect to this is that we get a context for experimenting with the different sandboxing experiences that the various OSes provide, such as the case of the supplementary tool ‘aloadimage‘ (a quite paranoid version of xloadimage).

Missing

The OpenBSD port of DRI currently lacks two important features – render nodes and DMA buffers. This means that we have no sane way of sending accelerated buffers between a client using accelerated graphics and Arcan. This breaks the nested ‘arcan-in-arcan’ mode with the arcan_lwa binary, it breaks wayland support and it breaks Xarcan supporting DRI3 as these all rely on that feature. (Added in OpenBSD 6.5)

While it is technically possibly to explicitly enable older style GPU sharing (durden exposes these as config/system/GPU delegation) where clients to be trusted authenticate via a challenge response scheme to the “drm master” then gets the keys to the kingdom, it is far from a recommended way as this puts the client on just about the same privilege terms as the display server itself. That being said, it is murderously foolish to think that a client who is allowed any form of raw GPU access these days doesn’t have multiple ways of privilege escalation waiting to be discovered; the order of priority is getting GPUs to do what they are told without crashing the system, then to account for clients that rely on unspecified or buggy behaviour and maybe to get them to perform under these conditions. Security is a distant dot somewhere on the horizon.

Last and incidentally, my least favourite part of the entire ecosystem, “libwayland-server” still lacks a working port – which comes as no surprise as this asynch race condition packed, faux-vtable loving, use-after free factory is littered with no-nos; the least of which being its pointless reliance on ‘epoll‘. Funny how they practically reimagined Microsoft COM and managed to make it worse.

Posted in Uncategorized | 2 Comments