This is a technical description of the IPC system used throughout the Arcan project, from both a designer and developer perspective, with annotations on legacy and considerations along the way. It’s one of a few gems inside of the Arcan ecosystem, and thousands of hours have gone into it alone.
The write-up assumes a basic computer science background, while the sections prefixed with ‘comment’ are more advanced.
History
SHMIF, or “SHared Memory InterFace” has a long history, dating back to around 2007. It was first added to cover the need to least-privilege separate all parsing of untrusted data in the main engine, simply because the ffmpeg libraries couldn’t stop corrupting memory – generally a bad thing and we had more than enough of that from GPU drivers doing their part.
With parsers sandboxed, it evolved to also work as a linker interposed- or injected shellcode- way of manipulating 3rd party audio/video processing and event loop without getting caught doing so. Rumour has it that it was once used to automate a lot of the tedium in games such as World of Warcraft, and was not caught doing so.
It was written to be portable across many operating systems. The initial version ran on both Windows, OSX, BSDs and Linux. There were also non-public versions that ran on Android and iOS. These days the focus remains on BSDs and Linux, with the networking version of the same model, “A12”, intended to retain compatibility with the others.
Its design is based on lessons learned from emulating arcade games of yore, as they represent the most varied and complex display systems to date. The data model evolved from increasingly complex experiments, up to- and beyond- the point of painstakingly going through every single dispatch function in X11 to guarantee that we did not miss anything. The safety and recovery aspects come from many lessons learned breaking and fixing control systems for power grids. The debugging and performance choices came from working on a last-resort debugging tiger team on (mainly) Android.
Layout
There is a shared memory region, and a set of OS specific primitives to account for inadequacies in how various kernels expose controls over memory allocation and use. Combined we refer to these as a segment. The first one established is referred to as the ‘primary’ and it is the only one that is guaranteed on a successful connection. Additional ones are negotiable, and the default is to reject any new allocation. This decision is ultimately left to the window management policy.
Comment: In this form, there is no mechanism for a client to influence allocations in the server end. Further compromises are possible (as we come to later) in order to gain more features, but for a hardened setup, this is one out of several ways we reduce the options for exploiting any vulnerabilities or staging a denial of service attack.
The shared memory is split into a fixed static region and a dynamic one.
The following figure shows the rough contents of these regions:

The order of the fields in the static region is organic, it has been extended over time. To avoid breaking compatibility, changes have been appended as more metadata was needed. The region marked ‘aux’ is 0- sized by default; it is only used for negotiating advanced features e.g. HDR metadata and VR device support.
Some of the more relevant and non-obvious members of the static regions are:
- DMS – Dead Man’s Switch. If it is ever modified the segment is considered dead. After this point no modifications to the page will be processed by the other side. (See ‘Safety Measures’ section).
- Verification Cookie. This is a checksum comprised of calculating the offsets and values of other members in the region. Both sides periodically calculate and compare this value to detect version mismatches or corruption.
- Inbound/Outbound event buffers. – These are fixed slot ring buffers of 128b events. They can be thought of as asynchronous ‘system’ calls (See ‘Event Processing’ section).
- Segment Token. A unique identifier for this specific segment. This can be used by the client end to reference other events if the identifier has been shared by some other mechanism. The ‘VIEWPORT’ event, for instance, instructs window management for repositioning or embedding segments owned by other clients or threads.
The entire memory region is treated as an unsafe contested area; one side populates it with changes it wants to see done and through some synchronisation trigger, and the other side verifies and applies or rejects them.
Comment: For debugging and inspection, this means a single snapshot of the mapped memory range is sufficient to inspect the state of the connection and trivial to write analysis, fuzzing and reporting tools for.
The raw layout is not necessarily exposed to the consumer of the corresponding library. Instead a context structure (struct arcan_shmif_cont) contains the developer-relevant pointers to the corresponding subregions.
Comment: While the implementations for this interface live in userspace, the design intent was to be able to have the server end live completely in a kernel, and have this act as the sole system call interface.
Each segment has a type that is transferred once from the client to the server during the REGISTER event (or when requesting a new one through a SEGREQ event). This is mainly a window management and server hint to control event response, but also determines if video and audio buffers are for (default) client to server or (screen recording and similar features) server to client.
First Connection (Client)
The client end comes as a library, libarcan-shmif. The rough skeleton that we will unpack here looks like this.
#include <arcan_shmif.h>
int main(int argc, char **argv)
{
struct arg_arr args;
struct arcan_shmif_cont C =
arcan_shmif_open(SEGID_APPLICATION,
SHMIF_ACQUIRE_FATALFAIL,
&args);
struct arcan_shmif_initial *config;
arcan_shmif_initial(&C, &config);
/* send audio/video */
/* event processing */
arcan_shmif_drop(&C);
}
The SEGID_ part is a hint to the server as to the intended use of this connection and how it could manage its resource allocation and scheduling. There is a handful of types available, but APPLICATION is a safe generic one. A video player would be wise to use MEDIA (extremely sensitive to synchronisation but not input), while a game would use, well, GAME (high resource utilisation, input latency and most-recent presentation more important than “perfect” frames).
The FATALFAIL part simply marks that there is no point to continue if a connection can’t be negotiated. It saves some error checking and unifies 'fprintf(stderr, "Couldn't connect")' like shenanigans.
The arg_arr ‘args’ is a form of passing command line arguments to the client without breaking traditional getopt/argv. It can be used to check for key=value pairs through something like ‘if (arg_lookup(&args, "myopt", 0, &val)){ ... }‘ .
A good question here would be, how does the client know to find the server? The actual mechanism is OS dependent, but for the POSIX case there are two main options that the library is looking for: the ARCAN_CONNPATH and ARCAN_SOCKIN_FD environment variables. The value for CONNPATH is the name of a connection point and is defined by the server side.
Comment: The connection point name is semantic. This stands in contrast to how Xorg does with its DISPLAY=:[number] where number normally came from the virtual terminal the user was starting Xorg from. The server end can spawn multiple connection points with different names and apply different policies based on the name.
ARCAN_SOCKIN_FD is used to reference a file descriptor inherited into the process running arcan_shmif_open. This is used when the server itself spawns the new process. It is also used in the special case of ARCAN_CONNPATH being set to “a12://” or “a12s://”. This form actually starts arcan-net to initialise a network connection to a remote host, which creates a single-use shmif server for the client to bind to. This is one of the ways we translate from local IPC to network protocol.
The 'arcan_shmif_initial' part gives the information needed to create correct content for the first frame. This includes user preferred text format (size, hinting), output density (DPI aware drawing over ‘scaling’ idiocy), colour scheme (contrast for accessibility, colour blindness or light/dark) and even locale (to migrate away from LC_… /getlocale) and location (latitude/longitude/elevation).
Comment: For accelerated graphics it also contains a reference to the GPU device to use for rendering, this lets the server compartment or load-balance between multiple accelerators.
Now for the ‘send audio/video’ part.
shmif_pixel px = SHMIF_RGBA(0x00, 0xff, 0x00, 0xff);
for (size_t y = 0; y < C.h; y++)
for (size_t x = 0; x < C.w; x++)
C.vidp[Y * C.pitch + x] = px;
arcan_shmif_signal(C, SHMIF_SIGVID);
This fills the dynamic video buffer part of the segment with a full opaque green pixel in linear RGB in whatever packing format the system considers native (embedded in the SHMIF_RGBA macro).
Comment: While on most systems (these days) that would be 32-bit RGBA, it is treated as compile time native as CPU endianness would be. Low-end embedded might want RGB565, special devices like eInk might want RGB800 and so on.
There are a lot of options available here, but most have to deal with special synchronisation or buffering needs. These are covered in the ‘Special Case’ sections on Synchronisation and on Accelerated Graphics.
For this example we omitted the aural representation, but if you have a synthesizer core or even tone-generator the same pattern apply; switch vidp for audp and SHMIF_SIGVID for SHMIF_SIGAUD (it is a bitmask, use both if you have both).
Comment: The common distinction between audio and video is something we strongly oppose. It causes needless complexity and suffering trying to have one IPC system for audio, then another for video and then trying to repair and synchronise the two after the fact. It is one of those historical mistakes that should have ended yesterday, but the state of audio on most systems is almost as bad as video.
At this stage we are already done (13 lines of code, zero need for error handling) but for something more polite, we will flesh out the ‘event processing’ part.
struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
if (ev.category == EVENT_IO){
/* mouse, keyboard, eyetracker, ... handling goes here */
}
switch(ev.tgt.kind){
case TARGET_COMMAND_EXIT:
/* any custom cleanup goes here */
break;
case TARGET_COMMAND_RESET:
/* assume that we are back where we started */
default:
break;
}
}
This will block until an event is received, though more options are covered in the section on ‘Synchronisation’. No action is ever expected of us, we just get polite suggestions ‘it would be nice if you do something about this’. The category part will only be EVENT_IO or EVENT_TARGET and the next section will dip into why.
Comment: The _RESET event in particular is interesting and will be covered in the 'Recovery' Special Case. It can be initiated by the outer desktop for whatever reason, and just suggest 'go back to whatever your starting state was, I have forgotten everything' but is also used if the server has crashed and the implementation recovered from it, or is shutting down and have already handed responsibilities over to another.
The event data model cover 32 different server to client possibilities, and 22 client to server. Together they cover everything needed for a full desktop and more, but it is descriptive, not normative. React to the ones relevant to you, ignore the others.
First Connection (Server)
There are two implementations for the server end; one inside the arcan codebase tailored to work better with its more advanced resource management, crash resiliency and scripting runtime. The other comes as a library, libarcan-shmif-server, and is mainly used by the arcan-net networking tool which translates this into the A12 network protocol.
Let’s walk through a short example which accepts a single client connection, and in the next section do the same thing for the client application end. Normal C foreplay is omitted for brevity.
#include <arcan_shmif.h>
#include <arcan_shmif_server.h>
struct shmifsrv_client *cl =
shmifsrv_allocate_connpoint("demo", NULL, S_IRWXU, fd);
shmifsrv_monotonic_rebase();
This creates a connection point for a client to bind to. There are two alternatives, shmifsrv_spawn_client and shmifsrv_inherit_connection. Spawn takes care of creating a process with the primitives inside. Inherit takes some preexisting primitive and builds from there. Both return the same shmifsrv_client structure.
Comment: For a controlled embedded or custom OS setting, the spawn client approach is the safest bet. The inherit connection approach is for when there is a delegate responsible for spawning processes and reduces the number of system calls needed to a bare minimum.
The shmifsrv_monotonic_rebase() call sets up the internal timekeeping necessary to provide a coarse (25Hz) grained CLK (clock) signal.
Now we need some processing that interleaves with rendering/input processing loops, which is a larger topic and out of scope here.
int status;
while ((status = shmifsrv_poll(cl) <= CLIENT_NOT_READY)
{
/* event and buffer processing goes here */
}
if (status == CLIENT_DEAD)
{
shmifsrv_free(cl, SHMIFSRV_FREE_FULL);
exit(EXIT_SUCCESS);
}
It is possible to extract an OS specific identifier for I/O multiplexing so that _poll is only invoked when there is some signalled inbound data via shmifsrv_client_handle().
Comment: The flag passed to _free determines the client side visibility. It is possible to just free the server side resources and not signal the dead man's switch. This can be used to transparently pass the client to another shmif server process or instance.
Before we get to the event and buffer processing part, there is also some timekeeping that should be managed outside of a higher frequency render loop.
int left;
int ticks = shmifsrv_monotonic_tick(&left);
while (ticks--){
shmifsrv_tick(cl);
}
This provides both integrity and liveness checks and manages client requested timers. The (left) returns the number of milliseconds until the next tick. This is used as feedback for a more advanced scheduler, if you have one (and you should).
Now to the event processing:
struct arcan_event buf[64];
size_t n_events, index = 0;
if ((n_events = shmifsrv_dequeue_events(cl, buf, 64)){
while (index != n_events){
struct arcan_event* ev = &buf[index++];
if (shmifsrv_process_event(cl, ev))
continue;
/* event handlers go here */
}
}
This will dequeue, at most, 64 events into the buffer. Each event is forwarded back into the library in order to allow a subset of internally managed ones. They are just routed through the developer to allow complete visibility. You can use arcan_shmif_eventstr() to get a human readable representation of its contents.
Comment: The reason for having a limit is that a clever and malicious client could set things up in a way that would race to stall the server or exhaust its file descriptor space as part of a denial of service, either to affect the user directly or as part of trying to make an exploitation chain more robust.
Now for the ‘event processing’ part.
if (ev->category != EVENT_EXTERNAL){
fprintf(stderr, "unexpected event category\n");
continue;
}
switch (ev->ext.kind){
case EVENT_EXTERNAL_REGISTER:
/* only allow this once on the client */
arcan_shmifsrv_enqueue(cl, &(struct arcan_event){
.category = TARGET_COMMAND,
.tgt = {
.kind = TARGET_COMMAND_ACTIVATE
}
}
break;
default:
break;
}
The event data model has a lot of display server specific nuances to it, neither is necessary except for the one above. This unlocks the client from the ‘preroll’ state where it accumulates information received into the “arcan_shmif_initial” structure as covered in the client section. Any information necessary for a client to produce a correct first frame goes before the ‘ACTIVATE’ one. The most likely ones you want is DISPLAYHINT, OUTPUTHINT and FONTHINT to instruct the client about the size it will be scaled to, the density, colourspace and subchannel layout it will be presented through, as well as the preferred size of the most important of primitives, text.
Comment: There are a number of event categories, but only one reserved for clients (EVENT_EXTERNAL). The other categories are for display server internals. The reason they are exposed over SHMIF is for the 'server' end to be split across many processes and still interleave with event processing in the server. This allows us to have external sensors, input drivers etc. all as discrete threads or processes without changing anything else in the architecture. It also allows a transformation from using it as a kernel-userspace boundary to a microkernel form.
The last part is to deal with ‘buffer processing’ part of the previous code.
/* separate function */
bool audio_cb(shmif_asample *buf,
size_t n_samples,
unsigned channels, unsigned rate,
void *tag)
{
/* forward buf[n_samples] to audio device or mixer
* configured to handle [channels] at [rate] */
return true;
}
if (status & CLIENT_VBUFFER_READY){
struct shmifsrv_vbuffer vbuf = shmifsrv_video(cl);
/* forward vbuf content to GPU */
shmifsrv_video_step(cl);
}
if (status & CLIENT_ABUFFER_READY){
shmifsrv_audio(cl, audio_cb, NULL);
}
The contents of vbuf is nuanced. There is a raw buffer or opaque GPU system handle + metadata (timing, dirty regions, …), or TPACK (see section on ‘Text Only Windows’) and a set of flags corresponding to the ‘presentation-hints’ on how buffer contents should be interpreted concerning coordinate system, alpha blending and so on.
Comment: Most of the graphics processing properties are things any competent scanout engine has hardware acceleration for, excluding TPACK (and even ancient graphics adaptors used to have those as 'text mode' display resolution). There are support functions to unpack this into a compact list of text lines and their colouring and formatting in "arcan_tui.h" as arcan_tui_tunpack().
Comment: For accelerated GPU handles it is possible to refuse it by sending a BUFFERFAIL event. This will force the client implementation to convert accelerated GPU-local content into the shared pixel format on their end. This is further covered in 'Special Case: Accelerated Graphics'. It doubles as a security measure, preventing the client from submitting command buffers to the GPU that will never finish and livelock composition that way (or leverage any of the many vulnerabilities GPU side). On a hardened system this would be used in tandem with IO-MMU isolation.
In total this lands us with less than 100 lines of code with very granular controls, a fraction of what other systems need to just boilerplate graphics alone. If you only want a 101 level – take on how SHMIF works, we are done; there is a lot more to it if the topic fascinates you, but it gets more difficult from here.
Synchronisation
While one might be tempted to think that ‘display servers’ are about well, providing access to the display, its real job is actually desktop IPC with soft realtime constraints. The bread and butter for such systems is synchronisation. If you fail to realise this you are in for a world of hurt and dealing with it after the fact will brew a storm of complexity.
Comment: It is also the hardest problem in the space - figuring out who among many stakeholders knows what; when do they know it; when is something said relevant or dated. All of those are difficult as is, but it gets much worse when you also need to factor in resonance effects, malicious influence and that some stake holders reason asynchronously about some things, and synchronously about others. As icing on an already fattening cake you need to weigh in the domain specific (audio / video) nuances. Troubleshooting boils down to profiling, and problems manifest as 'jitter' and 'judder' and how those fit with human cognition. Virtual Reality is a good testing ground here even if you are otherwise uninterested in the space.
Comment: beginner mistakes here are fairly easy to spot; if someone responds to synchronisation problems by increasing buffer sizes (landing in a version of the network engineering famous 'buffer bloat' problem) or arbitrary sleep calls (even though some might be necessary without adequate kernel level primitives) they are only shifting the problem around.
Recommended study here is ‘queuing theory’ and ‘signal processing’ for a deeper understanding.
To revisit the previous code examples on the client end:
arcan_shmif_signal(&C, SHMIF_SIGVID);
This is synchronous and blocking. The thread will not continue execution until the server end has said it is ok (the shmifsrv_video_step code). The server can defer this in order to prioritise other clients or to stagger releases to mitigate the ‘thundering herd’ problem.
For normal applications, this is often sufficient and comparable to ‘VSYNC’. When you have tighter latency requirements and/or it is costly to produce a frame, you need something more. The historically ‘easier’ solution has been to just add another buffer:
arcan_shmif_resize_ext(&C, C.w, C.h,
(struct shmif_resize_ext){
.vbuf_cnt = 2
});
_resize and _resize_ext calls are both also synchronous and blocking. This is because the server end needs the controls to guarantee that enough memory is available and permitted. It will recalculate all the buffer offsets (vidp, audp, …) and verification cookie in the context and possibly move the base address around to satisfy virtual or physical memory constraints.
Comment: Some accelerated display scanout controllers have hard requirements on physically continuous memory at fixed linear addresses and those are a limited and scarce resource and why such resize request might fail, especially in tight embedded development settings. Same applies when dealing with virtual GPUs in virtual machines and so on. The other option to still satisfy a request is to buffer in the server end, causing an extra copy with increased power consumption and less memory bandwidth available for other uses.
The request above would make the first arcan_shmif_signal call return immediately, and only block if another signal call happens before the server is able to consume the buffer from the first. Otherwise the context pointer (C.vidp) will be changed to point to the new free buffer slot. This also has the added cost of adding another display refresh period of latency.
Comment: It is possible to increase the buffer count even further, but this changes the semantics to indicate that only the most recently submitted buffer should be considered and others can be discarded. This counters the latency problem of the double buffering problem at the expense of memory consumption. This has historically been called 'triple buffering' but, to much confusion, has also been used for the 'double buffering' behaviour with just deeper buffer queues and is thus meaningless.
Not every part of a buffer might have actually changed. A common optimisation is to annotate which region that should be considered, and for regular UI applications (blinking cursor, single widget updates, …) this substantially cuts down on memory transfers. To cover this you can mark such regions with calls to arcan_shmif_dirty() prior to signalling.
Comment: While some might be tempted to annotate every pixel, there are strong diminishing returns as soon as you go above just one region due to constraints on memory transfers. Internally the shmif client library implementation will just merge multiple calls to _dirty into the extents of all changes. For the triple buffering behaviour mentioned in the previous comment, dirty regions won't have any effect at all as changes only present in the one buffer would not guarantee to transfer over in the next and the cost of trying to merge them on the composition end would cancel out the savings in the first place.
There are more synchronisation nuances to cover, but to avoid making this section even more exhausting, we will stick to the two most relevant. The first of these look like this:
arcan_shmif_signal(&C, SHMIF_SIGVID | SHMIF_SIGBLK_NONE);
This returns immediately and you can chose to check if it is safe to draw into the video buffer yourself (through arcan_shmif_signalstatus), or to simply continue writing into the buffer and risk ‘tearing’ in favour of lower latency. This amounts to what is commonly called ‘Disable VSYNC’ in games.
Comment: For those that have written games in days of yore, you might also recall 'chasing the beam', unless dementia has taken over by now. Since game rendering can have predictable write patterns and display scanout can have predictable read patterns it is possible to align your writes such that you raster lines up to just before the one the display is currently reading. This is neither true for modern rendering nor is it true for modern displays, 'the framebuffer' is a lie. Still, for emulation of old systems, it is possible, but impractical, to repeatedly access the 'vpts' field of the static region to figure out how many milliseconds are left until the next VBLANK and stage your rendering accordingly.
The last option is to to keep the SHMIF_SIGBLK_NONE behaviour, but adding the flag SHMIF_RHINT_VSIGNAL_EV to C.hints prior to a _resize call. This will provide you with a TARGET_COMMAND_STEPFRAME event and you can latch your rendering to that one alone and let your event loop block entirely.
Comment: Enabling STEPFRAME event delivery by sending a CLOCKREQ request provides a secondary path for advanced latency management as it enables feedback in presentation timing. Sampling the 'vpts' field of the static region would provide information about upcoming deadline and STEPFRAME contains metadata about presentation timing as to when the contents was actually presented to scale up/down effects and quality. Current game development is full of these kinds of tricks.
Event Processing
Event loop construction is an interesting and nuanced topic. We start by returning to the naive one introduced in the client section:
struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
/* interpretation goes here */
}
Responding to each event that arrives here should be as fast as possible. This is easy most of the time. Whenever a response includes rendering however, response times can vary by a lot. Some events are more prone to this, with mouse motion and resize requests being common offenders.
What happens then is that the number of events in the incoming queue starts to grow. If the rate of dispatched events is lower than that of the incoming one, we get buffer back-pressure.
This applies to both the server and the client side. Each side has different constraints and call for different mitigation. The server end is more vulnerable here as it has multiple clients to process, and higher costs for processing events as most prompt some form of managerial decision.
Comment: Events from client A can directly or indirectly cause a larger number of responses from clients B and C (amplification), which in turn can cascade into further responses from A. This can snowball fast into 'event storms' and back-pressure building up in others as 'resonance effects'.
One small change to the loop improves on the client end of the equation:
bool process_event(struct arcan_event *ev)
{
/* interpretation now goes here */
}
struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
bool dirty = process_event(&ev);
size_t cap = PP_QUEUE_SIZE;
while (arcan_shmif_poll(&C, &ev) > 0 && cap--){
dirty |= process_event(&ev);
}
if (dirty){
render();
}
}
This will flush out as much of the inbound queue (or up to a cap corresponding to the size of the ring buffer) as possible, and only render after all have been applied. This prevents earlier events in the queue from being overdrawn by responses to later ones in the queue.
Comment: Since the event queue is a ring-buffer visible to both sides, it is possible for either party to atomically inspect the head and tail values to determine the current state of the other end, as well as incoming workload. This is a powerful advantage over other possible carriers, e.g. sockets.
The full data model would take another lengthy post to flesh out, so we will only look at one event which highlights library internals. That event is ‘TARGET_COMMAND_DISPLAYHINT’. This event is used to indicate the size that the server end would prefer the window to have. The client is free to respond to this by resizing to the corresponding dimensions. If it doesn’t, the server can still scale and post process – it has the final say on the matter.
As mentioned earlier, resize is synchronous and blocking due to its systemic cost so it makes sense to keep them at a minimum. Some of that responsibility falls on the window manager to ensure that a drag resize using a 2kHz mouse doesn’t also result in 2000 DISPLAYHINTs. Even if that would happen, the implementation has another trick up its sleve.
There is a small number of events which are considered costly and can be coalesced. When _wait or _poll encounters such an event, it sweeps the entire pending queue looking for similar ones, merging them together into the one eventually returned, providing only the most recent state.
Comment: There is a tendency for IPC systems to be designed as generally as possible, even if their actual narrow use is known. This defers the decision to commit to any one domain specific data model, making optimisations such as this one impossible -- you can't coalesce or discard what you don't know or understand, at least not responsibly with predictable outcome. This does not make complexity go away, to the contrary, now every consumer has increased responsibility to manage queuing elsewhere. The problem doesn't magically disappear just because you have an XML based code generator or some-such nonsense.
Most of this article has been about a single segment, though in ‘rich’ applications you would have more: child windows, popups, custom mouse cursors and so on. We already mentioned it is possible to request more, even though only one is ever guaranteed. This is not more difficult than setting up the primary one. You submit a SEGREQ event with a custom identifier and type. Eventually you either get a NEWSEG event or REQFAIL event back with the same identifier. For the NEWSEG you forward the event structure to arcan_shmif_acquire and you get a new arcan_shmif_cont structure back.
What does this have to do with queue management? Well, each new segment has their own separate queues and each segment can be processed and rendered on separate threads independent of each other. There is a monotonic per-client global counter and timestamp as part of each event to account for ordering requirements between ‘windows’, but in practice those are exceedingly rare.
A final part about the events themselves. SHMIF is an IPC system, it is not a protocol, it doesn’t cross device boundaries. We have a separate and decoupled networking protocol specifically for that. As an IPC system we can and should take advantage of device and operating system specific nuances.
Two such details is that each event has a fixed size, 128 bytes (64 did not cover all cases) which amounts to 2 cache lines for the vast majority of architectures out there. They are in linear continuous buffers at a native aligned base with access patterns that prefetch very well. The packing of different fields is tied to the system ABI which is designed to be optimal for whatever you are running on.
Safety Measures
We are almost done with the overall walkthrough, then we can finish off with some special cases and exotic features. Before then there is one caveat to cover.
In previous sections we have brushed upon a few tactics that protect against misuse; the validation cookie to detect corruption, version and code generation mismatches as well as the dead man’s switch. There is still one glaring issue from shared memory event management and audio/video signalling solution: what happens if the other end livelocks or crashes while we are locked waiting for a response?
In a socket based setup the common ‘solution’ for a crash in the other end would cause it to detach and you can detect that. For a live lock it is to resort to a kind of ping-pong protocol and somehow disambiguate between that and a natural stall for some other part of the system (very often, GPU driver).
By default (there is a flag to disable this) each segment gets a guard thread. This guard thread periodically (default: every second) checks the aliveness of a monitoring process identifier that the server filled out, as well as if the dead man’s switch has been released. If that happens, it immediately unlocks all internal semaphores causing any locked call into the shmif library to release and any further calls to error out so the natural event loop takes hold. This setup is also used to not only detect- but recover from- crashes (see ‘Special Case: Recovery and Migration’).
This might not be enough for troubleshooting or even communicating to a user that something is wrong. For this we have the ‘last_words’ part of the memory region. This is an area the client can fill out with a human presentable error message that the server end can forward to relevant stakeholders.
The Arcan engine itself splits out into two parts. One potentially privileged parent supervision process that is used to negotiate device access, and the main engine. This supervision process also acts as a watchdog. Every time the engine enters and exits a dangerous area, e.g. the graphics platform or the scripting VM, it registers a timestamp with the parent. If this exceeds some threshold, the parent first signals the engine to try and gracefully recover (the scripting VM is able to do that) and if the problem persists, shuts down the engine. This will trigger the guard threads inside clients and they, in turn, enter a reconnect, migrate or shutdown state.
Special Case: Handover Allocation
As we covered previously, requesting a new segment to use as a window takes a type that indicates its role and purpose. One such type that sticks out is ‘SEGID_HANDOVER’. This means that the real type will be provided later and that the segment will be handed over to a new client.
To better illustrate, let’s take a code example:
arcan_shmif_enqueue(&C, &(struct arcan_event){
.category = EVENT_EXTERNAL,
.ext.kind = ARCAN_EVENT(SEGREQ),
.ext.segreq.kind = SEGID_HANDOVER
});
...
uint32_t new_token;
/* in event handler, 'ev' being the inbound event */
case TARGET_COMMAND_NEWSEGMENT:
arcan_shmif_handover_exec(&C, ev,
"/path/to/something",
argvv, envv,
0 /* detach-options */);
new_token = ev.ext.ioevs[4].uiv;
break;
This would launch “/path/to/something” so that when it calls arcan_shmif_open it will actually use the segment we received in order to connect. We can then use new_token in other events to manage some of it, e.g. reposition its windows, inject input and more. All of this retains the chain of trust: the server end knows who invited the new client in and can treat it accordingly.
This can be used to embed other clients into your own window, to build an external window manager and so on. In our ‘command lines without terminal emulation’ shell, Lash#Cat9, we use that to manage graphical clients while still being ‘text only’ ourselves. For other examples, see the article on Another Low Level Arcan Client: A Trayicon Handler.
Special Case: Recovery and Migration
A key feature of SHMIF is that it can redirect and reconnect clients manually. Through this we can even transition back and forth between local and networked operations. The section on ‘Safety Measures’ covered how it works in SHMIF internals. There is also an article on ‘Crash Resilient Wayland Compositing’ (2017) that demonstrates this.
When a client connects, the library enqueues a ‘REGISTER’ event that contains a generated UUID. This can be leveraged by the window manager to persist location on the desktop and so on. At any stage it can also send a ‘DEVICEHINT’ event back.
This event is used to provide an opaque handle for GPU access in operating systems which requires that (which can further be used to load balance between multiple GPUs), but it can also mention a ‘fallback connection point’. Should the server end die (or pretend that it has died), the library will try to connect to that connection point instead.
If it is successful, it will inject the ‘TARGET_COMMAND_RESET’ event that we covered earlier. We will use the following clip from “A12: Visions of the Fully Networked Desktop“. as a starting point.
In it, you see Lash#Cat9 CLI shell inside the ‘Durden’ Window manager having a video clip as an embedded handover allocation. This has previously used the local discovery feature of the network protocol to detect that the tablet in front of the screen (a Surface Go) is available as a sink and has added it as an icon in the statusbar — unfortunately occluded by the monitor bezel in the clip.
When I drag the window and drop it on that icon, Arcan sends a DEVICEHINT with the connection primitive needed for the network embedded into the event. It then pulls the dead man’s switch, forcing the shmif library to go into recover. Since it remembers the connection from the DEVICEHINT, it reconnects and rebuilds itself there.
This feature is not only leveraged for network migration as shown, but also for compartmentalisation between multiple instances; for crash recovery; for driver upgrades and for upgrading the display server itself. All using the same code paths.
Special Case: Accelerated Graphics
Many ‘modern’ clients unfortunately have a hard dependency to a GPU, and unfortunately the mechanisms for binding accelerated graphics between display server and client are anything but portable.
Comment: Khronos (of OpenGL and Vulkan fame) tried to define a solution of their own (OpenWF) that failed miserably. What happened instead is even worse; the compromise once made for embedded systems, 'EGL' got monkey patched with a few extensions that practically undoes near all of its original design and purpose, and it is suddenly what most are stuck with.
There is a lot of bad blood and vitriol on the subject that we will omit here and just focus on the SHMIF provided interface. Recall the normal way of starting a SHMIF client:
#include <arcan_shmif.h>
int main(int argc, char **argv)
{
struct arg_arr args;
struct arcan_shmif_cont C =
arcan_shmif_open(SEGID_APPLICATION,
SHMIF_ACQUIRE_FATALFAIL,
&args);
}
This does still apply. A client is always expected to start a normal connection first, and then try to bootstrap that to accelerated, which can fail. The reasoning for that is iff permissions or GPU driver problems stop us from providing an accelerated connection, the regular one can still be used to communicate that to the user rather than have them dig through trace outputs for the answer.
To extend a context to being accelerated you can do something like this:
struct arcan_shmifext_setup cfg = {
.api = API_OPENGL,
.major = 4,
.minor = 2
/* other options go here */
};
int status = arcan_shmifext_setup(&C, &cfg);
if (status != SHMIFEXT_OK){
/* configuration couldn't be filled */
}
There are a number of options to provide in the config that requires some background with OpenGL etc. to make any sense so we skip those. If you know, you know and if you don’t, enjoy the bliss. If the setup is OK, the passed ‘cfg’ is modified to return the negotiated values, which might be slightly different than what you requested.
Afterwards, you can then continue with arcan_shmifext_lookup() to extract the function pointers to the parts of the requested API that you need to use, bound to the driver end of the created context.
When writing platform backends to existing applications, they often provide their own way of doing all this and we do need a way to work with that. If there already is a context living in your process and you want to manually export and forward a resource, it is possible through:
size_t n_planes = 4;
struct shmifext_buffer_plane planes[n_planes];
n_planes = arcan_shmifext_export_image(&C, (uintptr_t) MyDisplay, (uintptr_t) my_texture_id, n_planes, planes);
if (n_planes){
arcan_shmifext_signal_planes(&C,
SHMIF_SIGVID,
n_planes, planes);
}
There are a several more support functions with similar patterns for context management, importing already exported images and so on, but this should be enough to get an idea of the process.
Special Case: Debugging and Accessibility
We have already shown how the client end can request new segments through SEGREQ events and how those are provided through NEWSEGMENT events coming back. Another nuance to this is that the server end can push a NEWSEGMENT without having to wait for a SEGREQ in advance.
This can be used to probe for support for things, such as custom client mouse cursors, or to signal a paste or drag and drop action (clipboard is just yet another segment being pushed), as the server end will know if the client mapped the segment or not.
There is nothing stopping us from actually mapping and populating the segment from within libarcan-shmif, and there are two cases where that is actually done. One is for SEGID_DEBUG and another for SEGID_ACCESSIBILITY.
There are two longer articles related to how this works in more depth, one on ‘Leveraging the Display Server to Improve Debugging’ and another on ‘Accessible Arcan: Out of Sight’.
If one of these are received, libarcan-shmif will (unless the feature is compiled out) internally spawn a new thread. In the debugging case it will provide a text interface for attaching a debugger, for exploring open files, inspecting environment and memory from within the process itself. In the accessibility case it will latch on to frame delivery (SHMIF_SIGVID) in order to overlay text descriptions that gets clocked to the video frames being delivered and the dirty regions being updated.
Special Case: Text-only Windows
There are more reasons as to why a set of fonts and desired font size is provided during the preroll state and why there is a ‘text rows and columns’ field in the static region.
Among the hints that can be set for the video region, there is SHMIF_RHINT_TPACK. This changes the interpretation of the contents of the video buffer region to use a packing format (TPACK) which is basically a few bytes of headers and then a number of rows where each row has a header covering how it should be processed (shaping, right-to-left) along with a number of cells with the formatting, colour and possible font local glyph indices (for ligature substitutions).
The format is complete enough to draw anything a terminal emulator would be able to throw at it, but also do things the terminal emulator can’t, such as annotation layers or borders that fall outside of the ‘grid’, saving precious space.
This approach lets the server end handle the complex task of rendering text. It also means that the costly glyph caches, GPU related acceleration primitives like distance fields and so on can all be shared between windows. It means that the server can apply the heuristic for ‘chasing the beam’ style minimal latency or tailor updates to the idiosyncrasy of eInk displays when appropriate, and that the actual colours used will fit with the overall visual theme of the desktop while letting the client focus on providing ‘just text’.
While it is possible to build these yourself, there is a higher level abstraction support library, ‘libarcan-tui’ for that purpose. The details of that, however, is a story for another time.