Interfacing with a ‘Stream Deck’ Device

Continuing the series on using the various Arcan APIs, we get to another use case that works a bit differently here than elsewhere. What makes it interesting enough for a post is how the low and high levels fit together for a device class that challenges normal boundaries on what are valid input and output devices.

Before we start, here are some references to the previous articles in the series:

  1. https://arcan-fe.com/2019/03/03/writing-a-low-level-arcan-client/
  2. https://arcan-fe.com/2019/05/07/another-low-level-arcan-client-a-tray-icon-handler/

The lucky device in question will be an “ElGato Stream Deck“, which is basically a low resolution budget display with an overlay of translucent buttons and a thin input grid layer that captures button presses.

Operating on the ‘mechanism, not policy’ principle, the tools we will be writing are not strictly bound to this kind of hardware. You can run it inside a Raspberry Pi with a touch display and use arcan-net to run the client remotely and you get quite the nifty visual remote controller. Why not bind it to a “display-pad” secondary small display that some recent laptops replace the trackpad with, or use the output as a model texture in safespaces to remove the need for most of the keybindings.

Here is a video of the results of this walkthrough in action, mainly showing the produced output from a short script being mapped to the display:

Demo of the arcan-streamdeck driver connected to a sample application

The (low level) repository can be found here: https://github.com/letoram/arcan-devices

There is also a high level tool that has been added to Durden(streamdeck) for these kinds of displays. It uses the same low level code, communication and setup as we will go through here. The rest of this tool is a bit more complicated, but for good reason; it can handle multiple devices of different display sizes and cell dimensions; it exposes dynamic WM state and live contents preview; titlebar decoration buttons; client announced icon bindings; and a ton of other features. It can basically be used as a very competent accessibility tool.

This video shows that tool in more real action:

Demo of the ‘streamdeck’ tool in Durden connected to the ‘arcan-streamdeck’ driver

Note how the buttons relayout based on the window that gets selected. The titlebar decoration buttons gets moved to the display, and if the client that is tied to the selected window announces any custom key inputs, those are added to the list as well. Window contents and workspace previews also gets real-time mapped to the display and can be used for window selection.

We will skip the build system and USB/reversing specific part, but basically each button is updated by sending a chunked raw bitmap image with a header that specifies button index and chunk, and inputs are received as reports with a byte per designated key.

For the Arcan side, we need both the low-level code that maps to the device, and a high-level script that defines the client behaviour. In this article, we will jump between the two as needed. The low-level parts will be prefixed with “Client-side” and the high-level with “Server-side”.

(Client-side) We start with the normal connection dance:

struct arcan_shmif_cont C = arcan_shmif_open(
    SEGID_ENCODER, SHMIF_ACQUIRE_FATALFAIL, NULL);
hid_device* deck_device = hid_open(0x0fd9, 0x0060, NULL);

Note the SEGID_ENCODER type. This tells the server side that the client wants to receive audio/video data rather than to provide it. Just building and running this would yield complaints that hey, it couldn’t connect anywhere. Even if the ARCAN_CONNPATH environment variable would point to a valid connection point, it is very likely that the listener on the other end rejects attempts at creating an encoder.

To remedy this, we need to specify the high-level reaction. Normally, we can write or modify an Arcan appl specifically for this, but this time we will use another nifty engine feature, and that is ‘hook scripts’.

Hook scripts are generic scripts that can be user-enabled, regardless of the appl:

arcan -H "hooks/streamdeck.lua" /path/to/my/appl

This would prompt the engine to first start the application as normal, but when it has finished executing the entry point, the hook script will be loaded. These scripts are located in the ‘system-scripts’ namespace, defaulting to /usr/share/arcan/scripts and redirectable via the ARCAN_SCRIPTPATH environment variable.

(Server-side) The skeleton of our hook script (stremdeck.lua) will start like this:

local function open_streamdeck_cp(name)
    local maxw = 72 * 5
    local maxh = 72 * 3
    target_alloc(name, maxw, maxh,
        function(source, status)
            if status.kind == "terminated" then
                delete_image(source)
                open_streamdeck_cp(name)
            end
        end
    )
end

open_streamdeck_cp("streamdeck")

This reads as:

  1. Open a connection point called streamdeck, tied to a (72 * 5), (72 * 3)’ buffer
  2. When that connection terminates, re-open the connection point

Had the “open_streamdeck_cp” call been moved to the state where status.kind == “connected”, multiple clients would be allowed to hook up and receive data over this connection point. This time, we enforce a single active client at a time – but the trick of forcing manual reactivation of a connection point makes it trivial to rate-limit connections and prevent clients from causing stalls or staging denial-of-service like attacks.

The code above still behaves like ‘normal’ clients, even though this client wants to be fed video. Running everything and stitching together like this:

ARCAN_SCRIPTPATH=. arcan -H "streamdeck.lua" console &
git clone https://github.com/letoram/arcan-devices
cd arcan-devices/streamdeck ; meson build ; cd build ; ninja
ARCAN_CONNPATH=streamdeck ./arcan-streamdeck

would still yield nothing of interest. We need to tell the engine what to forward to the client, and how. No external client is permitted to have a working primary segment acting as an encoder for security (confidentiality) reasons, as there is no visual identity for a user to make an informed judgement about what to forward to any specific client. The exception has traditionally been the ‘encode’ frameserver, as that is spawned from the server side where the chain of trust would remain intact. Even then, it is the server side (through the window manager) that specifies what an encoder gets to see.

The normal way of allowing a client to receive audio/video is through a new subsegment established over the primary. Scripting wise, this is initiated server-side through calling define_recordtarget on an identifier referencing a client segment; client side receives a new segment event and maps it by calling arcan_shmif_acquire.

There is an exemption to this, which we will leverage here. The internals of ‘define_recordtarget‘ is that an offscreen rendertarget (a so called ‘FBO’) is created, with an automated or scripted refresh trigger. The contents can be forwarded to an internal or external recipient whenever there is an update. With the recently added ‘rendertarget_bind‘ function, we can swap out the recipient of a rendertarget with the handle of our connected client.

(Server-side) First, we hook up a little check to see that the connected client is of the right type.

if status.kind == "registered" then
    if status.segkind ~= "encoder" then
        delete_image(source)
        open_streamdeck_cp(name)
    end
    build_rendertarget(source, maxw, maxh) -- to implement
end

(Server-side) Then we implement this ‘build_rendertarget’ function:

local function build_rendertarget(source, w, h)
    local buf = alloc_surface(w, h)
    local img = color_surface(w, h, 0, 255, 0)
    blend_image(img, 1.0, 100)
    define_rendertarget(buf, {img})
    rendertarget_bind(buf, source)
    link_image(buf, source)
end

This should be quite straight forward; first we allocate a buffer that will fit our offscreen rendering. Then we allocate a single coloured image that we fade in over the course of (100 / 25 = 4) seconds. Then we create the rendertarget with this image attached. Lastly comes the magic that makes the source act as a recipient of the rendertarget contents, as well as a life-cycle trick that makes sure buf gets deleted automatically when source is destroyed.

Now time for the client side steps. We add a normal event loop:

for(;;){
    struct arcan_event ev;
    if (arcan_shmif_poll(&C, &ev) > 0){
        if (ev.category != EVENT_TARGET)
            continue;
        if (ev.tgt.kind == TARGET_COMMAND_EXIT)
            break;
        if (ev.tgt.kind == TARGET_COMMAND_STEPFRAME)
            consume_video(&C, deck_device);
    }
}

Whenever a STEPFRAME event is received, the video buffer pointed to inside of the arcan_shmif_cont structure has been updated, and will be held in that state until released through arcan_shmif_signal(&C, SHMIF_SIGVID);

The code in the repository is more complicated as it handles repacking into checksummed tiles and sending those that have changed to the USB device. Reprinting that here would just add noise, so a simple loop implementation for consume_video would be:

for (size_t y = 0; y < C->h; y++)
  for (size_t x = 0; x < C->w; x++){
    uint8_t r, g, b, a;
    SHMIF_RGBA_DECOMP(C->vidp[y * C->pitch + x], &r, &g, &b, &a);
/* do something with r, g, b, a */
  }
arcan_shmif_signal(C, SHMIF_SIGVID);

Video covered, time for the input routine. The astute reader might have noticed that the shmif poll loop would always spin and never block. This is because we will use libusb for a crude form of blocking here.

Patching the loop from before, we get this:

int old_mask = 0;
for(;;){
  uint8_t inbuf[64];    
  ssize_t nr = hid_read_timeout(deck_device,
                                inbuf, sizeof inbuf, 64);
  if (-1 == nr)
      break;

  int mask = get_mask(inbuf, nr);
  if (old_mask != mask){
      update_mask(C, old_mask, mask);
      old_mask = mask;
  }
  ...

We will get a single report in inbuf, so just enumerate the number of bytes read, and map each to a bit in the mask (get_mask). The last Arcan piece will be in update_mask:

static void update_mask(struct arcan_shmif_cont* C,
                        int mask,
                        int changed)
{
    int ind;
    struct arcan_event ev = {
        .category = EVENT_IO,
        .io = {
            .devkind = EVENT_IDEVKIND_GAMEDEV,
            .datatype = EVENT_IDATATYPE_DIGITAL
        }
    };

    while((ind = ffs(changed)){
        ev.io.subid = ind;
        ev.io.input.digital.active = !!(changed & (1 << (ind-1)));
        changed = changed & ~(1 << (ind - 1));
        arcan_shmif_enqueue(C, &ev);
    }
}

What happens above is simply that we enumerate the bit positions that have changed since last time, translate its position to a numeric 0-index, and mark it as pressed or released. The io substructure in the arcan_event structure is quite complex as it contains some legacy left-overs, and encompasses a really wide range of devices. The Device-kind ‘GAMEDEV’ used here is the catch-all that does not cover other established input device types (keyboards, mice, touch panels, eye trackers, …).

For the last step, we should extend the input handler in the Lua script as well. If the callback set when issuing target_alloc is changed to contain this:

function(source, status, input)
    if status.kind == "input" and _G[APPLID .. "_input"] then
        key = translate_input(input)
        if key then
            _G[APPLID .. "_input"](key)
        end
    end
-- code from previous sections goes here
end

The missing translate_input function should filter out unwanted input events so that the client does not get free reign to inject whatever it wants into the main input handler of the running appl. Recall that we are writing a hook-script that can be forced into any appl.

The implementation of translate_input then:

local function translate_input(input)
    if not input.digital then
       return
    end
    local keyind = 97 + (input.subid % 25)
    return {
        kind = "digital",
        digital = true,
        devid = 0,
        subid = input.subid,
        translated = true,
        number = keyind,
        keysym = keyind,
        active = input.active,
        utf8 = input.active and string.char(keyind) or ""
    }
end

Would have the input be translated to act as a keyboard input. The input table is quite complex, as shown in the documentation for the related target_input function that forwards input to a client.

This brings us to a point where the device is functional. Before ending this, let us do two minor adjustments – one technical and one cosmetic. Adding another state to the event handler in the target_alloc handler:

if status.kind == "connected" then
     target_flags(source, TARGET_BLOCKADOPT)   
end

This one is subtle – but should the engine reset the Lua VM due to a scripting error, or as a user trigger from switching appl, the frameserver created through target_alloc would be re-exposed to the scripts through a possible _adopt event handler. Setting the flag above will cause the source to be destroyed rather than adopted, preventing this from causing the client to ‘appear’ as a normal one in that edge condition.

Lastly, and for fun since just a green screen is not very interesting, we go a bit old school demo scene through this that adds a plasma as well as showing how GPU programs are set up and used:

local frag = [[
varying vec2 texco;
uniform int timestamp;
uniform float fract_timestamp;
float f1(float x, float y, float t){
    return sin(sqrt(10.0*(x*x+y*y)+0.05)+t)+0.5);
}
float f2(float x, float y, float t){
    return sin(5.0*sin(t*0.5)+y*cos(t*0.3)+t)+0.5;
}
float f3(float x, float y, float t){
    return sin(3.0*x+t)+0.5;
}

const float pi = 3.1415926;
void main()
{
    float time = float(timestamp) + fract_timestamp;
    float sin_ht = sin(t*0.5)+0.5;
    float cx = texco.s*4.0;
    float cy = texco.t*2.0;
    float v = f1(cx, cy, t);
    v += f2(cx, cy, t);
    v += f3(cx*sin_ht, cy*sin_ht, t);
    float r = sin(t);
    float g = cos(t+v*pi);
    float b = sin(t+v*pi);
    gl_FragColor = vec4(r,g,b,1.0);
}
]]

local shid = build_shader(nil, frag, "plasma")

-- and at the end of build_rendertarget:
image_shader(img, shid);

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s