(… and for system graphics, games and other interactive applications but that would make the title just a bit too long…)
Many of the articles here have focused on the use of Arcan as a “desktop engine” or “display server”; even though those are rather fringe application areas which only showcases a fraction of the feature set – the display server target just happens to be part of my current focus.
This post is about another application area. For those of you not ‘in the know’, AWK is a programming language aptly suited for scripted stream processing of textual data, and a notable part of basic command-line hygiene. Arcan can be used in a similar way, but for scripted- and/or interactive- stream processing of multimedia; and this post will demonstrate how that can be achieved.
The example processing pipeline we’ll set up first takes an interactive gaming session in one instance (our base contents). This is forwarded to a second instance which applies a simple greyscale effect (transformation). The third instance finally mixes in a video feed and an animated watermark (our overlay or metadata). The output gets spliced out to a display and to a video recording.
The end result from the recording looks like this:
The invocation looks like this:
./arcan_lwa -w 480 -h 640 --pipe-stdout ./runner demo default | ./arcan_lwa --pipe-stdin --pipe-stdout ./gscale | ./arcan --pipe-stdin -p data ./composite mark.png demo.mp4 out.mkv
There are a number of subtle details here, particularly the distinction between “arcan_lwa” and “arcan”. The main difference is that the former can only connect to another “arcan” or “arcan_lwa” instance, while “arcan” will connect to some outer display system; this might be another display server like Xorg, or it can be through a lower level system interface. This is important for accelerated graphics, format selection, zero-copy buffer transfers and so on – but also for interactive input.
Structurally, it becomes something like this:
All of these interconnects can be shifted around or merged by reconfiguring the components to reduce synchronisation overhead or rebalance for system composition. The dashed squares indicate process- and possibly privilege- separation. The smaller arcan icons represent the lwa (lightweight) instance while the bigger showing the normal instance responsible for hardware/system integration.
Note that content flow from some initial source to the output system, while input move in the other direction. Both can be transformed, filtered or replaced with something synthesised at any step in the (arbitrary long) chain. For the example here only works with a single pipe-and-filter chain, but there is nothing preventing arbitrary, even dynamic, graphs to be created.
Going from left to right, lets take a look at the script bundles(“appls”) for each individual instance. These have been simplified here by removing error handling, showing only the normal control flow.
Runner
This reads like “On start, hook up an external program defined by the two command line arguments, and make its buffers visible. Shut down when the program terminates, and force-scale it to fit whatever dimensions was provided at startup. Whenever input is received from upstream, forward it without modification to the external program”.
function runner(argv) client = launch_target(argv[1], argv[2], LAUNCH_INTERNAL, handler) show_image(client) end function handler(source, status) if status.kind == "terminated" then return shutdown("", EXIT_SUCCESS) end if status.kind == "resized" then resize_image(source, VRESW, VRESH) end end function runner_input(iotbl) if valid_vid(client, TYPE_FRAMESERVER) then target_input(client, iotbl) end end
Note that the scripting environment is a simple event-driven imperative style using the Lua language, but with a modified and extended API (extensions being marked with cursive text). There are a number of “entry points” that will be invoked when the system reaches a specific state. These are prefixed with the nameof the set of scripts and resources (‘appl’) that you are currently running. In this case, it is “runner”.
Starting with the initialiser, runner(). Runner takes the first two command line arguments (“demo”, “default”) and passes through to launch_target. This function performs a lookup in the current database for a ‘target'(=demo) and a ‘configuration'(=default). To set this up, I had done this from the command line:
arcan_db add_target demo RETRO /path/to/libretro-mrboom.so arcan_db add_config demo default
The reason for this indirection is that the scripting API doesn’t expose any arbitrary eval/exec primitives (sorry, no “rm -rf /”). Instead, a database is used for managing allowed execution targets, their sets of arguments, environment, and so on. This doubles as a key/value store with separate namespaces for both arcan/arcan_lwa configuration, script bundles and individual targets.
RETRO indicates that we’re using libretro as the binary format here, and the demo is the ‘MrBoom‘ core. This can be substituted for anything that has a backend or dependency that can render and interact via the low-level engine API, shmif. At the time of this article, this set includes Qemu (via this patched backend), Xorg (via this patched backend), SDL2 (via this patched backend), Wayland (via this tool), SDL1.2 (via this preload library injection). There’s also built-in support for video decoding (afsrv_decode), terminal emulation (afsrv_terminal) and a skeleton for quickly hooking your own data providers (afsrv_avfeed), though these are spawned via the related launch_avfeed call.
Gscale
This reads like “On start, compile a GPU processing program (“shader”). Idle until the adoption handler provides a connection on standard input, then assign this shader and an event loop to the connection. Forward all received interactive input. If the client attempts a resize, orient its coordinate system to match.”
function gscale() shader = build_shader(nil, [[ uniform sampler2D map_tu0; varying vec2 texco; void main(){ float i = dot( texture2D(map_ty0, texco).rgb, float3(0.3, 0.59, 0.11) ); gl_FragColor = vec4(i, i, i, 1.0); } ]], "greyscale") end function gscale_adopt(source, type) if type ~= "_stdin" then return false end client = source target_updatehandler(source, handler) image_shader(source, shader) show_image(shader) resize_image(shader, VRESW, VRESH) return true end function handler(source, status) if status.kind == "terminated" then return shutdown("", EXIT_SUCCESS) elseif status.kind == "resized" then resize_image(source, VRESW, VRESH) if status.origo_ll then image_set_txcos_default(source, true) end end end function gscale_input(iotbl) if valid_vid(client, TYPE_FRAMESERVER) then target_input(client, iotbl) end end
This shouldn’t be particularly surprising given the structure of the ‘launcher’. First thing to note is that build_shader automatically uses the rather ancient GLSL120 for the simply reason that it was/is at the near-tolerable span of GPU programming in terms of feature-set versus driver-bugs versus hardware compatibility.
The interesting part here is the _adopt handler. This can be activated in three very different kinds of scenarios. The first case is when you want to explicitly switch or reload the set of scripts via the system_collapse function and want to keep external connections. The second case is when there’s an error is a script and the engine has been instructed to automatically switch to a fallback application to prevent data loss. The third case is the one being demonstrated here, and relates to the –pipe-stdin argument. When this is set, the engine will read a connection point identifier from standard input and sets it up via target_alloc. When a connection arrives, it is being forwarded to the adopt handler with a “_stdin” type. The return value of the _adopt handler tells the engine to keep or delete the connection that is up for adoption.
A subtle detail that will be repeated later is in the origio_ll “resized” part of the event handler.
The skippable backstory is that in this area of graphics programming there are many generic truths. Truths such as that color channels will somehow always come in an unexpected order; GPU uploads will copy the wrong things into the wrong storage format and in the most inefficient of ways possible; things you expect to be linear will be non-linear and vice versa; if something seem to be easy to implement, the only output you’ll get is a blank screen. The one relevant here is that at least one axis in whatever coordinate system that is used will be inverted for some reason.
Any dynamic data provider here actually needs to cover for when or if data source decides that a full copy can be saved by having the origo be in the lower left corner rather than the default upper left corner. For this reason, the script needs to react when the origo_ll flag flips.
Composite
This reads like “on start, load an image into layer 2, force-scale it to 64×64 and animate it moving up and down forever. Spawn a video decoding process that loops a user supplied video and draw it translucent in the corner at 20%. Record the contents of the screen and the mixed audio output into a file as h264/mp3/mkv”. Terminate if the ESCAPE key is pressed, otherwise forward all input”.
function composite(argv) setup_watermark(argv[1]) setup_overlay(argv[2]) setup_recording(argv[3]) symtable = system_load("symtable.lua")() end function setup_watermark(fn) watermark = load_image(fn, 2, 64, 64) if not valid_vid(watermark) then return end show_image(watermark) move_image(watermark, VRESW-64, 0) move_image(watermark, VRESW-64, VRESH - 64, 100, INTERP_SMOOTHSTEP) move_image(watermark, VRESW-64, 0, 100, INTERP_SMOOTHSTEP) image_transform_cycle(watermark, true) end function setup_overlay(fn) launch_decode(fn, "loop", function(source, status) if status.kind == "resized" then blend_image(overlay, 0.8) resize_image(overlay, VRESW*0.2, VRESH*0.2) order_image(overlay, 2) end end ) end function setup_recording(dst) local worldcopy = null_surface(VRESW, VRESH) local buffer = alloc_surface(VRESW, VRESH) image_sharestorage(WORLDID, worldcopy) define_recordtarget(buffer, dst, "", {worldcopy}, {}, RENDERTARGET_DETACH, RENDERTARGET_NOSCALE, -1, function(source, status) if status.kind == "terminated" then print("recording terminated") delete_image(source) end end ) end local function handler(source, status) if status.kind == "terminated" then return shutdown("", EXIT_SUCCESS) elseif status.kind == "resized" then resize_image(source, VRESW, VRESH) if status.origo_ll then image_set_txcos_default(source, true) end end end function composite_adopt(source, type) if type ~= "_stdin" then return false end show_image(source) resize_image(source) target_updatehandler(source, handler) return true end function composite_input(iotbl) if iotbl.translated and symtable[iotbl.keysym] == "ESCAPE" then return shutdown("", EXIT_SUCCESS) end if valid_vid(client, TYPE_FRAMESERVER) then target_input(client, iotbl) end end
Composite is a bit beefier than the two other steps but some of the structure should be familiar by now. The addition of system_load is simply to read/parse/execute another script and the symtable.lua used here comes with additional keyboard translation (how we can know which key is ESCAPE).
In setup_watermark, the thing to note is the last two move_image commands and the image_transform_cycle one. The time and interpolation arguments tell the engine that it should schedule this as a transformation chain, and the transform_cycle says that when an animation step is completed, it should be reinserted at the back of the chain. This reduces the amount of scripting code that needs to be processed to update animations, and lets the engine heuristics determine when a new frame should be produced.
In setup_overlay, launch_decode is used to setup a video decoding process that loops a single clip. If the decoding process tries to renegotiate the displayed size, it will be forcibly overridden to 20% of output width,height and set at 80% opacity.
The setup_recording function works similarly to setup_overlay, but uses the more complicated define_recordtarget which is used to selectively share contents with another process. Internally, what happens is that a separate offscreen rendering pipeline is set up with the contents provided in a table. The output buffer is sampled and copied or referenced from the GPU at a configurable rate and forwarded to the target client. In this case, the offscreen pipeline is populated with a single object that shares the same underlying datastore as the display output. The empty provided table is simply that we do not add any audio sources to the mix.
Final Remarks
I hope this rather long walkthrough has demonstrated some of the potential that is hidden in here, even though we have only scratched the surface of the full API. While the example presented above is, in its current form, very much a toy – slight variations of the same basic setup have been successful in a number of related application areas, e.g. surveillance systems, computer vision, visual performances, embedded-UIs and so on.
Even more interesting opportunities present themselves when taking into account that most connections can be dynamically rerouted, and things can be proxied over networks with a fine granularity, but that remain as material for another article.
I absolutely love how you explain the codes. I wish everyone explained everything that way.
(I know, I should work on of my ability to understand things, whatever.)