disclaimer: this is a technical post aimed at developers being somewhat aware of the problem space. There will be a concluding ‘the day of…’ post aimed at end users where some of the benefits will be demonstrated in a stronger light.
A few months back, I wrote a lighter post about an ongoing effort towards reshaping the venerable Linux/BSD CLI to be free of the legacy cruft that comes with having to deal with the emulation of old terminal protocols, stressing the point that these protocols make the CLI less efficient, and hard to work with from both a user- and a developer- perspective. In this post, we’ll recap some of the problems, go through the pending solution, update with the current progress and targets, and see what’s left to do.
To recap, some of the key issues to adress were:
- Split between terminal emulator and command line shell breaks desktop integration – Visual partitions such as windows, borders and popups are simulated with characters that are unwanted in copy-paste operations and fail to integration with an outer desktop shell (if any).
- Code/data confusion – both the terminal emulator and text-oriented user interfaces (TUIs) tries to separate content from metadata using a large assortment of encoding schemes, all being prone to errors, abuse, difficult to parse and ridden with legacy.
- Uncertain capabilities/feature-set – basic things like color depth, palette, character encoding schemes and so on are all probed through a broken mishmash of environment variables, capability databases and the actual support varies with the terminal emulator that is being used.
- Confusion between user-input and data – programs can’t reliably distinguish between interactive (keyboard) input, pasted/”IPC” input and other forms of data entry.
- Lack of synchronisation. This makes it impossible for the terminal emulator to know when it is supposed to draw, and signal propagation contributes to making resize operations slow.
- Crazy encoding schemes for representing non-character data – such as Sixel.
This just scratches the surface and don’t go into related issues when it comes to user-interaction, consistency, and it ignores the entire problem space of system interaction when it comes to tty devices, input modes, virtual terminal switching and so on.
If you consider the entire feature-set of all protocols that are already around and in use, you get a very “Cronenberg“- take on a display server and I, at least, find the eerie similarities between terminal emulators and the insect typewriters from Naked Lunch both amusing, tragic and frightening at the same time; the basic features one would expect are there, along with some very unwanted ones, but pieced together in an outright disgusting way. If we also include related libraries and tools like curses and turbo vision we get a clunky version of a regular point and click UI toolkit. Even though the scope is arguably more narrow and well-defined, these libraries are conceptually not far away from the likes of Qt, GTK and Electron. Study unicode and it shouldn’t be hard to see that ‘text’ is mostly graphics, the largest difference by far is the smallest atom, and the biggest state-space explosion comes from saying ‘pixel’ instead of cell.
So the first question is, why even bother to do anything at all within this spectrum instead of just maintaining the status quo? One may argue that we can, after all, write good CLI/TUIs using QT running on Xorg today, no change needed – it’s just not the path people typically take; maybe it’s the paradigm of favouring mouse or touch oriented user interaction that is “at fault” here, along with favouring style and aesthetics over substance. One counterpoint is that the infrastructure needed to support the toolkit+display server approach is morbidly obese into the millions of lines of code, when the problem space should be solvable within the tens-of-thousands, but “so what, we have teraflops and gigabytes to spare!”. Ok, how about the individual investment of writing software? accommodating for disabilities? attack surface? mobility and mutability of produced output? efficiency for a (trained) operator? or when said infrastructure isn’t available? the list goes on.
There is arguably a rift here between those that prefer the ‘shove it in a browser’ or flashy UIs that animate and morph as you interact, and those that prefer staring into a text editor. It seems to me that the former category gets all the fancy new toys, while the latter mutters on about insurmountable levels of legacy. What I personally want is many more “one- purpose” TUIs and for them to be much easier to develop. They need to be simpler, more consistent, obvious to use, and more configurable. That’s nice and dreamy, but how are “we” supposed to get there?
First, lets consider some of the relevant components of the Arcan project as a whole, as the proposed solution reconfigures these in a very specific way. The following picture shows the span of current components:
This time around, we’re only interested in the parts marked SHMIF, Terminal and TUI. Everything else can be ignored. SHMIF is the structural glue/client IPC. TUI is a developer facing API built on top of SHMIF but with actual guarantees of being a forward/backwards compatible API. Terminal is a vtXXX terminal emulator/state machine built using a modified and extended version of libtsm.
Inside the ‘Arcan’ block from the picture, we have something like this:
From this, we take the frameserver(ipc)– block and we put it into its own shmif-server library. We take the platform block and split out into its own, libarcan-abc. Terminal is extended to be able to use these two APIs along with optional Lua/whatever bindings for the TUI API so that the higher level shell CLI logic with all its string processing ickiness can be written in something that isn’t C. This opens the door for two configurations. Starting with the more complex one, we get this figure:
Here, Arcan is used as the main display server or hooked up to render using another one (there are implementations of the platform layer for both low-level and high-level system integration). The running ‘appl’ acts as the window manger (which can practically be a trivial one that just works as fullscreen or the alt+fN VT switching style with only a few lines of code) and it may spawn one or many of the afsrv_terminal. These can be run in ‘compatibility mode’ where the emulator state machine is activated and it acts just like xterm and friends.
We can also run it in a simpler form:
In this mode, the terminal works directly with the platform layer to drive displays and sample input. It can even perform this role directly at boot if need be. An interesting property of shmif here is the support for different connection modes (which I’ll elaborate on in another post) where you can both interactively migrate and delegate connection primitives. This means that you can switch between these two configurations at runtime, without data loss – even have the individual clients survive and reconnect in the event of a display server crash.
No matter the configuration, you (the ghost behind the shell) get access to all the features in shmif and can decide which ones that should be used and which ones that should be rejected. You are in control over the routing via the choice in shell (and the appl- for the complex version). Recall that the prime target now is local text-oriented, command line interfaces – not changing or tampering with the awk | sed | grep | … flow, that’s an entirely different beast. In contrast to curses and similar solutions, this approach also avoids tampering with stdin, stdout, stderr or argv, because connection primitives and invocation arguments are inherited or passed via env. This should mean that retrofitting existing tools can be done without much in terms of ifdef hell or breaking existing code.
Anyhow, most of this is not just vapours from some hallucinogenic vision but has, in fact, already been implemented and been in testing for quite some time. What is currently being worked on now and for the near future is improving the quality in some of the existing stages and adding:
- Double buffering on the virtual cell screen level to add support for sub-cell “smooth” scrolling, text shaping, BiDi, and non-monospace, properly kerned, text rendering.
- API and structures for designating regions (alt-screen mode) or lines (normal mode) for custom input, particularly mixing/composing contents from other tui clients or frameservers.
Then comes some more advanced refactoring:
- Shmif-server API still being fleshed out.
- Libarcan-abc platform split, as it depends on another refactoring effort.
- Lua bindings and possibly an example shell.
And more advanced “some time in the future” things:
- Shmif-server-proxy tool that can convert to-/from- a network or pipe-passed ‘line format’ (protocol) to enable networking support and test high latency/packet loss behavior.
- CPU- only platform rasteriser (current form uses GL2.1+ or GLES2/3).
- Ports to more OSes (currently only Linux, FreeBSD, OSX).
Should all these steps succeed, the last ‘nail in the coffin’ will be to provide an alternative platform output target that undoes all this work and outputs into a VT100 compliant mess again – all for the sake of backwards compatibility. That part is comparably trivial as it is the end result of ‘composition’ (merge all layers), it is the premature composition that is (primarily) at fault here as information is irreversibly lost. It is just made worse in this case as the feature scope of the output side (desktop computer instead of dumb terminal) and the capability of the input side (clients) mismatch because of the communication language.