This article introduces the first release of ‘Lash#Cat9’, a different kind of command-line shell.
A big change is that it is communicating with the display server directly, instead of being restricted and filtered by a terminal emulator. The source code repository with instructions for running it yourself can be found here: https://github.com/letoram/cat9. A concatenation of all the clips here can be found in this (youtube-link).
Cat9 serves as the practical complement to the article on ‘The day of a new command-line interface: shell‘. That article also covers the design/architectural considerations on a system level, as well as more generic advancements to displacing the terminal emulator.
The rest of the article will work through the major features and how they came about.
A guiding principle is the role of the textual shell as a frontend instead of a clunky programming environment. The shell presents a user-facing, interactive interface to make other complex tools more approachable or to glue them together into a more advanced weapon. Cat9 is entirely written in Lua, so scripting in it is a given, but also relatively uninteresting as a feature — there are better languages around for systems programming, and better UI paradigms for automating work flows.
Another is that of delegation – textual shells naturally evolved without assuming a graphical one being present. That is rarely the case today, yet the language for sharing between the two is unrefined, crude and fragile. The graphical shell is infinitely more capable of decorating and managing windows, animating transitions, routing inputs and tuning pixels for specific displays. It should naturally be in charge of such actions.
Another is to make experience self documenting – that the emergent patterns on how your use of command line processing gets extracted and remembered in a form where re-use becomes natural. Primitive forms of this are completions from command history and aliases, but there is much more to be done here.
I collected history from a few weeks of regular terminal use along with screen recordings of the desktop window management side. I then proceeded to manually sift through these, looking for signs of poor posture. I found plenty.
This is a humbling experience. The main conclusion drawn is that I am mostly a hapless twit who default to repeating the same things hoping for different outcomes. I consistently confuse ‘src’ and ‘dst’ for ‘ln -s’; ‘ls’ gets spelled ‘sl’ much too often; ifconfig remains the preferred choice to ‘ip’ even though its main output typically is ‘file not found’ these days; nearly every tool that expects regular expressions are first fed plaintext strings. When I actually want to use a regular expression I consistently pick the wrong expression language.
The signal to noise ratio in the history is abysmal. About 90% of scrollback contents were leftovers from cd, ls and tab completion sprinkled with repeated runs of the same command through sudo, with minor tweaks to the arguments or to get a redirection for stderr. Redirections that were then left in the file system, with descriptive names like “boogeraids2000”.
The screen recordings were also revealing. Some notable time sinks:
- Copy paste across line-feeds and resizing windows to deal with incorrect wrapping.
- Spinning up new terminals to work around man or vim hogging the alt screen.
- Digging around in ps/proc/… for PIDs.
- Redirecting to temporary files to transfer job outputs between windows or for later comparison.
- Switching vim buffers between horizontal/vertical to fight the tiling WM.
All these can be fixed with relatively minor effort.
Get the prompt out of the way.
Starting with the prompt – obvious bits are that its contents should be ephemeral and disappear after running a command. It should reflect information about the current context (directory, etc.) and whatever else of immediate short lived value. The point is to clean this up:
Instead we get this:
- Prompt is updated live regardless of input and can change its layout template dynamically.
- Prompt format and contents depends on window management state (focus, unfocus).
- Silent commands are kept away from the history.
- Completions come up without interaction and do not trample/shuffle actual contents.
- Commands that only resulted in errors are automatically delay purged.
The previous options for compartmentation was a choice between juggling between a ‘foreground’ job and ‘background’ jobs. For this to work you needed either a fragile weave of signalling (SIGSTP, …) and file redirections — or spin up new terminals, either through a terminal multiplexer (a terminal emulator inside a terminal emulator inside ..) or new windows.
I find those solutions both noisy and distracting. Instead, I now have this:
- Every command-line submitted now becomes its own job.
- Jobs can reference each other.
- Job context (environment variables, working directory, …) is saved and tracked.
- The jobs are presented in order of importance (active ones take priority over passive ones).
- Spawning new jobs automatically folds old ones into a collapsed form.
- Individual controls, status and statistics are added to a stateful bar at the top of the job.
- Job contexts can be reused for new commands.
Remember everything, but right to be forgotten.
In the terminal world, all job outputs either get composed unto one shared buffer with a certain amount of memory (scrollback history); fight for a scratchpad (“altscreen mode”) or are redirected to files or other jobs. This happens regardless of stream source or job state (foreground/background).
With real compartmentation and much larger memory and CPU budgets thanks to server side text rendering, we can do much better:
- Stdout and Stderr are tracked separately.
- All job output is kept, tracked and addressed individually.
- Contents can be forgotten, or selectively processed.
- Completed jobs can be repeated, appending to the existing output or replacing that of previous runs.
- Jobs can be repeated with an edited command-line.
Cooperate with the outer windowing system
Now that the shell can talk directly to the window manager without having the conversation dumbed down by a terminal emulator sitting in between, new integration options are possible:
- Snapshot the output of a job to a new window.
- Window creation hints to window manager, like vertical split or tabbed.
- Open applications and media embedded, with controls for position and size.
- Detach and reattach embedded media, preserving input routing.
- Directly route contents to clipboard and other data sharing mechanisms.
- Trigger GUI file pickers.
Let legacy in
Now with a fairly functional environment, the last part is to account for all the edge cases where we still need access to the old world in various degrees:
- Send data from a job to external processing pipes (#0 | grep hi).
- Request a new window, attach a terminal emulator to it and run a pty dependent command (!vim).
- Setup a PTY and attach a VTxxx view to it: (p! ls –color=yes).
Streamline command structure
The foundation to cat9 is the command-line language itself. All the UI elements that you see, mouse gestures and key bindings map to the same things that you could type in manually:
- Hooks and event actions can be added after a command has been setup or is running.
- Mouse actions, bindings (clicking shown in clip: view #csel $=crow as in ‘cursor job, cursor row’).
- Aliases and pre-commit expansion.
With these basics sorted out, it is time to build something more interesting.
Special Topic: Views on Life
Now that jobs keep their data around in nicely tracked structures rather than a prematurely composed and broken ‘scrollback buffer’, we can do something more. While we have data in its raw form, we can look at it through various lenses to get different representations of the data. These are baked into the ‘view’ builtin.
Simply put, they parse the data and reformat the contents by adding annotations, structures, formatting and so on. The current builtin ones are all shown in this clip:
In this one you see ‘wrap’ and ‘filter’ along with some options like line numbers and column wrapping. Filter even goes so far as to have an interactive mode that live-applies the filter as it is being written.
With the original data retained, re-executing previous pipelines is not needed, and the choice between using the formatted output and the original data is available when copying in/out.
This is one of the features that will be expanded heavily in future versions as we try to improve the presentation of the many ad-hoc text formats.
Special Topic: State Actor
This is a good one. Regular windowing systems provide Clipboard as well as Drag and Drop as forms of interactive data sharing. Some go further and also allow sequenced picking/sharing, like the “share” button popular in mobile operating systems. Arcan adds a state store/restore action to the mix.
This means that at any point, the windowing system can request that a state snapshot is created, or request that the application reverts to a provided one.
Examples of what gets stored in such a state blob here are configuration changes; command history; environment variables; aliases and so on. While this offloads the ‘where are my dot files’ responsibility, more interesting is that states can be transferred between instances at runtime.
Combine this with the job system: by marking a job as persistent, the command creating a job will be added to the state store. In the following clip you can see it being used to an interesting effect:
I first start a new cat9 session, run two jobs and mark one as persistent manual and other as automatic. Shutting down and restarting and you can see how the jobs come back, with the automatic one starting immediately. In the next clip I go one step further and copy the state between two live instances.
When combined with remote shells, this becomes a really potent administration and automation tool. Perform a task once; visually confirm that the results matched expectations; Save the state and replay wherever and whenever. Use that for knowledge sharing, or hook it up to an event source for snapshotting and rollback to give anything history/undo.
Special Topic: Frontending
There is little consistency between many popular tools, no matter if they come as “argv hell”, “CLIs within the CLI” or “lots of small binaries”. This is natural, but also undesired from a user perspective. It feels rather futile to have gone through the strides of building a CLI that behaves like you want it to — just to have the work be undone by the tools you launch from it.
I am no stranger to uphill battles, but the odds of getting the likes of wpa_supplicant, git, gdb/lldb or ffmpeg to change their evil ways and follow the one true path are slim to none. The passive aggressive form of dealing with this is what bash_completion and the likes do – create helper scripts that at least make polite suggestions while building the command line. This works poorly when the tool is interactive. Other options include defining better programmable interfaces, language server style external oracles, then hope for the main drivers to convert.
With the extensive scripting, parsing and rendering options available to us now – there is a more actively aggressive way. In Cat9, you can define multiple sets of builtins and views, and switch between them. This means that you can create a set of builtins for a specific logical function, like networking, programming or debugging, then swap between those as needed.
This, along with views, will be the more active area being developed for future releases. The following short clip shows an early ‘in progress’ such set for networking.
In the clip you can see the set of builtins being swapped to ‘networking’ which new builtins such as ‘wifi’. You can see the live completion of available SSIDs appearing asynchronously as a scan is complete. Commands can still be forwarded ‘raw’ with the output packaged into its own job that can be used by the other builtins. It can also attach polling status about signal levels and connection into the prompt, using all the same infrastructure as the previous demonstrations.
I hope this conveyed some of the benefits of leaving the shackles of terminal emulators and its more abstract form of ‘virtualisation for compatibility through emulation as default’ behind. There are a whole lot more ideas to squeeze into this setup now that all the grunt work has been dealt with.
Better CLIs as part of better TUIs are key for making professional computing more accessible to budding sprout experts and cognitively challenged alike. The building blocks are here for your ‘speech- assisted’ command-lines without having to have a screen reader try and make sense of a poorly segmented word soup, or for your red team approved secret “leave no trace” cleanup sauce.
The last article in this series will dip into the programmable surface – how the APIs replacing curses work and integrate with the display server / window manager.
Is it currently Arcan-only? Could it work in other environments? E.g. I use Plasma + i3wm.
Arcan can be built with SDL2 as its renderer, in that case it can run as a window in your Plasma environment but several integration features will be diminished and performance slower (i.e. closer to how konsole would be).
TUI (the API this is written against) is not tied to Arcan itself, is deliberately decoupled from how you write Arcan native clients so that there could be an implementation for other display servers, though it is not something I’d spend time on.
This is really cool. I can’t help be feel like the compartmentation features with persistent results and (re)transformation scream out for object based shell features rather than text – I mean, it’s the same logic as you are using in terms of it starting to thinking about ditching legacy shackles.
Leaving terminal emulators behind is one dimension, the other is leaving text streams behind, which is what things like powershell do.
I deliberately stayed away from doing too much with the command language itself; the lexer/parser was lifted straight out of Pipeworld for this reason.
This is partly why the “Lash” bit is there: it is the support structure for writing your own without having to deal with figuring out process control, asynch-I/O, WM integration etc.
Even jacking in your own parser for an OO view of things are there in cat9, I just have to be very careful about picking my battles so the grander project can actually be finished some day 😉
Yeah I figured that was the case 🙂
Can’t climb every mountain at once.
Still it’s a nice thought.
Love it. One point regarding frontending I didn’t get, though.
When it comes to completion helpers and the like, will this project only go the actively aggressive way (providing builtins and the means for user-defined additions to those) or will it still also attempt to define a standard way of interaction between the app (which probably must mean more than just an executable file, then?) and the shell for the app to provide such things?
There is an interface for a runtime probing / feedback in place as part of the ncurses replacement this is written in (arcan-tui), but not exposed or used (yet) as it would only apply if you are writing a new tool and actually follow it and it would compete with similar more generic efforts that oil-shell is looking into; I’d rather implement that.
The point of frontending is not “static database of completion helpers” for single tools, and as you can see with the networking/wifi one, is actively probing the tools themselves. This approach is used elsewhere, an example would be how most IDEs use gdb: the CLI tool is actually running in a hidden PTY+emulator somewhere, and the editor maps this to source view, break points and so on.
The real value is for two different groups:
1. the ones that have become “feature monsters” themselves that are basically programming languages in their own right by now (ffmpeg) where there are simplified recipes that are not accessible (and you go to stack overflow and cut and paste and ..) or have a logical grouping but fractured into multiple binaries that need to cooperate/coordinate (debugging and networking).
2. the “nested CLIs” that force interactive use through a different prompt language (gdb as per the example above, sftp, …) that would break consistency in keybindings, colours and presentation.
Maybe I got it all wrong, or your response went over my head.
I was thinking about some interface/protocol/standard that’d enabled the respective tool’s author to also provide functionality aiding in command line composition at composition time. Like validating what the user has written so far, suggest completions at the cursor positions (informed by system state maybe or in case of options annotated with comments), or providing further live information that might be relevant, for the shell to present in some way.
So I guess some standardized way to inform shells, where to find some function that takes the current command line portion relevant to some specific tool-about-to-be-invoked and spits out something useful in some format or the other.
Are there attempts to establish mechanics for something like this?
Are the oil-shell efforts you mentioned about that? I looked over there, but I couldn’t find it, or couldn’t recognize it.
yes (to both oil having a sketch somewhere but can’t recall where and to the response going over your head) – there is a way for the shell to launch an application, detect that it support arcan-tui (~ncurses) and over that communicate a set of supported key/values/types/description for the shell to useand for the shell to provide them (without restarting) over another channel (packed in an ENV or over a socket) It is not passed over argv in order to avoid break compatibility with the many commands that expect things over argv.
‘2. the “nested CLIs” that force interactive use through a different prompt language (gdb as per the example above, sftp, …) that would break consistency in keybindings, colours and presentation.’
What would that look like?
Some kind of layer in between, that in the background uses the tool in interactive mode, but to the user looks like cat9 with an extended command set?
yes, that is what is happening in the demo clip with “builtin network” “wifi connect” – in the background it connects to wpa_supplicant, creates network, scan etc. at the same time any commands not matching the set of current builtins are forwarded to wpa_supplicant itself.
This is fantastic stuff. Really innovative. I look forward to seeing where it goes.