In this post, I will go through the current stages of work on a 3D and (optionally) VR desktop for the Arcan display server. It is tentatively called safespaces (Github link) as an ironic remark on the ‘anything but safe’ state of what is waiting inside. For the impatient, here is a video of it being used (Youtube link):
The explanation of what is going on can be found in the ‘High Level Use’ section further below.
To help navigate the post, here are a few links to the individual sections:
Background and Motivation
One of the absolute main goals with the Arcan project as a whole is to explore different models for all the technical aspects that goes into how we interact with computers, and to provide the infrastructure for reducing the barrier to entry for such explorations.
The overall ambition is that it should be ‘patching- scripts’ levels of difficult to piece together or tune a complete desktop environment to fit your individual fancy, and that major parts of such scripts could – with little to no modification – be shared or reused across projects.
For this reason, I’ve deliberately tried to avoid repeating- or imitating- the designs that have been used by projects such as Windows, Android, Xorg, OS X and so on. The reason was not to question or challenge their technical or business related soundness as such; those merits are already fact. Instead, the reason was to find things that was not exactly “sound” business – hence why this has all been kept as a non-profit, self-financed thing.
During the last few months, I’ve started unlocking engine parts that were intended for building VR environments, and many of the design choices had this idea as part of the implicit requirements specification very early on. The biggest hurdle has been hardware quality and availability, an area where we are finally starting to reach a workable level – the Vive/PSVR- level of hardware is definitely “good enough” to start experimenting – yet terrible enough to not be taken seriously.
Arcan as a whole is in a prime position to do this “right”, in part because of its possible role as both a display server, and as a streaming multimedia processor / aggregator. Some of the details of the TUI subproject and the SHMIF IPC subsystem also fit into the bigger puzzle, as their implicit effect is to push for thinking of clients as interactive, streams of different synchronised data types – rather than opaque pixmaps produced in monolithic ‘GUI toolkit thiefdoms’.
High level use and Demo
Starting with a shorter but slightly more crowded video from the same desktop session:
In the video, you see parts of the first window management scheme that I am trying out. Although the video is presented as monoscopic, the contents itself is stereoscopic – 3D encoded video looks 3D when viewed on a HMD.
The window management scheme is a riff on the tiling window manager, mostly due to the current lack of more sophisticated input devices that would benefit from something more refined. Actual input is keyboard and mouse in this round, though there are ongoing experiments with building a suitable glove.
“Windows”, or rather, models – since you have a number of different shapes and mappings to chose from – are grouped in cylindrical layers. The user is fixed at the centre and the input focused window is positioned at 12 o clock, with other sibling windows rotated to face the user (“billboarding”), scaled down and positioned around the layer geometry.
Each layer can be arranged at either a fixed distance for heads-up display components and infinite geometry like skyboxes; or they can be swapped back and forth – with optional opacity fade-offs, level of detail triggers and so on based on layer distance.
In each layer you can attach various primitive models, from cylinders and spheres to rectangles, flat or curved. Each model can be assigned an external source, an image and an optional activation connection point. Individual properties, such as stereoscopic mapping modes, scale, opacity and so on can also be set, and the models can swap places with each-other within the layer. Each layer can have a ‘layouter’ attached that automatically determines window positions and scale. The builtin default works like in this figure:
This shows a single layer with the user is at a fixed position, dead center. The window that has input focus has the center slot at 12’o clock, and the other windows are evenly laid out around a circle that match the set per-layer radius. If the focus window spawns subwindows or hierarchically bound subconnections, like when a terminal starts a graphics program, such windows gets positioned vertically.
The activation connection point is a unique name where the ARCAN_CONNPATH environment variable points. In the video you can see a designated ‘moviescreen’ appearing at the end when I redirect video there, and disappearing when the specific client disconnects.
A little twist is that safespaces was actually written as a tool (‘vrviewer’) for Durden – even though it is also a full window manager in disguise. The reason why I went this path is for prototyping agility, taking advantage of the tons of existing code and features in Durden shortens the ‘edit, run, test’ cycle drastically. It also eliminates the headache of picking the right display for the right purpose and other low level time consuming details – and you can move your workflows back and forth between the 3D/VR state and the 2D one.
The downside is that there is quite some overhead running it nested like this since Arcan also needs to take the normal desktop management into account, and there is some interference in the refresh rates of the different displays I have hooked up.
Setup and Architecture
There is a tool in the Arcan codebase that you have to build and enable, vrbridge (arcan_vr). It has some weird constraints since we definitely don’t want to do the device management in-process – yet the sampled data has a tiny window of opportunity for use. It takes extended, privileged features in the SHMIF API (so the engine will have to launch and manage the program. Instructions for enabling are in the README.md file). The current version supports devices via OpenHMD, but it is trivial to interface with other VR- related APIs.
The design of the vrbridge interface is such that it allows for selective plugging of a wide range of devices (gloves, haptic suits, eye trackers, …). The scripting layer activates the VR bridge and gets announcements of ‘limbs’ arriving or disappearing. A limb gets activated by the script mapping the limb to a 3D model, and the engine will take care of synchronising limb orientation and so on at both a monotonic rate (for collision detection and response) and when preparing new output frames.
The biggest problem right now – is interfacing with devices and the vast array of possible pluggable input devices that extend the VR setup capabilities. It doesn’t help that practically everyone tries to push their own little lock-in happy frameworks and makes it hard (deliberately or by accident) to access the very few control primitives and sensor samples that are actually needed. As an added bonus, there is also “incentive” to merge with the next round of walled garden app-stores, because things weren’t bad enough as is.
This setup is still in its infancy, and the work so far has highlighted a few sharp corners that will need sanding, although the biggest eyesores (literally) – the quality of the 3D pipeline and the positional audio will have to wait for a while until the relevant stages of the longer Arcan roadmap has been reached.
Another part of that reason is due to the low level of system integration and high level portability requirements that Arcan needs to follow; we are really restricted as to the set of GPU features that can be relied upon and purposely restrictive when it comes to introducing dependencies.
There are two big low level targets for the near- future:
The first is improved asymmetric multi-GPU support. The challenge of this task scales drastically with what you actually do, where sending textured quads back and forth is trivial, and then it goes bad quick and nightmarish almost as fast. The two worst parts is effective load balancing and rewriting much of the synchronisation and storage management code to multithread better and get advantage out of adaptive synchronisation outputs (FreeSync).
The second is fleshing out the interaction with clients so that there is intelligent level-of-detail, and the ability for the client to output 3D representations rather than just rendered pixel buffers – think mesh icons, voxel buffers etc. for server-side rendering in order to get better immersion and seamless, gradual transition / handover between desktop use and dedicated client use.