This article presents an interpretation of the history surrounding the ability for X clients to interact with X servers running on other machines; recent arguments as to that ability being defunct and broken; problems with the feature itself; going into what it was and what happened along the way and where things seem to be heading.
The high level summary of the argumentation herein is that there is validity to the claims that, to this very day, there is such a thing as network transparency in X. It exists on a a higher level than streaming pixel buffers, but has a diminishing degree of practical usability and interest. Its technical underpinnings are fundamentally flawed, dated and criminally inefficient. Alas, similarly dated (VNC/RFB) or perversely complex (RDP) solutions are far from reasonable alternatives.
What are the network features of X?
If you play things strict, all of X are. It should be the very point of having a client / server protocol and not an API. Communication go across hard system barriers and data packets need to consider things like endianness, packet loss, remote addressing and so on, while the state machine(s) need to account for parameters that are fairly invisible locally. Some examples of such parameters would be the big sporadic delays caused by packet corruption and retransmission, a constantly high base latency (100+ms) and buffer back-pressure (clients keep sending new frames and commands at a high rate, accumulating into local buffers, like stepping on a garden hose and see the bubble grow).
The interplay between versions and revisions matter much more in protocol design than in API design unless you go cheap and reject client-server version mismatch (don’t do that). The consequences from tradeoffs between synchronous (processing being blocked waiting for a reply) and asynchronous forms of communication also becomes highly visible.
Let us go back to X. The real (and only) deal for X networking is in its practical nature; the way things work from a user standpoint. In the days of old, one could simply speak the following incantation:
Utter it and you can have its very soul stare back at you through heavily aliased portals. This was practically similar to the local “DISPLAY=:0” form often snuck into your terminal environment from some profile file, hence transparent.
In the way it worked, the remote display thing assumed you were OK with anyone able to listen in “on the wire” and doing all kinds of nasty things with the information gathered. Pixel buffers were not compressed so when they became too numerous or large, the network was anything but happy. It was good only through the rose tinted glasses of nostalgia and for a local area network; your home, school, or business; certainly not across the internet.
It also assumes that the X server itself was not started with the “-nolisten tcp” argument set or that you were using the better option of letting SSH configure forwarding, introduce compression and provide otherwise preferential treatment like disabling Nagel’s Algorithm. Even then, you had to be practically fine with the idea that some of your communication could be deduced from side channel analysis and so on.
Digressing a bit on the security angle, but in spite of speculative vulnerabilities (SPECTRE, …) being all the rage yet hard to do practical at scale, the side channel thing is highly relevant in much more modern protocols and efforts in this ‘web browser as the reinvention of thin clients’ day and age. Low hanging fruit to be found here, but lets move on before I say too much.
Those details in spite, this was a workable scenario for a long time, even for relatively complex clients like that of Quake3. The reason being that even GLX, the X related extensions to OpenGL only had local ‘direct rendering’ as an optional thing. But that was about the tipping point on the OpenGL timeline where the distance between locally optimal rendering and remote optimal rendering became much too great, and the large swath of developers- and users- in charge largely favoured the locally optimal case for desktop like workloads.
The big advantage non-local X had over other remote desktop solutions, of which there are far too many, is exactly this part. As far as the pragmatic user could care, the idea of transparency (or should it be translucency?) was simply to be able to say “hey you, this program, and only this program on this remote machine, get over here!”.
The principal quality was the relative seamlessness of the entire set of features on a per window basis, and that goes unmatched to this very day, but with every ‘integrated desktop environment’ advancement, the feature grows weaker and the likelihood of applications working tolerably like this decreases drastically.
An unusably short answer would be: the convergence of many things happened. A slightly longer answer can be found here: X’s network transparency has wound up mostly being a failure. My condensed take is this:
Evolution of accelerated graphics happened, or the ‘Direct Rendering Infrastructure, DRI’ as it is generationally referenced in the Xorg and Linux ecosystem. Applications starting to depend heavily on network unfriendly IPC systems that were being used in addition to X rather than in cooperation with it. You wanted sound to go with your application? Sorry. Notification popups going to the wrong machine? oops, D-Bus! and so on.
This technical development is what one side of the argument is poking at when they go ‘X is not network transparent!’, while the other side are quick to retort that they are, in fact, running emacs over X on the network to this very day. Try it for yourself, it is not that the mechanisms have suddenly disappeared; it should be a short exercise to gain some experience. From my own experiments just prior to writing this article, the results varied wildly from pleasant to painful depending on the toolkit.
Thus far, I have mostly painted a grim portrait, yet there are more interesting sides to this. These more interesting things are XPRA and X2go. X2go address some of the shortcomings in ways that still leverage parts over X without falling back to the lowest “no way out” common denominator of sending an already composited framebuffer across the wire. It does so by using a custom X server with a different line protocol for external communication and a carrier for adding in sound, among other things. Try it out, experiment with it! it is pretty neat.
Alas this approach also falls flat when it comes to accelerated composition past a specific feature-set, which can be seen in the compatibility documentation notes. That aside, X2go is still very actively both developed, and used. The activity on mailing lists, irc and gatherings all act as testament to the relevance of the feature and its current form, from both a user- and a develop- perspective.
What does the future hold?
So outside succumbing to using the web browser and possibly electron as its other springboard, what options are there?
Lets start with the ‘design by committee’ exercise that is Wayland and use it as an indicator of things that might become twisted reality.
From what I could find, there is a total of one good blog post/PoC that, in stark contrast to the rambling fever dreams of most forum threads on the subject, experiments technically with the possibility of transparent in the sense of “a client connecting/bridged to a remote server” and not opaque in the sense of “a server compositing and translating n clients to a different protocol”. Particularly note the issues around keyboard and descriptor passing. Those are significant yet still only the tip of a very unpleasant iceberg.
The post itself does a fair job providing notes on some of the problems, and you can discover a few more for yourself if you patch or proxy the wayland client library implementation to simulate various latencies in the buffer dispatch routine. Enjoy troubleshooting why clients gets disconnected or crash sporadically. It turns out testing asynchronous event driven implementations reliably is hard and not that much effort is being put into toolkit backends for Wayland; too bad most of the responsibilities have been pushed to the toolkit backends in order to claim that the server side is so darn simple.
The reason I bring this up is that what will eventually happen is eluded to in the Wayland FAQ:
This doesn’t mean that remote rendering won’t be possible with Wayland, it just means that you will have to put a remote rendering server on top of Wayland. One such server could be the X.org server, but other options include an RDP server, a VNC server or somebody could even invent their own new remote rendering model.
The dumbest thing that can happen is that people take it for the marketing gospel it is, and actually embed VNC on the compositor side. I tried this out of sheer folly back in ~2013 and the experience was most unpleasant.
RFB, the underlying protocol in ‘VNC’, is seriously terrible; even if you factor in the many extensions, proprietary as well as public. Making fun of X for having a dated view on graphics and in the next breath considering VNC has quite some air of irony to it. Its single quality is the inertia in clients being available on nearly every platform. At least the public part of the protocol (RFC6143) is documented in such a coherent and beautiful way that it puts the soup of scattered XML files and TODO sprinkled PDFs that is “modern” Wayland forever in the corner.
The counterpoint to that quality is that the RFB implementations have subtle incompatibilities with each other, so you do not know which features that can be relied on when, or to what extent; assuming the connection does not just terminate on connection handshake. This was, as an example, the case for many years with Apples VNC client or server connecting to one not written by Apple.
The second dumbest thing is to use RDP. It has features. Lots of them. Even a printer server and usb server and file system mount translation. Heck, all the things that Xorg was made fun of for having, is in there, and then some.
The reverse engineered implementation of this proprietary Microsoft monstrosity, FreeRDP, is about the code size of the actually used parts of Xorg, give or take some dependencies. In C. In network facing code. See where this is heading? Embed that straight into your privileged Wayland compositor process and I will just sit here in silence and enjoy the fireworks.
The least bad available technology to try and get in there would be the somewhat forgotten SPICE project, which is currently ‘wasted’ as a way of integrating and interacting with KVM/Qemu.
Rounding things off, the abstract point of the ‘VNC-‘ idea argument is, of course, the core concept of treating client buffers solely as opaque texture bitmaps in relation to an ordered stream of input and display events; not the underlying protocol as such. The core of the argument is that networked ‘vector’ drawing is defunct and dead or dying. The problem with that argument is that it is trivially shown to be false, well illustrated by the web browser which shows some of the potential. It is only partially right in the X case as X2go shows that there is validity to proper segmentation of the buffers so that the networking part can optimise and chose compression, caching and other transfer parameters based on the actual non-composited contents.
If you made it this far and want to punish yourself extra – visit or revisit this forum thread. Since the network transparency option in Arcan is nearing a useable state, next in line will be a follow up to this article.