This article presents an interpretation of the history surrounding the ability for X clients to interact with X servers that are running on other machines over a network; recent arguments as to that ability being defunct and broken; problems with the feature itself; going into what it was, what happened along the way, and where things seem to be heading.
The high level summary of the argumentation herein is that there is validity to the claims that, to this very day, there is such a thing as network transparency in X. It exists on a higher level than streaming pixel buffers, but has a diminishing degree of practical usability and interest. Its technical underpinnings are fundamentally flawed, dated and criminally inefficient. Alas, similarly dated (VNC/RFB) or perversely complex (RDP) solutions are far from reasonable alternatives.
What are the network features of X?
If you play things strict, all of X are. It should be the very point of having a client / server protocol and not an API/ABI.
Protocol vs. API/ABI tangent: Communication that travel across hard system barriers need to consider things like difference in endianness, loss in transit, remote addressing and so on, while the abstract state machine(s) need to account for parameters that are fairly invisible locally. Some examples of such parameters would be the big sporadic delays caused by packet corruption and retransmission, a constantly high base latency (100+ms) and buffer back-pressure (clients keep sending new frames and commands exceeding the available bandwidth of the communication channel, accumulating into local buffers, like stepping on a garden hose and see the bubble grow). The interplay between versions and revisions also tend to matter more in protocol design than in API design, unless you go cheap and reject client – server version mismatch.
Back to X: The real (and only) deal for X networking is in its practical nature; the way things work from a user standpoint. In the days of yore, one could simply chant the following incantation:
Should the gods be willing, you would have its very soul stare back at you through heavily aliased portals. The only difference to the local version would be a change to the “DISPLAY=:0” form, but other than that, the rest was all transparent to the user.
Now, the some.ip:port form assumed you were OK with anyone between you and the endpoint being able to listen in “on the wire”, possibly doing all kinds of nasty stuff with the information in transit. To add insult to injury, Pixel buffers were also not compressed so when they became too numerous or large, the network was anything but happy. The feature was really only ever ‘good’ through the rose tinted glasses of nostalgia on a local area network; your home, school, or business; certainly not across the internet.
The form above also assumes that the X server itself had not been started with the “-nolisten tcp” argument set, or that you were using the better option of letting an SSH client configure forwarding, introduce compression and provide otherwise preferential treatment like disabling Nagel’s Algorithm. Even then, you had to be practically fine with the idea that some of your communication could be deduced from side channel analysis (hint: even your keypresses looks very distinct from a packet-size over time plot) and so on. Details like this also puts a bit of a dent in the ‘transparent to the user’ idea.
Those details in spite, this was a workable scenario for a long time, even for relatively complex clients like that of the venerable Quake 3. The reason being that even GLX, the X related extensions to OpenGL only had local ‘direct rendering’ as an optional thing. But that was about the tipping point on the OpenGL timeline where the distance between locally optimal rendering and remote optimal rendering became much too great, and the large swath of developers- and users- in charge largely favoured the locally optimal case for desktop like workloads.
The big advantage non-local X had over other remote desktop solutions, of which there are far too many, is exactly this part. As far as the pragmatic user could care, the idea of transparency (or should it be translucency?) was simply to be able to say “hey you, this program, and only this program on this remote machine, get over here!”.
The principal quality was the relative seamlessness of the entire set of features on a per window basis, and that, sadly, goes unmatched to this very day, but with every ‘integrated desktop environment’ advancement, the feature grows weaker and the likelihood of applications being usable partially, or even at all, like this decreases drastically.
An unusably short answer would be: the convergence of many things happened. A slightly longer answer can be found here: X’s network transparency has wound up mostly being a failure. My condensed take is this:
Evolution of accelerated graphics happened, or the ‘Direct Rendering Infrastructure, DRI’ as it is generationally referenced in the Xorg and Linux ecosystems. Applications starting to depend heavily on network unfriendly IPC systems that were being used as a sideband to X rather than in cooperation with it. You wanted sound to go with your application? Sorry. Notification popups going to the wrong machine? oops, now you need D-Bus! and so on.
This technical development is what one side of the argument is poking fun at when they go ‘X is not network transparent!’, while the other side are quick to retort that they are, in fact, running emacs over X on the network to this very day. The easy answer is to try it for yourself, it is not that the mechanisms have suddenly disappeared; it should be a short exercise to gain some practical experience. From my own experiments just prior to writing this article, the results varied wildly from pleasant to painful depending on how the application and its toolkit were written.
Thus far, I have mostly painted a grim portrait, yet there are more interesting sides to this. These more interesting things are XPRA and X2go. X2go address some of the shortcomings in ways that still leverage parts over X without falling back to the lowest “no way out” common denominator of sending an already composited framebuffer across the wire. It does so by using a custom X server with a different line protocol for external communication and a carrier for adding in sound, among other things. Try it out! it is pretty neat.
Alas this approach also falls flat when it comes to accelerated composition past a specific feature-set, which can be seen in the compatibility documentation notes. That aside, X2go is still very actively both developed, and used. The activity on mailing lists, irc and gatherings all act as testament to the relevance of the feature and its current form, from both a user- and a develop- perspective.
What does the future hold?
So outside succumbing to using the web browser and possibly bastardised versions like ‘electron’ as its other springboard, what options are there?
Lets start with the ‘design by committee’ exercise that is Wayland, and use it as an indicator of things that might become a twisted reality.
From what I could find, there is a total of one good blog post/PoC that, in stark contrast to the rambling fever dreams of most forum threads on the subject, experiments technically with the possibility of transparent in the sense of “a client connecting/bridged to a remote server” and not opaque in the sense of “a server compositing and translating n clients to a different protocol”. Particularly note the issues around keyboard and descriptor passing. Those are significant yet still only the tip of a very unpleasant iceberg.
The post itself does a fair job providing notes on some of the problems, and you can discover a few more for yourself if you patch or proxy the wayland client library implementation to simulate various latencies in the buffer dispatch routine, sprinkle a few “timesleeps” in there. Enjoy troubleshooting why clients gets disconnected or crash sporadically. It turns out testing asynchronous event driven implementations reliably is really hard and not enough effort is being put into toolkit backends for Wayland; too bad most of the responsibilities have been pushed to the toolkit backends in order to claim that the server side is so darn simple.
That is not to say that it cannot be done, of course – the linked blog post showed as much. The issue is that the chasm between a. the “basic” proxy-server/patching support libraries and writing over a socket, even with some video compression, and b. getting to even the level of x2go with the aforementioned problems is a daunting task. Then you would still fight the sharp corners with queueing around back-pressure so data-device (clipboard) actions does not stall everything; the usability problems from D-bus dependent features breaking; audio not being paired, synched and resampled to the video it is tied to; and so on.
The reason I bring this up is that what will eventually happen is eluded to in the Wayland FAQ:
This doesn’t mean that remote rendering won’t be possible with Wayland, it just means that you will have to put a remote rendering server on top of Wayland. One such server could be the X.org server, but other options include an RDP server, a VNC server or somebody could even invent their own new remote rendering model.
The dumbest thing that can happen is that people take it for the marketing gospel it is, and actually embed VNC on the compositor side. I tried this out of sheer folly back in ~2013 and the experience was most unpleasant.
RFB, the underlying protocol in ‘VNC’, is seriously terrible; even if you factor in the many extensions, proprietary, as well as public. Making fun of X for having a dated view on graphics and in the next breath considering VNC has quite some air of irony to it. RFBs qualities is the inertia in clients being available on nearly every platform, and that the public part of the protocol (RFC6143) is documented in such a coherent and beautiful way that it puts the soup of scattered XML files and TODO sprinkled PDFs that is “modern” Wayland forever in the corner.
The counterpoint to the inertia quality is that the RFB implementations have subtle incompatibilities with each other, so you do not know which features that can be relied on, when they can be relied on, or to what extent; assuming the connection does not just terminate on connection handshake. The later case was, as an example, the case for many years with Apples VNC server being connected to from one not written by Apple.
The second dumbest thing is to use RDP. It has features. Lots of them. Even a printer server and usb server and file system mount translation. Heck, all the things that Xorg was made fun of for having, is in there, and then some. The reverse engineered implementation of this proprietary Microsoft monstrosity, FreeRDP, is about the code size of the actually used parts of Xorg, give or take some dependencies. In C. In network facing code. See where this is heading? Embed that straight into your privileged Wayland compositor process, and I will just sit here in bitter silence and be annoyed by the fireworks.
The least bad available technology to try and get in there would be the somewhat forgotten SPICE project, which is currently ‘wasted’ as a way of integrating and interacting with KVM/Qemu. In many ways, with the local buffer passing modifications, it makes a reasonably apt local display server API as well.
Rounding things off, the abstract point of the ‘VNC-‘ idea argument is, of course, the core concept of treating client buffers solely as opaque texture bitmaps in relation to an ordered stream of input and display events; not the underlying protocol as such.
The core of the argument is that networked ‘vector’ drawing is defunct and dead or dying. The problem with that argument is that it is trivially shown to be false, well illustrated by the web browser which shows some of the potential and public interest. We are not just streaming pixel buffers, and for good reason. The argument is only partially right in the X case as X2go shows that there is validity to proper segmentation of the buffers, so that the networking part can optimise and chose compression, caching and other transfer parameters based on the actual non-composited contents.
If you made it this far and want to punish yourself extra – visit or revisit this forum thread and contrast it in relation to this article.