This article presents an interpretation of the history surrounding the ability for X clients to interact with X servers running on other machines, recent arguments as to that ability being defunct and broken, problems with the feature itself, going into what it was and what happened along the way and where things seem to be heading.
The high level summary of the argumentation herein is that there is validity to the claims that, to this day and age, there is such a thing as network transparency in X on a higher level than streaming pixel buffers but it has a diminishing degree of practical usability and interest, and the technical underpinnings are fundamentally flawed, dated and criminally inefficient; yet similarly dated (VNC) or perversely complex (RDP) solutions are not reasonable alternatives.
What are the network features of X?
If you play things strict, all of it are. It should be the very point of having a client / server protocol and not an API. Communication go across hard system barriers and data packets need to consider things like endianness, loss, addressing and so on while the state machine(s) need to account for delays, latencies and buffer back-pressure. Interplay between versions and revisions matter much more in protocol design than in API design, and the effects of tradeoffs between synchronous (processing being blocked waiting for a reply) and asynchronous forms of communication becomes highly visible.
The real (and only) deal for X networking is in its practical nature; the way things work from a user standpoint. In the days of old, one could simply utter the following incantation:
and have its soul stare back at you through heavily aliased portals. This was practically similar to the local “DISPLAY=:0 eyes” form, hence transparent.
The remote display thing assumes that you were OK with anyone able to listen in “on the wire” and doing all kinds of nasty things with the information gathered. Pixel buffers were not compressed so when they too numerous or large, the network was anything but happy. It was good only through the rose tinted glasses of nostalgia and for a local area network; your home, school, or business; certainly not across the internet.
It also assumes that the X server itself was not started with the “-nolisten tcp” argument set or that you were using the better option of letting SSH set forwarding up, compress and provide otherwise preferential treatment. Even then, it assumes that you were practically fine with the idea that some of your communication could be deduced from side channel analysis and so on. Digressing a bit, in spite of speculative vulnerabilities being all the rage yet hard to do practical at scale, the side channel thing is highly relevant in much more modern protocols and efforts in this ‘web browser as a dumb terminal’ day and age. Just sayin’.
Those details in spite, this was a workable scenario for a long time, including relatively complex clients like that of Quake3. The reason being that even GLX, the X related extensions to OpenGL only had local ‘direct rendering’ as a very optional thing. That said, it was just about at the tipping point where the distance between locally optimal rendering and remote optimal rendering became much too great, and the large swath of developers- and users- in charge largely favoured the locally optimal case for desktop like workloads.
The big advantage non-local X had over other remote desktop solutions, of which there were far too many, is exactly this part. As far as the pragmatic user could care, the idea of transparency (or should it be translucency?) was simply to be able to say “hey you, this program, and only this program on this remote machine, get over here!”.
The principal quality was the relative seamlessness of the entire set of features on a per window basis, and that goes unmatched to this very day, but with every ‘integrated desktop environment’ advancement, the feature grows weaker and the likelihood of applications working tolerably like this decreases drastically.
An unusably short answer would be: the convergence of many things happened. A slightly longer answer can be found here: X’s network transparency has wound up mostly being a failure. My condensed take is this:
Evolution of accelerated graphics happened, or the ‘Direct Rendering Infrastructure, DRI’ as it is generationally referenced in the Xorg and Linux ecosystem. Applications starting to depend heavily on network unfriendly IPC systems that were being used in tandem with X. You wanted sound to go with your application? Sorry. Notification popups going to the wrong machine? oops, D-Bus! and so on.
This is what one side of the argument is poking at when they go ‘X is not network transparent!’, while the other side are quick to retort that they are, in fact, running emacs over X on the network to this very day. Try it for yourself, it is not that the mechanisms have suddenly disappeared and it should be a short exercise to gain some experience. From my own experiments just prior to writing this article, the results varied wildly from pleasant to painful depending on the toolkit. No points are awarded for guessing which ones faired the worst.
Thus far, I have mostly painted a grim portrait, yet there are more interesting sides to this, perhaps best represented by that of XPRA or X2go. X2go address some of the shortcomings in ways that still leverage parts over X without falling back to the lowest “no way out” denominator of a composited framebuffer. It does so by using a custom X server with a different line protocol for external communication and a carrier for muxing in sound, among other things.
While this approach still falls flat when it comes to accelerated composition past a specific feature-set, as can be seen in the compatibility documentation notes, it is still very actively developed, used. The activity on mailing lists, irc and gatherings all act as testament to the relevance of the feature and its current form, from both a user- and a develop- perspective.
What does the future hold?
… or the alternative section title of “Wayland, and the irony of dressing up the past as the future”.
It is no secret that I have become increasingly disappointed as to the technical and architectural merits of Wayland over the course of the last few years. This comes empirically from my quite public code, along with some not so public experiments and consultation work. That said, the Wayland ecosystem (because hey, it’s not “just a protocol”) is useful for reference here as an indicator of things that might become reality for some poor souls.
There is a total of one blog post/PoC that, in stark contrast to the rambling fever dreams of most forum threads on the subject, experiments technically with the possibility of transparent in the sense of “a client connecting/bridged to a remote server” and not opaque in the sense of “a server compositing and translating n clients to a different protocol”. Particularly note the issues around keyboard and descriptor passing. Those are significant yet still only the tip of a very unpleasant iceberg.
The post itself does a fair enough job providing notes on some of the problems, and you can discover a few more for yourself if you patch or proxy the wayland client library implementation to simulate various latencies in the buffer dispatch routine. Enjoy troubleshooting why clients gets disconnected or crash sporadically (or more than usually). It turns out testing asynchronous event driven implementations reliably is hard and not that much effort is being put into toolkit backends for Wayland; too bad near all the responsibility has been pushed to the toolkit backends in order to claim that the server side look simple.
The reason I bring this up is that what will eventually happen is eluded to in the Wayland FAQ. All of them are ‘opaque’ in the aforementioned sense.
The dumbest thing that can happen is that people take it too literally and actually embed VNC on the compositor side. RFB, the underlying protocol, is seriously terrible; even if you factor in the many extensions, proprietary as well as public. Making fun of X having a dated view on graphics and in the next breath considering VNC has some air of irony to it.
Putting aside the seriously dodgy implementations that persist in all the places you would not want them to be, the one redeeming feature is the inertia in that there are clients on lots of platforms.
The counterpoint to that feature is that they practically all have subtle incompatibilities with the many servers that exist, so you do not know which features that can be relied on when or to what extent; assuming the connection does not just terminate on connection handshake, as was the case for many years with Apples VNC client or server connecting to one not written by Apple. At least the public part of the protocol (RFC6143) is documented in such a coherent and beautiful way that it puts the soup of scattered XML files and TODO sprinkled PDFs that is “modern” Wayland forever in the corner.
A fun holiday depression exercise and CCC tradition in some circles, I have heard, is to script a scan of the open internet for 5900, connect, take a screenshot, print them all out and stitch together into a seriously depressing virtual quilt.
The second dumbest thing is to use RDP. It has features. Lots of them. Even a printer server and file system mount translation. Heck, all the things that Xorg was made fun of for having, is in there, and then some. The reverse engineered implementation of this proprietary Microsoft monstrosity is about the code size of the actually used parts of Xorg, give or take some dependencies. In C. In network facing code. See where this is heading? Does that not sound like the greatest thing to embed in that oh so simple and minimalistic Wayland Compositor you pretend to use for other things than XWayland?
The least bad option – outside of the more remote possibility of a purpose-fit design – is to realise that SPICE still exists, mostly being wasted as a way of integrating with KVM/Qemu, and it too has quite the room for improval.
Rounding things off, the abstract point of the referenced ‘VNC-‘ idea argument is, of course, the core concept of treating client buffers as opaque texture bitmaps in relation to an ordered stream of input and display events; not the underlying protocol as such.
The core of the argument is that being that networked ‘vector’ drawing is defunct and dead or dying. The problem with that argument is that it is trivially false, well illustrated by the web browser which shows some of the potential, and then only partially right in the X case as X2go shows that there is validity to proper segmentation of the buffers so that the networking part can optimize its contents and caching thereof.
If you made it this far and want to punish yourself extra – visit or revisit this forum thread.