Samstag, September 09, 2017

radeonsi: out-of-order rasterization on VI+

I've been polishing a patch of Marek to enable out-of-order rasterization on VI+. Assuming it goes through as planned, this will be the first time we're adding driver-specific drirc configuration options that are unfamiliar to the enthusiast community (there's radeonsi_enable_sisched already, but Phoronix has reported on the sisched option often enough). So I thought it makes sense to explain what those options are about.

Background: Out-of-order rasterization

Out-of-order rasterization is an optimization that can be enabled in some cases. Understanding it properly requires some background on how tasks are spread across shader engines (SEs) on Radeon GPUs.

The frontends (vertex processing, including tessellation and geometry shaders) and backends (fragment processing, including rasterization and depth and color buffers) are spread across SEs roughly like this:

(Not shown are the compute units (CUs) in each SE, which is where all shaders are actually executed.)

The input assembler distributes primitives (i.e., triangles) and their vertices across SEs in a mostly round-robin fashion for vertex processing. In the backend, work is distributed across SEs by on-screen location, because that improves cache locality.

This means that once the data of a triangle (vertex position and attributes) is complete, most likely the corresponding rasterization work needs to be distributed to other SEs. This is done by what I'm simplifying as the "crossbar" in the diagram.

OpenGL is very precise about the order in which the fixed-function parts of fragment processing should happen. If one triangle comes after another in a vertex buffer and they overlap, then the fragments of the second triangle better overwrite the corresponding fragments of the first triangle (if they weren't rejected by the depth test, of course). This means that the "crossbar" may have to delay forwarding primitives from a shader engine until all earlier primitives (which were processed in another shader engine) have been forwarded. This only happens rarely, but it's still sad when it does.

There are some cases in which the order of fragments doesn't matter. Depth pre-passes are a typical example: the order in which triangles are written to the depth buffer doesn't matter as long as the "front-most" fragments win in the end. Another example are some operations involved in stencil shadows.

Out-of-order rasterization simply means that the "crossbar" does not delay forwarding triangles. Triangles are instead forwarded immediately, which means that they can be rasterized out-of-order. With the in-progress patches, the driver recognizes cases where this optimization can be enabled safely.

By the way #1: From this explanation, you can immediately deduce that this feature only affects GPUs with multiple SEs. So integrated GPUs are not affected, for example.

By the way #2: Out-of-order rasterization is entirely disabled by setting R600_DEBUG=nooutoforder.

Why the configuration options?

There are some cases where the order of fragments almost doesn't matter. It turns out that the most common and basic type of rendering is one of these cases. This is when you're drawing triangles without blending and with a standard depth function like LEQUAL with depth writes enabled. Basically, this is what you learn to do in every first 3D programming tutorial.

In this case, the order of fragments is mostly irrelevant because of the depth test. However, it might happen that two triangles have the exact same depth value, and then the order matters. This is very unlikely in common scenes though. Setting the option radeonsi_assume_no_z_fights=true makes the driver assume that it indeed never happens, which means out-of-order rasterization can be enabled in the most common rendering mode!

Some other cases occur with blending. Some blending modes (though not the most common ones) are commutative in the sense that from a purely mathematical point of view, the end result of blending two triangles together is the same no matter which order they're blended in. Unfortunately, additive blending (which is one of those modes) involves floating point numbers in a way where changing the order of operations can lead to different rounding, which leads to subtly different results. Using out-of-order rasterization would break some of the guarantees the driver has to give for OpenGL conformance.

The option radeonsi_commutative_blend_add=true tells the driver that you don't care about these subtle errors and will lead to out-of-order rasterization being used in some additional cases (though again, those cases are rarer, and many games probably don't encounter them at all).


Out-of-order rasterization can give a very minor boost on multi-shader engine VI+ GPUs (meaning dGPUs, basically) in many games by default. In most games, you should be able to set radeonsi_assume_no_z_fights=true and radeonsi_commutative_blend_add=true to get an additional very minor boost. Those options aren't enabled by default because they can lead to incorrect results.

Sonntag, Juni 25, 2017

ARB_gl_spirv, NIR linking, and a NIR backend for radeonsi

SPIR-V is the binary shader code representation used by Vulkan, and GL_ARB_gl_spirv is a recent extension that allows it to be used for OpenGL as well. Over the last weeks, I've been exploring how to add support for it in radeonsi.

As a bit of background, here's an overview of the various relevant shader representations that Mesa knows about. There are some others for really old legacy OpenGL features, but we don't care about those. On the left, you see the SPIR-V to LLVM IR path used by radv for Vulkan. On the right is the path from GLSL to LLVM IR, plus a mention of the conversion from GLSL IR to NIR that some other drivers are using (i965, freedreno, and vc4).

For GL_ARB_gl_spirv, we ultimately need to translate SPIR-V to LLVM IR. A path for this exists, but it's in the context of radv, not radeonsi. Still, the idea is to reuse this path.

Most of the differences between radv and radeonsi are in the ABI used by the shaders: the conventions by which the shaders on the GPU know where to load constants and image descriptors from, for example. The existing NIR-to-LLVM code needs to be adjusted to be compatible with radeonsi's ABI. I have mostly completed this work for simple VS-PS shader pipelines, which has the interesting side effect of allowing the GLSL-to-NIR conversion in radeonsi as well. We don't plan to use it soon, but it's nice to be able to compare.

Then there's adding SPIR-V support to the driver-independent mesa/main code.  This is non-trivial, because while GL_ARB_gl_spirv has been designed to remove a lot of the cruft of the old GLSL paths, we still need more supporting code than a Vulkan driver. This still needs to be explored a bit; the main issue is that GL_ARB_gl_spirv allows using default-block uniforms, so the whole machinery around glUniform*() calls has to work, which requires setting up all the same internal data structures that are setup for GLSL programs. Oh, and it looks like assigning locations is required, too.

My current plan is to achieve all this by re-using the GLSL linker, giving a final picture that looks like this:

So the canonical path in radeonsi for GLSL remains GLSL -> AST -> IR -> TGSI -> LLVM (with an optional deviation along the IR -> NIR -> LLVM path for testing), while the path for GL_ARB_gl_spirv is SPIR-V -> NIR -> LLVM, with NIR-based linking in between. In radv, the path remains as it is today.

Now, you may rightfully say that the GLSL linker is a huge chunk of subtle code, and quite thoroughly invested in GLSL IR. How could it possibly be used with NIR?

The answer is that huge parts of the linker don't really that much about the code in the shaders that are being linked. They only really care about the variables: uniforms and shader inputs and outputs. True, there are a bunch of linking steps that touch code, but most of them aren't actually needed for SPIR-V. Most notably, GL_ARB_gl_spirv doesn't require intrastage linking, and it explicitly disallows the use of features that only exist in compatibility profiles.

So most of the linker functionality can be preserved simply by converting the relevant variables (shader inputs/outputs, uniforms) from NIR to IR, then performing the linking on those, and finally extracting the linker results and writing them back into NIR. This isn't too much work. Luckily, NIR reuses the GLSL IR type system.

There are still parts that might need to look at the actual shader code, but my hope is that they are few enough that they don't matter.

And by the way, some people might want to move the IR -> NIR translation to before linking, so this work would set a foundation for that as well.

Anyway, I got a ridiculously simple toy VS-PS pipeline working correctly this weekend. The real challenge now is to find actual test cases...

Sonntag, Januar 22, 2017

Unser Wohlstand basiert auf geerbtem Wissen und aktiven Gehirnen

Vor Kurzem schrieb Stefan Pietsch einen Essay, in dem er einigermaßen frei assoziierend eine Reihe von Themen im Umfeld "Arbeit" kommentiert. Der Text ist etwas zu unfokussiert für eine umfassende Antwort, aber ich möchte ein paar Punkte ansprechen, bei denen wir nicht ganz einig sind

Ich will einmal am Ende anfangen, worauf unser Wohlstand eigentlich beruht. Wenn man sich das im historischen Vergleich überlegt, dann übersieht Pietsch einen ganz wichtigen Faktor: im Wesentlichen beruht unser Wohlstand auf den wissenschaftlichen und technologischen Errungenschaften, die wir von unseren Vorfahren geerbt haben und heute weiterentwickeln.

Das hat interessante Konsequenzen. Da das Wissen und die Technologien von der ganzen Menschheit geerbt wurden, kann ein Einzelner daraus resultierende Gewinne nicht ohne Weiteres für sich beanspruchen. Zwar hat der Einzelne, indem er das geerbte Wissen praktisch anwendbar macht, durchaus seinen Teil beigetragen. Diese Argumentation eignet sich aber nur zur Rechtfertigung von relativem Wohlstand im Vergleich zu anderen, die weniger zur praktischen Anwendung des geerbten Wissens beigetragen haben. Sie eignet sich nicht zur Rechtfertigung von absolutem Wohlstand, da der absolute Wohlstand eben nicht aus dem persönlichen Beitrag kommt. Dieser Widerspruch lässt sich durch eine sehr "linke" Gesetzgebung ausgleichen, die dafür sorgt, dass jeder über hohen absolute Wohlstand verfügt, auch wenn es weiterhin relative Unterschiede gibt (die dann aber natürlich geringer ausfallen).

Unser Wohlstand beruht aber natürlich nicht nur auf geerbtem Wissen. Pietsch nennt ein paar weiter Punkte, die aber nicht wirklich an die Wurzel gehen. Ganz wesentlich beruht unser Wohlstand darauf, dass ein möglichst großer Anteil der menschlichen Gehirne, die auf unserem Planeten wandeln, möglichst gut in gesellschaftliche Produktionsprozesse einbezogen und genutzt werden.

Dazu gehört natürlich die von Pietsch genannte Freiheit: Gehirne, die von sich aus Konstruktives leisten wollen, muss man machen lassen.

Dazu gehört auch, dass man Gehirne, die vielleicht nicht unbedingt von sich aus Konstruktives leisten, trotzdem dazu ermutigen. Die von Pietsch genannte Achtung des Eigentums ist ein entsprechendes Anreizsystem. Wenn man das Eigentum von dieser Perspektive aus betrachtet erkennt man aber auch sofort, dass es dabei schnell zu Tradeoffs kommt. Gerne wird ja zum Beispiel die Senkung von Spitzensteuersätzen als Anreiz begründet. Aber hier muss man doch nachhaken: wie stark ist der Anreizeffekt dieser Steuersenkungen denn nun wirklich? Und wäre es nicht besser, die Gehirne in der breiten Masse zu mobilisieren, anstatt sie womöglich noch durch die offensichtliche und steigende Ungleichheit zu demoralisieren?

Der Blick auf die aktiven Gehirne wirft auf viele politische Themen ein anderes Licht. Pietsch nennt zum Beispiel eine Selbständigenquote von etwa 23% im Deutschland Anfang der 1960er und sieht darin lobenswerte Eigeninitiative. (Er unterschlägt dabei übrigens die in seiner eigenen Quelle erkennbare Tatsache, dass der Wert bis ins Jahr 2000 in Deutschland höher lag als in den USA, obwohl diese landläufig als sehr viel unternehmerischer wahrgenommen werden. Glaube keiner Statistik, die du nicht selbst selektiv ausgewählt hast.)

Ich sehe in dieser hohen Selbständigenquote aber auch Unternehmen mit durchschnittlich vier Personen. Da ist keine tiefe Spezialisierung möglich, und damit ist auch der optimalen Nutzung von Gehirnen eine gewisse Grenze gesetzt.

Ich sehe auch das Potential für Scheinselbständigkeit oder Selbständigkeit aus Not. Da wird ein großer Teil der mentalen Energie verschwendet, weil sich die Menschen mit Sorgen auseinandersetzen müssen, die ihnen von guter linker Politik genommen werden könnten.

Letztlich ist es die Einbindung möglichst vieler Menschen in produktiv-kreative Prozesse auf hohem Niveau, die für unseren Wohlstand essentiell ist. Diese Einbindung kann in Form von Unternehmertun geschehen, aber oft ist das eben auch der falsche Weg.