Sonntag, Januar 24, 2016

History of internet culture in acronyms

I've been on the internet for a pretty long time now. At least for me, getting on the internet back in the 90s meant making myself familiar with its "local" culture, customs and traditions. I'm sure that is still the case for many today. Curiously, I started to notice the appearance of acronyms that I wasn't familiar with, and I thought it interesting to consider how changing acronyms reflect broader changes of internet culture.

Back in the old days, a large fraction of what the internet was about was more or less serious, text-based discussion among relatively small groups. This shows in acronyms that facilitate such discussion, like AFAIK (as far as I know) or IM(H)O (in my (humble) opinion). Of course, social interactions could be rough back then as well: for every IMHO there's an IMNSHO (in my not so humble opinion), and of course there is the tradition of telling people to RTFM (read the f{ine, ucking} manual).

The acronyms that I did not recognize at first reflect an internet that has changed. MFW (my face when) reflects a culture of image reaction macros that were made possible by higher bandwidth. ICYMI (in case you missed it) certainly developed as the internet became a mass communication medium where shared stories and memes roll around. Similarly, ITT (in this thread) is a pattern where the author of a posting tries to set themselves apart from the "herd" of others in the thread, clearly a reaction to the increased size of groups and increasing anonymity in which the interactions often take place.

Samstag, Dezember 19, 2015

Debugging a GPU VM fault

Having finished debugging and fixing a rather tricky GPU VM fault bug in the radeonsi driver, I thought I'd document the bug chasing process I went through (some parts are cleaned up with dead ends removed). May it help myself and others in the future.

Fortunately, the original submitter of the bug had already bisected the cause of the VM faults to a change in LLVM, so the fault was clearly due to some shader. Unfortunately, the triggering commit in LLVM was completely unrelated to Radeon, so it was very unclear what was going on. Still, the bug occured in the publically available Unreal Elemental Demo and was easily reproducible, so off I went.

Since some shader was to blame, the first thing to do after reproduction was to collect all shaders before and after the bad commit, using RADEON_DUMP_SHADERS=y (R600_DEBUG=ps,vs,gs also does this). This resulted in a lot of output with a large diff between the good and the bad run. Clearly, the change in LLVM subtly affected register allocation and/or instruction scheduling in the compiler in a way that affected many shaders and exposed some pre-existing, underlying bug. I needed to find the exact shader that caused problems.

The next step, to ensure even more reliable and deterministic reproduction, was to record an apitrace. This allows us to replay the exact same sequence of OpenGL calls that leads to the VM faults over and over again, to learn ever more about what's going on. (The Unreal Elemental Demo always plays the same scene, but it is affected by timing.)

Now it was time to find the exact draw call that caused the problems. The driver has some tools to help with that: the GALLIUM_DDEBUG feature is meant to detect lockups, but it conveniently causes a command stream flush after every draw call, so I used it by setting GALLIUM_DDEBUG=800. This makes the replay terribly slow (there's a reason we batch many draw calls into a single CS, also called IB in the kernel). So I implemented a GALLIUM_DDEBUG_SKIP feature in the driver that let me skip the additional flushes and lockup checks for the initial, known-good segment of the trace.

In addition, the driver comes with a debug feature that detects and aborts on VM faults, which is enabled via R600_DEBUG=check_vm. Since the fault comes from a shader, we also need a way to cross-reference the dected fault to the currently bound shader. This is achieved by dumping shaders and enabling the vm debug option, for a full command line of something like
GALLIUM_DDEBUG=800 GALLIUM_DDEBUG_SKIP=170000 \
RADEON_DUMP_SHADERS=y R600_DEBUG=vm,check_vm \
glretrace -v ElementalDemo.trace > runXXX.log 2>&1
The option -v for glretrace dumps all OpenGL calls as they are executed, which also turned out to be useful.

How to find the faulty shader from all that? Well, the check_vm logic not only detects VM faults, but also writes helpful logging dumps to a file in ~/ddebug_dumps/ (use less -R to make sense of the coloring escape codes that are written to the file). Most crucially, this dump contains a list of all buffers that were mapped, obviously including the buffers that contain the shader binaries. In one example run:
        Size    VM start page         VM end page           Usage
...
         245    -- hole --
           1    0x0000000113a9b       0x0000000113a9c       USER_SHADER
         268    -- hole --
...
         564    -- hole --
           2    0x000000016f00d       0x000000016f00f       USER_SHADER
         145    -- hole --
Remember that we enabled the vm debug option together with shader dumping? This means our log contains lots of lines of the form
VM start=0x105249000  end=0x10524A000 | Buffer 4096 bytes
(Note that the log contains byte addresses, while the check_vm dump contains page numbers. Pages are 4KB, so you just need to add or remove three 0s at the end to go from bytes to pages and vice versa.) All we need to do is grep the log file for "VM start=0x113A9B" and "VM start=0x16F00D" (mmh, I'm getting hungry...). And while those might appear multiple times if a buffer is reused or destroyed, the last occurence will be where the shader binary was created.

The shader dump contains three versions of the shader: the initial TGSI, which is what Gallium's state tracker provides to the hardware-dependent driver, the LLVM IR after an initial optimization pass, and the disassembly of the final shader binary as provided by LLVM. I extracted the IR of the two shaders (vertex and fragment), and compiled them with LLVM's standalone compiler, once with the "good" version of LLVM and once with the "bad" (in both cases using the command line llc -march=amdgcn -mcpu=tonga < shader.ll). It turned out that both shaders were affected by the change, so I needed to figure out whether it was the vertex or the fragment shader.

The GUI of apitrace has the wonderful ability of allowing you to edit a trace. Remember that I used the -v option of glretrace? That produces lots of lines like
2511163 @2 glDrawRangeElements(mode = GL_TRIANGLES, start = 0, end = 9212, ...
Indeed, that was the final reported draw call, i.e. the one causing the fault. So let's open the trace with qapitrace and jump to that exact call using its number. Sure enough, not long before the call we find
glUseProgram(505)
We need to find where this program is linked, which is typically much earlier in the program. So we search for glLinkProgram(program = 505) and find exactly one such call. A short bit above, we find the calls
glAttachShader(505, 69)
glAttachShader(505, 504)
where the fragment and vertex shaders are attached to the program. Finally, we search for glShaderSource(shader = 69 and 504 to find the call where the source is loaded into the shader. Now, we can edit those calls to replace the shaders by dummy versions. Be careful, though: the length of the shader source is passed as a separate argument, and you must adjust it manually or you will get surprising error messages when running the modified trace.

By doing so, I determined that the fragment shader was at fault: even minor modifications such as re-ordering statements without changing any effects removed VM faults when applied to the fragment shader, but not when applied to the vertex shader.

So... time to stare at the disassembly of the fragment shader. Unfortunately, there was nothing that caught my eye. It was a long shader with more than 700 instructions. So what next? Since even minor changes at the source level fixed the fault, no matter what kind of change, I needed to go deeper and modify the binary directly. I wrote a new feature for radeonsi that would help me do just that, by allowing me to tell the driver to replace the binary created by LLVM on the N'th compile by a binary that is supplied as an environment variable.

I would have loved to be able to edit assembly at this point. Unfortunately, the llvm-mc tool, which can theoretically act as an assembler, is not able to parse all the assembly constructs that llc generates for the AMDGPU backend. So I went with the next best option, creating an ELF object file using llc -march=amdgcn -mcpu=tonga -filetype=obj and editing the binary directly with a hex editor.

That wasn't too bad though: since VM faults are generated by memory instructions, I could just replace those memory instructions by NOPs. Since the shader dumps collected above helpfully include the binary representation of instructions in addition to the assembly, the instructions aren't too hard to find, either. I only needed to take care not to NOP out memory instructions whose output was then later used as addresses or resource descriptors for other memory instructions, or else I would have introduced new sources for VM faults!

At that point, I ran into a tough problem. My plan was to NOP out large groups of memory instructions initially, and then do a kind of binary search to isolate the bad access. Unfortunately, what happened was, roughly speaking, that when I NOP'ed out group A of instructions or group B of instructions, the VM faults remained, but when I NOP'ed out both groups at the same time, the VM faults disappeared. This can happen when there are really two underlying bugs, but unfortunately I did not see a plausible culprit in either group (in fact, the first bug which I found was actually outside both groups, but - as far as I understand it - depended subtly on timing and on how that affected the scheduling of shader waves by the hardware).

Luckily, at that point I had long suspected the problem to be in wait state handling. You see, in order to save on complicated circuitry in the hardware, there are some rarely occuring data hazards which the compiler must avoid by inserting NOPs explicitly (there is also the s_waitcnt instruction which is part of the strategy for hiding memory latency without complex out-of-order circuitry). So I read the documentation of those hazards again and noticed that the compiler in fact didn't insert enough wait states for a sequence involving register spills (i.e., something affecting only very large shaders). I fixed that, and while it didn't eliminate the VM faults, it changed their pattern sufficiently to give me new hope.

Indeed, with the additional wait states, my original idea of finding the bad instructions by binary search was successful. I narrowed the problem down to accesses to one single texture. At that point, my brain was too exhausted to see the bug that was rather obvious in hindsight, but a colleague pointed it out to me: there was a multi-word register copy which attempted to copy a resource descriptor (consisting of 8 32-bit words) between overlapping register ranges, and it was doing that copy in the wrong direction - kind of like using memcpy when you should be using memmove.

Once this second bug was found, coming up with a fix was relatively straightforward. And yes, I checked: both bug fixes were actually required to completely fix the original bug.

That was my story. Hopefully you've learned something if you've come this far, but there is not really much of a moral to it. Perhaps it is this: pray you have deterministically reproducible bugs. If you do, patiently collecting more and more information will lead you to a solution. If you don't, well, sometimes you just have to be lucky.

Sonntag, November 29, 2015

Steam with radeonsi on Ubuntu 15.10 (Wily Werewolf)

For public documentation's sake, if you want to run Steam on 64-bit Ubuntu 15.10 together with the open-source radeonsi driver (the part of Mesa that implements OpenGL for recent AMD GPUs), you'll probably run into problems that can be fixed by starting Steam as
LD_PRELOAD=/usr/lib/i386-linux-gnu/libstdc++.so.6 steam
(You'll have to have installed certain 32-bit packages, which you should have automatically been prompted to do while installing Steam.)

The background of this issue is that both Steam and radeonsi contain C++ code and dynamically link against the C++ runtime libraries. However, Steam comes with its own copy of these libraries, which it prefers to link against. Meanwhile, Ubuntu 15.10 is built using the most recent g++, which contains a new C++ ABI. This ABI is not supported by the C++ libraries shipped with Steam, so that by default, the radeonsi driver fails to load due to linking errors.

The workaround noted above fixes the problem by forcing the use of the newer C++ library. (Note that it does so only for 32-bit applications. If you're running 64-bit games from Steam, a similar workaround may be required.)

Montag, Juli 06, 2015

OXI zu Parallelwelten

Es sind krasse Parallelwelten, die sich in Folge der Finanzkrise in Europa aufgebaut haben. Letzte Woche wurde in den Tagesthemen kommentiert, die Griechen sollten doch bitte beim gestrigen Referendum mit Ja stimmen. Die Logik des Kommentators: Dann gebe es ein Ende (der Syriza-Regierung) mit Schrecken, statt einem Schrecken (stockende Verhandlungen) ohne Ende.

Dabei erleben die Griechen den Schrecken (Rezession) ohne Ende schon seit mehreren Jahren. Folgerichtig haben sie mit deutlicher Mehrheit Nein gesagt zu den Forderungen der Gläubigern. Womöglich ist damit ein Ende (Grexit) mit Schrecken eingeleitet - so sieht es in der Welt aus, in der ich lebe, und die sich offenbar deutlich von der jenes Kommentators unterscheidet.

Der Kommentator und ich verwenden die gleiche Redensart - "Lieber ein Ende mit Schrecken als ein Schrecken ohne Ende" - und drücken damit vollkommen gegensätzliche Meinungen aus.

Das macht mir schon länger Sorgen. Denn wie überlegen haben wir uns gefühlt, als vor über 10 Jahren in den USA das öffentlich-mediale Narrativ aus dem Ruder gelaufen ist und so den Irakkrieg ermöglicht hat! Und nun passiert bei uns etwas ganz ähnliches (nur dass es wenigstens nicht in einem Krieg, sondern "nur" in der ewigen Rezession einer Volkswirtschaft durch Austeritätspolitik mündet)!

Von daher: Ganz unabhängig davon, wie es mit dem Euro weitergeht, von mir ein "Vielen Dank, ihr Griechen!" aus tiefster Seele. Denn die Griechen haben mit ihrem Mut und ihrer Standfestigkeit eine Politik gestärkt, die Europa zur Auseinandersetzung mit den blinden Flecken seiner Eliten zwingt. Die Chancen stehen besser denn je, dass wir die Parallelwelten, in denen wir leben, aufbrechen und wieder gemeinsam zur Vernunft finden können.