Dienstag, September 29, 2009

Firefox Cache Viewer und Google Books

Neulich wollte ich einen Artikel aus einer Zeitschrift lesen, die zwar frei zugänglich ist, aber leider nur die Jahrgänge ab 1997 als PDF anbietet. Die älteren Jahrgänge stehen bei uns in der Bibliothek im Magazin und sind auf Google Books auch abrufbar. Allerdings kann man von Google Books aus nicht drucken oder gar PDFs abrufen, und der Abruf eines Artikels aus dem Magazin ist auch nicht gerade benutzerfreundlich.

Mit Hilfe des Cache Viewer-Plugins für Firefox kann man dagegen auf Low-Tech-Ebene leicht die geladenen PNGs exportieren und danach mit üblichen Kommandozeilentools (bzw. unter Mac OS X auch mit dem Automator) in ein PDF konvertieren. Ich war glücklich.

Es gibt übrigens auch ein Open-Source-Werkzeug für diese Aufgabe unter Windows, das allerdings vermutlich in Schwierigkeiten gerät, wenn sich die Interna von Google Books in der Zukunft einmal ändern sollten.

Dienstag, September 22, 2009

Ich tu nur meine Bürgerpflicht

Die Tage werden kürzer, der Sommer neigt sich dem Ende entgegen, die Wahlplakate sprießen aus dem Boden, Zigarettenhersteller machen sich über Wa(h)lwerbung lustig, kurz: es ist Bundestagswahlzeit.

Ich für meinen Teil habe aufenthaltsortsbedingt schon letzte Woche zwei Kreuze gemacht und in einen Briefkasten geworfen.

Nun will ich auch gar nicht weiter um den heißen Brei herumreden, schließlich ist allgemein bekannt, dass Nichtwähler und Nichtwählerinnen einfach nur rückgratlose Waschlappen sind. Also: Macht am Sonntag einen kurzen Spaziergang und geht wählen!

P.S.: Eigentlich würde ich die Frage ja gerne stellen, aber da man heutzutage die Antwort sofort via Google finden kann, wäre es ein wenig witzlos.

Montag, September 14, 2009

Nachbarliebe

Neulich kam ich in den zweifelhaften Genuss, in der Bahn den Blick, das Schweizer Pendant der schlimmsten aller Zeitungen, lesen zu können. In ihm wurde nicht nur über den deutschen Cowboy, sondern auch über den Erfolg der Miss Swiss beim Miss World-Zirkus (oder ging es da um das ganze Universum? So ganz konnte ich diese Hybris noch nie verstehen) berichtet. Schlimm aus Perspektive der Yellow Press nur, dass sie es dann doch nicht in die Top 5 geschafft hat.

Einen wichtigen Trost vermochten die Blick-Redakteure ihren Eidgenossen jedoch mitzugeben: Miss Germany landete unter ferner liefen.

Dienstag, September 08, 2009

r300: Whither OpenGL 2.0?

As you may know, there are currently two drivers for the Radeon R300-R500 families of GPUs. There is the classic Mesa driver and the r300g Gallium 3D driver.

The classic Mesa driver has obviously been around longer and has therefore seen more bugfixing and general attention. Naturally, r300g is not as mature even though Gallium 3D is where the future is, because the potential of many state trackers is only going to get bigger. Think a unified acceleration logic for the X server, client-side accelerated 2D rendering, OpenCL – the possibilities are endless: Each of these items simply needs a state tracker, and we can then painlessly hook our driver up to support these things without any additional work.

The question is where the cutoff should be. At which point do we "stop caring" about the classic Mesa driver? Here, "stop caring" obviously means stop implementing new features; bugfixing will remain important.

This has become a more important question for me now that I've entered new feature territory again with exploring GLSL. While the shader compiler is shared between classic Mesa and r300g, there will probably be some more required changes. Considering the fact that we also need to support the rest of OpenGL 2.0 to support GLSL well (a lot of applications will only test for OpenGL 2.0 and will not use GLSL otherwise even if the ARB extensions are there), I now have an even bigger incentive to make the break to Gallium.

I believe it's a very viable and sane strategy: Leave the classic Mesa driver at its current OpenGL 1.5 level and let it become a solid base for conservative users (including the next round or two of Linux distributions). In the meantime, get r300g into a good shape, particularly against Piglit, and get cracking on those OpenGL 2.0 features over in Gallium territory.

Samstag, September 05, 2009

Versprecher

Das Schöne an der Rechtschreibreform sind die von ihr gebotenen Gelegenheiten zum Philosophieren über Sprache. So las ich neulich, eine Technologie sei "viel versprechend" und wunderte mich über die moderne Technik, die sich heutzutage offenbar aktiv selbst vermarktet.

Ich kann von einem unbekannten Künstler behaupten, er sei vielversprechend. Aber wenn ein Künstler viel verspricht, also viel versprechend ist, so ist er deshalb noch lange nicht vielversprechend. Wenn sich dann außerdem jemand sehr viel verspricht, wird alles noch viel komplizierter.

Und da behaupte noch jemand, die Reform wäre logisch!

Dienstag, September 01, 2009

The shader optimization challenge

During my vacation - great trip through beautiful Iceland - a lot of important improvements have been brought to the r300 driver, the Mesa driver that provides hardware accelerated OpenGL for Radeon R300 to R500 chipsets.

The biggest noticeable improvement is that, mostly thanks to Maciej's (osiris) push, we finally have real support for ARB_vertex_buffer_object (short: VBO) and ARB_occlusion_query (short: OQ).

What does this mean? First of all, it means that Sauerbraten finally approaches good framerates on my Athlon XP 2400 + Radeon X1650 Pro setup (unfortunately still in PCI mode due to a crappy AGP bridge). The performance difference is impressive; the CPU performance profile now looks entirely different, because all of the previously most CPU intensive tasks have simply disappeared thanks to the fact that we don't constantly have to reupload VBOs - and you can expect those performance improvements in essentially all 3D games. It also means that the driver can finally support OpenGL 1.5. It's about damn time.

In the meantime, I have been exploratorily experimenting with support for the OpenGL Shading Language (short: GLSL). This is still a long way off, but today I would like to give you a taste of the kind of challenges waiting for us.

The glsl/trirast test that comes with Mesa implements a very simple and stupid triangle rasterizer within a fragment shader. Said shader looks like this:
uniform vec2 v0, v1, v2;

float crs(const vec2 u, const vec2 v)
{
return u.x * v.y - u.y * v.x;
}

void main() {
vec2 p = gl_FragCoord.xy;
if (crs(v1 - v0, p - v0) >= 0 &&
crs(v2 - v1, p - v1) >= 0 &&
crs(v0 - v2, p - v2) >= 0)
gl_FragColor = vec4(1.0);
else
gl_FragColor = vec4(0.5);
}

Mesa's GLSL compiler turns this into an assembly program which looks like this (the style is that of ARB_fragment_program, but the control flow instructions are a Mesa invention):
  0: MOV TEMP[0].xy, INPUT[0];
1: SUB TEMP[2].xy, UNIFORM[1], UNIFORM[0];
2: SUB TEMP[4].xy, TEMP[0], UNIFORM[0];
3: MUL TEMP[1].y, TEMP[2].xxxx, TEMP[4].yyyy;
4: MUL TEMP[1].z, TEMP[2].yyyy, TEMP[4].xxxx;
5: SUB TEMP[1].x, TEMP[1].yyyy, TEMP[1].zzzz;
6: SGE TEMP[1].y, TEMP[1].xxxx, CONST[3].xxxx;
7: IF TEMP[1].yyyy; # (if false, goto 15);
8: SUB TEMP[2].xy, UNIFORM[2], UNIFORM[1];
9: SUB TEMP[4].xy, TEMP[0], UNIFORM[1];
10: MUL TEMP[1].z, TEMP[2].xxxx, TEMP[4].yyyy;
11: MUL TEMP[1].w, TEMP[2].yyyy, TEMP[4].xxxx;
12: SUB TEMP[1].x, TEMP[1].zzzz, TEMP[1].wwww;
13: SGE TEMP[0].w, TEMP[1].xxxx, CONST[3].xxxx;
14: ELSE; # (goto 17)
15: MOV TEMP[0].w, CONST[3].xxxx;
16: ENDIF;
17: IF TEMP[0].wwww; # (if false, goto 25);
18: SUB TEMP[2].xy, UNIFORM[0], UNIFORM[2];
19: SUB TEMP[4].xy, TEMP[0], UNIFORM[2];
20: MUL TEMP[1].w, TEMP[2].xxxx, TEMP[4].yyyy;
21: MUL TEMP[2].z, TEMP[2].yyyy, TEMP[4].xxxx;
22: SUB TEMP[1].x, TEMP[1].wwww, TEMP[2].zzzz;
23: SGE TEMP[0].z, TEMP[1].xxxx, CONST[3].xxxx;
24: ELSE; # (goto 27)
25: MOV TEMP[0].z, CONST[3].xxxx;
26: ENDIF;
27: IF TEMP[0].zzzz; # (if false, goto 30);
28: MOV OUTPUT[1], CONST[4];
29: ELSE; # (goto 32)
30: MOV OUTPUT[1], CONST[5];
31: ENDIF;
32: END
Observe how the subroutine crs was inlined three times.

There are a lot of instructions here that operate on a single component or on two components. For a chip like the Intel i965 this is fine, because every shader instruction at the hardware level conceptually only operates on a single floating point value. This is in contrast to the Radeon chips, where the hardware level instructions still conceptually operate on four-component vectors.

The important point is that on Intel chips, one could emit the 32 instructions seen above more or less as is, without wasting too many resources. On a Radeon chip - and let's use the R500 fragment processor to make it concrete - one could also emit those 32 instructions as is. The problem, however, is that we would use 32 instruction slots that can potentially operate on 4-component vectors and use them to operate on single components or sometimes two components. In every cycle, we waste two or three of the four available computation channels. Roughly two thirds of the available computational resources are wasted.

With a little bit of thought, one finds a better program to emit on an R500, particularly by rearranging the register usage a bit:
  0: MAD r0, v1.xy11, p.11xy, -v0.xyxy;
1: DP4 r0.x, r0.x00-y, r0.w00z;
2: IF ALU_result >= 0; (if false, goto 11);
3: MAD r0, v2.xy11, p.11xy, -v1.xyxy;
4: DP4 r0.x, r0.x00-y, r0.w00z;
5: IF ALU_result >= 0; (if false, goto 11);
6: MAD r0, v0.xy11, p.11xy, -v2.xyxy;
7: DP4 r0.x, r0.x00-y, r0.w00z;
8: IF ALU_result >= 0; (if false, goto 11);
9: MOV out, .1111;
10: ELSE;
11: MOV out, .HHHH;
12: ENDIF;
13: END
Yes, the IF and ENDIF are unbalanced; this is possible in some cases in the R500 flow control model. By some clever optimizations, including the fact that the R500 fragment shader can negate the w part of an instruction independently from the xyz part, we more than halved the number of instructions to 13. Compare this to the estimate that the previous version would waste about two thirds of the available computational power.

The real challenge is recognizing these opportunities for optimizations automatically and applying them in our driver. The field is wide open here. Incidentally, the example above illustrates why I don't believe LLVM is of too much use for us. Somehow I doubt that a compiler project that has its roots in normal CPUs has useful knowledge about these kinds of optimization problems.

Of course, the first step is to support GLSL at all. Afterwards, we can talk again about such optimizations.

P.S.: Can you reduce the length of the program above even further? I have a version that uses only 8 instructions, though it involves quite significant changes to the flow control logic. Can you get there, too?