Writing the accelerated glReadPixels path for reads to PBOs for Gallium, I wanted to make sure the various possible format conversions are working correctly. They do, but I noticed something strange: when reading from a GL_RGB565 framebuffer to GL_UNSIGNED_BYTE, I was getting tiny differences in the results depending on the code path that was taken. What was going on?
Color values are conceptually floating point values, but most of the time, so-called normalized formats are used to store the values in memory. In fact, many probably think of color values as 8-bit normalized values by default, because of the way many graphics programs present color values and because of the #cccccc color format of HTML.
Normalized formats generalize this well-known notion to an arbitrary number of bits. Given a normalized integer value x in N bits, the corresponding floating point value is x / (2**N - 1) - for example, x / 255 for 8 bits and x / 31 for 5 bits. When converting between normalized formats with different bit depths, the values cannot be mapped perfectly. For example, since 255 and 31 are coprime, the only floating point values representable exactly in both 5- and 8-bit channels are 0.0 and 1.0.
So some imprecision is unavoidable, but why was I getting different values in different code paths?
It turns out that the non-PBO path first blits the requested framebuffer region to a staging texture, from where the result is then memcpy()d to the user's buffer. It is the GPU that takes care of the copy from VRAM, the de-tiling of the framebuffer, and the format conversion. The blit uses the normal 3D pipeline with a simple fragment shader that reads from the "framebuffer" (which is really bound as a texture during the blit) and writes to the staging texture (which is bound as the framebuffer).
Normally, fragment shaders operate on 32-bit floating point numbers. However, Radeon hardware allows an optimization where color values are exported from the shader to the CB hardware unit as 16-bit half-precision floating point numbers when the framebuffer does not require the full floating point precision. This is useful because it reduces the bandwidth required for shader exports and allows more shader waves to be in flight simultaneously, because less memory is reserved for the exports.
And it turns out that the value 20 in a 5-bit color channel, when first converted into half-float (fp16) format, becomes 164 in an 8-bit color channel, even though the 8-bit color value that is closest to the floating point number represented by 20 in 5-bit is actually 165. The temporary conversion to fp16 cuts off a bit that would make the difference.
Intrigued, I wrote a little script to see how often this happens. It turns out that 20 in a 5-bit channel and 32 in a 6-bit channel are the only cases where the temporary conversion to fp16 leads to the resulting 8-bit value to be off by one. Luckily, people don't usually use GL_RGB565 framebuffers... and as a general rule, taking a value from an N-bit channel, converting it to fp16, and then storing the value again in an N-bit value (of the same bit depth!) will always result in what we started out with, as long as N <= 11 (figuring out why is an exercise left to the reader ;-)) - so the use cases we really care about are fine.