tag:blogger.com,1999:blog-361375062017-09-30T20:30:35.564+02:00Tagebuch eines Interplanetaren BotschaftersLerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-36137506.post-13381443031453096602017-09-30T20:25:00.000+02:002017-09-30T20:30:35.633+02:00PositsThe <a href="https://www.slideshare.net/insideHPC/beyond-floating-point-next-generation-computer-arithmetic">posit number system</a> is a proposed alternative to floating point numbers. Having heard of posits a couple of times now, I'd like to take the time to digest them and, in the second half, write a bit about their implementation in hardware. Their creator makes some bold claims about posits being simpler to implement, and - spoiler alert! - I believe he's mistaken. Posits are still a clever idea and may indeed be a good candidate for replacing floating point in the long run. But trade-offs are an inescapable fact of life.<br /><br /><h3>Floating point revisited</h3>In floating point, numbers are represented by a sign bit, an exponent, and a mantissa:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-3FBYtNpKQrI/Wc_FCc3ZTWI/AAAAAAAABKU/uUwU6VnYGvc_eKb2QKVQC7fpLd3Kw6cYACLcBGAs/s1600/floatingpoint.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="24" data-original-width="336" src="https://2.bp.blogspot.com/-3FBYtNpKQrI/Wc_FCc3ZTWI/AAAAAAAABKU/uUwU6VnYGvc_eKb2QKVQC7fpLd3Kw6cYACLcBGAs/s1600/floatingpoint.png" /></a></div><br />The value of a normal floating point number is <i>±1.m<sub>2</sub>*2<sup>e</sup></i> (actually, <i>e</i> is stored with a bias in order to be able to treat it like an unsigned number most of the time, but let's not get distracted by that kind of detail). By using an exponent, a wide range of numbers can be represented at a constant relative accuracy.<br /><br />There are some non-normal floating point numbers. When <i>e</i> is maximal, the number is either considered infinity or "not a number", depending on <i>m</i>. When <i>e</i> is minimal, it represents a <i>sub-normal</i> number: either a <i>denormal</i> or zero.<br /><br />Denormals can be confusing at first, but their justification is actually quite simple. Let's take single-precision floating point as an example, where there are 8 exponent bits and 23 mantissa bits. The smallest positive normal single-precision floating point number is <i>1.00000000000000000000000<sub>2</sub>*2<sup>-126</sup></i>. The next larger representable number is <i>1.00000000000000000000001<sub>2</sub>*2<sup>-126</sup></i>. Those numbers are not equal, but their difference is not representable as a normal single-precision floating point number. It would be rather odd if the difference between non-equal numbers were equal to zero, as it would be if we had to round the difference to zero!<br /><br />When <i>e</i> is minimal, the represented number is (in the case of single-precision floating point) <i>±0.m<sub>2</sub>*2<sup>-126</sup></i>, which means that the difference between the smallest normal numbers, <i>0.00000000000000000000001<sub>2</sub>*2<sup>-126</sup></i>, can still be represented.<br /><br />Note how with floating point numbers, the relative accuracy with which numbers can be represented is constant for almost the entire range of representable numbers. Once you get to sub-normal numbers, the accuracy drops very quickly. At the other end, the drop is even more extreme with a sudden jump to infinity.<br /><br /><h3>Posits</h3>The basic idea of posits is to vary the size of the mantissa and to use a variable-length hybrid encoding of the exponent that mixes unary with binary encodings. The variable-length exponent encoding is shorter for exponents close to zero, so that more bits of mantissa are available for numbers close to one.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-S6rnu774bzw/Wc_FR_dQHnI/AAAAAAAABKY/tNcC4JfhYyEg_GwC_dprNoAnPAThLCiIgCLcBGAs/s1600/posit.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="24" data-original-width="336" src="https://2.bp.blogspot.com/-S6rnu774bzw/Wc_FR_dQHnI/AAAAAAAABKY/tNcC4JfhYyEg_GwC_dprNoAnPAThLCiIgCLcBGAs/s1600/posit.png" /></a></div><br />Posits have a fixed number of binary exponent bits <i>e</i> (except in the extreme ranges), and a posit system is characterized by that number. A typical choice appears to be <i>es = 3</i>. The unary part of the exponent is encoded by the <i>r</i> bits. For positive posits, <i>10<sub>1</sub></i> encodes <i>0</i>, <i>110<sub>1</sub></i> encodes <i>1</i>, <i>01<sub>1</sub></i> encodes <i>-1</i>, <i>001<sub>1</sub></i> encodes <i>-2</i>, and so on. The overall encoded number is then <i>±1.m<sub>2</sub>*2<sup>r*2<sup>es</sup> + e</sup></i>.<br /><br />Let's look at some examples of 16-bit posits with <i>es = 3</i>.<br /><br /><span style="color: red;">0</span> <span style="color: #b45f06;">10</span> <span style="color: blue;">000</span> 0000000000 is <i>1.0<sub>2</sub>*2<sup>0*2<sup>3</sup>+0</sup> = 1</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">10</span> <span style="color: blue;">000</span> 1000000000 is <i>1.1<sub>2</sub>*2<sup>0*2<sup>3</sup>+0</sup> = 1.5</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">10</span> <span style="color: blue;">001</span> 0000000000 is <i>1.0<sub>2</sub>*2<sup>0*2<sup>3</sup>+1</sup> = 2</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">10</span> <span style="color: blue;">111</span> 0000000000 is <i>1.0<sub>2</sub>*2<sup>0*2<sup>3</sup>+7</sup> = 128</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">110</span> <span style="color: blue;">000</span> 000000000 is <i>1.0<sub>2</sub>*2<sup>1*2<sup>3</sup>+0</sup> = 256</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">1110</span> <span style="color: blue;">000</span> 00000000 is <i>1.0<sub>2</sub>*2<sup>2*2<sup>3</sup>+0</sup> = 65536</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">111111111110</span> <span style="color: blue;">000</span> is <i>1.0<sub>2</sub>*2<sup>10*2<sup>3</sup>+0</sup> = 2<sup>80</sup></i>. Note that there is no mantissa anymore! The next larger number is:<br /><span style="color: red;">0</span> <span style="color: #b45f06;">111111111110</span> <span style="color: blue;">001</span> is <i>1.0<sub>2</sub>*2<sup>10*2<sup>3</sup>+1</sup> = 2<sup>81</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">111111111110</span> <span style="color: blue;">111</span> is <i>1.0<sub>2</sub>*2<sup>10*2<sup>3</sup>+7</sup> = 2<sup>87</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">1111111111110</span> <span style="color: blue;">00</span> is <i>1.0<sub>2</sub>*2<sup>11*2<sup>3</sup>+0</sup> = 2<sup>88</sup></i>. Now the number of binary exponent bits starts shrinking. The missing bits are implicitly zero, so the next larger number is:<br /><span style="color: red;">0</span> <span style="color: #b45f06;">1111111111110</span> <span style="color: blue;">01</span> is <i>1.0<sub>2</sub>*2<sup>11*2<sup>3</sup>+2</sup> = 2<sup>90</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">1111111111110</span> <span style="color: blue;">11</span> is <i>1.0<sub>2</sub>*2<sup>11*2<sup>3</sup>+6</sup> = 2<sup>94</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">11111111111110</span> <span style="color: blue;">0</span> is <i>1.0<sub>2</sub>*2<sup>12*2<sup>3</sup>+0</sup> = 2<sup>96</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">11111111111110</span> <span style="color: blue;">1</span> is <i>1.0<sub>2</sub>*2<sup>12*2<sup>3</sup>+4</sup> = 2<sup>100</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">111111111111110</span><span style="color: blue;"></span> is <i>1.0<sub>2</sub>*2<sup>13*2<sup>3</sup>+0</sup> = 2<sup>104</sup></i>. There are no binary exponent bits left, but the presentation in the slides linked above still allows for one larger normal number:<br /><span style="color: red;">0</span> <span style="color: #b45f06;">111111111111111</span><span style="color: blue;"></span> is <i>1.0<sub>2</sub>*2<sup>14*2<sup>3</sup>+0</sup> = 2<sup>112</sup></i>.<br />Going in the other direction, we get:<br /><span style="color: red;">0</span> <span style="color: #b45f06;">01</span> <span style="color: blue;">111</span> 0000000000 is <i>1.0<sub>2</sub>*2<sup>-1*2<sup>3</sup>+7</sup> = 1/2 = 0.5</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">01</span> <span style="color: blue;">000</span> 0000000000 is <i>1.0<sub>2</sub>*2<sup>-1*2<sup>3</sup>+0</sup> = 1/256 = 0.00390625</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">001</span> <span style="color: blue;">000</span> 000000000 is <i>1.0<sub>2</sub>*2<sup>-2*2<sup>3</sup>+0</sup> = 1/65536 = 0.0000152587890625</i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">000000000001</span> <span style="color: blue;">111</span> is <i>1.0<sub>2</sub>*2<sup>-11*2<sup>3</sup>+7</sup> = 2<sup>-81</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">000000000001</span> <span style="color: blue;">000</span> is <i>1.0<sub>2</sub>*2<sup>-11*2<sup>3</sup>+0</sup> = 2<sup>-88</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">0000000000001</span> <span style="color: blue;">11</span> is <i>1.0<sub>2</sub>*2<sup>-12*2<sup>3</sup>+6</sup> = 2<sup>-90</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">0000000000001</span> <span style="color: blue;">00</span> is <i>1.0<sub>2</sub>*2<sup>-12*2<sup>3</sup>+0</sup> = 2<sup>-96</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">00000000000001</span> <span style="color: blue;">0</span> is <i>1.0<sub>2</sub>*2<sup>-13*2<sup>3</sup>+0</sup> = 2<sup>-104</sup></i>.<br /><span style="color: red;">0</span> <span style="color: #b45f06;">000000000000001</span> is <i>1.0<sub>2</sub>*2<sup>-14*2<sup>3</sup>+0</sup> = 2<sup>-112</sup></i>. This is the smallest positive normal number, since we have no choice but to treat 0 specially:<br /><span style="color: red;">0</span> <span style="color: #b45f06;">000000000000000</span> is 0.<br /><br />For values close to 1, the accuracy is the same as for half-precision floating point numbers (which have 5 exponent and 10 mantissa bits). Half-precision floating point numbers do have slightly higher accuracy at the extreme ends of their dynamic range, but the dynamic range of posits is <i>much</i> higher. This is a very tempting trade-off for many applications.<br /><br />By the way: if we had set <i>es = 2</i>, we could have larger accuracy for values close to 1, while still having a higher dynamic range than half-precision floating point.<br /><br />You'll note that we have not encountered an infinity. Gustafson's proposal here is to do away with the distinction between positive and negative zero and infinity. Instead, his proposal is to think of the real numbers projectively, and use a two's complement representation, meaning that negating a posit is the same operation at the bit level as negating an integer. For example:<br /><br /><span style="color: red;">1</span> <span style="color: #b45f06;">111111111111111</span> is <i>-1.0<sub>2</sub>*2<sup>-14*2<sup>3</sup>+0</sup> = -2<sup>-112</sup></i>.<br /><span style="color: red;">1</span> <span style="color: #b45f06;">10</span> <span style="color: blue;">000</span> 0000000000 is <i>-1.0<sub>2</sub>*2<sup>0*2<sup>3</sup>+0</sup> = -1</i>. The next smaller number (larger in absolute magnitude) is:<br /><span style="color: red;">1</span> <span style="color: #b45f06;">01</span> <span style="color: blue;">111</span> 1111111111 is <i>-1.0000000001<sub>2</sub>*2<sup>0*2<sup>3</sup>+0</sup></i>.<br /><span style="color: red;">1</span> <span style="color: #b45f06;">01</span> <span style="color: blue;">111</span> 1000000000 is <i>-1.1<sub>2</sub>*2<sup>0*2<sup>3</sup>+0</sup> = -1</i>.5<br /><span style="color: red;">1</span> <span style="color: #b45f06;">000000000000001</span> is <i>-1.0<sub>2</sub>*2<sup>14*2<sup>3</sup>+0</sup> = -</i><i><i>2<sup>112</sup></i></i>.<br /><br />The bit pattern <span style="color: red;">1</span> <span style="color: #b45f06;">000000000000000</span> (which, like 0, is its own inverse in two's complement negation) would then represent infinity.<br /><br />There's an elegance to thinking projectively in this way. Comparison of posits is the same as comparison of signed integers at the bit level (except for infinity, which is unordered). Even better, it's great that the smallest and largest normal numbers are multiplicative inverses of each other.<br /><br />But to people used to floating point, not having a "sign + magnitude" representation is surprising. I also imagine that it could be annoying for a hardware implementation, so let's look into that.<br /><br /><h3>Hardware implementations</h3>In his presentations, Gustafson claims that by reducing the number of special cases, posits are easier to implement than floating point. No doubt there are fewer special cases (no denorms, no NaNs), but at the cost of a more complicated normal case.<br /><br />Let's take a look at a floating point multiply. The basic structure is conceptually quite simple, since all parts of a floating point number can be treated separately:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-e6cBI8WQ1K4/Wc_FjXE-T8I/AAAAAAAABKc/4b-yTkn8nyY4AhkAeshU2lWcQJzXGTAQACLcBGAs/s1600/fp-multiply.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="232" data-original-width="693" src="https://2.bp.blogspot.com/-e6cBI8WQ1K4/Wc_FjXE-T8I/AAAAAAAABKc/4b-yTkn8nyY4AhkAeshU2lWcQJzXGTAQACLcBGAs/s1600/fp-multiply.png" /></a></div><br />By far the most expensive part here is the multiplication of the mantissas. There are of course a bunch of special cases that need to be accounted for: the inputs could be zero, infinity, or NaN, and the multiplication could overflow. Each of these cases are easily detected and handled with a little bit of comparatively inexpensive boolean logic.<br /><br />Where it starts to get complicated is when handling the possibility that an input is denormal, or when the multiplication of two normal numbers results in a denormal.<br /><br />When an input is denormal, the corresponding input for the multiply is <i>0.m</i> instead of <i>1.m</i>. Some logic has to decide whether the most significant input bit to the multiply is 0 or 1. This could potentially add to the latency of the computation. Luckily, deciding whether the input is denormal is fairly simple, and only the most significant input bit is affected. Because of carries, the less significant input bits tend to be more critical for latency. Conversely, this means that the latency of determining the most significant input bit can be hidden well.<br /><br />On the output side, the cost is higher, both in terms of the required logic and in terms of the added latency, because a shifter is needed to shift the output into the correct position. Two cases need to be considered: When a multiplication of two normal numbers results in a denormal, the output has to be shifted to the right an appropriate number of places.<br /><br />When a denormal is multiplied by a normal number, the output needs to be shifted to the left or the right, depending on the exponent of the normal number. Additionally, the number of leading zeros of either the denormal input or of the multiplication output is required to determine the exponent of the final result. Since the area cost is the same either way, I would expect implementations to determine the leading zero of the denormal input, since that allows for better latency hiding.<br /><br />(The design space for floating point multipliers is larger than I've shown here. For example, you could deal with denormals by shifting their mantissa into place <i>before</i> the multiply. That seems like a waste of hardware considering that you cannot avoid the shifter after the multiply, but my understanding of hardware design is limited, so who knows.)<br /><br />So there is a bit more hardware required than just what is shown in the diagram above: a leading-zero-count and a shifter, plus a bit more random logic. But now compare to the effort required for a posit multiply:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-YBVFH6Q2QHU/Wc_K4qqZwoI/AAAAAAAABKs/L99xh6RFOcEVvH-ZuMPJbEKdoHsm3irXACLcBGAs/s1600/posit-multiply.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="298" data-original-width="689" src="https://3.bp.blogspot.com/-YBVFH6Q2QHU/Wc_K4qqZwoI/AAAAAAAABKs/L99xh6RFOcEVvH-ZuMPJbEKdoHsm3irXACLcBGAs/s1600/posit-multiply.png" /></a></div><br />First of all, there is unavoidable latency in front of the multiplier. Every single bit of mantissa input may be masked off, depending on the variable size of the exponent's unary part. The exponents themselves need to be decoded in order to add them, and then the resulting exponent needs to be encoded again. Finally, the multiplication result needs to be shifted into place; this was already required for floating point multiplication, but the determination of the shift becomes more complicated since it depends on the exponent size. Also, each output bit needs a multiplexer since it can originate from either the exponent or the mantissa.<br /><br />From my non-expert glance, here's the hardware you need in addition to the multiplier and exponent addition:<br /><ul><li>two leading-bit counts to decode the unary exponent parts (floating-point multiply only needs a single leading-zero count for a denormal input)</li><li>two shifters to shift the binary input exponent parts into place</li><li>logic for masking the input mantissas</li><li>one leading bit encoder</li><li>one shifter to shift the binary output exponent part into place</li><li>one shifter to shift the mantissa into place (floating-point also needs this)</li><li>multiplexer logic to combine the variable-length output parts </li></ul>Also note that the multiplier and mantissa shifter may have to be larger, since - depending on the value of <i>es</i> - the mantissa of posits close to 1 can be larger than the mantissa of floating point numbers.<br /><br />On the other hand, the additional shifters don't have to be large, since they only need to shift <i>es</i> bits. The additional hardware is almost certainly dominated by the cost of the mantissa multiplier. Still, the additional latency could be a problem - though obviously, I have no actual experience designing floating point multipliers.<br /><br />There's also the issue of the proposed two's complement representation for negative posits. This may not be too bad for the mantissa multiplication, since one can probably treat it as a signed integer multiplication and automatically get the correct signs for the resulting mantissa. However, I would expect some more overhead for the decoding and encoding of the exponent.<br /><br />The story should be similar for posit vs. floating point addition. When building a multiply-accumulate unit, the latency that is added for masking the input based on the variable exponent length can likely be hidden quite well, but there does not appear a way around the decoding and encoding of exponents.<br /><br /><h3>Closing thoughts</h3>As explained above, I expect posit hardware to be more expensive than floating point hardware. However, the gain in dynamic range and accuracy is nothing to sneeze at. It's worth giving posits a fair shot, since the trade-off may be worth it.<br /><br />There is a lot of legacy software that relies on floating point behavior. Luckily, a posit ALU contains all the pieces of a floating point ALU, so it should be possible to build an ALU that can do both at pretty much the cost of a posit-only ALU. This makes a painless transition feasible.<br /><br />Posits have an elegant design based on thinking about numbers projectively, but the lack of NaNs, the two's complement representation, and not having signed zeros and infinities may be alien to some floating point practicioners. I don't know how much of an issue this really is, but it's worth pointing out that a simple modification to posits could accommodate all these concerns. Using again the example of 16-bit posits with <i>es = 3</i>, we could designate bit patterns at the extreme ends as NaN and infinity:<br /><span style="color: red;">0</span> <span style="color: #b45f06;">111111111111111</span> is <i>+inf</i> (instead of <i>2<sup>112</sup></i>).<br /><span style="color: red;">0</span> <span style="color: #b45f06;">000000000000001</span> is <i>+NaN</i> (instead of <i>2<sup>-112</sup></i>).<br />We could then treat the sign bit independently, like in floating point, giving us <i>±0</i>, <i><i>±inf</i></i>, and <i><i><i>±NaN</i></i>.</i> The neat properties related to thinking projectively would be lost, but the smallest and largest positive normal numbers would still be multiplicative inverses of each other. The hardware implementation may even be smaller, thanks to not having to deal with two's complement exponents.<br /><br />The inertia of floating point is massive, and I don't expect it to be unseated anytime soon. But it's awesome to see people rethinking such fundamental building blocks of computing and coming up with solid new ideas. Posits aren't going to happen quickly, if at all, but it's worth taking them seriously.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0tag:blogger.com,1999:blog-36137506.post-47978570030100238892015-02-28T20:25:00.000+01:002015-02-28T20:25:00.618+01:00Interesting mistakesI taught a course in Linear and Integer Optimization last semester, the exam of which was held last week. I'm fairly pleased with the results. Grading an exam is often painful when you realize that some students failed to understand important concepts, but overall it appears that my expectations about the difficulty of this exam were mostly accurate. Sometimes, however, students fail at a problem for reasons that we do not anticipate. One problem of the exam was to prove that when P and Q are (convex) <a href="http://en.wikipedia.org/wiki/Polytope">polytopes</a>, then so is their <a href="http://en.wikipedia.org/wiki/Minkowski_addition">Minkowski sum</a> P + Q. <br><br>I am aware of two reasonable approaches to this problem. The first is to use inequality representations to argue that the product P x Q is a polyhedron, and then appeal to the fact (shown in the lecture) that the image under the linear map (p, q) -> p + q is a polyhedron (boundedness is of course easy to see, and naturally there are some variations of this approach). The second approach is to use representations of P and Q as convex hulls of finitely many points, and to show that P + Q is the convex hull of the pairwise sum of such points. This second proof is elementary but requires a bit more work because one has to argue about the coefficients in convex combinations. <br><br>To my surprise, several students attempted a third route. They started with inequality representations and used the boundedness to argue that given one of the inequalities ax <= b that define P, the maximum of ax over Q is finite. They then tried to prove that P + Q can be obtained by simply using both the inequalities defining P and those defining Q together, with the right-hand sides increased by the corresponding (finite) maximum over the <em>other</em> polytope. This approach cannot possibly work because the number of facets of the Minkowski sum can be exponentially larger than the number of facets of the summands (the <a href="http://en.wikipedia.org/wiki/Permutohedron">permutahedron</a> is a cute example of this). And yet, in hindsight, it is a very plausible approach to try if one doesn't know about this fact. <br><br>As mistakes go, I would call this an <em>interesting</em> mistake. In some sense it isn't even a mistake at all, just a false trail into the woods that unfortunately some students got lost in.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0tag:blogger.com,1999:blog-36137506.post-43599205687018000932014-04-25T14:55:00.000+02:002014-04-25T14:55:36.790+02:00Finding duplicate elements in an arrayThis is a story about the power of <a href="http://nhaehnle.blogspot.de/2011/12/zwei-briefumschlage-und-ein.html">randomness</a> and universal hashing. <br/><br/> Suppose you are given an array of elements of some fixed type. How would you go about figuring out whether two elements of the array are equal? <br/><br/> At the time that I am writing this post, searching for its title leads almost universally to discussions about an interview-type puzzle question where you are given an array of size <code>N</code> that contains integers in the range from <code>0</code> to <code>N-1</code>. Indeed, clever solutions to this problem are proposed. Most of them amount to the same kind of trickery that <a href="http://en.wikipedia.org/wiki/Radix_sort">radix sort</a> employs to get a linear-time sorting algorithm. I hope the reader agrees that this is cute, but of limited use. <br/><br/> The most straightforward efficient solution to this problem that applies to general types is to sort the array using an arbitrary sorting function. Then a simple linear scan suffices to answer the question. The overall running time of this approach is <code>O(N log N)</code>, and <a href="http://www.mathcs.emory.edu/~cheung/papers/StreamDB/Frequency-count/1982-Misra-FindRepeatedElements.pdf">Misra and Gries</a> showed in the 1980s that no comparison-based algorithm can be asymptotically faster. <br/><br/> Let me present their argument, which should feel familiar to anybody who has seen the <a href="http://planetmath.org/LowerBoundForSorting">proof that comparison-based sorting requires time <code>Omega(n log n)</code></a> (the result in the paper is more general). <br/><br/> Every deterministic algorithm for finding duplicates in an array using comparisons can be thought of as a family of infinitely many ternary decision trees, one tree for every array size. For any given array size, the algorithm starts at the root node of the tree for that size, comparing the elements at the indices given by that node. Depending on how they compare, the algorithm moves down in the tree. This process continues until a leaf node is reached which is labelled either YES (duplicate elements have been found) or NO (all elements are unique). Here is a decision tree for an array of size 3: <br/><br/> <a href="http://4.bp.blogspot.com/-Gn-pvKc2ywA/U1pXb2_RcYI/AAAAAAAAAaU/qW2zuFRBYUY/s1600/decision-tree.png" imageanchor="1" ><img border="0" src="http://4.bp.blogspot.com/-Gn-pvKc2ywA/U1pXb2_RcYI/AAAAAAAAAaU/qW2zuFRBYUY/s1600/decision-tree.png" /></a> <br/><br/> We will argue that every decision tree for arrays of size <code>N</code> must have more than <code>N!</code> leaves. Via <a href="http://en.wikipedia.org/wiki/Stirling%27s_approximation">Stirling's approximation</a>, this implies that any such tree has depth at least <code>Omega(N log N)</code>, which in turn implies the same bound for the worst-case running time. In fact, the <em>average</em> depth of leaves is <code>Omega(N log N)</code>. This means that even a randomized algorithm, which we can think of as choosing one of many decision trees at random, has an expected worst-case running time of at least <code>Omega(N log N)</code>. <br/><br/> Let us fix a decision tree <code>T</code> for arrays of size <code>N</code>. There are <code>N!</code> different permutations of size <code>N</code>. We can think of such a permutation as an array of size <code>N</code> filled with the integers from <code>1</code> to <code>N</code> without duplicates. Let <code>P1</code> and <code>P2</code> be two different such permutations/arrays, and define the array <code>Q</code> as <pre>Q[i] := min(P1[i], P2[i])<br /></pre>The array <code>Q</code> contains duplicate entries. In fact, let <code>m</code> be the smallest number that appears at different positions in <code>P1</code> and <code>P2</code>. Then <code>m</code> appears twice in <code>Q</code>. <br/><br/> As a consequence, the computation for <code>Q</code> must end in a YES-leaf of <code>T</code>, while <code>P1</code> and <code>P2</code> must lead to NO-leaves. What if <code>P1</code> and <code>P2</code> lead to the <em>same</em> NO-leaf <code>N</code>? Let <code>V</code> be a node on the path to <code>N</code>, and let us say that <code>i</code> and <code>j</code> are the array indices that are compared at the node <code>V</code>. Then the comparison <code>P1[i]</code> vs. <code>P1[j]</code> has the same result as the comparison <code>P2[i]</code> vs. <code>P2[j]</code>. But then, by the definition of <code>Q</code>, the comparison <code>Q[i]</code> vs. <code>Q[j]</code> <em>also</em> has the same result. This means that the computation for <code>Q</code> must <em>also</em> follow the same path and end in a NO-leaf! That is clearly a contradiction, and so we can conclude that the computations for <code>P1</code> and <code>P2</code> must end in <em>different</em> NO-leaves. Hence, the decision tree <code>T</code> must have at least <code>N!</code> NO-leaves, and this completes the proof. <br/><br/> So, to summarize up to this point: As long as we can only compare array elements, no algorithm can beat the running time of <code>O(N log N)</code> that a simple sort followed by a linear scan offers. However, this isn't the end of the story. <br/><br/> Suppose you have a <a href="http://en.wikipedia.org/wiki/Universal_hashing">universal family of hash functions</a> for the type of elements in the array (and indeed, such families are not too difficult to construct). That is, suppose you can randomly choose a hash function <code>h:U -> [m]</code>, with <code>m</code> within a constant factor of <code>N</code>, such that for different array elements <code>A[i]</code> and <code>A[j]</code> you know that the probability of a <em>hash collision</em> is low: <pre>Pr[h(A[i]) = h(A[j])] <= 1/N<br /></pre>Note that if the guarantee is slightly weaker, e.g. a bound of <code>2/N</code> on the probability of a collision, the following argument will still work with only minor modifications to the constants in the computations. <br/><br/> We will now follow a simple strategy: Build a hash table of the elements of <code>A</code> according to <code>h</code>. Since equal elements will be hashed into the same bucket, it is then sufficient to check for duplicate elements within each bucket by sorting and a linear scan, or by an even simpler quadratic check. <br/><br/> What is the running time of this approach? If <code>b(i)</code> is the number of array elements mapped into the <code>i</code>-th bucket, then even the running time of the simple quadratic check can be bounded by <pre>sum(b(i)<sup>2</sup>, i=1..m),<br /></pre>which is exactly the number of hash collision pairs: <pre>sum(b(i)<sup>2</sup>, i=1..m) = #{ (i,j) : h(A[i]) = h(A[j]) } = N + #{ (i,j) : i != j, h(A[i]) = h(A[j]) }<br /></pre>The expected cardinality of the last part is bounded by the number of pairs <code>(i,j)</code> with <code>i != j</code> times our bound on the probability that a pair results in a hash collision. This product is <code>N-1</code>. That is, the expected running time of the algorithm we have outlined is linear. <br/><br/> Observe that we must choose the hash function randomly for this approach to work. If we were to use a fixed, deterministic hash function, then an adversary could construct an array in which all elements hash to the same bucket (using the <a href="http://en.wikipedia.org/wiki/Pigeonhole_principle">pigeonhole principle</a> and assuming that the universe <code>U</code> of possible array elements is large enough). We would be back in the case where only comparisons are allowed, and hence a linear running time is impossible. <br/><br/> So we see that the <em>combination</em> of hashing and randomness, in the form of universal hashing, allows us to solve the original problem in a running time that is asymptotically better than what is achievable using deterministic algorithms.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0tag:blogger.com,1999:blog-36137506.post-30385610942257459192011-12-19T11:03:00.000+01:002011-12-19T11:03:00.739+01:00The AI Challenge - a look backFor the last four weeks, I have worked on my submission for the <a href="http://aichallenge.org/">Google AI Challenge</a>. The deadline has passed this morning, so it is time to relax while the official final tournament is running on the contest servers. Until yesterday, my submission was ranked quite consistently in the top 20. Then I uploaded my final version (which resets the skill) which was quite consistently better on the unofficial TCP servers, but given that everybody else was doing last minute tweaks, too, it's far too early to call.<br /><br />I enjoyed the spirit of this contest immensely, and now I would like to document some of my thoughts on how my submission works. I have uploaded the source code to Github (<a href="https://github.com/nhaehnle/aiant">https://github.com/nhaehnle/aiant</a>) so you can peruse it while following this blog entry if you wish.<br /><br /><b>High level structure</b><br /><br />The bot is divided into a number of modules that select high-level goals for the ants it controls. This is done in a very straightforward way. Every ant initially has no direction to go in (the field <code>Ant::assigneddirection</code> is initialized to <code>false</code> each turn). The high-level modules then simply assign directions to ants that do not have one yet, and the order in which the modules are called reflects the relative importance I assign to the various tasks. For example, the <code>HillDefense</code> module will only assign ants that have not been assigned by the <code>FoodSeeker</code>.<br /><br />There are two modules that fall outside of this basic structure: The <code>Zoc</code> ("zone of control") module does not steer any ants. Instead, it keeps track of how fast my ants vs. enemy ants can reach each square of the map. And the <code>Tactical</code> module overrides the previous decisions if necessary for ants that may be involved in combat.<br /><br /><b>The strategy modules</b><br /><br />The following strategy modules are used by the bot, assigning jobs to ants in the given order:<ol><li><code>FoodSeeker</code>: Getting enough food is probably the most important problem of the game, and so this module comes first. It greedily sends the closest ant to each item of food that is visible, using several <a href="http://en.wikipedia.org/wiki/Breadth-first_search">breadth-first-searches</a>.</li><li><code>HillDefense</code>: When one of the bot's hills can be reached in few turns by the enemy, this module ensures that a few ants (adjusted based on the number of enemy ants in sight) stick close to the hill.<br /><br />An important tweak of this code is that it does not send ants back to the hill indiscriminately. Instead, it only recalls an ant if it is too far away from the hill relative to the closest enemy. This way, ants are not needlessly wasted on patrol duty. It would probably be a good idea to treat multi-hill maps specially, but this is not done.</li><li><code>OpportunisticAttack</code>: This rather stupid piece of code ensures that the ants move more aggressively towards enemy hills. After all, that is the only way to win the game.</li><li><code>Scout</code>: This module assigns a tiny number of ants to re-explore squares that have not been seen in a long time.<br /><br />This is needed because the rest of the code uses the <code>Zoc</code> module to understand that an enemy can never come out of a cul-de-sac once it's been secured. So without some re-scouting logic, the bot would simply ignore the food in those secured locations!</li><li><code>Diffusion</code>: This is a very ad-hoc heuristic to spread out my ants better than they would otherwise. There would probably have been some potential for improvement in this part of the code.</li><li>Zoc-based movement: any ant that has not been assigned a move up to this point will simply use the <code>Zoc</code> data to move closer to the enemy. This is done in <code>Bot::makeMoves</code> rather than in a separate module.</li></ol>On a "historical" note, it may be interesting to know that I started out with just the <code>FoodSeeker</code> and the Zoc-based movement. Together with <code>Tactical</code>, this was enough to get into the top 100 a bit more than two weeks before the deadline.<br /><br /><b>The tactical logic</b><br /><br />The idea behind the <code>TacticalSm</code> module is simple:<ol><li>Carve out small <code>Submap</code>s that contain all the ants that could potentially be involved in combat on the next turn.</li><li>Generate potential moves for both my bot and the enemy. The combination of <code>Submap</code> and associated <code>PlayerMove</code>s is called a <code>Theater</code>.</li><li>Evaluate those moves using some simple scoring function.</li><li>Try to find better moves for both parties.</li><li>Repeat until time runs out.</li><li>Pick a move using one of two strategies.</li></ol>Note that my tactical code doesn't understand situations involving enemy ants of more than one enemy player. This is certainly a limitation, but it's hard to tell how bad it really is.<br /><br />Most of the tactical code is about engineering, and careful consideration of the scoring function. For example, during a major code reorganization halfway through the competition, the tactical code stopped considering proximity to food in the scoring function. That hurt the performance of my bot quite significantly, and my bot's skill recovered when I turned that part of the scoring function back on.<br /><br />There are really only two clever parts. One is to use a scoring function that, by default, assigns a higher value to the own ants, to avoid costly trades of ants, but to turn on <code>aggressive</code> mode if the own ants overpower the enemy. This switch is done randomly based on the ratio of ants, where the probability to turn on aggressive mode is conceptually a <a href="http://en.wikipedia.org/wiki/Logistic_function">logistic function</a> in the log of the ants ratio.<br /><br />The second clever part is to use a small bit of <a href="http://en.wikipedia.org/wiki/Rock-paper-scissors">rock-paper-scissors</a> logic in the final move selection. My first method to select a move is a simple min-max (or rather, max-min): pick the move that maximizes the minimum scoring value I could possibly get, given all enemy moves under consideration. Since this is a rather conservative choice of moves, especially given that there is absolutely no look-ahead, I implemented max-average as a second strategy: choose the move that maximizes the average scoring value, where the average is over all enemy moves, weighted by their max-min value.<br /><br />Of course, this strategy may sometimes be too aggressive. What's more, the best strategy may depend on the opponent. Therefore, my bot uses a variant of the <a href="http://en.wikipedia.org/wiki/Randomized_weighted_majority_algorithm">randomized weighted majority algorithm</a> to pick the move-selection strategy. At the beginning of each turn, my bot determines what the best strategy would have been <em>in hindsight</em> to update the weights of the strategy. One important tweak is that the weights depend both on which enemy player my bot is facing, and on the number of ants in play.<br /><br /><b>Discarded experiments</b><br /><br />I experimented with adding more move selection strategies, but the results were not convincing at all, perhaps because it takes longer for the bot to learn which strategy to choose, so I scrapped that again.<br /><br />I also implemented map symmetry detection to guess unseen enemy hills and a corresponding module for a coordinated offense against enemy hills. The code is still there, but I have disabled it. The simple implementation I have is far too aggressive and wasteful, and I didn't feel like trying to tweak it to a point where it becomes useful.<br /><br />I also did some experiments with an alternative move generation method for my tactical system, as well as a very simple implementation of the sampling-based combat system described by a1k0n on the <a href="http://forums.aichallenge.org/viewtopic.php?f=24&t=2044">challenge forums</a>. This simple method performed worse than my tactical code; I have some ideas for improving it (just like a1k0n obviously did), but ultimately did not have the time to try them out. Those experiments can still be found in the Git history if you feel like digging through it.<br /><br />I tried some automatic tuning of parameters using meta-heuristics (see the <code>test/tune.py</code> script), but somehow that didn't yield very convincing results either.<br /><br /><b>Code quality?</b><br /><br />I have to admit that the code is not very well documented, and some of it is rather hackish. I hope you'll forgive me for this type of one-off code.<br /><br />There is one thing that I did consistently that I would never do in another type of project, and that is keeping STL containers around as long as possible to avoid re-allocation. I intensively rely on the fact that <code>std::vector::clear</code> does not free the allocated memory, at least in the default implementation of g++. By keeping those vectors around, I want to avoid unpleasant surprises in the performance of memory management.<br /><br />I don't think this is strictly necessary, given that the bot actually uses surprisingly little memory, but it didn't hurt in this case. It reduces the maintainability of the code, which is why I wouldn't do it on other projects, but maintainability was obviously never a goal for a competition like this one.<br /><br /><b>A neat randomization trick</b><br /><br />"When in doubt, throw a coin" is perhaps the most important lesson that Theoretical Compute Science teaches. When there are several directions to go in that all score the same, which one should you choose? The best approach is to choose randomly.<br /><br />To facilitate this, I pre-generated all 24 permutations of the 4 directions, and all 120 permutation of the 5 possible moves of an ant, and I randomly choose one of those permutation whenever I have a loop through directions to choose from.<br /><br />Sometimes, however, I have a loop through variable-sized vectors, for example, when choosing ants for an offensive move in the tactical logic. I would like to have a random permutation for those cases as well, and of course there are algorithms to generate them. But they take time, and the benefit of a perfectly uniform distribution is not that clear.<br /><br />So here's a little trick I use to permute variable-sized vector of size <code>N</code>. I pick a random prime <code>p</code> out of a decently sized pre-generated list of large primes (larger than any <code>N</code> the code is ever going to see), as well as a random offset <code>ofs</code> and loop through the numbers <code>0..N-1</code>. But instead of using these numbers <code>i</code> as an index, I use <code>(p*i + ofs) % N</code> as index. Since <code>p</code> is prime different from <code>N</code>, it is invertible <a href="http://en.wikipedia.org/wiki/Modular_arithmetic">modulo</a> <code>N</code>, and therefore the map <code>i -> p*i + ofs</code> is a bijection, aka a permutation. Of course, this is far from a uniformly distributed permutation: there are <code>N!</code> potential permutations, out of which this method can generate at most <code>N * <a href="http://en.wikipedia.org/wiki/Euler%27s_totient_function">φ</a>(N)</code>. But hey: it's good enough for what I need.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0tag:blogger.com,1999:blog-36137506.post-75574767751039373472009-10-24T15:13:00.011+02:002009-10-24T15:13:00.462+02:00Eine elegante Definition von MathematikDie meisten Nichtmathematiker haben nur eine sehr schwammige Vorstellung davon, was Mathematik eigentlich ist. Man muss es ihnen verzeihen, schließlich tun sich Mathematiker selbst auch schwer damit, eine klare Definition zu geben. Einen sehr schönen Versuch einer Definition davon, was Mathematik eigentlich ist, las ich kürzlich in einem <a href="http://arxiv.org/abs/math.HO/9404236">Artikel</a> von <a href="http://de.wikipedia.org/wiki/William_Thurston">William Thurston</a>:<br /><br /><i>Mathematics is the smallest subject satisfying the following:<ul><li>Mathematics includes the natural numbers and plane and solid geometry.</li><li>Mathematics is that which mathematicians study.</li><li>Mathematicians are those humans who advance human understanding of mathematics.</li></ul></i><br /><br />Manch Futurist mag sich an der diskriminierenden Verwendung des Wortes "human" stören. Diesen Mangel könnte man vielleicht zum Vorteil umkehren, um den Begriff der Intelligenz zu klären: wir könnten sagen, die Objekte einer Klasse C sind mindestens so intelligent wie Menschen, wenn in der obigen Definition die Klasse der Menschen durch C ersetzt werden kann, ohne das dadurch definierte Feld der Mathematik zu verkleinern.<br /><br />Viel Spaß bei der Spekulation über diese Definitionen.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0tag:blogger.com,1999:blog-36137506.post-65158561761173498562009-10-12T12:30:00.000+02:002009-10-12T12:34:35.675+02:00Math reading #1: Volume bounds for lattice polytopes with interior lattice pointsSo I got a bit into the mood of writing down mathematical stuff for the web. Since reading and trying to understand stuff is a good exercise anyway, I decided to start an experiment in which I will attempt to digest and explain some mathematical paper, or chapter from a book, or something similar on a semi-regular basis.<br /><br />I hope that these digests will be useful to other people, but in the end their real purpose is to force myself to organize my own thoughts a bit better - and perhaps to serve as an archive for myself where I can look things up in the future.<br /><br />Without further blah-blah, here's the first installment: <a href="http://sma.epfl.ch/~haehnle/blog/mr01-volume-bounds.xhtml">Volume bounds for lattice polytopes with interior lattice points</a>Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0tag:blogger.com,1999:blog-36137506.post-55127009429094069662009-10-03T11:59:00.000+02:002009-10-03T11:59:00.376+02:00Ehrhart polynomials and integer points in polytopesIt's about time for me to write about something mathematical on this blog. I used the opportunity to experiment with <a href="http://www.w3.org/Math/">MathML</a> and <a href="http://www.w3.org/Graphics/SVG/">SVG</a>. Unfortunately, the state of these technologies is rather horrible, which is why I can't write the actual entry in the blog itself. HTML 5 promises to improve things, but it's not quite there yet.<br /><br /><a href="http://sma.epfl.ch/~haehnle/blog/ehrhart.xhtml">So here is a link to my text on Ehrhart polynomials.</a><br /><br />There's a little bit of backstory here which I should probably mention. I was reading up on Ehrhart polynomials a while ago, and in particular I was looking for a proof of their existence. Unfortunately, the proofs I found immediately by perusing literature used rather <i>abgewandte Mathematik</i>, which made me sad. So, in a moment of the kind of hubris which is necessary to do these kinds of things, I decided that I could find an elementary proof on my own. I succeeded, and I thought to myself, "Hey, that proof is actually rather simple. I've been looking for something mathematical to write up on my blog, let's just use this."<br /><br />So I started, and I had this goal in mind that I could explain my proof in a way that is understandable to ordinary laypeople. In the process, I had to admit to myself that the proof is probably not <em>that</em> simple.<br /><br />You see, I am not writing for the kind of people who are uninterested in mathematics - that would be futile - but I do want my writing to be interesting and useful for other students of mathematics and interested laypeople. Sometimes, I like to try to write a text where my yardstick is, "Would I have been able to follow and appreciate this text at the beginning of my university studies?" Of course it is not always feasible to write texts like that, and it is actually incredibly hard to tell whether I achieve this goal because I have mostly forgotten who I was five years ago. Trying to see things from that older perspective is not easy.<br /><br />I do hope that I have succeeded, and while the MathML was annoying to write, it was ultimately enjoyable because I could touch a large number of ideas and areas that are relevant to my daily work.<br /><br />In the future, I will probably experiment with <a href="http://www1.chapman.edu/~jipsen/mathml/asciimathsyntax.html">ASCIIMathML</a>, which I discovered a bit too late. It appears to offer a reasonable solution to the verbosity of MathML.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0tag:blogger.com,1999:blog-36137506.post-24350027772238178662009-09-29T15:06:00.002+02:002009-09-29T15:06:00.522+02:00Firefox Cache Viewer und Google BooksNeulich wollte ich einen Artikel aus einer Zeitschrift lesen, die zwar frei zugänglich ist, aber leider nur die Jahrgänge ab 1997 als PDF anbietet. Die älteren Jahrgänge stehen bei uns in der Bibliothek im Magazin und sind auf Google Books auch abrufbar. Allerdings kann man von Google Books aus nicht drucken oder gar PDFs abrufen, und der Abruf eines Artikels aus dem Magazin ist auch nicht gerade benutzerfreundlich.<br /><br />Mit Hilfe des <a href="https://addons.mozilla.org/de/firefox/addon/2489">Cache Viewer</a>-Plugins für Firefox kann man dagegen auf Low-Tech-Ebene leicht die geladenen PNGs exportieren und danach mit üblichen Kommandozeilentools (bzw. unter Mac OS X auch mit dem Automator) in ein PDF konvertieren. Ich war glücklich.<br /><br />Es gibt übrigens auch ein <a href="http://www.codeplex.com/GoogleBookDownloader">Open-Source-Werkzeug</a> für diese Aufgabe unter Windows, das allerdings vermutlich in Schwierigkeiten gerät, wenn sich die Interna von Google Books in der Zukunft einmal ändern sollten.Nicolai Hähnlehttp://www.blogger.com/profile/18235566517992076346noreply@blogger.com0