Tagebuch eines Interplanetaren Botschafters

Posits

2017-09-30T20:25:00.000+02:00

The posit number system is a proposed alternative to floating point numbers. Having heard of posits a couple of times now, I'd like to take the time to digest them and, in the second half, write a bit about their implementation in hardware. Their creator makes some bold claims about posits being simpler to implement, and - spoiler alert! - I believe he's mistaken. Posits are still a clever idea and may indeed be a good candidate for replacing floating point in the long run. But trade-offs are an inescapable fact of life.

Floating point revisited

In floating point, numbers are represented by a sign bit, an exponent, and a mantissa:

The value of a normal floating point number is ±1.m₂*2^e (actually, e is stored with a bias in order to be able to treat it like an unsigned number most of the time, but let's not get distracted by that kind of detail). By using an exponent, a wide range of numbers can be represented at a constant relative accuracy.

There are some non-normal floating point numbers. When e is maximal, the number is either considered infinity or "not a number", depending on m. When e is minimal, it represents a sub-normal number: either a denormal or zero.

Denormals can be confusing at first, but their justification is actually quite simple. Let's take single-precision floating point as an example, where there are 8 exponent bits and 23 mantissa bits. The smallest positive normal single-precision floating point number is 1.00000000000000000000000₂*2^-126. The next larger representable number is 1.00000000000000000000001₂*2^-126. Those numbers are not equal, but their difference is not representable as a normal single-precision floating point number. It would be rather odd if the difference between non-equal numbers were equal to zero, as it would be if we had to round the difference to zero!

When e is minimal, the represented number is (in the case of single-precision floating point) ±0.m₂*2^-126, which means that the difference between the smallest normal numbers, 0.00000000000000000000001₂*2^-126, can still be represented.

Note how with floating point numbers, the relative accuracy with which numbers can be represented is constant for almost the entire range of representable numbers. Once you get to sub-normal numbers, the accuracy drops very quickly. At the other end, the drop is even more extreme with a sudden jump to infinity.

Posits

The basic idea of posits is to vary the size of the mantissa and to use a variable-length hybrid encoding of the exponent that mixes unary with binary encodings. The variable-length exponent encoding is shorter for exponents close to zero, so that more bits of mantissa are available for numbers close to one.

Posits have a fixed number of binary exponent bits e (except in the extreme ranges), and a posit system is characterized by that number. A typical choice appears to be es = 3. The unary part of the exponent is encoded by the r bits. For positive posits, 10₁ encodes 0, 110₁ encodes 1, 01₁ encodes -1, 001₁ encodes -2, and so on. The overall encoded number is then ±1.m₂*2^{r*2^es + e}.

Let's look at some examples of 16-bit posits with es = 3.

0 10 000 0000000000 is 1.0₂*2^0*2³+0 = 1.
0 10 000 1000000000 is 1.1₂*2^0*2³+0 = 1.5.
0 10 001 0000000000 is 1.0₂*2^0*2³+1 = 2.
0 10 111 0000000000 is 1.0₂*2^0*2³+7 = 128.
0 110 000 000000000 is 1.0₂*2^1*2³+0 = 256.
0 1110 000 00000000 is 1.0₂*2^2*2³+0 = 65536.
0 111111111110 000 is 1.0₂*2^10*2³+0 = 2⁸⁰. Note that there is no mantissa anymore! The next larger number is:
0 111111111110 001 is 1.0₂*2^10*2³+1 = 2⁸¹.
0 111111111110 111 is 1.0₂*2^10*2³+7 = 2⁸⁷.
0 1111111111110 00 is 1.0₂*2^11*2³+0 = 2⁸⁸. Now the number of binary exponent bits starts shrinking. The missing bits are implicitly zero, so the next larger number is:
0 1111111111110 01 is 1.0₂*2^11*2³+2 = 2⁹⁰.
0 1111111111110 11 is 1.0₂*2^11*2³+6 = 2⁹⁴.
0 11111111111110 0 is 1.0₂*2^12*2³+0 = 2⁹⁶.
0 11111111111110 1 is 1.0₂*2^12*2³+4 = 2¹⁰⁰.
0 111111111111110 is 1.0₂*2^13*2³+0 = 2¹⁰⁴. There are no binary exponent bits left, but the presentation in the slides linked above still allows for one larger normal number:
0 111111111111111 is 1.0₂*2^14*2³+0 = 2¹¹².
Going in the other direction, we get:
0 01 111 0000000000 is 1.0₂*2^-1*2³+7 = 1/2 = 0.5.
0 01 000 0000000000 is 1.0₂*2^-1*2³+0 = 1/256 = 0.00390625.
0 001 000 000000000 is 1.0₂*2^-2*2³+0 = 1/65536 = 0.0000152587890625.
0 000000000001 111 is 1.0₂*2^-11*2³+7 = 2^-81.
0 000000000001 000 is 1.0₂*2^-11*2³+0 = 2^-88.
0 0000000000001 11 is 1.0₂*2^-12*2³+6 = 2^-90.
0 0000000000001 00 is 1.0₂*2^-12*2³+0 = 2^-96.
0 00000000000001 0 is 1.0₂*2^-13*2³+0 = 2^-104.
0 000000000000001 is 1.0₂*2^-14*2³+0 = 2^-112. This is the smallest positive normal number, since we have no choice but to treat 0 specially:
0 000000000000000 is 0.

For values close to 1, the accuracy is the same as for half-precision floating point numbers (which have 5 exponent and 10 mantissa bits). Half-precision floating point numbers do have slightly higher accuracy at the extreme ends of their dynamic range, but the dynamic range of posits is much higher. This is a very tempting trade-off for many applications.

By the way: if we had set es = 2, we could have larger accuracy for values close to 1, while still having a higher dynamic range than half-precision floating point.

You'll note that we have not encountered an infinity. Gustafson's proposal here is to do away with the distinction between positive and negative zero and infinity. Instead, his proposal is to think of the real numbers projectively, and use a two's complement representation, meaning that negating a posit is the same operation at the bit level as negating an integer. For example:

1 111111111111111 is -1.0₂*2^-14*2³+0 = -2^-112.
1 10 000 0000000000 is -1.0₂*2^0*2³+0 = -1. The next smaller number (larger in absolute magnitude) is:
1 01 111 1111111111 is -1.0000000001₂*2^0*2³+0.
1 01 111 1000000000 is -1.1₂*2^0*2³+0 = -1.5
1 000000000000001 is -1.0₂*2^14*2³+0 = -2¹¹².

The bit pattern 1 000000000000000 (which, like 0, is its own inverse in two's complement negation) would then represent infinity.

There's an elegance to thinking projectively in this way. Comparison of posits is the same as comparison of signed integers at the bit level (except for infinity, which is unordered). Even better, it's great that the smallest and largest normal numbers are multiplicative inverses of each other.

But to people used to floating point, not having a "sign + magnitude" representation is surprising. I also imagine that it could be annoying for a hardware implementation, so let's look into that.

Hardware implementations

In his presentations, Gustafson claims that by reducing the number of special cases, posits are easier to implement than floating point. No doubt there are fewer special cases (no denorms, no NaNs), but at the cost of a more complicated normal case.

Let's take a look at a floating point multiply. The basic structure is conceptually quite simple, since all parts of a floating point number can be treated separately:

By far the most expensive part here is the multiplication of the mantissas. There are of course a bunch of special cases that need to be accounted for: the inputs could be zero, infinity, or NaN, and the multiplication could overflow. Each of these cases are easily detected and handled with a little bit of comparatively inexpensive boolean logic.

Where it starts to get complicated is when handling the possibility that an input is denormal, or when the multiplication of two normal numbers results in a denormal.

When an input is denormal, the corresponding input for the multiply is 0.m instead of 1.m. Some logic has to decide whether the most significant input bit to the multiply is 0 or 1. This could potentially add to the latency of the computation. Luckily, deciding whether the input is denormal is fairly simple, and only the most significant input bit is affected. Because of carries, the less significant input bits tend to be more critical for latency. Conversely, this means that the latency of determining the most significant input bit can be hidden well.

On the output side, the cost is higher, both in terms of the required logic and in terms of the added latency, because a shifter is needed to shift the output into the correct position. Two cases need to be considered: When a multiplication of two normal numbers results in a denormal, the output has to be shifted to the right an appropriate number of places.

When a denormal is multiplied by a normal number, the output needs to be shifted to the left or the right, depending on the exponent of the normal number. Additionally, the number of leading zeros of either the denormal input or of the multiplication output is required to determine the exponent of the final result. Since the area cost is the same either way, I would expect implementations to determine the leading zero of the denormal input, since that allows for better latency hiding.

(The design space for floating point multipliers is larger than I've shown here. For example, you could deal with denormals by shifting their mantissa into place before the multiply. That seems like a waste of hardware considering that you cannot avoid the shifter after the multiply, but my understanding of hardware design is limited, so who knows.)

So there is a bit more hardware required than just what is shown in the diagram above: a leading-zero-count and a shifter, plus a bit more random logic. But now compare to the effort required for a posit multiply:

First of all, there is unavoidable latency in front of the multiplier. Every single bit of mantissa input may be masked off, depending on the variable size of the exponent's unary part. The exponents themselves need to be decoded in order to add them, and then the resulting exponent needs to be encoded again. Finally, the multiplication result needs to be shifted into place; this was already required for floating point multiplication, but the determination of the shift becomes more complicated since it depends on the exponent size. Also, each output bit needs a multiplexer since it can originate from either the exponent or the mantissa.

From my non-expert glance, here's the hardware you need in addition to the multiplier and exponent addition:

two leading-bit counts to decode the unary exponent parts (floating-point multiply only needs a single leading-zero count for a denormal input)
two shifters to shift the binary input exponent parts into place
logic for masking the input mantissas
one leading bit encoder
one shifter to shift the binary output exponent part into place
one shifter to shift the mantissa into place (floating-point also needs this)
multiplexer logic to combine the variable-length output parts

Also note that the multiplier and mantissa shifter may have to be larger, since - depending on the value of es - the mantissa of posits close to 1 can be larger than the mantissa of floating point numbers.

On the other hand, the additional shifters don't have to be large, since they only need to shift es bits. The additional hardware is almost certainly dominated by the cost of the mantissa multiplier. Still, the additional latency could be a problem - though obviously, I have no actual experience designing floating point multipliers.

There's also the issue of the proposed two's complement representation for negative posits. This may not be too bad for the mantissa multiplication, since one can probably treat it as a signed integer multiplication and automatically get the correct signs for the resulting mantissa. However, I would expect some more overhead for the decoding and encoding of the exponent.

The story should be similar for posit vs. floating point addition. When building a multiply-accumulate unit, the latency that is added for masking the input based on the variable exponent length can likely be hidden quite well, but there does not appear a way around the decoding and encoding of exponents.

Closing thoughts

As explained above, I expect posit hardware to be more expensive than floating point hardware. However, the gain in dynamic range and accuracy is nothing to sneeze at. It's worth giving posits a fair shot, since the trade-off may be worth it.

There is a lot of legacy software that relies on floating point behavior. Luckily, a posit ALU contains all the pieces of a floating point ALU, so it should be possible to build an ALU that can do both at pretty much the cost of a posit-only ALU. This makes a painless transition feasible.

Posits have an elegant design based on thinking about numbers projectively, but the lack of NaNs, the two's complement representation, and not having signed zeros and infinities may be alien to some floating point practicioners. I don't know how much of an issue this really is, but it's worth pointing out that a simple modification to posits could accommodate all these concerns. Using again the example of 16-bit posits with es = 3, we could designate bit patterns at the extreme ends as NaN and infinity:
0 111111111111111 is +inf (instead of 2¹¹²).
0 000000000000001 is +NaN (instead of 2^-112).
We could then treat the sign bit independently, like in floating point, giving us ±0, ±inf, and ±NaN. The neat properties related to thinking projectively would be lost, but the smallest and largest positive normal numbers would still be multiplicative inverses of each other. The hardware implementation may even be smaller, thanks to not having to deal with two's complement exponents.

The inertia of floating point is massive, and I don't expect it to be unseated anytime soon. But it's awesome to see people rethinking such fundamental building blocks of computing and coming up with solid new ideas. Posits aren't going to happen quickly, if at all, but it's worth taking them seriously.

Interesting mistakes

2015-02-28T20:25:00.000+01:00

I taught a course in Linear and Integer Optimization last semester, the exam of which was held last week. I'm fairly pleased with the results. Grading an exam is often painful when you realize that some students failed to understand important concepts, but overall it appears that my expectations about the difficulty of this exam were mostly accurate. Sometimes, however, students fail at a problem for reasons that we do not anticipate. One problem of the exam was to prove that when P and Q are (convex) polytopes, then so is their Minkowski sum P + Q.

I am aware of two reasonable approaches to this problem. The first is to use inequality representations to argue that the product P x Q is a polyhedron, and then appeal to the fact (shown in the lecture) that the image under the linear map (p, q) -> p + q is a polyhedron (boundedness is of course easy to see, and naturally there are some variations of this approach). The second approach is to use representations of P and Q as convex hulls of finitely many points, and to show that P + Q is the convex hull of the pairwise sum of such points. This second proof is elementary but requires a bit more work because one has to argue about the coefficients in convex combinations.

To my surprise, several students attempted a third route. They started with inequality representations and used the boundedness to argue that given one of the inequalities ax <= b that define P, the maximum of ax over Q is finite. They then tried to prove that P + Q can be obtained by simply using both the inequalities defining P and those defining Q together, with the right-hand sides increased by the corresponding (finite) maximum over the other polytope. This approach cannot possibly work because the number of facets of the Minkowski sum can be exponentially larger than the number of facets of the summands (the permutahedron is a cute example of this). And yet, in hindsight, it is a very plausible approach to try if one doesn't know about this fact.

As mistakes go, I would call this an interesting mistake. In some sense it isn't even a mistake at all, just a false trail into the woods that unfortunately some students got lost in.

Finding duplicate elements in an array

2014-04-25T14:55:00.000+02:00

This is a story about the power of randomness and universal hashing.

Suppose you are given an array of elements of some fixed type. How would you go about figuring out whether two elements of the array are equal?

At the time that I am writing this post, searching for its title leads almost universally to discussions about an interview-type puzzle question where you are given an array of size N that contains integers in the range from 0 to N-1. Indeed, clever solutions to this problem are proposed. Most of them amount to the same kind of trickery that radix sort employs to get a linear-time sorting algorithm. I hope the reader agrees that this is cute, but of limited use.

The most straightforward efficient solution to this problem that applies to general types is to sort the array using an arbitrary sorting function. Then a simple linear scan suffices to answer the question. The overall running time of this approach is O(N log N), and Misra and Gries showed in the 1980s that no comparison-based algorithm can be asymptotically faster.

Let me present their argument, which should feel familiar to anybody who has seen the proof that comparison-based sorting requires time Omega(n log n) (the result in the paper is more general).

Every deterministic algorithm for finding duplicates in an array using comparisons can be thought of as a family of infinitely many ternary decision trees, one tree for every array size. For any given array size, the algorithm starts at the root node of the tree for that size, comparing the elements at the indices given by that node. Depending on how they compare, the algorithm moves down in the tree. This process continues until a leaf node is reached which is labelled either YES (duplicate elements have been found) or NO (all elements are unique). Here is a decision tree for an array of size 3:

We will argue that every decision tree for arrays of size N must have more than N! leaves. Via Stirling's approximation, this implies that any such tree has depth at least Omega(N log N), which in turn implies the same bound for the worst-case running time. In fact, the average depth of leaves is Omega(N log N). This means that even a randomized algorithm, which we can think of as choosing one of many decision trees at random, has an expected worst-case running time of at least Omega(N log N).

Let us fix a decision tree T for arrays of size N. There are N! different permutations of size N. We can think of such a permutation as an array of size N filled with the integers from 1 to N without duplicates. Let P1 and P2 be two different such permutations/arrays, and define the array Q as

Q[i] := min(P1[i], P2[i])

The array Q contains duplicate entries. In fact, let m be the smallest number that appears at different positions in P1 and P2. Then m appears twice in Q.

As a consequence, the computation for Q must end in a YES-leaf of T, while P1 and P2 must lead to NO-leaves. What if P1 and P2 lead to the same NO-leaf N? Let V be a node on the path to N, and let us say that i and j are the array indices that are compared at the node V. Then the comparison P1[i] vs. P1[j] has the same result as the comparison P2[i] vs. P2[j]. But then, by the definition of Q, the comparison Q[i] vs. Q[j] also has the same result. This means that the computation for Q must also follow the same path and end in a NO-leaf! That is clearly a contradiction, and so we can conclude that the computations for P1 and P2 must end in different NO-leaves. Hence, the decision tree T must have at least N! NO-leaves, and this completes the proof.

So, to summarize up to this point: As long as we can only compare array elements, no algorithm can beat the running time of O(N log N) that a simple sort followed by a linear scan offers. However, this isn't the end of the story.

Suppose you have a universal family of hash functions for the type of elements in the array (and indeed, such families are not too difficult to construct). That is, suppose you can randomly choose a hash function h:U -> [m], with m within a constant factor of N, such that for different array elements A[i] and A[j] you know that the probability of a hash collision is low:

Pr[h(A[i]) = h(A[j])] <= 1/N

Note that if the guarantee is slightly weaker, e.g. a bound of 2/N on the probability of a collision, the following argument will still work with only minor modifications to the constants in the computations.

We will now follow a simple strategy: Build a hash table of the elements of A according to h. Since equal elements will be hashed into the same bucket, it is then sufficient to check for duplicate elements within each bucket by sorting and a linear scan, or by an even simpler quadratic check.

What is the running time of this approach? If b(i) is the number of array elements mapped into the i-th bucket, then even the running time of the simple quadratic check can be bounded by

sum(b(i)², i=1..m),

which is exactly the number of hash collision pairs:

sum(b(i)², i=1..m) = #{ (i,j) : h(A[i]) = h(A[j]) } = N + #{ (i,j) : i != j, h(A[i]) = h(A[j]) }

The expected cardinality of the last part is bounded by the number of pairs (i,j) with i != j times our bound on the probability that a pair results in a hash collision. This product is N-1. That is, the expected running time of the algorithm we have outlined is linear.

Observe that we must choose the hash function randomly for this approach to work. If we were to use a fixed, deterministic hash function, then an adversary could construct an array in which all elements hash to the same bucket (using the pigeonhole principle and assuming that the universe U of possible array elements is large enough). We would be back in the case where only comparisons are allowed, and hence a linear running time is impossible.

So we see that the combination of hashing and randomness, in the form of universal hashing, allows us to solve the original problem in a running time that is asymptotically better than what is achievable using deterministic algorithms.

The AI Challenge - a look back

2011-12-19T11:03:00.000+01:00

For the last four weeks, I have worked on my submission for the Google AI Challenge. The deadline has passed this morning, so it is time to relax while the official final tournament is running on the contest servers. Until yesterday, my submission was ranked quite consistently in the top 20. Then I uploaded my final version (which resets the skill) which was quite consistently better on the unofficial TCP servers, but given that everybody else was doing last minute tweaks, too, it's far too early to call.

I enjoyed the spirit of this contest immensely, and now I would like to document some of my thoughts on how my submission works. I have uploaded the source code to Github (https://github.com/nhaehnle/aiant) so you can peruse it while following this blog entry if you wish.

High level structure

The bot is divided into a number of modules that select high-level goals for the ants it controls. This is done in a very straightforward way. Every ant initially has no direction to go in (the field Ant::assigneddirection is initialized to false each turn). The high-level modules then simply assign directions to ants that do not have one yet, and the order in which the modules are called reflects the relative importance I assign to the various tasks. For example, the HillDefense module will only assign ants that have not been assigned by the FoodSeeker.

There are two modules that fall outside of this basic structure: The Zoc ("zone of control") module does not steer any ants. Instead, it keeps track of how fast my ants vs. enemy ants can reach each square of the map. And the Tactical module overrides the previous decisions if necessary for ants that may be involved in combat.

The strategy modules

The following strategy modules are used by the bot, assigning jobs to ants in the given order:

FoodSeeker: Getting enough food is probably the most important problem of the game, and so this module comes first. It greedily sends the closest ant to each item of food that is visible, using several breadth-first-searches.
HillDefense: When one of the bot's hills can be reached in few turns by the enemy, this module ensures that a few ants (adjusted based on the number of enemy ants in sight) stick close to the hill.

An important tweak of this code is that it does not send ants back to the hill indiscriminately. Instead, it only recalls an ant if it is too far away from the hill relative to the closest enemy. This way, ants are not needlessly wasted on patrol duty. It would probably be a good idea to treat multi-hill maps specially, but this is not done.
OpportunisticAttack: This rather stupid piece of code ensures that the ants move more aggressively towards enemy hills. After all, that is the only way to win the game.
Scout: This module assigns a tiny number of ants to re-explore squares that have not been seen in a long time.

This is needed because the rest of the code uses the Zoc module to understand that an enemy can never come out of a cul-de-sac once it's been secured. So without some re-scouting logic, the bot would simply ignore the food in those secured locations!
Diffusion: This is a very ad-hoc heuristic to spread out my ants better than they would otherwise. There would probably have been some potential for improvement in this part of the code.
Zoc-based movement: any ant that has not been assigned a move up to this point will simply use the Zoc data to move closer to the enemy. This is done in Bot::makeMoves rather than in a separate module.

On a "historical" note, it may be interesting to know that I started out with just the FoodSeeker and the Zoc-based movement. Together with Tactical, this was enough to get into the top 100 a bit more than two weeks before the deadline.

The tactical logic

The idea behind the TacticalSm module is simple:

Carve out small Submaps that contain all the ants that could potentially be involved in combat on the next turn.
Generate potential moves for both my bot and the enemy. The combination of Submap and associated PlayerMoves is called a Theater.
Evaluate those moves using some simple scoring function.
Try to find better moves for both parties.
Repeat until time runs out.
Pick a move using one of two strategies.

Note that my tactical code doesn't understand situations involving enemy ants of more than one enemy player. This is certainly a limitation, but it's hard to tell how bad it really is.

Most of the tactical code is about engineering, and careful consideration of the scoring function. For example, during a major code reorganization halfway through the competition, the tactical code stopped considering proximity to food in the scoring function. That hurt the performance of my bot quite significantly, and my bot's skill recovered when I turned that part of the scoring function back on.

There are really only two clever parts. One is to use a scoring function that, by default, assigns a higher value to the own ants, to avoid costly trades of ants, but to turn on aggressive mode if the own ants overpower the enemy. This switch is done randomly based on the ratio of ants, where the probability to turn on aggressive mode is conceptually a logistic function in the log of the ants ratio.

The second clever part is to use a small bit of rock-paper-scissors logic in the final move selection. My first method to select a move is a simple min-max (or rather, max-min): pick the move that maximizes the minimum scoring value I could possibly get, given all enemy moves under consideration. Since this is a rather conservative choice of moves, especially given that there is absolutely no look-ahead, I implemented max-average as a second strategy: choose the move that maximizes the average scoring value, where the average is over all enemy moves, weighted by their max-min value.

Of course, this strategy may sometimes be too aggressive. What's more, the best strategy may depend on the opponent. Therefore, my bot uses a variant of the randomized weighted majority algorithm to pick the move-selection strategy. At the beginning of each turn, my bot determines what the best strategy would have been in hindsight to update the weights of the strategy. One important tweak is that the weights depend both on which enemy player my bot is facing, and on the number of ants in play.

Discarded experiments

I experimented with adding more move selection strategies, but the results were not convincing at all, perhaps because it takes longer for the bot to learn which strategy to choose, so I scrapped that again.

I also implemented map symmetry detection to guess unseen enemy hills and a corresponding module for a coordinated offense against enemy hills. The code is still there, but I have disabled it. The simple implementation I have is far too aggressive and wasteful, and I didn't feel like trying to tweak it to a point where it becomes useful.

I also did some experiments with an alternative move generation method for my tactical system, as well as a very simple implementation of the sampling-based combat system described by a1k0n on the challenge forums. This simple method performed worse than my tactical code; I have some ideas for improving it (just like a1k0n obviously did), but ultimately did not have the time to try them out. Those experiments can still be found in the Git history if you feel like digging through it.

I tried some automatic tuning of parameters using meta-heuristics (see the test/tune.py script), but somehow that didn't yield very convincing results either.

Code quality?

I have to admit that the code is not very well documented, and some of it is rather hackish. I hope you'll forgive me for this type of one-off code.

There is one thing that I did consistently that I would never do in another type of project, and that is keeping STL containers around as long as possible to avoid re-allocation. I intensively rely on the fact that std::vector::clear does not free the allocated memory, at least in the default implementation of g++. By keeping those vectors around, I want to avoid unpleasant surprises in the performance of memory management.

I don't think this is strictly necessary, given that the bot actually uses surprisingly little memory, but it didn't hurt in this case. It reduces the maintainability of the code, which is why I wouldn't do it on other projects, but maintainability was obviously never a goal for a competition like this one.

A neat randomization trick

"When in doubt, throw a coin" is perhaps the most important lesson that Theoretical Compute Science teaches. When there are several directions to go in that all score the same, which one should you choose? The best approach is to choose randomly.

To facilitate this, I pre-generated all 24 permutations of the 4 directions, and all 120 permutation of the 5 possible moves of an ant, and I randomly choose one of those permutation whenever I have a loop through directions to choose from.

Sometimes, however, I have a loop through variable-sized vectors, for example, when choosing ants for an offensive move in the tactical logic. I would like to have a random permutation for those cases as well, and of course there are algorithms to generate them. But they take time, and the benefit of a perfectly uniform distribution is not that clear.

So here's a little trick I use to permute variable-sized vector of size N. I pick a random prime p out of a decently sized pre-generated list of large primes (larger than any N the code is ever going to see), as well as a random offset ofs and loop through the numbers 0..N-1. But instead of using these numbers i as an index, I use (p*i + ofs) % N as index. Since p is prime different from N, it is invertible modulo N, and therefore the map i -> p*i + ofs is a bijection, aka a permutation. Of course, this is far from a uniformly distributed permutation: there are N! potential permutations, out of which this method can generate at most N * φ(N). But hey: it's good enough for what I need.

Eine elegante Definition von Mathematik

2009-10-24T15:13:00.011+02:00

Die meisten Nichtmathematiker haben nur eine sehr schwammige Vorstellung davon, was Mathematik eigentlich ist. Man muss es ihnen verzeihen, schließlich tun sich Mathematiker selbst auch schwer damit, eine klare Definition zu geben. Einen sehr schönen Versuch einer Definition davon, was Mathematik eigentlich ist, las ich kürzlich in einem Artikel von William Thurston:

Mathematics is the smallest subject satisfying the following:
Mathematics includes the natural numbers and plane and solid geometry.
Mathematics is that which mathematicians study.
Mathematicians are those humans who advance human understanding of mathematics.

Manch Futurist mag sich an der diskriminierenden Verwendung des Wortes "human" stören. Diesen Mangel könnte man vielleicht zum Vorteil umkehren, um den Begriff der Intelligenz zu klären: wir könnten sagen, die Objekte einer Klasse C sind mindestens so intelligent wie Menschen, wenn in der obigen Definition die Klasse der Menschen durch C ersetzt werden kann, ohne das dadurch definierte Feld der Mathematik zu verkleinern.

Viel Spaß bei der Spekulation über diese Definitionen.

Math reading #1: Volume bounds for lattice polytopes with interior lattice points

2009-10-12T12:30:00.000+02:00

So I got a bit into the mood of writing down mathematical stuff for the web. Since reading and trying to understand stuff is a good exercise anyway, I decided to start an experiment in which I will attempt to digest and explain some mathematical paper, or chapter from a book, or something similar on a semi-regular basis.

I hope that these digests will be useful to other people, but in the end their real purpose is to force myself to organize my own thoughts a bit better - and perhaps to serve as an archive for myself where I can look things up in the future.

Without further blah-blah, here's the first installment: Volume bounds for lattice polytopes with interior lattice points

Ehrhart polynomials and integer points in polytopes

2009-10-03T11:59:00.000+02:00

It's about time for me to write about something mathematical on this blog. I used the opportunity to experiment with MathML and SVG. Unfortunately, the state of these technologies is rather horrible, which is why I can't write the actual entry in the blog itself. HTML 5 promises to improve things, but it's not quite there yet.

So here is a link to my text on Ehrhart polynomials.

There's a little bit of backstory here which I should probably mention. I was reading up on Ehrhart polynomials a while ago, and in particular I was looking for a proof of their existence. Unfortunately, the proofs I found immediately by perusing literature used rather abgewandte Mathematik, which made me sad. So, in a moment of the kind of hubris which is necessary to do these kinds of things, I decided that I could find an elementary proof on my own. I succeeded, and I thought to myself, "Hey, that proof is actually rather simple. I've been looking for something mathematical to write up on my blog, let's just use this."

So I started, and I had this goal in mind that I could explain my proof in a way that is understandable to ordinary laypeople. In the process, I had to admit to myself that the proof is probably not that simple.

You see, I am not writing for the kind of people who are uninterested in mathematics - that would be futile - but I do want my writing to be interesting and useful for other students of mathematics and interested laypeople. Sometimes, I like to try to write a text where my yardstick is, "Would I have been able to follow and appreciate this text at the beginning of my university studies?" Of course it is not always feasible to write texts like that, and it is actually incredibly hard to tell whether I achieve this goal because I have mostly forgotten who I was five years ago. Trying to see things from that older perspective is not easy.

I do hope that I have succeeded, and while the MathML was annoying to write, it was ultimately enjoyable because I could touch a large number of ideas and areas that are relevant to my daily work.

In the future, I will probably experiment with ASCIIMathML, which I discovered a bit too late. It appears to offer a reasonable solution to the verbosity of MathML.

Firefox Cache Viewer und Google Books

2009-09-29T15:06:00.002+02:00

Neulich wollte ich einen Artikel aus einer Zeitschrift lesen, die zwar frei zugänglich ist, aber leider nur die Jahrgänge ab 1997 als PDF anbietet. Die älteren Jahrgänge stehen bei uns in der Bibliothek im Magazin und sind auf Google Books auch abrufbar. Allerdings kann man von Google Books aus nicht drucken oder gar PDFs abrufen, und der Abruf eines Artikels aus dem Magazin ist auch nicht gerade benutzerfreundlich.

Mit Hilfe des Cache Viewer-Plugins für Firefox kann man dagegen auf Low-Tech-Ebene leicht die geladenen PNGs exportieren und danach mit üblichen Kommandozeilentools (bzw. unter Mac OS X auch mit dem Automator) in ein PDF konvertieren. Ich war glücklich.

Es gibt übrigens auch ein Open-Source-Werkzeug für diese Aufgabe unter Windows, das allerdings vermutlich in Schwierigkeiten gerät, wenn sich die Interna von Google Books in der Zukunft einmal ändern sollten.