Montag, Dezember 19, 2011

The AI Challenge - a look back

For the last four weeks, I have worked on my submission for the Google AI Challenge. The deadline has passed this morning, so it is time to relax while the official final tournament is running on the contest servers. Until yesterday, my submission was ranked quite consistently in the top 20. Then I uploaded my final version (which resets the skill) which was quite consistently better on the unofficial TCP servers, but given that everybody else was doing last minute tweaks, too, it's far too early to call.

I enjoyed the spirit of this contest immensely, and now I would like to document some of my thoughts on how my submission works. I have uploaded the source code to Github (https://github.com/nhaehnle/aiant) so you can peruse it while following this blog entry if you wish.

High level structure

The bot is divided into a number of modules that select high-level goals for the ants it controls. This is done in a very straightforward way. Every ant initially has no direction to go in (the field Ant::assigneddirection is initialized to false each turn). The high-level modules then simply assign directions to ants that do not have one yet, and the order in which the modules are called reflects the relative importance I assign to the various tasks. For example, the HillDefense module will only assign ants that have not been assigned by the FoodSeeker.

There are two modules that fall outside of this basic structure: The Zoc ("zone of control") module does not steer any ants. Instead, it keeps track of how fast my ants vs. enemy ants can reach each square of the map. And the Tactical module overrides the previous decisions if necessary for ants that may be involved in combat.

The strategy modules

The following strategy modules are used by the bot, assigning jobs to ants in the given order:
  1. FoodSeeker: Getting enough food is probably the most important problem of the game, and so this module comes first. It greedily sends the closest ant to each item of food that is visible, using several breadth-first-searches.
  2. HillDefense: When one of the bot's hills can be reached in few turns by the enemy, this module ensures that a few ants (adjusted based on the number of enemy ants in sight) stick close to the hill.

    An important tweak of this code is that it does not send ants back to the hill indiscriminately. Instead, it only recalls an ant if it is too far away from the hill relative to the closest enemy. This way, ants are not needlessly wasted on patrol duty. It would probably be a good idea to treat multi-hill maps specially, but this is not done.
  3. OpportunisticAttack: This rather stupid piece of code ensures that the ants move more aggressively towards enemy hills. After all, that is the only way to win the game.
  4. Scout: This module assigns a tiny number of ants to re-explore squares that have not been seen in a long time.

    This is needed because the rest of the code uses the Zoc module to understand that an enemy can never come out of a cul-de-sac once it's been secured. So without some re-scouting logic, the bot would simply ignore the food in those secured locations!
  5. Diffusion: This is a very ad-hoc heuristic to spread out my ants better than they would otherwise. There would probably have been some potential for improvement in this part of the code.
  6. Zoc-based movement: any ant that has not been assigned a move up to this point will simply use the Zoc data to move closer to the enemy. This is done in Bot::makeMoves rather than in a separate module.
On a "historical" note, it may be interesting to know that I started out with just the FoodSeeker and the Zoc-based movement. Together with Tactical, this was enough to get into the top 100 a bit more than two weeks before the deadline.

The tactical logic

The idea behind the TacticalSm module is simple:
  1. Carve out small Submaps that contain all the ants that could potentially be involved in combat on the next turn.
  2. Generate potential moves for both my bot and the enemy. The combination of Submap and associated PlayerMoves is called a Theater.
  3. Evaluate those moves using some simple scoring function.
  4. Try to find better moves for both parties.
  5. Repeat until time runs out.
  6. Pick a move using one of two strategies.
Note that my tactical code doesn't understand situations involving enemy ants of more than one enemy player. This is certainly a limitation, but it's hard to tell how bad it really is.

Most of the tactical code is about engineering, and careful consideration of the scoring function. For example, during a major code reorganization halfway through the competition, the tactical code stopped considering proximity to food in the scoring function. That hurt the performance of my bot quite significantly, and my bot's skill recovered when I turned that part of the scoring function back on.

There are really only two clever parts. One is to use a scoring function that, by default, assigns a higher value to the own ants, to avoid costly trades of ants, but to turn on aggressive mode if the own ants overpower the enemy. This switch is done randomly based on the ratio of ants, where the probability to turn on aggressive mode is conceptually a logistic function in the log of the ants ratio.

The second clever part is to use a small bit of rock-paper-scissors logic in the final move selection. My first method to select a move is a simple min-max (or rather, max-min): pick the move that maximizes the minimum scoring value I could possibly get, given all enemy moves under consideration. Since this is a rather conservative choice of moves, especially given that there is absolutely no look-ahead, I implemented max-average as a second strategy: choose the move that maximizes the average scoring value, where the average is over all enemy moves, weighted by their max-min value.

Of course, this strategy may sometimes be too aggressive. What's more, the best strategy may depend on the opponent. Therefore, my bot uses a variant of the randomized weighted majority algorithm to pick the move-selection strategy. At the beginning of each turn, my bot determines what the best strategy would have been in hindsight to update the weights of the strategy. One important tweak is that the weights depend both on which enemy player my bot is facing, and on the number of ants in play.

Discarded experiments

I experimented with adding more move selection strategies, but the results were not convincing at all, perhaps because it takes longer for the bot to learn which strategy to choose, so I scrapped that again.

I also implemented map symmetry detection to guess unseen enemy hills and a corresponding module for a coordinated offense against enemy hills. The code is still there, but I have disabled it. The simple implementation I have is far too aggressive and wasteful, and I didn't feel like trying to tweak it to a point where it becomes useful.

I also did some experiments with an alternative move generation method for my tactical system, as well as a very simple implementation of the sampling-based combat system described by a1k0n on the challenge forums. This simple method performed worse than my tactical code; I have some ideas for improving it (just like a1k0n obviously did), but ultimately did not have the time to try them out. Those experiments can still be found in the Git history if you feel like digging through it.

I tried some automatic tuning of parameters using meta-heuristics (see the test/tune.py script), but somehow that didn't yield very convincing results either.

Code quality?

I have to admit that the code is not very well documented, and some of it is rather hackish. I hope you'll forgive me for this type of one-off code.

There is one thing that I did consistently that I would never do in another type of project, and that is keeping STL containers around as long as possible to avoid re-allocation. I intensively rely on the fact that std::vector::clear does not free the allocated memory, at least in the default implementation of g++. By keeping those vectors around, I want to avoid unpleasant surprises in the performance of memory management.

I don't think this is strictly necessary, given that the bot actually uses surprisingly little memory, but it didn't hurt in this case. It reduces the maintainability of the code, which is why I wouldn't do it on other projects, but maintainability was obviously never a goal for a competition like this one.

A neat randomization trick

"When in doubt, throw a coin" is perhaps the most important lesson that Theoretical Compute Science teaches. When there are several directions to go in that all score the same, which one should you choose? The best approach is to choose randomly.

To facilitate this, I pre-generated all 24 permutations of the 4 directions, and all 120 permutation of the 5 possible moves of an ant, and I randomly choose one of those permutation whenever I have a loop through directions to choose from.

Sometimes, however, I have a loop through variable-sized vectors, for example, when choosing ants for an offensive move in the tactical logic. I would like to have a random permutation for those cases as well, and of course there are algorithms to generate them. But they take time, and the benefit of a perfectly uniform distribution is not that clear.

So here's a little trick I use to permute variable-sized vector of size N. I pick a random prime p out of a decently sized pre-generated list of large primes (larger than any N the code is ever going to see), as well as a random offset ofs and loop through the numbers 0..N-1. But instead of using these numbers i as an index, I use (p*i + ofs) % N as index. Since p is prime different from N, it is invertible modulo N, and therefore the map i -> p*i + ofs is a bijection, aka a permutation. Of course, this is far from a uniformly distributed permutation: there are N! potential permutations, out of which this method can generate at most N * φ(N). But hey: it's good enough for what I need.

Keine Kommentare:

Kommentar veröffentlichen