Testing...
...is a somewhat complicated topic, also mentioned in the section "Ratings lists".
To find out if a new revision of Aristarch is better than my reference version, I
play 144 games under the following conditions:
- time control 3+3
- hardware: AMD XP 2400 @ 2.0 GHz
- hash: 64 MB
- opening books: none, but 24 predefined starting positions
- opponents: Shredder 8, Fritz 8, Deep Junior 7, List 5.14, Ruffian 2.1.0, Delfi 4.4
This testing method is still unreliable as 144 games are not enough to find out
small improvements or disimprovements. But it is necessary to find a compromise between
testing time and development cycle.
Sometimes I send an engine to external
testers, but unfortunately this does not help much, because I can only compare
these results to exactly the same test conditions, which are normally not available
or out-dated. Especially helpful are testers who have defined testing conditions and
are able to play lots of games.
An important point is that testing in practical games is not the only way to
determin playing strength:
- Speed tests. If the code changes do not change the engine's behaviour, but only improve
calculation speed, this is an obvious improvement. The faster an engine is, the better it plays.
Any chess engine must be designed to fulfill this requirement, because an engine that
does not become stronger on faster hardware is not desirable.
- Intuition. I would not exchange intuition for 30 testing work stations.
- Test positions. Can be used to check certain positions by hand and find new evaluation
algorithms if the engine cannot handle the position. Unfortunately there is only little correlation
between test positions and practical tournaments, so a set of test positions cannot be used
to optimize an engine. I tried so and achieved great results in common test position sets, but Aristarch
simply became weaker.