Parallel search in chess
If you know alphabeta , you will know that it is 'essentially' a serial algorithm : there is a pretty high dependency among the nodes in the tree. (I mention only alphabeta here , though all variations of it have same problems).
This has resulted in two directions of research for parallel search , two approaches so as to speak of :
* Trying to come up with alternate algorithms which try to 'eliminate' (or reduce) this dependency between nodes : lower the dependency , the more 'parallel' the algo becomes.
* Variations of alphabeta itself so that you can parallelize it better.
Most of my experiments with anything other than pvs (variant of alphabeta) has not been very fruitful : they suffer from a variety of problems (MTD(f) , ABDADA , etc) ... and yet some people do seem to be using them quite well !
So it is a case of YMMV I guess.
Among the various algorithms in the second category , I have always found nothing much in the literature which is really good - so I hacked up a algo which is reminiscent of DTS.
I have an iterative search , with YBW (for deciding on split points) and no concrete subtree ownerships (so the tree 'owner' can move between processes) - (skipping the gory details here ;-) - bug me if you want to know more :-P )
Note , I usually run on an SMP box - never have worked on an MPI box and very rarely on a NUMA box : so I have not considered the implications of a MPI or NUMA problems in my design too much. (Yes yes , I try to make sure that localization of mem is highest , etc - but there could be other 'issues').
The above should be boring stuff for most I guess ? :-)
Anyway , the reason why I am blogging about parallel chess algorithm is :
* I have a dual core proc box at home - was eating into me that I am not using it efficiently enough.
* I was bored out of my wits and needed a new challenge - fixing bugs after the major release is done can be a bit boring after a while :-D
and so ....
* I just completed testing my new parallel search which exhibits very good average speedup of around 1.88+ for 2 procs , with a scaling of 1.98+ ! :-D
The worst case behavior is still around 1.4 or so , also the scaling ramp up is not as fast I would like it to be (takes a few secs to get to 1.9x).
Ofcourse other than the parallel search , rest of this new engine is 'borrowed' from my previous engine - so the gameplay is not really much better (I was testing some wacky eval and moveordering ideas earlier using incremental attacktables) : but damn it gives me a kick to see it fly like this :-)
The way I have designed the search , I am not sure till how many processors it will scale , how NUMA will affect its performance , etc.
I expect it to impact it atleast a bit : no I am not even thinking of clusters or other MPI boxes - all that is in the realm of testing and tuning (I suck at tuning :-( ) , but I would expect it to do quite well indeed upto quite a high number of processors.
So tonight , I relax and enjoy the numbers flashing through my screen - tomorrow I port it to solaris and then ... let us see if I can get a 'relatively free' box at office ;-)
2 Comments:
"So tonight , I relax and enjoy the numbers flashing through my screen" - Matrix?
Better than matrix :-)
Matrix codes , I cant understand - these I can :-D
Post a Comment
<< Home