The problem of determining the evolutionary history for a set of species, known as the phylogeny problem, is fundamental to molecular biology. Evolutionary history is typically represented by a phylogeny tree, a tree of species with the root being the oldest common ancestor and the children of a node being the species that evolved directly from that node. Each species in a set is represented by a set of traits or character values. One technique for solving the phylogeny problem, called character compatibility, is to search through the power set of characteristics to see which ones are in a sense consistent [Jon94]. The notion of consistency in this problem is the existence of a particular kind of phylogeny tree called a perfect phylogeny tree. The specifics are not important. What is important is the structure of the search space (a power set) and the following property of the perfect phylogeny trees: if none exists for some set of characters S (the set is inconsistent), then none exists for any superset of S.
Although the search space has a known structure, the above property allows for unpredictable pruning, which leads to load imbalance. The RandQue data structure is not appropriate in this application, because there locality is important. However, work stealing, using the TaskStealer data structure in Multipol, provides load balance that is almost as good as RandQue, but with better locality. (Work stealing is provably optimal, in the same sense that randomized task sharing is [BL94].) The basic difference is that TaskStealer leaves tasks on the processor that created them until another processor becomes idle. The TaskStealer was originally used in the Eigenvalue and Gröbner basis problems, until the simpler RandQue proved to work as well [CRY94].
The choice of a data structure for holding a partial solution is more difficult in Phylogeny than in the other search problems, because we need a representation of the result (success or failure) of every node searched so far. Fortunately, because the structure of the search space, this information can be compressed using a trie, but the trie should ideally be shared between processors. We use a lazily replicated trie with global synchronization points for making the shared copies consistent.