<tt>grobner</tt>

`grobner`

The Gröbner basis program [CY93] is a completion procedure which guarantees to terminate successfully. It computes the Gröbner basis of a set of polynomials as follows. For each pair of polynomial a new polynomial is computed; if the new polynomial is shown to be an linear combination of existing ones it is eliminated, otherwise it is added to the set. The RandQue data structure is used to dynamically assign polynomial pairs to processors. The data structure provides good load balance assuming locality is not a concern.

The basis of polynomials has many more reads than writes. Therefore, it is replicated among the processors to reduce communication. One option is full replication, whereby each polynomial is broadcast when added to the basis. However, polynomials can be large and the processors are not synchronizationed, so this disrupts the computation and leads to poor processor utilization. Instead, we use software caching with a Multipol data structure called ObjLayer. Object caching avoids false sharing and fragmentation problems of hardware caches, but has higher address translation overhead. It also has an advantage of flexibility: we use a consistency protocol that is specific to the data structure, and make scheduling decisions based on cache state. For example, when new polynomials are added or old ones simplified, other processors may have stale or incomplete copies of the basis. Fortunately, this does not prevent them from doing useful work. When a processor finds a polynomial that appears to be new, i.e., did not reduce to zero, it locks the basis, obtains a consistent copy of all elements, performs one final check on reducibility, and finally adds the polynomial.

The locking solves the consistency problems and enforces the uniqueness of elements, but it can lead to performance bottlenecks as processors wait for locks. To avoid these overheads, we use multi-threading. If a processor cannot acquire the basis lock to perform the desired task, it suspends the current work and picks up something unrelated. Multi-threading support is built into the Multipol runtime layer and is done at user level to avoid the costs of saving complete thread contexts. As with caching, some hardware designers would place multi-threading support into the hardware. This may lower the cost of threading operations, but gives up some flexibility. In our approach, the library designer can decide whether a cache miss may be ignored (because the value is not essential) or should result in a change of context.

Chih-Po Wen
Wed Sep 13 23:57:28 PDT 1995