Compiling Non-Strict Parallel Languages for a Threaded Abstract Machine
Klaus Erik Schauser
(Professor D. E. Culler)
IBM Graduate Fellowship and (NSF) CCR-90-58432 (PYI)

The goal of our research is to develop effective compilation
techniques for parallel, non-strict languages, such as the extended
functional language Id. It has been shown that these languages allow
an elegant formulation of a broad class of programs while exposing
substantial parallelism. However, they require fine-grain dynamic
scheduling and synchronization, so an efficient implementation on
conventional machines is challenging. Dataflow architectures support
fine-grain synchronization and tolerance to latency in hardware, but
present a host of implementation challenges. We employ dataflow graphs
as an intermediate program representation, which is compiled further
into a collection of sequential threads for a threaded abstract
machine (TAM) [1]. TAM recognizes a hierarchy of synchronization and
scheduling events.

In the TAM model, a program consists of a collection of code-blocks,
each of which have a number of threads that execute in the context of
an activation frame. A thread is a sequence of instructions that
execute in order. Scheduling among threads is dynamic. Very important
to the model is that scheduling and resource management are completely
visible to the compiler and that a change of threads does not imply a
context swap as in most multi-threaded machines. The frequent case is
that closely related threads will be scheduled close together, and
values will be passed between threads in registers.  Register
management under this quasi-static scheduling presents an interesting
compilation problem.

The second version of a compiler to translate Id into TAM code is
complete and available for the CM-5 multiprocessor, as well as
conventional workstations. It is being used for development of Id
applications at several universities. Studies demonstrate performance
comparable to state-of-the-art dataflow processors. The compiler uses
the Id front end developed at MIT and uses a dual control-flow and
dataflow graph representation in the translation to threads. The
control-flow graph is used in partitioning the program into threads;
the current strategy is to make the threads as large as possible to
minimize the number of synchronization points. The dataflow graph is
used for register and frame slot assignment. The dual graph form
allows us to address several subtle issues, such as interactions
between instruction ordering and register management [2].

Further work will improve the partitioning into sequential threads by
using interprocedural analysis techniques [3]. In addition, we want to
integrate the TAM back end into the Id compiler, so that it can
directly produce native code for the CM-5 and other machines. This
will allow us to conduct a detailed study of performance gains due to
carrying data in registers across threads that are scheduled together.

[1] D. E. Culler, A. Sah, K. E. Schauser, T. von Eicken, and
J. Wawrzynek, Fine-Grain Parallelism with Minimal Hardware Support: A
Compiler-Controlled Threaded Abstract Machine, UC Berkeley Computer
Science Division, Report No. UCB/CSD 90/594, September 1990. Also, 4th
Int. Conf. Architectural Support for Programming Languages and
Operating Systems, Santa Clara, CA, April 1991.

[2] K. E. Schauser, D. E. Culler, and T. von Eicken,
"Compiler-Controlled Multithreading for Lenient Parallel Languages,"
Proc. Functional Prog. Lang. Comp. Architecture Conf., Cambridge, MA,
August 1991. Also, UC Berkeley Computer Science Division, Report
No. UCB/CSD 90/644, July 1991.

[3] K. R. Traub, D. E. Culler, and K. E. Schauser, "Global Analysis
for Partitioning Non-Strict Programs into Sequential Threads,"
Proc. ACM Conf. LISP and Functional Programming, San Francisco, CA,
June 1992.