Compiling Non-Strict Parallel Languages for a Threaded Abstract Machine Klaus Erik Schauser (Professor D. E. Culler) IBM Graduate Fellowship and (NSF) CCR-90-58432 (PYI) The goal of our research is to develop effective compilation techniques for parallel, non-strict languages, such as the extended functional language Id. It has been shown that these languages allow an elegant formulation of a broad class of programs while exposing substantial parallelism. However, they require fine-grain dynamic scheduling and synchronization, so an efficient implementation on conventional machines is challenging. Dataflow architectures support fine-grain synchronization and tolerance to latency in hardware, but present a host of implementation challenges. We employ dataflow graphs as an intermediate program representation, which is compiled further into a collection of sequential threads for a threaded abstract machine (TAM) [1]. TAM recognizes a hierarchy of synchronization and scheduling events. In the TAM model, a program consists of a collection of code-blocks, each of which have a number of threads that execute in the context of an activation frame. A thread is a sequence of instructions that execute in order. Scheduling among threads is dynamic. Very important to the model is that scheduling and resource management are completely visible to the compiler and that a change of threads does not imply a context swap as in most multi-threaded machines. The frequent case is that closely related threads will be scheduled close together, and values will be passed between threads in registers. Register management under this quasi-static scheduling presents an interesting compilation problem. The second version of a compiler to translate Id into TAM code is complete and available for the CM-5 multiprocessor, as well as conventional workstations. It is being used for development of Id applications at several universities. Studies demonstrate performance comparable to state-of-the-art dataflow processors. The compiler uses the Id front end developed at MIT and uses a dual control-flow and dataflow graph representation in the translation to threads. The control-flow graph is used in partitioning the program into threads; the current strategy is to make the threads as large as possible to minimize the number of synchronization points. The dataflow graph is used for register and frame slot assignment. The dual graph form allows us to address several subtle issues, such as interactions between instruction ordering and register management [2]. Further work will improve the partitioning into sequential threads by using interprocedural analysis techniques [3]. In addition, we want to integrate the TAM back end into the Id compiler, so that it can directly produce native code for the CM-5 and other machines. This will allow us to conduct a detailed study of performance gains due to carrying data in registers across threads that are scheduled together. [1] D. E. Culler, A. Sah, K. E. Schauser, T. von Eicken, and J. Wawrzynek, Fine-Grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine, UC Berkeley Computer Science Division, Report No. UCB/CSD 90/594, September 1990. Also, 4th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991. [2] K. E. Schauser, D. E. Culler, and T. von Eicken, "Compiler-Controlled Multithreading for Lenient Parallel Languages," Proc. Functional Prog. Lang. Comp. Architecture Conf., Cambridge, MA, August 1991. Also, UC Berkeley Computer Science Division, Report No. UCB/CSD 90/644, July 1991. [3] K. R. Traub, D. E. Culler, and K. E. Schauser, "Global Analysis for Partitioning Non-Strict Programs into Sequential Threads," Proc. ACM Conf. LISP and Functional Programming, San Francisco, CA, June 1992.