Princeton University

School of Engineering & Applied Science

Exploring the Space-Time Continuum of Parallel Architectures

Michael Pellauer, NVIDIA
Engineering Quadrangle, B205
Friday, November 6, 2015 - 12:00pm to 1:00pm

In the days of the single-ALU computer, that ALU was necessarily programmed via time-division multiplexing to execute a complete algorithm. As architectures extended to multiple ALUs and multi-cores, it was natural to extend the existing temporal programming paradigm into cooperative multithreading. This paradigm works best for data-parallel programs, where independent threads can work in isolation and communicate rarely. SIMT architectures such as GPUs leverage the regularity and predictability of these temporal threads to create extremely efficient execution substrates for these programs. However, there is a large class of interesting and valuable programs that exhibit limited or hard-to-extract data parallelism. For many of these programs, a ‘’spatial’’ programming approach can result in large efficiency and performance boosts. In this approach neighboring ALUs communicate on a cycle-by-cycle basis, forming producer-consumer pipelines. Coarse-Grained Reconfigurable Arrays (CGRAs) and Field-Programmable Gate Arrays (FPGAs) are examples of architectures that leverage spatial parallelism.
This talk examines some theory and mechanisms for spatially-programmed architectures, building intuition as to what kinds of program properties (control-flow divergence, memory divergence, loop-carried dependencies, etc.) can be amenable to the spatial approach. This leads towards a long-term vision where heterogeneous spatial/temporal architectures can work together to efficiently serve a large space of interesting workloads.