Princeton University

School of Engineering & Applied Science

Static and Dynamic Instruction Mapping for Spatial Architectures

Feng Liu
Prof. August
Engineering Quadrangle B327
Tuesday, April 24, 2018 - 9:00am to 10:30am

In response to the technology scaling trends, spatial architectures have emerged as a new style of processors for executing programs more efficiently. Unlike traditional out-of-order (OoO) processors, which time-share a small set of functional units, a spatial computer is composed of hundreds or even thousands of simple and replicated functional units. Spatial architectures avoid the overheads of time-sharing and of generating schedules repeatedly, by mapping instruction sequences onto the functional units explicitly and reusing the mapping across multiple invocations.
Currently, spatial architectures mainly use static methods to map and schedule instructions onto the arrays of functional units. The existing methods have several limitations: First, for programs with irregular memory accesses and control flows, they yield poor performance because the functional units need to be invoked sequentially to respect data and control dependences. Second, static methods cannot fully exploit speculation techniques, which are the dominant performance sources in OoO processors. Finally, static methods cannot adapt to changing workloads and are not compatible across hardware generations.
In this talk, I will present two techniques to address the above issues and improve the applicability of spatial architectures. The first, Coarse-Grained Pipelined Accelerators (CGPA), is a static compiling framework that exploits the hidden parallelism within irregular C/C++ loops and translates them into spatial hardware modules directly. The proposed technique has been implemented as a compiler pass and the experiment shows 3.3x speedup over the performance achieved by an open-source tool baseline. The second technique, Dynamic Spatial Architecture Mapping (DynaSpAM), which reuses the speculation system in the OoO processors to dynamically produce high performance scheduling and execution on a dedicated spatial fabric. The proposed technique is modeled by a cycle accurate simulator and the experiment shows the new technique can achieve 1.4x geomean performance improvement and 23.9% energy consumption reduction, compared to an aggressive OoO processor baseline.