Princeton University

School of Engineering & Applied Science

Computing on Large, Sparse Datasets and Error-Prone Fabrics

Pareesa Golnari
Prof. Sharad Malik
Engineering Quadrangle J-401
Wednesday, March 14, 2018 - 3:00pm to 4:30pm

In this dissertation we focus on two trends of research: computation on large and sparse datasets and error-tolerant computing. Every year the dataset sizes are growing, however, many of these large datasets are sparse, i.e., majority of the data is zero. Therefore, skipping the zero elements could considerably accelerate computation on these datasets. This is the first trend we study in this dissertation. The second trend we study is error-tolerant computing. As transistor scaling continues, devices get more unreliable and cause the systems built of them to behave erroneously. We study computation on such processors.
1- Accelerating sparse matrix-matrix multiplication (SpMM): We propose a high performance and scalable systolic accelerator that minimizes the bandwidth requirement and accelerates this operation 9-30 times more than state-of-the-art. Moreover, we study the sparse formats and modify the popular sparse format: CRS and propose the InCRS format. We show that this modification reduces the required memory accesses and consequently accelerates SpMM 5-12 times.
2- Error tolerance of sparse formats: We provide a framework that allows for comparing error-tolerance of different data formats and choosing the most appropriate format for an arbitrary application. As case studies, we compare the performance of different formats for two machine learning applications, RBM and PCA, and a set of linear algebra operations.
3- Reliability of error-tolerant processors: We also study the processors built on error-prone fabrics and define reliability for them. We propose a framework to model the control flow of these processors, capturing the effects of errors and protection mechanisms, and to verify the reliability properties on them. As case studies, we verify these properties on two recent error-tolerant processors, PPU and ERSA, and propose modifications to these designs to satisfy the reliability requirements.