Malfunctions in electronic systems can have major consequences ranging from loss of data and services, to financial and productivity losses, or even loss of human life. Such impacts continue to increase as systems become more complex, interconnected, and pervasive. Hardware failures are especially a growing concern because:
Robust system design is required to ensure that future electronic systems, from supercomputers all the way to embedded systems, perform correctly despite rising levels of complexity and disturbances. Traditional fault-tolerant computing techniques are generally very expensive, and often inadequate, for this purpose. I will present two techniques that are essential for robust system design:
A key aspect of the approach to these techniques is the orchestration across multiple abstraction layers: physical design, architecture, and system software. I will demonstrate the effectiveness and practicality of these techniques using results from the industrial OpenSPARC T2 multi-core design and the Intel Core i7 hardware platform. I will also share recent experiences in implementing these techniques in the latest Intel designs.
Yanjing Li is a research scientist at Intel Labs and a visiting scholar at Stanford University. She received her Ph.D. in Electrical Engineering from Stanford University. Her research interests include robust system design, energy-efficient systems, system validation and test, computer architecture, and system software. Dr. Li received the European Design and Automation Association Outstanding Dissertation Award, the IEEE International Test Conference Best Student Paper Award, and the IEEE VLSI Test Symposium Best Paper Award for novel research on robust system design, and two Intel Divisional Recognition Awards for mobile processor designs that are being adopted by product groups at Intel.