Computer Architecture
The Art of Designing Micro-Processors

Tiered SIMD Processor Architecture

My research focuses on designing faster and more efficient computers using specialized architectures that are highly optimized for performing general purpose tasks. Making systems more efficient reduces their cost and increases their availability. It increases their potential to solve challenging problems in other domains. In the past, advances in computer efficiency have enabled streaming video on handheld devices, visual computing and 3D gaming on consoles and desktops, and molecular dynamic simulations to aid the develpment of new healthcare products.


Until recently, it has been possible to improve computer performance by treating all applications the same and optimizing the common case. Around 2005, hardware designers in industry began to hit limits in the amount of energy used to run a general purpose processor at full speed.

In order to continue to improve performance under new power constraints, it becomes necessary to optimize hardware to perform specific rather than general purpose operations. GPU architectures from NVIDIA AMD and Intel are commercial examples of specialized processors for graphics. However, as all applications are no longer capable of running efficiently on all processor architectures, application developers are forced to deal with additional complexity.

My research interests are two-fold: 1) Determining different sets of architecture and systems features that are highly efficienct for performing domain specific applications and 2) folding these features back into general purpose computing systems.

Individual Projects

Many-Core Photo

Harmony - An Execution Model and Runtime for Heterogeneous Many-Core Systems

Harmony is a programming and execution model for systems with at least one CPU and possibly many accelerators. The goal is to hide inter-accelerator parallelism and architecture heterogeneity from the programmer without sacrificing performance. This is done via automatic parallelization of sequential applications using speculative threading and dynamic mapping of work to accelerators. We also explore techniques such as performance prediction, variable renaming, and kernel fusion/fission further optimize this model.