Building Chips Faster: Hardware-Compiler Co-Design for Accelerated RTL Simulation

Kashani, Sahand

doi:10.5075/epfl-thesis-8990

Kashani, Sahand

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The demise of Moore's Law and Dennard scaling has resulted in diminishing performance gains for general-purpose processors, and so has prompted a surge in academic and commercial interest for hardware accelerators. Specialized hardware has already redefined the computing landscape by enabling the emergence of disruptive, large-scale applications that would otherwise not have been possible with CPUs alone. \emph{RTL simulators} play a key role in enabling the accelerated computing revolution: they are to hardware engineers what debuggers and runtime systems are to software engineers. Without RTL simulators, no hardware accelerator could be functionally designed. As accelerators increase in size and complexity, the hardware design industry will increasingly need faster RTL simulators to permit chip design in reasonable time. Since the advent of multicore computers, parallelism is the preferred approach to improve software performance. RTL simulation seems to offer many opportunities to follow such a path: accelerators are written in hardware description languages that contain parallel constructs for describing independent hardware components that run in parallel and synchronize only at clock edges. Unfortunately, there is a mismatch between RTL simulation and today's multicore systems: tasks in RTL simulation tend to be very small in size, resulting in fine-grain parallelism. This fine-grain parallelism contrasts with coarse-grain parallel workloads for which modern multicore systems are built, which leads to simulator designs that can achieve only weak parallel performance scaling. This thesis argues that we need computing architectures that can achieve \emph{strong scaling} to truly speed up RTL simulation through parallelism. A strong scaling architecture is one that can make effective use of additional cores without having to increase the total workload size. This enables even small or moderate size designs to exploit parallelism to run quickly. This thesis contributes Manticore, a co-designed manycore architecture and compiler for RTL simulation that achieves strong parallel performance scaling. Manticore combines a bulk-synchronous parallel execution model with static scheduling to eliminate the runtime overheads of synchronization among hundreds of cores, simplify core design, and significantly increase the parallelism possible on a single chip. Our modest FPGA prototype of Manticore greatly increases parallel RTL simulation rate compared to a state-of-the-art software simulator running on top-of-the-line desktop and server x86 processors. The ideas underlying Manticore's design present a first step towards fast, scale-out RTL simulation.

Details

Title Building Chips Faster: Hardware-Compiler Co-Design for Accelerated RTL Simulation

Author(s) Kashani, Sahand

Advisor(s)

Larus, James Richard

Pagination 180

Date 2023

Publisher Lausanne, EPFL

Keywords

RTL simulation; parallelism; hardware acceleration; manycore architecture; FPGA

Language English

DOI https://doi.org/10.5075/epfl-thesis-8990

Laboratories VLSC

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > VLSC - Very Large Scale Computing Laboratory
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2023-08-24

Files

Abstract

Details

PDF