Abstract

FPGAs play an increasing role in the reconfigurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difficult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unified open-source software/FPGA compiler and runtime that takes dataflow programs written in Cal, and automatically partitions them across heterogeneous CPU/FPGA platforms. The explicit task-parallel semantics of dataflow allows our compiler to simultaneously take advantage of thread parallelism on software and spatial parallelism on hardware. StreamBlocks is augmented with a profile-guided autopartitioning tool that helps identify the best hardware-software partitions. We demonstrate the capability of our compiler in finding the right balance between hardware and software execution on both a high-end datacenter accelerator card and an embedded board. Our experiments exhibit a 4 - 7x speedup over trivial partitions. This speedup is achieved automatically with zero code modifications.

Details