Previous Up Next

Chapter 2  Introduction to the Spark High-Level Synthesis Framework



Figure 2.1: An overview of our High-Level Synthesis Framework


Spark is a parallelizing high-level synthesis framework that synthesizes a behavioral description using a set of aggressive compiler, parallelizing compiler and synthesis techniques [1, 2, 3]. An overview of the Spark framework is shown in Figure 2.17Introduction to the Spark High-Level Synthesis Frameworkfigure.2.1. Spark takes a behavioral description in ANSI-C as input albeit with no support for pointers, function recursion and gotos. The output of Spark is synthesizable register-transfer level (RTL) VHDL. As shown in Figure 2.17Introduction to the Spark High-Level Synthesis Frameworkfigure.2.1, Spark also takes as input additional information such as a hardware resource library, resource and timing constraints, data type information, and user controlled scripts that guide the various heuristics and transformations.

The code parallelization and transformation techniques in Spark have been grouped into pre-synthesis, scheduling, dynamic, and basic compiler optimizations -- this grouping improves controllability over the transformations applied to the input description. The pre-synthesis transformations are applied at a source-level, whereas the scheduling and dynamic transformations are applied during scheduling. The basic compiler transformations are applied at each stage of synthesis -- during pre-synthesis, scheduling and post-scheduling.

The pre-synthesis transformations include coarse-level transformations such as loop unrolling and compiler transformations such as common sub-expression elimination (CSE), copy propagation, dead code elimination, loop-invariant code motion and so on. These compiler transformations aim to remove unnecessary and redundant operations and reduce the number of operations within loops.

The pre-synthesis phase is followed by the scheduling phase (see Figure 2.17Introduction to the Spark High-Level Synthesis Frameworkfigure.2.1). Resource allocation and module selection are done by the designer and are given as input to the synthesis tool through a hardware resource library. The transformations toolbox in the scheduler contains the Percolation and Trailblazing code motion techniques, dynamic renaming of variables, and several compiler, parallelizing compiler and synthesis transformations. The synthesis transformations include chaining operations across conditional blocks, scheduling on multi-cycle operations and resource sharing.

The core parallelizing compiler transformations are a set of speculative code motions that not only move operations out of conditionals but sometimes duplicate operations into the branches of conditional blocks [4, 5]. The scheduler also employs several compiler transformations applied dynamically during scheduling. These dynamic transformations, such as dynamic CSE, dynamic copy propagation et cetera exploit the new opportunities created by code motions [6]. A dynamic branch balancing technique dynamically adds scheduling steps in conditional branches to enable code motions -- specifically those code motions that duplicate operations in conditional branches [7, 8].

The scheduling phase is followed by a resource binding and control synthesis phase. This phase binds operations to functional units, ties the functional units together (interconnect binding), allocates and binds storage (registers), generates the steering logic and generates the control circuits to implement the schedule. We employ an interconnect minimizing resource binding methodology by which operations are mapped to functional units and variables are mapped to registers in a way that minimizes the number of inputs to the multiplexers connected to the functional units [5, 9]. This reduces the complexity and size of the multiplexers and the associated control logic.

After resource binding, a control unit is generated using the finite state machine (FSM) controller style. We use a global slicing approach to FSM generation, whereby operations in mutually exclusive control paths that execute in the same cycle are assigned the same state [1]. This reduces the number of states in the controller and also, gives more information to the logic synthesis tool about the control generation for the mutually exclusive operations.

Finally, a back-end code generation pass generates register-transfer level (RTL) VHDL. This RTL VHDL belongs to the subset of VHDL that is synthesizable by commercial logic synthesis tools. This enables our synthesis framework to complete the design flow path from architectural design to final design netlist. Note that, the output VHDL is structural with all operations bound to resources (components in VHDL) and variables bound to registers.

Spark also has back-end code generation passes that generate ANSI-C and behavioral VHDL. These behavioral output codes represent the scheduled and optimized design. The output ``C'' can be used in conjunction with the input ``C'' to perform functional verification and also, to improve visualization for the designer on the affects of the transformations applied by Spark on the design.

The synthesis framework is organized as a toolbox of transformations guided by heuristics that are fairly independent of the transformations. We instrumented our synthesis framework with a scripting ability that provides the designer with knobs to experiment and tune the heuristics and to identify the transformations that are useful in optimizing overall circuit quality.

2.1  Inferring Hardware Intent from C

As mentioned above, the Spark synthesis framework accepts input in ANSI-C, but does not support the synthesis of pointers (arrays are supported) and code with function recursion and irregular jumps (see next Chapter for details). The input ``C'' description is a sequential description of statements. Statements may be operation expressions, conditional constructs (if-then-else, switch-case) and loop constructs (for, while, do-while loops). Besides the operations supported by ``C'', we also support Boolean conditional checks; these Boolean checks are produced by comparison or relational tests such as <, ==, ³, ¹, et cetera. We decompose complex expressions into three-address expressions (of type a=b+c) [10], since hardware functional units are modeled in our framework as 2-input, 1-output resources. Each three-address expression is then called an operation in our representation.

Each function in the input description is mapped to a (concurrent) hardware block. If one function calls another function, then the called function is instantiated as a component in the calling function. We enable the designer to specify the range (bit-widths) of the various data types supported by C as a table in a hardware description file. This table has three columns: the data type, lower data range and upper data range. Hence, we can specify that an ``integer'' ranges from -32767 to 32768; this corresponds to a 16 bit boolean in hardware. Also, this same table can be used to make entries for specific variables from the input description. Hence, a designer can specify that a variable ``myVariable'' in the design description can have a range of only 0 to 15. This enables the designer to use his or her knowledge of the design to provide constraints.

The back-end code generator uses this table of data types to generate a structural hardware representation of the scheduled design in VHDL [11]. To be synthesizable, hardware descriptions require the exact range of data types used in the design. Furthermore, high-level synthesis transformations can take advantage of the designer provided range constraints for specific variables to improve synthesis results [12].

The structure of the input description in terms of control and loop constructs are retained by our framework using a hierarchical intermediate representation (IR) [2]. This hierarchical IR consists of basic blocks1 encapsulated in Hierarchical Task Graphs (HTGs) [13, 14]. HTG nodes can be of three types: single or non-hierarchical nodes, compound or nodes that have sub-nodes), and loop nodes that encapsulate loops. Hence, basic blocks are encapsulated into compound HTG nodes to form hierarchical structures such as if-then-else blocks, switch-case blocks, loop nodes or a series of HTG nodes.

In addition to HTGs, we also maintain control flow graphs (CFGs) that capture the flow of control between basic blocks and data flow graphs (DFGs) that capture the data dependencies between operations. Whereas HTGs are useful for hierarchical code motions and applying coarse-grain transformations, CFGs are used for design traversal during scheduling.

There is currently no support for providing specific timing information to Spark. Hence, Spark assumes that all input variables/signals are available at the start of execution of a hardware block and stay available for the duration of its execution.

2.2  Restrictions on ``C'' Accepted as Input

Spark uses the EDG [15] C/C++ front-end. Spark takes pure ANSI-C with no special constructs. Although the front-end is a complete ANSI-C parser, however, there are some features of the ``C'' language that Spark does not currently support because of fundamental problems or unresolved issues in synthesizability. These are:

112 No support for pointers. However, arrays and array accesses of the type arr[index variable expression] are supported. Also, passing arguments by reference to a function is also supported. No support for function recursion. No support for irregular jumps through goto. Some of these can be resolved in a state machine, but they adversely affect our ability to apply transformations.

Besides these, the following features of C are not supported because they have not yet been implemented in the Spark framework:

112 No support for break and continue. In general, it is possible to convert a program with breaks and continues into a program without them [14]. No support for multi-dimension arrays. Multi-dimensional arrays can be reduced manually to single-dimensional arrays. For example: consider an array a[N][M]. This can be re-declared as a[N*M]. Any access to a[i][j] then becomes a[i*M+j]. Poor/no support for structs and unions. In general, structs are currently not synthesizable. Also, no VHDL generation for user-defined data types. Poor support for expressions of type (a ? b : c). We advise changing this expression to the following statement:
if (a)   b
else   c


We discuss code modifications that can serve as workarounds for unimplemented features in Appendix C28Modifying Input Code to be Synthesizable by Sparkappendix.C.


1
A basic block is a sequence of statements in the input code with no control flow between them.

Previous Up Next