| Center for Embedded Computer Systems | Computer Science and Engineering |
| Microelectronic Embedded Systems Laboratory | University Of California, San Diego |
| |
Based on the number of functional units allocated to the design, resource-constrained scheduling determines the control step during which each operation in the design graph executes. A resource binding pass maps the operations in each control step to specific functional units and the variables in the design to registers. However, the choice of operation to functional unit and variable to register binding has a profound effect on the interconnect of the design. Hence, it is possible to reduce interconnect by careful resource binding, as explained in the next section.
The speculative nature of the code motions -- particularly conditional speculation that duplicates and executes operations conditionally that would have otherwise executed unconditionally -- increases the complexity of the multiplexers and associated control logic by many-fold. The control logic selects the inputs to the functional units based on the current state of the finite state machine (FSM) and the evaluation of the conditions. We collectively refer to the multiplexers, de-multiplexers and the control logic as steering logic or interconnect.
The increase in the complexity of the interconnect leads to an increase in the total design area. Also, the longest combinational paths in the design or the critical paths often pass through the steering logic. A typical critical path in our designs originates in the control logic that generates the select signals for the multiplexers and passes through the multiplexers, the functional unit, the de-multiplexers and terminates in a register.
In a bid to control the increase in interconnect complexity, we developed an interconnect minimization strategy based on resource binding. This resource binding methodology first binds operations with the same inputs or outputs to the same functional unit. The variable to register binding then takes advantage of this by mapping variables that are inputs or outputs to the same functional units to the same register. This reduces the number of registers connected to the inputs and outputs of functional units, thereby, reducing the size of the multiplexers and de-multiplexers connected to them.
![]() Figure 1: (a) Two consecutive sets of concurrent operations (b) An example of binding that leads to a large number of interconnections. |
![]() Figure 2: Reducing interconnect by improved (a) operation binding (b) variable binding. |
To complete the high-level synthesis process, a control unit has to be synthesized that implements the schedule. This control unit generates the signals that drive the functional units, interconnect (multiplexers, de-multiplexers) and registers in the data path -- the sequence of execution is as per the generated schedule.
Although there are several styles of controller architectures to choose from, by far finite state machines (FSMs) are the most popular for digital design. This is also the controller architecture we have chosen in our methodology. We now describe the construction of the finite state machines from the scheduled HTG followed by the subsequent generation of register transfer level (RTL) VHDL. The output VHDL code can be synthesized using commercial logic synthesis tools such as Synopsys Design Compiler.
Finite state machine generation from scheduled designs starts with state assignment of the operations. This is the process of assigning states to each scheduling step in which concurrent operations are scheduled. State assignment in purely data-flow designs is trivial. Each scheduling step is assigned a state of its own. However, in designs with control, first a state transition graph is generated based on the control flow in the scheduled HTG. Operations that execute during the same time, but in scheduling steps that are on mutually exclusive control paths, can either be assigned the same state or can be assigned a state based on the control path that they are in. The former method is known as global slicing and the latter as local slicing.
State assignment by these two techniques is demonstrated by an example in Figures 3(a) and (b). In these figures, the states are demarcated by dashed lines and marked as S0, S1, S2 and so on. The local slicing method of state assignment has been used for the example in Figure 3(a). Here, each set of concurrent operations in a scheduling step is viewed as a single slice and assigned a unique state. Hence, operations g and h in basic block BB_1 are assigned state S_2 and operation k in basic block BB_2 is assigned state S_5. However, in global slicing, concurrent operations in scheduling steps that are on mutually exclusive conditional branches but which execute in the same time step in the scheduled design, are assigned the same global slice and hence, the same state. Figure 3(b) presents the state assignment for the same example using global slicing. With global slicing, operations g and h in basic block BB_1 and operation k in BB_2 are all assigned the same state S_2.
Since global slicing assigns states across mutually exclusive control paths, it results in fewer states in the control unit. However, status registers are required to store the information about which set of mutually exclusive operations to execute in the state. On the other hand, local slicing requires as many states as the sum of the time steps in each basic block, leading to larger state machines. However, no status registers are required.

Weng and Parker have done a comprehensive study on these two types of controllers and found that using global slicing with status registers always produces designs with lower area. Our own experiments support these results. We find that the larger number of states required for local slicing leads to poorer finite state machine optimization. Furthermore, since local slicing executes all sets of concurrent operations in different states, the mutual exclusivity information that can be potentially used for further interconnect optimization is lost to the logic synthesis tool.