Design Article
Comment
cardinalsin
"So, what exactly is a control-dominated design?" is a good question. The ...
Dr DSP
It seems to me that we need a new approach to control definition. Can't we find ...
Control dominated design
Mike Meredith
7/14/2012 11:16 AM EDT
One of the things that jumped out at me on the floor and in the suites at DAC this year was how many people asked something like "I've heard that HLS can be used for control-dominated stuff now. Explain to me how that's possible."
In this article, I'll try to do that.
What is a control-dominated design?
So, what exactly is a control-dominated design? Unfortunately I'm not able to give as precise a definition as I'd like.
One definition that makes sense to me is "A control-dominated design is one in which the complexity of the resulting hardware is more related to I/O and memory accesses than to computation."
Another, more prosaic, definition I like is "A datapath-dominated design is about deriving new data values from previous data values. A control-dominated design is about moving data values around."
Is control-dominated SystemC code really at a higher level of abstraction than RTL?
It certainly is. The key improvements in the abstraction level come from the ability to use an implicit state machine representation and the encapsulation and scheduling of I/O and memory access protocols.
As an example, here's an excerpt from the instruction fetch and decode logic of an 8051 processor. Note that the state machine is explicit. Also note that the memory access protocols have been broken into phases that are manually scheduled. The RAM has a 1 cycle read latency, while the ROM has a 2 cycle read latency.
when CS_2 =>
case exe_state is
when ES_0 =>
GET_PC_H(pch);
GET_PC_L(pcl);
START_RD_ROM(pch, pcl); // ROM access #1 phase 1
alu_op_code <= ALU_OPC_PCUADD;
alu_src_1 <= pcl;
alu_src_2 <= pch;
alu_src_3 <= C1_8;
exe_state <= ES_1; // Next state
when ES_1 =>
START_RD_RAM(R_PSW); // RAM access #1 phase 1
exe_state <= ES_2;
when ES_2 =>
START_RD_RAM(R_ACC); // RAM access #2 phase 1
reg_op1 <= rom_data; // ROM access #1 phase 2
exe_state <= ES_3; // Next state
when ES_3 =>
START_RD_ROM(alu_des_2, alu_des_1); // ROM access #2 phase 1
SET_PSW(ram_in_data); // RAM access #1 phase 2
alu_op_code <= ALU_OPC_PCUADD;
alu_src_1 <= alu_des_1;
alu_src_2 <= alu_des_2;
if( dec_op_in(7) = '1' ) then
alu_src_3 <= C1_8;
else
alu_src_3 <= C0_8;
end if;
exe_state <= ES_4; // Next state
when ES_4 =>
START_RD_ROM(alu_des_2, alu_des_1); // ROM access #3 phase 1
reg_acc <= ram_in_data; // RAM access #2 phase 2
alu_op_code <= ALU_OPC_PCUADD;
alu_src_1 <= alu_des_1;
alu_src_2 <= alu_des_2;
if( dec_op_in(8) = '1' ) then
alu_src_3 <= C1_8;
else
alu_src_3 <= C0_8;
end if;
exe_state <= ES_5; // Next state
when ES_5 =>
reg_op2 <= rom_data; // ROM access #2 phase 2
SET_PC_1(alu_des_2, alu_des_1);
exe_state <= ES_6; // Next state
when ES_6 =>
reg_op3 <= rom_data; // ROM access #3 phase 2
exe_state <= ES_7; // Next state
when ES_7 =>
SHUT_DOWN_ALU;
cpu_state <= CS_3;
exe_state <= ES_0; // Next state
end case;
That's 54 lines of code (with 54 opportunities to screw up). The same functionality can be written in high-level synthesizable SystemC in the 16 lines of code below.
reg_op1 = irom[reg_pc++]; // Read the instruction
m_dec_op_in = decode(reg_op1); // decode it
reg_op2 = m_dec_op_in[7] ? irom[reg_pc++] : reg_op1;
reg_op3 = m_dec_op_in[8] ? irom[reg_pc++] : reg_op2;
if (reg_op1[3])
v8 = iram[GET_RAM_ADDR_1()]; // get (r) - register contents
else
if (reg_op1(3,1) == 0x3){
a8 = iram[GET_RAM_ADDR_2()]; // get (r) indirect address
v8 = iram.qram[a8]; // get ((r))
}
else {
if (reg_op1(3,0) == 0x5) {
v8 = iram[reg_op2]; // get (direct)
}
}
Writing this in high-level SystemC requires less than 1/3 the amount of code compared to RTL. The benefit comes from the encapsulation of the memory protocols, the ability to automatically schedule the various phases of the memory protocols and the elimination of the need to explicitly code the state machine transitions. Of course, you would see an even greater productivity improvement for a computation-heavy datapath design, but this is still three times better than RTL!


Dr DSP
7/23/2012 2:00 PM EDT
It seems to me that we need a new approach to control definition. Can't we find a way to use a still higher level approach and avoid this level of detail? Then we could get real productivity gains.
Sign in to Reply
cardinalsin
7/24/2012 12:47 PM EDT
"So, what exactly is a control-dominated design?" is a good question. The examples here are impressive -- but they're examples of control within a single-threaded computation. Hardware design gets really difficult only when there are concurrent FSMs that have complex interactions. Specifically, control logic is hard to write and get right due to having to schedule/manage/arbitrate the accesses by multiple different, concurrent operations for common resources (e.g. memory ports, FIFOs, registers, DMA channels, etc.). This is where C++ with threads is a bad model for expressing concurrency. And, it's not clear from these examples what SystemC offers over RTL for managing these types of problems. It would be great to see examples like this too.
George Harper, Bluespec, Inc.
Sign in to Reply