We have seen an enormous rise of multiprocessor usage and its support infrastructure over the past years. This trend will most likely continue and is already challenging the community with new hard- and software problems.
Interconnects for multiprocessor SoCs are one potential bottleneck and require additional optimizations to achieve the necessary data throughput. Also for SoCs with cores such as graphic engines, de-/encoders, DMAs and external DRAMs, interconnects are facing tough hurdles as we can see it in the field of video applications, for instance.
The instantiation for multiple equal cores such as processors, DSPs and peripherals are also driven by ever-increasing challenges of all kinds of applications. We move from 2D to 3D, multiple audio channels, more and more enhanced network switches, multiple channel sensor readout and processing and, last but not least, there is an ever increasing number of instantiation of thousands of equal cores in super-computers.
In this paper, a method is discussed: How the functionality of a core can be multiplied by just adding registers to the core. Not only does this result in less area usage compared to its individual instantiations, but it can also have a substantial beneficial impact on the system performance as a whole. This method is called “hyper pipelining” and is explained in chapter 2. In chapter 3, different approaches and their impact on the system architecture are discussed. Chapter 4 shows the results of a hyper pipelined complex RISC core (OR1200 from OpenCores) in detail.
2. Theory of hyper pipelining
Figure 1: Simplified sequential logic
Figure 2: Sequential logic with inter-logic mediate register clocked by clk2
This chapter gives an overview of the theory of hyper pipelining. Figure 1
shows the simplified structure of sequential logic. Inputs and sequential elements clocked by clk1 drive the combinatorial logic. The combinatorial logic drives the outputs and the data inputs of the registers.
In Figure 2
each sequential element is duplicated with an intermediate register clocked by a second clock clk2. If clk2 is synchronous to clk1, but not edge-aligned, and if the timing is right (no setup or hold time violation between clk1 and clk2 registers), the behavior of the sequential logic doesn't change.
Figure 3: Two functional independent designs
Figure 4: Hyper pipelined sequential logic with distributed logic