Clocking constitutes one of the most important aspect of block or SOC-level design and its architecture needs to well defined and understood during the conceptualizing/planning phase of the design. In a single SOC there are various blocks such as core, flash, memories and peripherals which need to be run at a different frequency. The maximum operational rates may be limited by the implementation technology used, the implementation architecture, power targets and access time of the IPs. Clock divider circuitry is necessary that can generate divided clocks from the master PLL /oscillator clock, or any system clock, and feed different divided clocks to different device modules. As clocking can also be application driven, the clock dividers must be configurable. The need for configurability might arise for a number of reasons including:
- Running the system clock at a lower frequency to save on dynamic power dissipation
- Running state machine of peripherals at a higher/lower frequency than that of the processor
- Setting the baud rate for peripheral frame transmission/reception.
This article illustrates various implementations of configurable clock divider logic used in SOCs today and highlights their challenges, advantages or limitations over the others. There are various implementations of configurable division; however the simplest and the most frequently used in the digital design industry are:
- Ripple Dividers
- Div decode based 2N dividers with 50% duty cycle
- Clock gating enable-based integer dividers which do not have 50% duty cycle
- Mux based dividers with integer division and 50% duty cycle.
The circuit diagram of a configurable ripple divider is shown below.
Figure 1: Ripple divider
Ripple dividers are the traditional dividers which are usually avoided today in SOC designs because they have stringent setup and hold time requirements.Advantages
- RTL complexity of such dividers is minimal
- The divided clocks generated are of 50 % duty cycle.
- Clock latency is escalated as higher versions of divided clocks are used (Latency of the rising edge of the clock is in the order DIV16 > DIV 8 > DIV 4 > DIV2 > DIV1.
This drawback can also lead to greater uncommon path if the launch and capture clocks are tapped from different dividers with different division factors.
For example, consider the following simple clock architecture with two ripple dividers, one for feeding the core and the other for feeding the flash. The ratio between the two clocks must be 4:1. This causes an unwanted skew inherent to the design.
Figure 2. Configurable Clocking example
Latency at flash – Latency at platform = CK-Q delay of 2 flops.
- Even with clock tree balancing, ensuring robust timing signoff is essential for the design to be sent for production. For this the STA engineer needs to define a clock at outputs of four different flops because a clock with different latency is being generated at each flop. This increases the manual effort in defining and checking all such possible generated clocks in the design.
create_clock –name div_2_clk –master _clock pll_clock [get_pins F1/Q] -source PLL
create_clock –name div_4_clk –master _clock div_2_clk [get_pins F2/Q] -source PLL