Nowadays, chip vendors are devoted to developing new power saving
techniques in order to extend the battery life of portable devices such
as cell phone, MP3 player, portable media player, notebook PC, etc. In
general, these techniques can be categorized into two classes: dynamic
techniques and static techniques.
Static techniques include different low-power modes, on-demand
gating of clocks and power domains, etc. The dynamic technique is to
dynamically scale the CPU work frequency (and voltage, because CPU
requires higher voltage when it runs at higher frequency) according to
the performance requirement of current applications running on the CPU
and achieve the goal of energy saving. In theory, this technique comes
from the formulas below:
From the equations above, it can be seen that scaling down the
frequency can only reduce the power in watt but can't save the energy
in joule consumed by a task, because for a given task, F*t is a constant. To reduce the
energy consumption effectively, the voltage should also be scaled down
when the frequency is decreased.
Currently many chips support the dynamic voltage frequency scaling (DVFS)
feature. For example, Intel
supports SpeedStep while ARM
supports IEM (Intelligent Energy
AVS (Adaptive Voltage Scaling).
But the only support of chips is not enough to make the DVFS take
effect and reach the real goal of energy saving. The comprehensive
design of software and hardware is needed.
A typical DVFS workflow is as follows:
Step 1: Monitor
the signals related to the workload, acquire the workload data and
calculate the current system workload. This job can be done by either
software or hardware. Generally, software does this by installing hooks
to the system calls in the kernel, especially the scheduler, and
calculating the workload according to the frequency these system calls
The implementation based on hardware, e.g., Freescale's i.MX31,
gets the workload data by gathering the use info of some critical
signals such as interrupt line, cache line, memory bus as well as
On the basis of current workload, predict the performance requirement
of system in the next time slice. Many prediction algorithms can be
used here and it is up to the real application. This prediction can
also be done by either software or hardware.
Translate the predicted performance requirement to frequency and adjust
the CPU clock setting.
Step 4: Calculate the new voltage corresponding to the new
frequency, notify the power source module and ask it to adjust the
voltage to CPU. A special power management IC is needed here, such as
Freescale's MC13783, or ICs
from National Semiconductor that support the PowerWise feature. They support
small step voltage adjusting and they can complete this adjusting very
quickly (~10 microseconds).
Additionally, the frequency and voltage should be adjusted in
specific order. When the frequency is adjusted from high to low, it
should be scaled down before the voltage is decreased. Contrarily, in
the frequency up case, the voltage should be increased before the
frequency is scaled up. Figure 1
below illustrates the simple workflow of DVFS .
1, Workflow of Prediction Based DVFS
DVFS Realization Based on Software
In the implementation of DVFS based on software, hooks are installed to
the system calls in the kernel. They gather the use information about
calls and estimate the system workload. The obvious location where the
hooks are installed is scheduler. Other locations include read/write
interfaces, timers, etc. For instance, in Linux kernel, hooks are
installed to following places:
o hack __schedule(), insert code
before and after schedule(), record the execution time of a task
o hack sys_read() and
sys_write(), record the times they are called
o hack sys_nanosleep() and
msleep(), record the sleep time of a task
o hack sys_ioctl(), record the
times it is called
o hack do_exit(), record the time
when a task exits voluntarily
o hack arch_idle(), calculate the time that cpu_idle() thread is
When one predicts the system workload of next time slice, the
acquired workload data of previous several slices can be used. The
predicted workload can be gotten from the formula below:
The prediction algorithm varies with different h. Following are some
All the algorithms above have their own advantages and disadvantages.
E.g., LMS is similar to adaptive filter and can adjust parameters
automatically but it faces convergence issue.
ARM developed Vertigo
demonstrate the DVS (Dynamic Voltage
Scaling) feature. This software uses following formulas to
estimate the workload, deadline and performance:
This algorithm works well for those OS tasks whose workload changes
slowly, e.g., MPEG decoder.
In the architecture of Vertigo, once the predictor finishes
performance estimation, it submits the result to a policy manager. It
is the duty of the policy manager to decide whether to accept the
prediction result and adjust the performance setting. [Please refer to (3) for the detailed
implementation of Vertigo.] The architecture diagram of Vertigo
isshown in Figure 2 below:
2. Architecture Diagram of Vertigo
DVFS Realization Based on Hardware
As mentioned before, the job of CPU load track and performance
prediction can also be done by the hardware. This method not only
improves the reliability of workload track and calculation, but also
reduces the overhead of CPU performing such calculation and estimation.
Of course, it brings another disadvantage: the prediction algorithm
can not be selected freely. But this inconvenience can be compensated
to some extend by adjusting the prediction parameters.
Freescale's i.MX31 is a good example for this. It is an application
processor targeting at mobile multimedia market and it has powerful
performance for audio and video processing. An ARM11 core
is integrated into this
chip which inherits DVS technique from ARM and derives DVFS.
In this chip, CPU workload track and performance prediction is
completed by the hardware automatically. The CPU workload track module
diagram is shown below.
3. i.MX31 DVFS Load Tracking Module Block Diagram
In Figure 3 above, 16
general purpose load signals are sampled and weighted. The weighted sum
is sent to the load adder where it is added to the CPU idleness signal
data (simply averaged to reduce the sample clock frequency).
The output of the load adder is fed to the Exponential Moving Average block
which performs EMA (Exponential Moving Average) algorithm and predicts
the performance requirement. The estimation data from EMA block is
compared with the predefined threshold values.
If the performance prediction is greater than the upper threshold,
the frequency should be scaled up. Otherwise, if the prediction is less
than the lower threshold, the frequency should be scaled down.
The interrupt request of frequency or performance adjusting will be
sent to the CPU itself or to an external processor. The processor will
handle the request in the ISR
(interrupt service routine) and set the correct frequency
In Figure 4, below, which
illustrates the total workflow, CCM means Clock Control Module, which
is responsible for adjusting the CPU frequency. PMIC means Power
Management IC, which is responsible for supplying the power needed by
This chip provides two interfaces to CPU: normal SPI (Serial Peripheral Interface) and
dedicated DVS interface for dynamic voltage scaling. This interface is
made up of two lines and the state of these two lines means different
voltage adjusting request: 00-no change, 01-decrease the voltage by one
step, 10-increase the voltage by one step, 11-increase the voltage to
4. Workflow of i.MX31 DVFS
DPTC in the figure above means Dynamic Process and Temperature
Control. This technique can adjust the power voltage according to the
chip process and current ambient temperature and in result, save the
energy effectively. It is also an attractive feature of i.MX31.
Real Effect of DVFS
To verify the real effect of DVFS, the applications should be executed
on the given CPU and the actual power consumption should be measured
with DVFS enabled and disabled. The actual result of power consumption
measurement is presented here for DVFS implementation based on software
and hardware respectively.
Intrinsyc ported the IEM
software developed by ARM to WinCE
 and measured the power
consumption of CPU when IEM was enabled or disabled. The IEM software
runs on i.MX31 advanced development suite. It can be thought of as an
implementation of DVFS based on software since it doesn't make use of
the i.MX31 built-in DVFS.
The moving average algorithm is used to estimate the workload [the h
in equation (3) is always 1/N]. In addition, a GPIO is used to indicate whether the
CPU enters IDLE state (cpu_idle() thread is scheduled). The smaller the
IDLE portion is, the higher the CPU utilization is. The benchmarking
result is presented below.
1. Video File Information Used for IEM Testing
2.Power Consumption with IEM Enabled or Disabled
In order to verify the real effect of DVFS implementation based on
hardware, the author measured the power consumption on i.MX31 advanced
development suite. The multimedia applications (audio or video player)
run on Linux. The benchmarking result is presented in the Table 3, below.
3. Power Consumption with i.MX31built-in DVFS Enabled or Disabled
It can be seen clearly from the tables above that DVFS
implementation based on either software or hardware can reduce the
power consumption effectively.
Factors Affecting DVFS Application
The idea of dynamic voltage and/or frequency scaling appeared for long
and an open source project cpufreq is developing the software for it.
But this technique is not widely applied till now. One of the key
factors is the reliability of performance prediction.
Neither prediction algorithm is 100% reliable, nor one works well
for all applications. And for those real time applications such as
audio or video, it is not acceptable if the prediction fails. If the
deadline is missed, e.g., the audio or video frame misses its
presentation time, the user will perceive the audio or video quality
deterioration. It will worsen the user experience greatly and
compromise his confidence in DVFS. The author met such problems when
performing the DVFS test.
The moving average algorithm used by IEM only works well for simple
use cases such as only one application running on CPU. And the EMA
(exponential moving average) used by i.MX31built-in DVFS is not a
panacea either. If the built-in DVFS is enabled, the CPU can't play
some songs of Pink Floyd
smoothly (after some DVFS parameters
such as lower frequency threshold are modified, these songs can be
But the author believes that DVFS will be applied wider and wider
with the evolution of prediction algorithm and other techniques,
because it has demonstrated the great potential in power reduction. And
power reduction is often the first requirement for many portable
Karl Lu graduated from Tsinghua
University in 1999 with a Master's degree of Electrical Engineering.
Before joined Freescale, he worked for ZTE and LSI Logic and developed
software for network equipment and multimedia device. Now at Freescale
, he serves as a senior system and architecture engineer for mobile
multimedia system. His research interests are focused on embedded
system, mobile device and multimedia. He can be reached at firstname.lastname@example.org.
1. Freescale, 2/2006, i.MX31
Multimedia Application Processor Reference Manual, Rev 1
2. Freescale, Boris Bobrov
& Michael Priel, 6/2005, i.MX31Power Management White Paper, Rev 0
3. ARM, Krisztian Flautner, et
al, OSDI 2002, Vertigo: Automatic Performance-Setting for Linux
4. ARM, Krisztian Flautner, et
al, DesignConn 2003, A Combined Hardware-Software Approach for
Low-Power SoCs: Applying Adaptive Voltage Scaling and Intelligent
Energy Management Software
5. Intrinsyc, Suji Velupillai
& Ken Tough, 9/2006, Intelligent Energy Manager (IEM) Benchmarking
on a Freescale's i.MX31 Multimedia Processor