datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Tell us What You Think

We want to know what you thought about this Design. Let us know by adding a comment.

ADD A COMMENT >

Big.LITTLE processing with ARM Cortex-A15 & Cortex-A7

Peter Greenhalgh

10/24/2011 4:33 PM EDT

big.LITTLE task migration use model
In the big.LITTLE task migration use model the OS and applications only ever execute on Cortex-A15 or Cortex-A7 and never both processors at the same time. This use-model is a natural extension to the Dynamic Voltage and Frequency Scaling (DVFS), operating points provided by current mobile platforms with a single application processor to allow the OS to match the performance of the platform to the performance required by the application.

However, in a Cortex-A15-Cortex-A7 platform these operating points are applied both to Cortex-A15 and Cortex-A7. When Cortex-A7 is executing the OS can tune the operating points as it would for an existing platform with a single applications processor. Once Cortex-A7 is at its highest operating point if more performance is required a task migration can be invoked that picks up the OS and applications and moves them to Cortex-A15.

This allows low and medium intensity applications to be executed on Cortex-A7 with better energy efficiency than Cortex-A15 can achieve while the high intensity applications that characterize today’s smartphones can execute on Cortex-A15.


Click on image to enlarge.

Fig 4: Cortex-A15-Cortex-A7 DVFS curves

An important consideration of a big.LITTLE system is the time it takes to migrate a task between the Cortex-A15 cluster and the Cortex-A7 cluster. If it takes too long then it may become noticeable to the operating system and the system power may outweigh the benefit of task migration for some time. Therefore, the Cortex-A15-Cortex-A7 system is designed to migrate in less than 20,000-cycles, or 20-microSeconds with processors operating at 1GHz.

One of the reasons the task migration can be so fast is that the amount of processor state involved in the task migration is relatively small. The processor that is going to be turned off, which is termed the outbound processor, must have all of the integer and Advanced SIMD register files contents saved along with the entire CP15 configuration state. The processor that is going to resume execution, which is termed the inbound processor, must then restore all of the state saved from the outbound processor. Additionally, any active interrupts that are being controlled by the GIC-400 must be migrated. Less than 2,000 instructions are required to achieve save-restore and because the two processors are architecturally identical there is a one-to-one mapping between state registers in the inbound and outbound processors.


Click on image to enlarge.

Fig 5: big-LITTLE switch.

Figure 5 describes the task migration process between inbound and outbound processors. Coherency is clearly a critical enabler in achieving a fast task migration time as it allows the state that has been saved on the outbound processor to be snooped and restored on the inbound processor rather than going via main memory. Additionally, because the level-2 cache of the outbound processor is coherent it can remain powered up after a task migration to improve the cache warming time of the inbound processor through snooping of data values. However, since the level-2 cache of the outbound processor cannot be allocated too, it will eventually need to be cleaned and powered off to save leakage power.

It should also be observed that normal execution of the thread occurs during the task migration process. The only “black out” period is during the task migration when interrupts are disabled and state is transferred from the outbound to the inbound processor.

big.LITTLE MP use model
Since a big.LITTLE system containing Cortex-A15 and Cortex-A7 is fully coherent through CCI-400 another logical use-model is to allow both Cortex-A15 and Cortex-A7 to be powered on and simultaneously executing code. This is termed big.LITTLE MP, which is essentially Heterogeneous Multi-Processing. Note that in this use model Cortex-A15 only needs to be powered on and simultaneously executing next to Cortex-A7 if there are threads that need that level of processing performance. If not, only Cortex-A7 needs to be powered on.

big.LITTLE MP is compelling because it enables threads to be executed on the processing resource that is most appropriate. Compute intensive threads that require significant amounts of processing performance, as their output is user visible, can be allocated to Cortex-A15. Threads that are I/O heavy or that do not produce a result that is time critical to the user can be executed on Cortex-A7.

A simple example of a non-time critical thread is one associated with e-mail updates. While web browsing the user will want email updates to continue, but it does not matter if they are done at Cortex-A15 performance levels or Cortex-A7 performance levels. Since Cortex-A7 is a more energy efficient processor it makes more sense to take a LITTLE longer, but consume less battery life.

Finally, as a fully coherent system can create a significant volume of coherent transactions, Cortex-A15, Cortex-A7 and CCI-400 have been designed to cope with worst case snooping scenarios. This includes the case where a Mali-T604 GPU is connected to one of the I/O coherent CCI-400 ports and every transaction is snooping Cortex-A15 and Cortex-A7 at the same time as Cortex-A15 and Cortex-A7 are snooping each other.

Software
As part of the big.LITTLE system, ARM provides a software switcher for use with Cortex-A15, Cortex-A7, CCI-400 and the GIC-400. The switcher serves two purposes:
  • The first purpose is to provide all of the mechanisms required for task migration between Cortex-A15 and Cortex-A7. As well as the processor state save-restore this also includes the code required to bring the processors in and out of coherency, control snooping in the interconnect and migrate interrupts. The switcher can be used as-is or the code can be used as a template for integration in to the operating system.
  • A second purpose is to hide the small number of programmer’s model differences between Cortex-A15 and Cortex-A7 from the Operating System. While Cortex-A15 and Cortex-A7 are architecturally identical and all registers are read and written in an architecturally consistent manner, the contents of the registers may not always be identical. So Cortex-A15 and Cortex-A7 are not totally programmers model identical in all cases.
For example, the contents of the Main ID register that identifies the processor will be different between Cortex-A15 and Cortex-A7 as will the contents of the CP15 registers that describe the level-1 and level-2 cache topologies. Fortunately, since both Cortex-A15 and Cortex-A7 implement the virtualization extensions OS accesses to these registers can be trapped to the hypervisor layer which is where the switcher can handle them.

The switcher enables a big.LITTLE system to be built today with current Operating Systems. However, as in the case of the state save-restore code, it may be that the small number of programmer’s model differences between Cortex-A15 and Cortex-A7 may want to be handled by the OS rather than the switcher.

Conclusion
This white paper has described the first big.LITTLE system from ARM. The combination of a fully coherent system with Cortex-A15 and Cortex-A7 opens up new processing possibilities beyond what is possible in current high-performance mobile platforms.

Rather than having to compromise the implementation of a single applications processor to cope with high and low intensity tasks, the big.LITTLE system opens the door to an extremely high performance implementation of Cortex-A15 since it will only be powered on when that performance is needed. This is complimented by the opportunity to create an extremely energy efficient implementation of Cortex-A7 since it will be the workhorse of the platform.

Through these implementation techniques and the variety of use-models, big.LITTLE provides the opportunity to raise performance and extend battery life in the next generation of mobile platforms.

About the author:

Peter Greenhalgh, Consultant Engineer at ARM, is the technical lead for microprocessor hardware development. During his 10-years at ARM, Peter has worked on Cortex-A5, Cortex-R4, Cortex-A8,
On November 8, 2011 he will be presenting a webinar on Addressing the High Performance/Low Energy Requirements of Multicore Platforms. Click here to register.

On October 26 he will present a session which will discuss how the Cortex-A7 and Cortex-A15 provide adaptive multi-processing to address the high performance/low energy requirements of these platforms by creating a subsystem consisting of a higher performance processor coherently connected a much smaller, more energy efficient processor.  

---------------------------------------------------------------
If you found this article to be of interest, visit the Micocontroller Designline where you will find links to relevant technical articles, blogs, new products and news.

You can also get a weekly newsletter highlighting the latest developments in this sector - just Click Here to request this newsletter using the Manage Newsletters tab - if you aren't already a member you'll be asked to register.




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)