Rather than the two-core big-little idea ARM may need to work on a finer-grained approach; a whole nest of ISA compatible cores optimized for different levels of power saving and performance. Could it have to evolve to a "biggest-big-little-tiny" strategy?
But what about the real-estate; is a complex hierarchy of tuned ISA-compatible cores all waiting to perform at a particular point in the power-performance curve too expensive?
Maybe, maybe not. We should also remember that big-little is also partly a response to the ARM idea of "dark silicon," which is based on the idea that for power consumption and thermal reasons advanced processors could not afford to have all the silicon in operation at the same time, because the chip would simply burn up. So, the argument runs, you may as well use portions of the IC optimized to different loads and use cases. The counter-argument is that rather than design a complex IC that has to be predominantly dark, I would rather have simpler chip that is easier to design and less costly to make.
One alternative is to find a manufacturing process that can support a wider range of DVFS.
In fact such a process already exists in the form of the 28-nm fully-depleted silicon on insulator (FDSOI) manufacturing process from STMicroelectronics. This can take operation down to around 0.6-V compared with bulk CMOS, which is generally limited to lower end minimum of about 0.9-V. The use of back-biasing of the FDSOI wafer provides a broad dynamic-scaling of performance which should be a good fit for the big-little architecture.
In fact ST-Ericsson has already used the 28-nm FDSOI to market the idea of a "quad-core" processor, the L8580 ModAp, based on two physical Cortex-A9 cores. ST-Ericsson's argument is that the Cortex-A9 can be operated at low-voltage so save dynamic power like a "little" core or pushed to high clock frequency for performance like a "big" core; a virtual big-little.
I don't much like the idea of measuring a chip by the number of virtual cores, which seems to be a trend right now, because of course that number is arbitrary. I prefer to count the cores in three-dimensional physical space.
But nonetheless the intelligent combination time-slicing, extended DFVS and multiple physical cores are likely to be one way forward for big-little in a multidimensional extension of what we already have.
What's next for big.Little is that it just got 20 licensees, and Renesas will even take it into the car chip market, which should be another booming chip market, soon:
Mhhh, more interesting techniques exit like body biasing. Check the literature on conferences I would say. Though such techniques are not always possible for everyone at a foundry - advantageous for firms with own process and modelling departments.
I'll also throw this into the mix. At highest voltage there also tends to be an increase in non-dynamic (leakage) power consumption.
BUT leakage consumption is a bigger proportion of overall IC power consumption when a chip is idling at low voltage.
ARM has no choice. When your Dual core A15 power (just for core) comes at 6W, how do you compete against Intel Atom, when they can match performance and have lower power? ARM is losing the power battle. ARM is no longer power performance efficient. Intel took the lead
Unfortunately, from the perspective of energy consumption, DF really doesn't help, even though it does lower the power (energy consumption per second). In fact, slowing down the frequency will only increase the total energy consumed for the same task as it increases the execution time (due to the extra energy consumed in the "supporting" blocks to keep the core running).
I would hesitate to claim that it's just DVFS. DFS works primarily on dynamic power, and does little for overall energy consumption, while DVS is where the real meat is, affecting both leakage and dynamic consumption strongly. At the same time, as voltage drops, frequency must also often be compensated to assure proper operation. Big.Little is different in that by having two entire cores, you can also use separate fabrication parameters for both.
Because the two cores are independent entirely, it's possible for the .Little processor to be fabricated with a high threshold voltage in mind for low leakage during the expected long on-times. The Big. processor can then be fabricated with a more aggressive process and less concern about leakage. Rather than designing for average case, you can design for expected case for both processors.
How about use of Digital power mngmt (Intel's term is IVR - for integrated voltage regulator - to be used in multicore Hasswell processor family) and fine tune each core....
How many PMU/PMICs is Samsung's Exynos Octa 5 using?
Oh come on...there ought to be a system-wide solution to this problem. Some types of code (simple logic, event handling etc.) can probably run on the baby processor.
Some larger code (iterating through large data structures, block data handling, signal processing etc.) can be done on the larger core.
These are 2 different types of programming...one needs to optimize latency while the other might need to optimize throughput...but due to the prevailing convention we use a single programming language and a single processor for both types of data. If arm wants to tackle this they should invent a new type of programming language or virtual machine or something, and assign threads to different processors. Then let the VM or OS decide which core to turn on, based on software demand.
Bunch of Bollocks! Peter gotta get his facts straights and realize it was not ARM who invented Big Little! It was really an ARM customer that started it, then ARM took the concept, enhanced and started gorilla marketing campaign.