For the reader benefit, I would like to add some light on information included in the article.
The indicative power and size mentioned in one of the last paragraphs are both representative of a complete microcontroller based on the Cortex-M7, using a mainstream process with embedded flash (i.e. not the processor only). In other words, it includes the size and power of the volatile and non-volatile on-chip memories, of the IOs, and also of the digital and analogue peripherals.
When comparing with the Cortex-R, it would be more precise to state it's "more energy-efficient" when run at comparable frequencies. As for performance, Cortex-R can achieve by design higher frequency thanks to their longer pipeline and can thus offer "higher absolute performance".
Thanks to Jessica and EETimes for all these great articles!
I'd much rather see ARM focus on improved DSP functionality in its Cortex-M family. The "DSP" Cortex-M4 is 4-5x faster than a Cortex-M3 at FFTs, but the Cortex-M4 seems to be 8-10x slower than a Blackfin. I'd rather see an M0-based "M2" Cortex with really good FFT performance than smartphone processor features adapted for the high-end embedded market. ARM has dipped its toe in the DSP waters, but hasn't yet commited to it in a big way.
You're right, slow light switches etc are typically not due to a slow micro controller, even a 4-bit 1MHz MCU will be fast enough for that.
Changing channels on a digital decoder relies on MPEG encoder sending full frames every now and again, as the decoder can only start showing the channel after it sees such a frame. This is an issue with broadcasters not encoding these frames often enough in order to reduce bandwidth for digital TV and not at all an issue with slow CPUs or software (in any case all of the decoding is done by a dedicated hardware decoder).
The slow startup time of modern TV's is an issue with the OS booting, so there faster flash and CPU should help.
Airbags, engine control, harddiscs are hard realtime problems requiring low predictable interrupt latency. The latter 2 have typically used Cortex-R4 for performance but it seems now an M7 should be more than fast enough.
ARM Compiler 6 already have LLVM. But first release is focus on 64-bit application processors.
A lot of the technologies (e.g. dual issue pipeline, optional caches, optional double precision FPU) arguably already available in other highend processors. However, putting all these in a much smaller processor footprint is a big challenge. At the same time, the design remain deterministics, and is very easy to use (almost everything can be programmed in C including interrupt handlers, and simple programmer's model based on the existing Cortex-M architecture).
For system designers, in addition to able to do more with a mass market MCU product, it also means you could consolidate multiple processors (e.g. some audio products today use a MCU + DSP, which could be replaced with a single Cortex-M7). They can also have a much larger system memory size without losting performance (due to integrate L1 cache, you can execute code from external memories with good performance, including serial flash like QuadSPI).
Things will get even more interesting when chip designers start to use more advanced process nodes in highend MCUs.
Of course, there are many applications that doesn't need such high performance. That's why ARM's wide range of Cortex-M processor products is important. You can get a tiny processor for ultra low power IoT sensor, to high performance microcontrollers, all based on a consistence architecture.