Lots of Little Cores Do the Trick for ARM and Intel
An impromptu discussion with ARM’s CTO Mike Muller reveals details behind the behind big.LITTLE processor strategy, while IP interface issues remain.
How does the leading IP supplier optimize trade-offs between high performance and low power? The same way it always has, but with new twists that take advantage of technological advances. The nature of these twists were highlighted by ARM’s co-founder and CTO, Mike Muller, during an impromptu chat with Ed Sperling, editor-in-chief of System-Level Design (SLD) and myself after Muller’s keynote address at the ARM TechCon. What follows is a portion of that conversation. – JB
Blyler: (Muller had talked earlier about structural advances – like FinFet technology – and voltage scaling as traditional ways to balance processor performance and power constraints.) In addition to these techniques, ARM recently announced the big.LITTLE approach to help this balancing act. While such an approach will help in the short term, will it matter in the long run? (see, “ARM Goes Big and Little at the Same Time“)
Muller: We have always done dynamic voltage scaling (DVS) to crank down the voltage and save power for tasks that need less processing performance. Our observation with big.LITTLE is that there are times when you really need single thread performance, not big processing. In fact, there are a lot of times when you don’t need big performance. Further, you cannot build as efficient a micro-architecture for the big cores as you can for the little cores because getting that single thread performance involves a lot of micro-architecture complexity and speculation which ultimately costs you power.
But what if you don’t need all of that performance? What if your voltage scaling has run out and you don’t have any where to go because the guard band margins are too tight? Then the right thing to do is to task migrate onto an identical but smaller core with simpler micro-architectures that you can build more efficiently. I think that works where ever you are and whatever process node. It will always be true that you will be able to build much more efficient little cores than big cores.
[Readers may recall my earlier discussions with Intel on this same point: IP Core Interconnect leads to MARC at IDF2011]
Blyler: How about the software – single verses parallel compiler – side of the many little cores equation?
Muller: This is an operating system (OS) level task migration which happens any time you have many SMP cores. If you need to lighten up the processing, you can close down the number of cores you have running. At the same time, you do a task migration and then there is just another step to migrate onto a small core. That is something you build into the OS. You don’t add any extra magic. It is already happening.
[For reader reference: Symmetric multiprocessing (SMP) deals with many identical cores that are connected to the same memory and controlled by the same OS. Each core in an SMP architecture is treated as a separate processor. The problem with SMP is the interconnect bottleneck as more cores are added. Interconnects are accomplished with buses, crossbar switches or even NOCs. (see, "On-Chip Interconnection IP Gains Attention" ]
Steve Leibson’s latest blog covers related memory challenges in ARM’s announcement concerning its 64-bit processor architecture. In particular, Leibson notes the software challenges between the big.LITTLE scheme as implemented by the Cortex-A15 and newly announced Cortex-A7.
“Even though the ARM Cortex-A15 and newly announced Cortex-A7 have extended 40-bit real memory-address spaces, that’s not what the software sees. Because of the inherent 32-bit architecture and 32-bit addressing model of the ARMv7 architecture, the software still sees a 32-bit subset of the 40-bit address space.”]