Embedded processors are ubiquitous in the products we use on a daily basis, from devices requiring high-performance like smart phones and tablets, to devices with low energy consumption requirements, including medical monitors, hearing aids and wearable electronics. These electronic devices are increasingly required to execute a greater number of functions while consuming less power and silicon area. While newer process technologies address these low power/high performance requirements to some extent, they cannot keep up with the performance, power and area requirements of advanced battery-operated devices. Additional techniques are needed in order to deliver more features with higher performance, lower area and less power consumption. This article explores how Synopsys' DesignWare® ARC® Processors and ARC Processor EXtensions (APEX) technology can optimize processor power and performance.
Custom Processor Extensions
Often, General Purpose (GP) processors are used to run applications in deeply embedded systems. The drawback is that these GP processors typically are not optimized for dedicated tasks, and so it may not be possible to achieve the required performance and stay within the power budget. ARC Processor EXtensions (APEX) technology enables designers to add custom hardware accelerators, registers and condition codes, as well as tightly coupled peripherals to their ARC processor to optimize it for specific application domains, without the costs of additional hardware bus infrastructure.
Designers can add one or more of the following types of user-defined components to create a set of user-defined hardware resources for functionality that would otherwise require a large number of existing instructions:
- Zero, single, and dual operand APEX ALU instructions that implement a custom function in hardware can be added to an ARC core. In addition to these operands, the APEX ALU instruction can also use APEX core and/or auxiliary registers defined in the same custom processor extension.
- Condition codes can be added to test certain conditions before an APEX instruction is executed.
- Up to 28 additional core registers can be added to the core. These extension registers are available to all instructions and are typically used for information that changes often or must be accessed quickly.
- An almost unlimited number of 32-bit auxiliary registers can be added to the core. These registers are accessible in software using the "LR" and "SR" instructions and are typically used for information that does not change often and does not need to be accessed quickly.
- An ARC processor extension can also contain external interface signals. These additional input, output, and bidirectional signals will propagate to the processor's core boundary, resulting in additional boundary ports.
By giving ARC processors the ability to perform certain software functions in hardware and to directly control the hardware extension from the processor pipeline, APEX eliminates the need for an external hardware bus infrastructure. Additionally, APEX allows the processor to be optimally tailored to perform its dedicated tasks in fewer cycles, which enables the same performance with lower power consumption, higher performance at the same power, and/or new functionality that previously would have required hardware support to be executed.
Different applications have different PPA requirements. APEX provides designers the ability to implement the most adequate hardware accelerators with the optimal balance of power, performance, and area for their specific design.
Optimize Your Processor with Hardware Extensions
Sensors are used in an increasing number of applications from consumer products like smart phones and tablets to a variety of medical devices, so an example sensor application is an effective way to illustrate how you can optimize a processor using hardware extensions. Adding custom extensions to an embedded processor to execute typical sensor functions in hardware reduces the cycle count needed to execute the sensor application. This lowers energy consumption, either by lowering the clock frequency and keeping the same execution time, or by keeping the same power budget but with a shorter execution time.
Figure 1 - ARC EM4 Processor with APEX hardware accelerators
Figure 1 shows a typical ARC EM4 Processor with hardware extensions targeting sensor applications. Running a sensor application on an ARC EM4 configuration with APEX floating point hardware accelerators resulted in a cycle count reduction of 89% and a total dynamic power reduction of 1.2% at the cost of an area increase of 4%, using a TSMC-90-nm LP library and a clock frequency of 10 MHz. Multiplying the achieved cycle count and dynamic power reductions results in an energy reduction (energy=time x power) of almost 10x, as depicted in Figure 2.
Figure 2 - Total energy consumption of sensor application
Next we will compare the energy consumption of an extendible ARC EM4 with the energy consumption of two popular commercial embedded processors without extension support for typical functions present in sensor applications. Different groups of functions have been identified and for each function group, processor extensions are added to the ARC EM4 such that all functions within the group are hardware accelerated, with a balanced trade-off between area and cycle-count optimization. Figure 3 illustrates the average relative energy consumption spent on executing the different function groups on the three different processor solutions at a 40-nm technology node. Processor A has the smallest area and also has low power numbers, however the overall energy consumption to execute each function group is the highest due to the large amount of processor cycles required during execution. Processor B, with the highest area, is more optimized for performance and as a results its energy consumption numbers are better than A, but not as low as the ARC EM4 + APEX numbers. The combination of a low-power EM4 Processor and APEX to reduce cycle count results in the lowest energy numbers, up to 35x lower than Processor A and up to 20x lower than Processor B. On average the ARC EM4 + APEX solution consumes 13x less energy than Processor A, and 7x less energy than Processor B, another critical care-about in sensor applications.
Figure 3 - Relative energy consumption compared to ARC EM4+APEX
The ever increasing demand for smaller electronic devices, with more functionality, longer battery life, and shorter time to market has accelerated use of embedded processors to offload the host processor from commonly executed tasks. Processor extensions provide a means to extend a general-purpose processor with custom hardware accelerators to optimize the execution of dedicated applications for reduced power consumption and area, and/or increased performance. Examples comparing energy consumption of an ARC EM4 processor with processor extensions to embedded processors solutions without extensions show that significant energy savings are possible with custom processor extensions.
For demonstration of how an ARC EM4 Processor delivers the fewest cycle counts and smallest memory footprint as compared to popular embedded processors without processor extension support, read the whitepaper entitled Leveraging Processor Extensibility to Build an Ultra-Low Power Embedded Subsystem.