Embedded Systems

ARM introduced its ARMv7 ‘Cortex’ architecture to address the future demands of embedded developers targeting the open systems, real-time embedded, and deeply embedded spaces. The ARM Cortex-M3 processor for deeply embedded applications introduces specific features to enable new microcontroller and system on chip solutions to deliver 32-bit performance within established 16-bit and even 8-bit cost and power consumption levels. These include a combination of techniques to improve efficiency, reduce gate count and memory size which cut die size and power consumption, as well as a more powerful, integrated interrupt structure to save clock cycles and provide extra flexibility for developers.

Vast families of 8-bit and 16-bit microcontrollers (MCU) have grown up around a handful of processor cores, such as the ubiquitous 8051. However, as end users’ demands intensify, system developers now need a more capable processing subsystem in all respects, for example a more complex memory subsystem combining several technologies for code and data storage and including off-chip memory. This requires the greater address range of a 32-bit processor.
 
In addition, new communication protocols including Ethernet and USB are increasingly penetrating deeply embedded applications, but in practical terms are beyond the capabilities of the 8-bit and 16-bit domains. Another factor pushing deeply embedded platforms beyond the reach of 8-bit and 16-bit implementations is the consolidation of multiple functions into a single product to create a compelling sales proposition, for example, replacing numerous 8-bit door and mirror controller microcontrollers with a single more powerful 32-bit device.
 
The need for suitable development tools and techniques is also playing a role in moving developers’ minds on from legacy 8-bit and 16-bit solutions. Today’s short market windows impose short product development schedules, requiring engineers to work at a higher level of abstraction than is feasible with most tools serving 8-bit and 16-bit architectures.

However, migrating to a more sophisticated and capable processor is not a simple matter, considering the resource and cost restrictions imposed on the majority of embedded systems. In the past developers may have balked at the prospect of the extra power consumption that typically comes with a more advanced processor architecture. Per-unit prices have also traditionally tended to be higher, and developers were concerned about the implications of transitioning established code base and whether they can maintain a familiar tool-set when moving to a 32-bit processor, which naturally utilises higher level languages and tools than are typically used for 8-bit designs. These factors are of key concern to engineers as they evaluate the trade-offs associated with moving up to a significantly more powerful processing environment.

Among vendors of 32-bit processors, ARM’s mobile roots meant that processor cores such as the ARM7 and ARM9 were able to address many of the concerns of embedded developers seeking to upgrade to 32-bit. There were already many power saving features, such as flexible powerdown modes as well as hardware memory acceleration to reduce the number of clock cycles per fetch. In addition, deep sub-micron process geometry, in some cases as low as 65nm, enables a cost-competitive implementation by minimizing die size. MCUs from various vendors presented a compelling migration path for many 8- and 16-bit designs, particularly where small increases in power and cost could be accepted.
 
On the other hand, interrupt handling in ARM7, for example, supports only basic and fast levels, whereas most 8-bit MCUs support significantly more. Some MCU vendors implemented vectored interrupt control as a peripheral function, enabling nested interrupts leading to greater flexibility in diverse MCU applications.

Among the enhancements introduced by the ARM Cortex-M3 processor, multi-level interrupt handling is now implemented close to the core, saving clock cycles as well as gates and providing greater flexibility for developers of end products.

For example, 240 interrupt levels are now supported, thanks to a tightly integrated nested vectored interrupt controller (NVIC), and interrupt latency is reduced by eliminating repetition of stack pop and push actions when transferring to a higher priority exception, a technique called ‘tail chaining’. Previously, up to 42 clock cycles could be required to move between active and pending interrupts. In Cortex-M3, interrupt handling is typically some 65% faster. An exception can now be serviced within a worst case of 12 cycles, and typically in only six cycles using tail chaining.
 
The cost barriers to 32-bit migration are addressed through several other measures to further reduce the number of gates required to implement the core. These include a revised programmer’s model that reduces the number of registers from 37 to 12 without incurring any performance trade-off. In addition, tight integration of close system peripherals, which include the configurable interrupt system as well as an integrated bus arbiter and advanced debug capability, has eliminated much of the interfacing overheads that MCU developers incur to add these basic functions. As a result, the core is implemented using only 33,000 gates. Including a minimal set of system peripherals there are still less than 50,000 gates in total.
 
With this very compact implementation, MCU integrators have two options that they can offer to customers. Using a modern design rule, such as 0.13%u03BCm or lower, a very small die size can be achieved even while offering a large and complex set of peripherals, thereby enabling a very competitive price simultaneously with a significant increase in complexity and capability. Power consumption is also very low at 0.06mW/MHz for Cortex-M3 fabricated in 0.13um.
 
Alternatively, this very low gate count also enables fabrication using a more mature process such as 0.25um, to achieve a very low cost solution. In fact, this approach has been used to break the $1 price barrier for 32-bit.
 
Another advantage of the larger design rule is that transistor leakage is significantly lower, leading to energy efficient operation. Naturally, the 0.25um process limits the operating frequency, but a 32-bit core can achieve more MIPS for a given frequency, compared to a traditional 8-bit. In any case the lower frequency allows use of low-cost packaging technologies such as lead-frame construction and leaded interconnect. Even in highly cost sensitive applications, then, the low gate count of Cortex-M3 now increases the appeal of 32-bit on cost grounds, as well as in terms of outright performance.
 
Code density is traditionally seen as a weak point in software targeted at RISC MCUs. This is not the case with Cortex-M3, which utilizes the Thumb-2 instruction set architecture (ISA). In practice, developing for Cortex-M3 is able to achieve up to four times the code density of an 8051-based implementation, implying that code memory size will actually be smaller – not larger – for a Cortex-M3 based implementation. This removes a significant barrier to 32-bit migration.
 
A further improvement to exception handling is yet another advantage of the Thumb-2 ISA. In the past, an ARM7 or ARM9 processor operating 16-bit Thumb instructions would have to switch back into 32-bit mode to handle exceptions. This is no longer necessary with the arrival of Thumb2, which enables smooth transitions between main program execution and exception handling.
 
However, since Cortex-M3 supports only the Thumb-2 ISA, existing traditional 32-bit ARM code is not binary compatible for Cortex-M3 design. Additionally the ARM Unified Assembler, which was introduced in version 3 of the RealView Developer Suite, enables the direct migration of assembler source code from ARM code to Thumb-2. To new developers targeting Cortex-M3 to host their first 32-bit implementation, this is of little consequence. Leaving behind a shelf full of familiar development tools and techniques, on the other hand, is somewhat more important. Fortunately, the RealView Micro Development Kit (MDK) automates many of the basic tasks required to develop, verify and download software to the target hardware, including configuring the tools, writing device boot code, and analyzing include file dependency.
 
These convenient features allow developers to focus on writing code using C. This, in itself, represents a new and more sophisticated approach to application development for deeply embedded systems.