Maturing MCU and MPU for edge AI offer new ways to optimize for different application requirements

By Laurent Helin EMEA Business Development Manager (Embedded Solutions) and
Tom Foltier, AI Business Development Manager, Future Electronics

Read this to find out about:

Implementing AI on the smallest low-power MCUs
How AI applications can be scaled up across a family of MCUs and MPUs
The highest-performing MCUs for AI implementation

Numerous embedded device manufacturers have come to the conclusion that the most advantageous way to implement machine learning (ML) and artificial intelligence (AI) applications is often locally on the device. Edge AI implementation can make it easier to meet the end user requirements for privacy, security, latency, cost and flexibility than a wholly cloud-based AI system.

In response to demand for edge AI, big players in the microcontroller and microprocessor markets are building out product portfolios that offer enhanced AI capabilities. These are competing to provide the easiest routes to the development of valuable AI functions, and to provide the highest performance, within the processing and memory constraints of an edge device.

Some marked differences in the approaches to AI enablement at the edge are already emerging. This means that design engineers can take advantage of the latest products and tools, introduced below, to best meet any one of three different application requirements:

An AI solution for the most resource-constrained MCUs
An AI solution for scalability across a range of products
An AI solution for high performance

When the MCU has little memory and low processing power

At one time, the conventional wisdom was that any useful AI operations required a specialized graphics processing unit (GPU) which consumed large amounts of power. In fact, the development of neural network algorithms such as tinyML has shown that even the smallest MCU can perform certain AI functions such as object detection or limited speech recognition.

The design engineer who implements AI on a low-cost MCU, however, will have to take care to minimize the memory footprint of the application inference engine, and to streamline the computation process to avoid overburdening the capability of a CPU as small as an Arm^® Cortex^®-M0 core.

This is the design objective for which STMicroelectronics built the NanoEdge™ AI Studio development environment. It is ideal for common AI applications such as predictive maintenance, in which an MCU analyzes data on parameters such as vibration, sound, pressure, magnetic field strength, or temperature changes. By detecting anomalies, or unusual patterns, in the stream of data, the AI system can help to predict when equipment is likely to fail. The NanoEdge AI Studio can also be used for intelligent motion sensing and people counting.

The aim of ST with the studio was to make ML development simple. It is a PC-based push-button development studio which requires no advanced data science skills. Even with no prior AI experience, any software developer using the studio can create a tinyML library for anomaly detection, outlier detection, classification, or regression.

These libraries can be combined to create a complete edge AI solution. In predictive maintenance, for instance, anomaly or outlier detection could be used to detect a problem, classification to find the source of the problem, and regression to extrapolate information and provide insights that enable maintenance engineers to fix the problem faster.

Both learning and inference are performed in the MCU itself via the NanoEdge AI on-device learning library: this streamlines the edge AI development process, and enables useful AI functions to be realized even with small training data sets.

The studio integrates a variety of tools for ML algorithm development, including:

Sampling finder tool for selecting the right data rate and data length
Datalogger generator, which enables the developer to log data in a few clicks
Data manipulation tool for datasets
ML libraries benchmark, a tool to help the developer find the best balance between preprocessing and machine learning models
Embedded emulator to test library performance live on an STM32 board, or from test data files
Inference time estimation: this helps the engineer to select the ML model that provides the best balance between speed, accuracy and processing/memory overhead
Validation tool to compare the libraries available in the studio

AI applications developed within the NanoEdge AI Studio environment are easy to run on STM32 MCU development boards with no configuration required, and shown on Figure 1. The applications can also be ported readily between STM32 MCUs, enabling the OEM to change the choice of Arm Cortex-M CPU core without wasting investment in AI application development.

Fig. 1: The STEVAL-PROTEUS1 development board from STMicroelectronics supports temperature and vibration monitoring in industrial applications. (Image credit: STMicroelectronics)

Chips and tools for scaling AI implementation across a broad product family

While tinyML and similar algorithms allow for AI applications to run on the smallest and cheapest MCUs, some markets call for a broader range of AI capabilities to be implemented: an OEM marketing strategy might aim to meet demand for low-end, mid-range and premium products with a portfolio based on a single platform.

In this case, the OEM developers will want to scale AI capabilities up or down to fit the resources available across an MCU and MPU portfolio. This is the promise of the eIQ^® development environment from NXP Semiconductors, shown in Figure 2. The eIQ environment is compatible with the NXP ‘EdgeVerse™’ components, which span a portfolio of both MCUs and MPUs, as well as the i.MX RT crossover MCUs.

The eIQ system is a comprehensive AI environment: the system includes inference engines, neural network compilers, libraries and hardware abstraction layers.

For development, the eIQ toolkit features an intuitive GUI, the eIQ portal, and development workflow tools, along with command-line host tool options. The eIQ toolkit enables graph-level profiling capability to help developers to optimize the neural network architecture for specific EdgeVerse processors.

In the portal, developers can create, optimize, debug and export ML models, as well as import datasets and models. They can also train and deploy neural network models and ML workloads for vision applications. The output from the portal is software which seamlessly feeds into runtime inference engines.

NXP supplies examples which demonstrate various use cases, and provides guidelines for the different process flow options such as importing trained models based on popular frameworks, and creating, importing and augmenting datasets to develop models within the tools.

Fig. 2: the eIQ development environment is compatible with multiple ML inference engines. (Image credit: NXP Semiconductors)

Machine learning inference engines supported by the eIQ environment include:

DeepViewRT: a proprietary NXP inference engine for i.MX RT crossover MCUs and i.MX 8 series applications processors
TensorFlow Lite Micro: a faster and smaller inference engine than the familiar TensorFlow Lite
TensorFlow Lite: a flexible inference engine for embedded applications running on Arm Cortex-A and Cortex-M CPUs. This is also suitable for Verisilicon GPUs and neural processing units.
CMSIS-NN: a performance- and memory-optimized inference engine for Cortex-M CPU cores
Glow: a neural network compiler which produces high-performance, memory-efficient inferencing on i.MX RT crossover MCUs

Combining hardware and software to maximize AI performance at the edge

While the ST NanoEdge AI Studio environment aims to make it easy to do useful AI work on the smallest of MCUs, AI on high-performance hardware can help sophisticated edge devices such as mobile robots to achieve very high levels of autonomy while limiting power consumption.

It is for these high-end application requirements that Renesas built the RZ/V2H high-end AI MPU. This features a dynamically reconfigurable AI accelerator, the DRP-AI3, alongside quad Arm Cortex-A55 processor cores for the application Linux^® operating system, and dual Cortex-R8 real-time processors. An additional dynamically reconfigurable processor can accelerate image processing in AI computer vision algorithms such as OpenCV, and perform the dynamics calculations required in robotics applications. The RZ/V2H offers the further advantage of very low power consumption, enabling many designs to eliminate heat sinks and fans.

Like ST and NXP, Renesas has recognized the need for wide-ranging software enablement for the AI/ML development process. This is provided through the RZ/V AI software development kit (SDK), a Linux OS-based system which includes Yocto Linux with bootloader, a Linux kernel, a cross compiler, and a complete set of libraries for the DRP-AI3, graphics and codec, shown in Figure 3.

Fig. 3: The Renesas AI solution includes powerful hardware backed by a dedicated AI SDK. (Image credit: Renesas)

The Renesas approach to accelerating AI computation is with the proprietary DRP-AI3 AI engine and multiple high-speed CPU cores. Others adopt a different approach: for the fastest AI products, NXP has created the eIQ Neutron Neural Processing Unit (NPU), a highly scalable accelerator core for ML operations. The AI-specific features include tightly coupled memory, DMA controllers, data mover cores, control cores, and weight compression decompression technology.

The eIQ Neutron NPUs offer support for a wide variety of neural network types such as CNN, RNN, TCN and transformer networks. The eIQ Neutron NPU is fully integrated into the eIQ machine learning software development environment.

The eIQ NPU is available in the MCX N series MCU, and the new i.MX 95 applications processor family.

Highly integrated offerings from big MCU and MPU suppliers

This article shows that MCU and MPU manufacturers are reacting to surging demand for AI capability with a broad range of components that can perform various ML operations. The advantage for large component manufacturers is that they have the resources required to build a comprehensive, integrated offering which encompasses not only optimized hardware, but also a full suite of the enablement tools required to bring real AI applications to market as quickly and easily as possible.