Kernel Hacks: Evaluating Rust for embedded firmware development

You might be wondering how Rust fits in the embedded firmware development world. So was I, so I decided to give it a try. My intention was to build a simple ecosystem to test with a STM32F401 MCU, similar to the one I made here in plain C, using libopencm3.

So the requirements for this environment are:

A comfortable and easy to use build and deploy (flash) environment.
A Hardware Abstraction Layer (HAL) supporting the most common ARM Cortex-M MCUs.
An operating system allowing some basic multitasking.
Some debugging facilities: traces and a gdb stub.

Did I complete the environment fulfilling all these requirements? Let's see.

Scope

This post will try to briefly cover the environment setup and the support for embedded MCUs. It will not deal with the Rust language itself and how it compares to C. You can have a look for example here if you are interested. It also does not pretend to be a detailed guide, just a brief explanation to give an insight into the process, so you know what to expect. If you want more detailed info, a good place to start is The Embedded Rust Book.

Build environment

The toolchain is dead easy to setup: install rustup from your favorite distro package manager, and use it to automatically install (and keep updated) your toolchain of choice. For example for the STM32F401 (that is an ARM Cortex-M4 MCU), you need to install the thumbv7em-none-eabihf toolchain:

$ rustup add thumbv7em-none-eabihf

And that's all!

My original C build environment was Makefile based. After setting up the toolchain as explained earlier, for embedded Rust you will already have installed Cargo. It works just the same as with standard Rust development. And man, Cargo is just awesome. You have a single place to build your project, manage dependencies, install additional dev tools, flash the MCU, start the debugger... At this point, it is recommended to install the following additional Cargo tools to ease development:

cargo-binutils: you can use binutils directly from your ARM cross compiler, but installing this eases the process. For example, once installed, you can see your release binary size with an invocation of cargo size --release. Note that for latest versions to work, you will also have to rustup component add llvm-tools-preview.
cargo-flash: allows easily flashing the firmware to the target MCU. Uses probe-rs under the hood.
cargo-embed: allows using RTT (Real Time Transfer, more on this later) and starting the GDB server stub. Also uses probe-rs under the hood.
cargo-generate: for quick and easy creation of project templates (you can see it as a supercharged version of cargo init).

All these tools can be installed with two commands:

$ cargo install cargo-binutils cargo-flash cargo-embed cargo-generate
$ rustup component add llvm-tools-preview

To conclude this point, we have to give a big point here to Rust: Cargo is extremely more powerful than my simple Makefile. It is also easy to use and does not get in the way as I feel modern and bloated IDEs do. Configuring Cargo is easy (using Cargo.toml and .cargo/config files), and if you need a complex build (for example generating bindings from Protobuffer files), you can create build.rs scripts and they are supported exactly the same as always.

Hardware Abstraction Layer

For my C projects, I was using libopencm3 to abstract the hardware. This library is really great: small, very easy to use, supports a lot of devices and has a ton of examples available. It has just one thing that can be seen as a defect (and that for some projects can be a very big one): it is LGPLv3 licensed. And as dynamic linking is not a possibility in the MCU world, if you use this library, you must either release your sources under a compatible license or release the object files of your project along with a script allowing to link them. This can be problematic for the commercial world, but not only there: it can cause problems mixing the lib with other incompatible licensed ones. Please do not use LGPL with libraries intended to be used in embedded devices. Or at least not without a linking exception.

The HAL layer in Rust is separated in several crates. There is a cortex-m crate abstracting the CPU and standard ARM peripherals (like the ARM SYSTICK system timer and the NVIC interrupt controller), and there are also a lot of HAL crates abstracting all the other MCU peripherals, separated in families. For example, for my STM32F401, I have to use the stm32f4xx-hal crate along with the aforementioned cortex-m crate. I have yet to thoroughly evaluate these crates, but code quality looks great. It uses the builder pattern for peripherals to make configuration of devices easy, and it uses idiomatic Rust features to implement zero cost abstractions that prevent some programming errors without incurring in CPU usage penalty (checks are done at compile-time). For example to configure a GPIO as an input with internal pull-up, this information (input, internal pull-up) is embedded in the GPIO pin type, so if you try using the pin as an output, you will get a compiler error.

And here I must declare another win for Rust. libopencm3 is great, but its LGPLv3 license can be troublesome, while cortex-m is MIT/Apache licensed and usually the stm32fxxx-hal crates are 0BSD licensed. Also although libopencm3 is very high quality, the Rust HAL crates use the additional Rust features C lacks to make your life easier (once you learn enough Rust to fix the build errors 😅). Some people can argue that I could directly use CMSIS instead of libopencm3, but it's a lot lower level. Or that I could use the HAL provided by ST Microelectronics, but it's closed source and code quality does not seem as good. So point for Rust.

Multitasking

For simple embedded projects, I just use interrupt-based multitasking: you have your main loop doing low priority stuff in the background (or sometimes not even that, a plain empty loop idling or entering a low power mode) and that loop is preempted to process external events using hardware interrupts (a timer fires, or you receive audio samples to process via DMA interrupt, or you receive a command via any communication interface, someone pushes a button, etc.).

For more complex projects, I like using an OS that gives you thread implementation and synchronization primitives (semaphores, locks, queues, etc) to allow separating the components in different and almost independent modules that can be easily plugged/unplugged/modified without affecting the rest of the system. In the past I used RTX51-TINY and TI-RTOS (previously known as SYS/BIOS or DSP/BIOS). Nowadays I tend to use FreeRTOS (I have yet to test ChibiOS and Contiki, but FreeRTOS is good enough for me).

Browsing crates.io, it seems there is no widely accepted classic multithreading OS implementation for embedded Rust. I found two shim layers that wrap FreeRTOS under a Rust interface, but they do not seem widely used and they look almost abandonded (the latest release was more than a year ago). So I decided to test what nowadays seems to be the most widely accepted solution to implement concurrency in embedded Rust: RTIC (Real-Time Interrupt-driven Concurrency).

RTIC is elegant and minimalistic. It also looks great for hard real-time applications: the simple dispatcher implements the Immediate Ceiling Priority Protocol (ICPP) to schedule tasks with different priorities in a deterministic manner. This allows doing analysis like Worst-Case Execution Time (WCET), mandatory in critical systems. Unfortunately RTIC does not implement classical multithreading: RTIC tasks respond to external events (a button press, new data received, a timer wrap) and must return when their processing is complete (i.e. no infinite loops like in many typical threading approaches). This can be enough for many systems, but in my opinion makes more difficult to separate different components in non-critical systems. In fact, if we compare using RTIC with just using a background task with interrupt handlers in Rust, RTIC almost only improves the sharing of data between different interrupts and the background task. Other than that the approach is almost the same!

Another possibility to achieve concurrency in embedded Rust I have not explored, is going the async/await route. It seems recent additions to the compiler allow asynchronous programming in embedded Rust. But I have to admit I am not the best fan (yet) of async/await semantics, so I decided not to explore this route (yet).

Reaching a conclusion in this subsection is not easy, but I think I will give the point to classic C + FreeRTOS. The FreeRTOS and ChibiOS layers in Rust seem barely used and developed and the RTIC implementation does not cover all my needs. But this might change in the near future: maybe the async/await route is the way to go. Or maybe the FreeRTOS support will improve.

Debugging

Having a comfortable debugging environment is very important for embedded systems. I do most of my debugging using simple traces, but I like using gdb when things get ugly and those bugs that make you scratch your head appear.

Debug trace

My trace implementation approach for the C environment is pretty straightforward: use an UART to log the data. To avoid the logs interfering with the program flow as much as possible, I use DMA to get the data out of the chip. I have also defined some fancy macros to implement logging levels (with color support) that directly do not compile the logging code if the log level (defined as a constant at compile time) is below the threshold. This is very useful because it allows flooding the logs of the verbose and debug levels for debug builds, but easily removing all these logs for release builds just by changing a build flag (this way the log code does not even get compiled into the final binary).

I could have used a similar approach for the debug trace in Rust, but I wanted to try a more modern approach that avoids using additional hardware (the UART). So I tried two approaches: gdb semihosting and RTT.

To log using gdb semihosting, you just use the hprintln!() macro included in the cortex-m-semihosting crate. You must enable gdb semihosting in the gdb debugging session, but this is typically taken care transparently by using a gdb init script. Using this approach is easy, it works and is compatible with gdb debug sessions (in fact it needs a gdb debug session to work!). But unfortunately it has some unconvenients: using hprintln!() macro crashes your program if the debugging probe is not connected! Initially I thought this was caused because I was unwrapping hprintln!() calls (e.g. hprintln!("Hello World").unwrap();), so I thought hprintln!() was returning error and the unwrap() was causing the program to panic. But replacing whe unwrap() calls with ok() did not fix the problem, so it seems that hprintln()! just hangs the program if the debug probe is not connected. Another problem with hprintln!() is that it is very slow. When one of this calls is invoked, the CPU is stopped, the debug probe has to realize there is data to transfer, transfer the data and resume the CPU. This can take several milliseconds!

To log using RTT (Real-Time Transfers) you need the cargo-embed crate installed on the host, you have to initialize RTT by calling rtt_init_print!() and you have to use the rprintln!() or writeln!() macros to log. RTT is very fast and it allows defining several channels that can be input/output and blocking/nonblocking. So you are not restricted to a single slow output blocking channel like when using semihosting. This is much better than hprintln!(), but unfortunately it seems using RTT and at the same time GDB is troublesome. Support for using both at the same time was added a month ago (version 0.11.0) and in my experience does not work very well: when I enabled both at the same time, the output interleaving GDB and RTT outputs gets borked. I hope this gets improved in the future (or maybe I have not been able to properly configure it?). Another problem with RTT is that sleep modes break the RTT connection (at least for the STM32F401). This is specially annoying when using RTIC, because the default idle task puts the MCU in sleep mode. So I had to override the idle task to prevent the MCU entering sleep in order for RTT to work.

GDB stub

I was able to set-up and use the GDB stub with both log configurations (semihosting and RTT). If you are using semihosting, the usual way to start a debug session is first starting openocd in a separate terminal, then either manually running cross gdb, or configuring .cargo/config file to be able to start it automatically via cargo run. When using RTT, you start the gdb server stub by invoking cargo embed (instead of running openocd), and then you directly run cross gdb or cargo run as before. The only thing to highlight here is that as I wrote above, using RTT and GDB at once breaks RTT output.

Debugging the blinker with cross gdb under cgdb

So the point here goes to... no one (or to both, it's a tie). I had no problem to set-up GDB, and although I had some trouble with log traces, nothing prevents me using an UART to send the trace data like I did in my plain C project template. By the way, I have not bothered but I should also be able to use RTT with my plain C project template, so as I wrote, it's a tie here.

Wrapping up

So far, it seems Rust is a very good alternative to C for embedded firmware development on the supported MCUs (shame there is no WiFi support for the esp32 platform). The build environment is just awesome, support for ARM Cortex-M devices looks very complete and you should be able to debug the same you do with your C projects. It might lack a bit when talking about concurrency with multithreading, but for not very complex programs RTIC should be enough and I am sure threading support will eventually catch up. If you feel adventurous, you can also try the asynchronous approach using async/await. Please let me know in the comments if you do.

Happy hacking!

Kernel Hacks

Friday, August 27, 2021

Evaluating Rust for embedded firmware development