Trading modularity for performance and portability

TL;DR By merging interface and implementation of a component into a single file, and marking all functions and data as static, and a little bit of C macros, you can help your C compiler to generate highly-optimized code for you.

Engineering is all about trade-offs! A problem can have many different solutions, even many acceptable solutions.

Acceptable solution is a solution which satisfies the pre-defined design goals. Defining the design goals is, obviously, part of the designer's job.

I think, performance and portability are the most important goals for embedded designs due to the constrained nature of embedded systems.

So a design that sacrifices modularity is an acceptable solution for me.

Design

So the idea here is to get rid of separate files for interface and implementation of a software component and merge them. By software component, I mean a logical unit that do some useful task. E.g., "touch component" that interacts with touch IC.

Physical layout design

As an example, the physical layout of a project with three components (comp-a, comp-b, and comp-c) will be like this (this is a simplified view, you need more files in a real-world project):

\
 +-- platform/
    \
     +-- ...
 +-- comp-a.h
 +-- comp-a.test.c
 +-- comp-b.h
 +-- comp-b.test.c
 +-- comp-c.h
 +-- comp-c.test.c
 +-- app.h
 +-- app.test.c
 +-- main.c

Other than main.c, all .c files are test codes. And they'll run on the build machine (not on the target microcontroller, obviously).

Each component comprises:

an "interface/implementation file" (e.g., comp-a.h),
a unit test file (e.g., comp-a.test.c)

The app.h includes and configures all required components. And provides app_setup and app_loop functions.

There's also an integration test for the whole app in app.test.c file.

Program design

There are two phases in a program:

Initialization
Main loop

// main.c

#include "app.h"

int
main()
{
  app_setup();  // initialization
  while (1)
    app_loop(); // main loop
  return 0;
}

app.h will provides app_setup and app_loop.

Platform abstraction

Each component expects a set of macros to be defined. These macros expose functinalities of the platform (abstraction).

Imagine that comp-a needs to do IO using the GPIO functionlity of the platform, we can pass the functionality to comp-a.h like this (assuming that gpio_write is a function defined by the platform):

// app.h

#define COMP_A_GPIO_WRITE(port, pin) gpio_write(GPIO_PORTS_##port, pin)
#include "comp-a.h"
#undef COMP_A_GPIO_WRITE

The comp-a.h can provides an explicit error message when the expected macro is not defined:

// comp-a.h

#ifndef COMP_A_GPIO_WRITE
#error COMP_A_GPIO_WRITE should be defined by user
#endif

This way we can improve the modularity :)

Now, let's look at a concrete example: A component to interact with touch IC through I2C (despite the fact that I hate STM32 HAL library, I'll use it for this example):

// app.h

#define TOUCH_I2C_TX(adr, buf, size) \
  (HAL_I2C_Master_Transmit(&hi2c1, adr, (buf), (size), 1) == HAL_OK)
#define TOUCH_I2C_RX(adr, buf, size) \
  (HAL_I2C_Master_Receive(&hi2c1, adr, (buf), (size), 1) == HAL_OK)
#include "touch.h"
#undef TOUCH_I2C_RX
#undef TOUCH_I2C_TX

// Our *expected* declarations for `touch_init` and `touch_read` functions
// which are defined in `touch.h`.
// This is useful for catching possible mismatches.

static void
touch_init(void);

static int
touch_read(uint8_t* pressed, uint8_t* released);

We expect touch.h gives us two functions: touch_init and touch_read.

I put declarations of touch_init and touch_read functions after inclusion of touch.h file to catch possible errors, early and explicitly. This is a double-check!

Other components glued together in the same manner in app.h. So app.h would be something like this:

// app.h

#define COMP_A_GPIO_WRITE(port, pin) gpio_write(GPIO_PORTS_##port, pin)
#include "comp-a.h"
#undef COMP_A_GPIO_WRITE

// ...
#include "comp-b.h"
// ...

// ...
#include "comp-c.h"
// ...

static void
app_setup(void)
{
  // ...
}

static void
app_loop(void)
{
  // ...
}

Improving modularity

To improve modularity and decreasing user errors, a component must not leak data, macro or function definitions (other than the expected ones).

Imagine comp-a defines COMP_A_DATA_ variable (which is a static, global variable for storing internal data inside the component), and we don't want to leak that to the outside of comp-a.h.

To accomplish this aim, we can use macros!

#ifndef COMP_A_H_
#define COMP_A_H_

static struct
{
  // ...
} COMP_A_DATA_;

// ...

#define COMP_A_DATA_ PRIVATE_

#endif

The macro processor will replace all COMP_A_DATA_ identifiers (after inclusion of comp-a.h) with PRIVATE_, and the user will get a compilation error for an undefined identifier PRIVATE_ if his/her inadvertently uses that variable. Easy!

This technique applies to functions, too. For macros you can use #undef.

Advantages

Compiler can apply more eager optimizations

Let's take a look at the magic of static keyword on functions in this example on Compiler Explorer. It's a non-real code, just to show you the effect of static in function declarations.

If you uncomment line 5 of the source code, you'll see that the compiler replaces function calls with simple jumps, which are in general cheaper. (Because the function foo is not that complicated, the usefulness is not obvious).

The reason is that optimizer knows, for sure, that this function has internal linkage, and there's no need for keeping the function semantics (according to the ABI).

But I encourage you to look at the disassembly of your codes (arm-none-eabi-objdump -d -j .text your-firmware.elf). And you'll be amazed by the improvement of the generated code.

Debugging on development machine

Because of platform abstractions, the code is very testable (you can mock the platform easily). So you can debug the bugs on your development machine, using your favorite debugger (instead of on the target microcontroller).

Static and dynamic analysis tools

Regardless of your target compiler, you can use the gcc- and clang-provided analysis. You can employ different sanitizers to catch errors dynamically.

You can compile your test code with gcc (version 10 or later) using -fanalyzer to take advantage of gcc's static analyzer (more info here and here).

LLVM has the Clang Static Analyzer project.

You can also compile the test codes with different sanitizers like AddressSanitizer (using option -fsanitize=address), UndefinedBehaviorSanitizer (using option -fsanitize=undefined), etc. You can check manual of your compiler for more info. There are a ton of sanitizers which can help you to detect errors before the code hits the target microcontroller thanks to "platform abstraction"!

Disadvantages

Having implementation inside files with .h suffix makes some people grumpy!
Use of macros (some people just don't like macros!)
Requires the programmer to be more disciplined in writing code
Because of more eager optimizations of the compiler, debugging can become harder.

Conclusion

Summary of what we did:

Hardware abstraction using macros (instead of function pointers)
Put all codes into a single compilation unit using C pre-processor
Applied simple macro tricks to improve maintainability

The advantages of this approach outweight the disadvantages by, at least, an order of magnitude! So this is a good trade-off for me.

Acknowledgements

I want to thank Ali-Reza Chegini and Hamid Rostami for reading the draft of this post and giving useful feedbacks.

Mohammad-Reza Nabipoor

blog