In Depth Analysis of an ARM Cortex-M4 Program

I've been working with ARM Cortex-M4 microprocessors for a little over a year and a half now. Despite reading popular embedded books and blogs, I have yet to come across an end-to-end overview of the creation and execution of a program.

STM32F4 Chip

For my own education and reference I've created such a document myself and am posting it here as I hope it may be helpful to others as well.

Development Environment

I'm using an STM32F411-Discovery development board. This board comes with an ARM Cortex-M4F processor and various peripherals. It's packaged with 512K of flash memory and 128K of RAM. Currently (Winter 2018), the board can be purchased for around $15 USD plus shipping.

I'm also using the GNU Arm Embedded Toolchain for building and debugging. As I find using an IDE is helfpul at times for debugging, I sometimes also use System Workbench for STM32. System Workbench is basically Eclipse with the same GNU Arm Embedded Toolchain mentioned above.

The Program

Our program will simply repeatedly flash a couple of the LEDs on the development board as shown in the animated gif image below. The code for this program can be found in my GitHub repo here.

STM32F411 LEDs Blinking

The Source Files

The program is comprised of three source files:
  • startup.s: Contains necessary startup code. It's written in assembly and is compiled into an object file using the GCC ARM assembler.
  • main.c: The actual program. It's compiled into an object file using the GCC ARM C compiler.
  • LinkerScript.ld: Necessary for linking the object files into a binary.

Building Source

Linking results in an ELF (executable and linkable format) file. Using the arm-none-eabi-objcopy tool included in the GNU ARM Embedded Toolchain, the ELF file is converted to a binary file (out.bin). The binary file can then be "flashed" to the STM32F411 development board using a tool like the STM32 ST-Link Utility. The bin file is examined farther below, but first let's discuss each of the source files.

The main.c Source File

The C program (main.c) is fairly simple. Using an interrupt, it periodically blinks LEDs on the STM32F411 development board. I won't explain the code in detail here as the source file is well commented, explaining what each line does.

You'll notice there are some poor programming practices in this code - I have some unnecessary variables - some global. These exist purely to show how global, scoped, initialized and uninitialized variables are dealt with during the compilation, linking and execution of the program.

The two main parts of the C program are an initialization function (named Init) and an interrupt handler. The initialization function sets up the GPIO pins the LEDs are connected to and also initializes the SysTick interrupt. The SysTick interrupt handler toggles the LED lights.

The interrupt is configured to occur periodically - about once every second. This results in the LED lights blinking on and off every couple of seconds.

The startup.s Source File

The startup file (startup.s) contains the following:
  • The reset handler function. This is called at startup (or reset) of the dev board. (Lines 38-45)
  • Code to copy the data segment values into RAM. (Lines 47-60)
  • Code to zero-fill the .bss segment. (Lines 62-71)
  • The definition of the interrupt vector table. (Lines 78-196)
  • Calls the "main" function after initialization is done. (Lines 73-74)
The Linker Script

The Linker Script (LinkerScript.ld) contains the following:
  • Highest stack address to be used during execution. (Lines 55-56)
  • Size of RAM and ROM. (Lines 58-63)
  • Locations of the interrupt vector table, text section, data section and bss section. (Lines 65-133)
The Bin File

This resulting bin file is what gets flashed to the ROM of the STM32F411 microcontroller. Below is a hex view of the bin file with specific sections identified by color: Hex View of Bin File I've learned from Jonathon Valvano's book "Introduction to ARM Cortex-M Microcontrollers" (5th Ed, section 3.1.2) that upon start-up, the 32 bit value at location zero is loaded into the stack pointer register and the 32 bit value at location 4 is loaded into the program counter register. These values are shown in the hex view of the bin file above.

These values make sense as the highest address of the stack is specified as 0x2002000 in the linker script and the stack grows downward. Later on, when we run the program, we'll see how the program counter address (0x0800027D) corresponds to the location of the Reset_Handler function defined in startup.s.

The next section of the bin file is the interrupt vector table. This simply holds memory addresses of functions to be called when an interrupt or exception occurs. Since the only interrupt we've assigned for use is the SysTick Interrupt, that is the only one that is defined. You'll see it has a memory address of 0x08000189. This is the address of the SysTick_Handler() function defined in main.c.

The program code follows the interrupt vector table. I don't think it's possible to recognize much of anything in this section just by examining the binary data as seen in the hex viewer. However, we'll look at the code when we run the program later.

The remaining area in the bin file is the data section. It spans from 0x2d4 to 0x2fc. It's easy to tell by looking at the data contained in these memory addresses that this is the initialization data for the initializedArray variable defined on line 50 of main.c.

Adding Up the Sections

When building the program, after compilation and linking occurs, size information is displayed for the different sections:

    arm-none-eabi-size "out.elf"
    text    data     bss     dec     hex filename
     724      40      48     812     32c out.elf 


Using the above hex view of the bin file we can calculate the sizes of each section and verify the size info given at the end of compilation. I believe the given size info considers the text section to be a combination of the initial SP and PC (8 bytes), the vector table (0x188 - 0x8 = 384 bytes) and the code (0x2d4 - 0x188 = 332 bytes). This totals to 724 bytes.

Then the data section is 40 bytes (0x2fd - 0x2d4). This brings the total to 764 bytes (724 bytes + 40 bytes). If we look at the bin file on the file system, we see its size is 764 bytes.

    01/23/2018  07:12 PM               764 out.bin
                   1 File(s)            764 bytes
                   0 Dir(s)  19,913,203,712 bytes free


You might be wondering where the 48 byte bss section is. I believe it doesn't exist in the bin file since this section will simply be initialized to all zeroes at startup. We'll see this happen when running the code below.

The Map File

When building the project, we've specified in the makefile that a map file ( should be generated during the linking step. The map file can be viewed with a text editor and is helpful to understand and confirm some of the conclusions we've come to when analyzing the bin file.

For example, the starting location of the code section can be seen in the map file around line 208. This line should show the ".text" (code) section starting address (0x08000188). Scrolling down farther you should see the ending address for this section (0x080002d4).

We know from the linker script that the ROM location will start at 0x08000000. The math is simple. Subtracting 0x08000000 from these values shows the code section extending from 0x188 to 0x2d4 of the bin file. This corresponds exactly to our hex view of the bin file above.

Looking through the map file you'll find various other possibly helpful information concerning the binary.

Getting the Bin File on the MCU

All of the popular IDE/build environments that support STM32 development should have a way to flash the bin file to the dev board. Usually this is just part of the initialization when running/debugging the software after building it.

If you build from the command line, you can use the STM32 ST-Link Utility. It's quite easy to use. Below is a screenshot of the ST-Link Utility just after the binary was flashed to my STM32F411 development board.

ST-Link Utility Screenshot

Examining the Code: Startup

If you're building from the command line, I have some easy to follow steps here explaining how to setup your environment to step through the software using GDB. I frequently use GDB, but also sometimes use Eclipse (which just provides a GUI over GDB), since in certain cases an IDE with multiple windows can be helpful.

The program starts with the following register values:

    (gdb) i r
    r0             0x0      0
    r1             0x0      0
    r2             0x0      0
    r3             0x0      0
    r4             0x0      0
    r5             0x0      0
    r6             0x0      0
    r7             0x0      0
    r8             0x0      0
    r9             0x0      0
    r10            0x0      0
    r11            0x0      0
    r12            0x0      0
    sp             0x20020000       0x20020000
    lr             0xffffffff       -1
    pc             0x800027c        0x800027c
    xPSR           0x1000000        16777216 


Note the stack pointer (sp) is set to 0x2002000. Remember, when we examined the bin files, we saw where both this and the program counter (pc) would come from at startup.

Note that program counter contains the address of the Reset_Handler code: 0x800027c. Below, we have the first three lines of disassembly of the Reset_Handler function. This should look familiar from the startup code (startup.s).

    (gdb) disass 0x800027c,0x8000282
    Dump of assembler code from 0x800027c to 0x8000282:
    => 0x0800027c <Reset_Handler+0>        movs    r1, #0
       0x0800027e <Reset_Handler+2>        b.n     0x8000288 <Reset_Handler+12>
       0x08000280 <Reset_Handler+4>        ldr     r3, [pc, #36]   ; (0x80002a8 <LoopFillZerobss+12>
    End of assembler dump.

What may be surprising about this, is the address is set to 0x800027c and not 0x800027d. Recall from the analysis of the bin file above, the PC was to be initialized to 0x800027d. So what happened?

I think this is related to what Miro Samek explains in his "Embedded Programming Lesson 8: Functions and the Stack" tutorial (see 7:00 to 11:30 in the tutorial video). Apparently, the program counter must be even. The least significant bit (LSB) was historically used to specify differing instructions sets - ARM vs Thumb.

This is also discussed in Jospeh Yiu's "The Definitive Guide To ARM Cortex-M3 and Cortex-M4 Processors" (3rd Ed, section 4.4.2). Here's a quote:

"Since the instructions must be aligned to half-word or word addresses, the Least Significant Bit (LSB) of the PC is zero. However, when using some of the branch/memory read instructions to update the PC, you need to set the LSB of the new PC value to 1 to indicate the Thumb state. Otherwise, a fault exception can be triggered, as it indicates an attempt to switch to use ARM instructions (i.e., 32-bit ARM instructions as in ARM7TDMI), which is not supported".

Getting back to the code, placing the disassembly side-by-side to the code in startup.s we see it matches.

Startup Disassembly

As we step through the code, we see LoopCopyDataInit copies 40 bytes starting at 0x80002d4 to 0x20000000. This is the initialization data for the initializedArray variable defined on line 50 of main.c as shown below when viewed in Eclipse:

Data Section Initialization

The LoopFillZerobss function is called next. This sets the bss section to zero:

bss Initialization

The bss section is 48 bytes long. It ranges from 0x20000028 to 0x20000058. It contains the global variables listed in main.c on lines 47-to-49. This size makes sense since the "lightsOn" and "interruptCount" variables each take 4 bytes (unsigned ints) and the uninitializedArray takes 40 bytes (10 elements, each 4 bytes a piece). Also, looking at the map file we see lines 271 to 284 show the same exact addresses (0x20000028 to 0x20000058) for the bss section.

Examining the Code: Main

As shown on line 74 of the startup.s, once the LoopFillZerobss function is complete, main is called. Stepping through the code in main is not very different than doing so on other platforms. The only surprise I came across was the Init function call in main (line 94 of main.c).

Just before Init is called the stack pointer is 0x2001FFF0. The stack memory is as shown below. Note that I set the yet-to-be-used memory of the stack to 0xDEADBEEF to easily tell if it gets updated.

Before Init is Called

When the function is called, we see the stack pointer is updated to 0x2001FFE0. We see the old stack pointer of 0x2001FFF0 was pushed onto the stack at 0x2001FFEC and the arguments to the Init function were also pushed onto the stack:

After Init is Called

I'm not sure why the stack memory address 0x2001FFE8 is being skipped. At first I thought it might be for memory alignment or possibly for a return value, but testing both of these theories have shown this not to be the case.

Examining the Code: The SysTick Interrupt

If you've examined the code in main.c you'll notice the LEDs blink on and off due to the periodic SysTick interrupt. I've learned from Jonathon Valvano's "Introduction to ARM Cortex-M Microcontrollers" (5th Ed, section 9.2) that when an interrupt occurs the following actions happen as part of the context switch:
  • Registers R0, R1, R2, R3, R12, LR, PC and PSR are pushed onto the stack.
  • The LR register is loaded with 0xFFFFFFF9 to signify that an interrupt service routine is being run.
  • The IPSR register (see the xPSR register) will contain the number of the interrupt being processed.
  • The PC is loaded with the address of the interrupt service routine.

By examining the stack and registers before the interrupt occurs, and after the interrupt occurs (by setting a breakpoint on line 55 of main.c, the first line of the SysTick_Handler), we see each of the four actions mentioned above occur:

Before: Before SysTick Interrupt

After: After SysTick Interrupt

In addition to the registers being pushed onto the stack, and LR and IPSR updated, we see the program counter is now set to 0x0800018c. This makes sense when viewing the code in GDB:

    (gdb) disass /m 0x08000188
    Dump of assembler code for function SysTick_Handler:
    54      {
       0x08000188 <+0>:     push    {r7}
       0x0800018a <+2>:     add     r7, sp, #0
    55              ACCESS(GPIOD_ODR) ^= lightsOn;
       0x0800018c <+4>:     ldr     r1, [pc, #48]   ; (0x80001c0 <SysTick_Handler+56>)
       0x0800018e <+6>:     ldr     r3, [pc, #48]   ; (0x80001c0 <SysTick_Handler+56>)
       0x08000190 <+8>:     ldr     r2, [r3, #0]
       0x08000192 <+10>:    ldr     r3, [pc, #48]   ; (0x80001c4 <SysTick_Handler+60>)
       0x08000194 <+12>:    ldr     r3, [r3, #0]
       0x08000196 <+14>:    eors    r3, r2
       0x08000198 <+16>:    str     r3, [r1, #0]
    57              ++interruptCount;
       0x0800019a <+18>:    ldr     r3, [pc, #44]   ; (0x80001c8 <SysTick_Handler+64>)
       0x0800019c <+20>:    ldr     r3, [r3, #0]
       0x0800019e <+22>:    adds    r3, #1
       0x080001a0 <+24>:    ldr     r2, [pc, #36]   ; (0x80001c8 <SysTick_Handler+64>)
       0x080001a2 <+26>:    str     r3, [r2, #0]
    59              // Simply using the arrays so they don't get optimized out
    60              uninitializedArray[0] = interruptCount;
       0x080001a4 <+28>:    ldr     r3, [pc, #32]   ; (0x80001c8 <SysTick_Handler+64>)
       0x080001a6 <+30>:    ldr     r3, [r3, #0]
       0x080001a8 <+32>:    ldr     r2, [pc, #32]   ; (0x80001cc <SysTick_Handler+68>)
       0x080001aa <+34>:    str     r3, [r2, #0]
    61              initializedArray[0] = interruptCount;
       0x080001ac <+36>:    ldr     r3, [pc, #24]   ; (0x80001c8 <SysTick_Handler+64>)
       0x080001ae <+38>:    ldr     r3, [r3, #0]
       0x080001b0 <+40>:    ldr     r2, [pc, #28]   ; (0x80001d0 <SysTick_Handler+72>)
       0x080001b2 <+42>:    str     r3, [r2, #0]
    62      }


When the interrupt service routine completes, the information on the stack is used to set the registers back to what they were before the interrupt was called. Since the program counter register was one of these registers, program execution knows exactly where to return in order to continue program execution.

Wrap Up

And that's a wrap on my end-to-end analysis of how an ARM Cortex-M4 binary is created using the GNU Arm Embedded Toolchain, what the bin file contains and what happens when it runs. Happy to hear any corrections or additional info I might've missed.

Author: Terence Darwen
Date: January 24th, 2018