In Depth Analysis of an ARM Cortex-M4 Program
I’ve been working with ARM Cortex-M4 microprocessors for a little over a year and a half now. Despite reading popular embedded books and blogs, I have yet to come across an end-to-end overview of the creation and execution of a program.
For my own education and reference I’ve created such a document myself and am posting it here as I hope it may be helpful to others as well.
I’m using an STM32F411-Discovery development board. This board comes with an ARM Cortex-M4F processor and various peripherals. It's packaged with 512K of flash memory and 128K of RAM. Currently (Winter 2018), the board can be purchased for around $15 USD plus shipping.
I’m also using the GNU Arm Embedded Toolchain for building and debugging. As I find using an IDE is helfpul at times for debugging, I sometimes also use System Workbench for STM32. System Workbench is basically Eclipse with the same GNU Arm Embedded Toolchain mentioned above.
Our program will simply repeatedly flash a couple of the LEDs on the development board as shown in the animated gif image below. The code for this program can be found in my GitHub repo here.
- startup.s: Contains necessary startup code. It's written in assembly and is compiled into an object file using the GCC ARM assembler.
- main.c: The actual program. It's compiled into an object file using the GCC ARM C compiler.
- LinkerScript.ld: Necessary for linking the object files into a binary.
Linking results in an ELF (executable and linkable format) file. Using the arm-none-eabi-objcopy tool included in the GNU ARM Embedded Toolchain, the ELF file is converted to a binary file (out.bin). The binary file can then be "flashed" to the STM32F411 development board using a tool like the STM32 ST-Link Utility. The bin file is examined farther below, but first let’s discuss each of the source files.
The C program (main.c) is fairly simple. Using an interrupt, it periodically blinks LEDs on the STM32F411 development board. I won't explain the code in detail here as the source file is well commented, explaining what each line does.
You'll notice there are some poor programming practices in this code - I have some unnecessary variables - some global. These exist purely to show how global, scoped, initialized and uninitialized variables are dealt with during the compilation, linking and execution of the program.
The two main parts of the C program are an initialization function (named Init) and an interrupt handler. The initialization function sets up the GPIO pins the LEDs are connected to and also initializes the SysTick interrupt. The SysTick interrupt handler toggles the LED lights.
The interrupt is configured to occur periodically - about once every second. This results in the LED lights blinking on and off every couple of seconds.
- The reset handler function. This is called at startup (or reset) of the dev board. (Lines 38-45)
- Code to copy the data segment values into RAM. (Lines 47-60)
- Code to zero-fill the .bss segment. (Lines 62-71)
- The definition of the interrupt vector table. (Lines 78-196)
- Calls the "main" function after initialization is done. (Lines 73-74)
- Highest stack address to be used during execution. (Lines 55-56)
- Size of RAM and ROM. (Lines 58-63)
- Locations of the interrupt vector table, text section, data section and bss section. (Lines 65-133)
This resulting bin file is what gets flashed to the ROM of the STM32F411 microcontroller. Below is a hex view of the bin file with specific sections identified by color:
I’ve learned from Jonathon Valvano’s book “Introduction to ARM Cortex-M Microcontrollers” (5th Ed, section 3.1.2) that upon start-up, the 32 bit value at location zero is loaded into the stack pointer register and the 32 bit value at location 4 is loaded into the program counter register. These values are shown in the hex view of the bin file above.
These values make sense as the highest address of the stack is specified as 0x2002000 in the linker script and the stack grows downward. Later on, when we run the program, we’ll see how the program counter address (0x0800027D) corresponds to the location of the Reset_Handler function defined in startup.s.
The next section of the bin file is the interrupt vector table. This simply holds memory addresses of functions to be called when an interrupt or exception occurs. Since the only interrupt we’ve assigned for use is the SysTick Interrupt, that is the only one that is defined. You’ll see it has a memory address of 0x08000189. This is the address of the SysTick_Handler() function defined in main.c.
The program code follows the interrupt vector table. I don’t think it’s possible to recognize much of anything in this section just by examining the binary data as seen in the hex viewer. However, we’ll look at the code when we run the program later.
The remaining area is the data section. It spans from 0x2d4 to 0x2fc of the bin file. It’s easy to tell by looking at the data contained in these memory addresses that this is the initialization data for the initializedArray variable defined on line 50 of main.c.
When building the program, after compilation and linking occurs, size information is displayed for the different sections:
arm-none-eabi-size "out.elf"
text data bss dec hex filename
724 40 48 812 32c out.elf
Using the above hex view of the bin file we can calculate the sizes of each section and verify the size info given at the end of compilation. I believe the given size info considers the text section to be a combination of the initial SP and PC (8 bytes), the vector table (0x188 – 0x8 = 384 bytes) and the code (0x2d4 – 0x188 = 332 bytes). This totals to 724 bytes.
Then the data section is 40 bytes (0x2fd – 0x2d4). This brings the total to 764 bytes (724 bytes + 40 bytes). If we look at the bin file on the file system, we see its size is 764 bytes.
01/23/2018 07:12 PM 764 out.bin
1 File(s) 764 bytes
0 Dir(s) 19,913,203,712 bytes free
You might be wondering where the 48 byte bss section is. I believe it doesn't exist in the bin file since this section will simply be initialized to all zeroes at startup. We'll see this happen when running the code below.
When building the project, we've specified in the makefile that a map file (out.map) should be generated during the linking step. The map file can be viewed with a text editor and is helpful to understand and confirm some of the conclusions we've come to when analyzing the bin file.
For example, the starting location of the code section can be seen in the map file around line 208. This line should show the “.text” (code) section starting address (0x08000188). Scrolling down farther you should see the ending address for this section (0x080002d4).
We know from the linker script that the ROM location will start at 0x08000000. The math is simple. Subtracting 0x08000000 from these values shows the code section extending from 0x188 to 0x2d4 of the bin file. This corresponds exactly to our hex view of the bin file above.
Looking through the map file you'll find various other possibly helpful information concerning the binary.
All of the popular IDE/build environments that support STM32 development should have a way to flash the bin file to the dev board. Usually this is just part of the initialization when running/debugging the software after building it.
If you build from the command line, you can use the STM32 ST-Link Utility. It’s quite easy to use. Below is a screenshot of the ST-Link Utility just after the binary was flashed to my STM32F411 development board.
If you're building from the command line, I have some easy to follow steps here explaining how to setup your environment to step through the software using GDB. I frequently use GDB, but also sometimes use Eclipse (which just provides a GUI over GDB), since in certain cases an IDE with multiple windows can be helpful.
The program starts with the following register values:
(gdb) i r
r0 0x0 0
r1 0x0 0
r2 0x0 0
r3 0x0 0
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
sp 0x20020000 0x20020000
lr 0xffffffff -1
pc 0x800027c 0x800027c
xPSR 0x1000000 16777216
Note the stack pointer (sp) is set to 0x2002000. Remember, when we examined the bin files, we saw where both this and the program counter (pc) would come from at startup.
Note that program counter contains the address of the Reset_Handler code: 0x800027c. Below, we have the first three lines of disassembly of the Reset_Handler function. This should look familiar from the startup code (startup.s).
(gdb) disass 0x800027c,0x8000282
Dump of assembler code from 0x800027c to 0x8000282:
=> 0x0800027c <Reset_Handler+0> movs r1, #0
0x0800027e <Reset_Handler+2> b.n 0x8000288 <Reset_Handler+12>
0x08000280 <Reset_Handler+4> ldr r3, [pc, #36] ; (0x80002a8 <LoopFillZerobss+12>
End of assembler dump.
What may be surprising about this, is the address is set to 0x800027c and not 0x800027d. Recall from the analysis of the bin file above, the PC was to be initialized to 0x800027d. So what happened?
I think this is related to what Miro Samek explains in his “Embedded Programming Lesson 8: Functions and the Stack” tutorial (see 7:00 to 11:30 in the tutorial video). Apparently, the program counter must be even. The least significant bit (LSB) was historically used to specify differing instructions sets – ARM vs Thumb.
This is also discussed in Jospeh Yiu’s “The Definitive Guide To ARM Cortex-M3 and Cortex-M4 Processors” (3rd Ed, section 4.4.2). Here's a quote:
“Since the instructions must be aligned to half-word or word addresses, the Least Significant Bit (LSB) of the PC is zero. However, when using some of the branch/memory read instructions to update the PC, you need to set the LSB of the new PC value to 1 to indicate the Thumb state. Otherwise, a fault exception can be triggered, as it indicates an attempt to switch to use ARM instructions (i.e., 32-bit ARM instructions as in ARM7TDMI), which is not supported”.
Getting back to the code, placing the disassembly side-by-side to the code in startup.s we see it matches.
As we step through the code, we see LoopCopyDataInit copies 40 bytes starting at 0x80002d4 to 0x20000000. This is the initialization data for the initializedArray variable defined on line 50 of main.c as shown below when viewed in Eclipse:
The LoopFillZerobss function is called next. This sets the bss section to zero:
The bss section is 48 bytes long – ranging from 0x20000028 to 0x20000058. It contains the global variables listed in main.c on lines 47-to-49. This size makes sense since the “lightsOn” and “interruptCount” variables each take 4 bytes (unsigned ints) and the uninitializedArray takes 40 bytes (10 elements, each 4 bytes a piece). Also, looking at the map file we see lines 271 to 284 show the same exact addresses (0x20000028 to 0x20000058) for the bss section.
As shown on line 74 of the startup.s, once the LoopFillZerobss function is complete, main is called. Stepping through the code in main is not very different than doing so on other platforms. The only surprise I came across was the Init function call in main (line 94 of main.c).
Just before Init is called the stack pointer is 0x2001FFF0. The stack memory is as shown below. Note that I set the yet-to-be-used memory of the stack to 0xDEADBEEF to easily tell if it gets updated.
When the function is called, we see the stack pointer is updated to 0x2001FFE0. We see the old stack pointer of 0x2001FFF0 was pushed onto the stack at 0x2001FFEC and the arguments to the Init function were also pushed onto the stack:
I’m not sure why the stack memory address 0x2001FFE8 is being skipped. At first I thought it might be for memory alignment or possibly for a return value, but testing both of these theories have shown this not to be the case.
- Registers R0, R1, R2, R3, R12, LR, PC and PSR are pushed onto the stack.
- The LR register is loaded with 0xFFFFFFF9 to signify that an interrupt service routine is being run.
- The IPSR register (see the xPSR register) will contain the number of the interrupt being processed.
- The PC is loaded with the address of the interrupt service routine.
By examining the stack and registers before the interrupt occurs, and after the interrupt occurs (by setting a breakpoint on line 55 of main.c, the first line of the SysTick_Handler), we see each of the four actions mentioned above occur:
Before:
After:
In addition to the registers being pushed onto the stack, and LR and IPSR updated, we see the program counter is now set to 0x0800018c. This makes sense when viewing the code in GDB:
(gdb) disass /m 0x08000188
Dump of assembler code for function SysTick_Handler:
54 {
0x08000188 <+0>: push {r7}
0x0800018a <+2>: add r7, sp, #0
55 ACCESS(GPIOD_ODR) ^= lightsOn;
0x0800018c <+4>: ldr r1, [pc, #48] ; (0x80001c0 <SysTick_Handler+56>)
0x0800018e <+6>: ldr r3, [pc, #48] ; (0x80001c0 <SysTick_Handler+56>)
0x08000190 <+8>: ldr r2, [r3, #0]
0x08000192 <+10>: ldr r3, [pc, #48] ; (0x80001c4 <SysTick_Handler+60>)
0x08000194 <+12>: ldr r3, [r3, #0]
0x08000196 <+14>: eors r3, r2
0x08000198 <+16>: str r3, [r1, #0]
56
57 ++interruptCount;
0x0800019a <+18>: ldr r3, [pc, #44] ; (0x80001c8 <SysTick_Handler+64>)
0x0800019c <+20>: ldr r3, [r3, #0]
0x0800019e <+22>: adds r3, #1
0x080001a0 <+24>: ldr r2, [pc, #36] ; (0x80001c8 <SysTick_Handler+64>)
0x080001a2 <+26>: str r3, [r2, #0]
58
59 // Simply using the arrays so they don't get optimized out
60 uninitializedArray[0] = interruptCount;
0x080001a4 <+28>: ldr r3, [pc, #32] ; (0x80001c8 <SysTick_Handler+64>)
0x080001a6 <+30>: ldr r3, [r3, #0]
0x080001a8 <+32>: ldr r2, [pc, #32] ; (0x80001cc <SysTick_Handler+68>)
0x080001aa <+34>: str r3, [r2, #0]
61 initializedArray[0] = interruptCount;
0x080001ac <+36>: ldr r3, [pc, #24] ; (0x80001c8 <SysTick_Handler+64>)
0x080001ae <+38>: ldr r3, [r3, #0]
0x080001b0 <+40>: ldr r2, [pc, #28] ; (0x80001d0 <SysTick_Handler+72>)
0x080001b2 <+42>: str r3, [r2, #0]
62 }
When the interrupt service routine completes, the information on the stack is used to set the registers back to what they were before the interrupt was called. Since the program counter register was one of these registers, program execution knows exactly where to return in order to continue program execution.
So, that’s my end-to-end analysis of how an ARM Cortex-M4 binary is created using the GNU Arm Embedded Toolchain, what the bin file contains and what happens when it runs. Happy to hear from anyone with corrections, questions or additional info. My contact info is here.
Author: Terence Darwen
Tags: Embedded, ARM-Cortex-M4, STM32, STM32F4