ARMv7-M Run Time Stack Checking


Nuttx supports facilities to verify the dynamically allocated stacks and fixed stacks used by the tasks and interrupt context running under Nuttx. There are 2 types of stack checking that can be used together or separately.

  1. The Stack Monitor
  2. Per function Call (ARMV7 Only)
The Stack Monitor

The use of the Stack Monitor application requires that CONFIG_STACK_COLORATION be enabled. This compile time option enables the writing of a know pattern STACK_COLOR to the stack memory at creation time. In the case of the idle task and interrupt, this is done in the code that runs just after reset at startup. This is known as stack coloring.

Once the pattern has been established, the functions up_check_stack and friends are used to perform the stack checking by finding the lowest word of the pattern in the allocated stack.

The Stack Monitor is enabled with CONFIG_SYSTEM_STACKMONITOR which will enable a daemon that will periodically run and check the stack penetration of the tasks running on the system.

The stack monitor is good to help size and check the usage of tasks. However it is not really useful to detect, certain kinds, nor the cause of a stack overrun.

The reason for this is because a corrupted stack may not be eaten away at. It may have overruns where the stack pointer is set way below the stack bottom in a function call as it allocates local variables on the stack. The code in the function call can then corrupt the memory below the stack bottom, restores the stack pointer and returns to the caller without actually overwriting the coloring at the base of the stack.

This brings us the the next method of stack checking.

Per function Call

This method of stack checking leverages the profiler hook mechanism supported by the compiler. Once enabled using CONFIG_ARMV7M_STACKCHECK, one register is set aside (R10 is the default) and the value of the base of stack is saved there (rBS). Then every function call will have a preamble and a postamble code added to it. The preamble (__cyg_profile_func_enter) checks the current stack pointer, minus a margin of 64 bytes (with an additional 136 bytes for the FP registers) against the value in the reserved register rBS. If the computed value lies below the value in rBS a hard fault is generated. The postamble code (__cyg_profile_func_exit) just returns to the caller.

The rationale for subtracting the margin can be viewed two ways. If the configuration is not using a separate ISR stack, then the space reserved will accommodate the context save of the CPU and optionally the FPU registers to service an interrupt on the users stack. If the configuration is using a separate ISR stack, some the 64 bytes will accommodate the transition to the interrupt stack and the remaining 60-200 bytes are just margin. Either way stacks should always be allocated with at least 200 bytes of margin.

Because of the reserved register rBS contains the current context's stack base, and rBS is not updated on the entry to an ISR, it is not possible to check the stack penetration for an interrupt with Per function Call stack checking.

One thing to consider is the impact on code size and speed this method of stack checking will have. Each function will have two additional call and return instructions added to it. In the execution path of each function, there will be an added set of instructions to perform the preamble and postamble functionality. In a call tree that is nested several layers deep, this can add up. In one particular use, we saw an increase of 30% to 35% additional CPU utilization required to support per function call stack checking.

Is this just a debugging tool? One could imagine that in a mission critical application, this might be part of a release build if the code size and speed impact can be tolerated.

Details for Support of Per Call Stack Checking

Currently only ARMV7 derivatives support Per Call Stack Checking. Support requires the following components:

The start function must establish the value in rBS (R10 by default see below). Yet to do this the start function must NOT have the preamble and postamble code added to it. This is accomplished with the use the following gcc attribute:

  /* we need to get r10 set before we can allow instrumentation calls */
  void __start(void) __attribute__ ((no_instrument_function));

  void __start(void)
    const uint32_t *src;
    uint32_t *dest;
    /* Set the stack limit before we attempt to call any functions */
    __asm__ volatile ("sub r10, sp, %0" : : "r" (CONFIG_IDLETHREAD_STACKSIZE -64) : );

The minus 64 is setting the limit 64 bytes above the bottom of the stack. Note: This may be adding another 64 bytes of margin

For the creation of a task's context the following code is needed to set up rBS

  void up_initial_state(struct tcb_s *tcb)
    struct xcptcontext *xcp = &tcb->xcp;
    /* Initialize the initial exception register context structure */
    memset(xcp, 0, sizeof(struct xcptcontext));
    /* Save the initial stack pointer */
    xcp->regs[REG_SP]      = (uint32_t)tcb->adj_stack_ptr;
    /* Set the stack limit value */
    xcp->regs[REG_R10]     = (uint32_t)tcb->stack_alloc_ptr + 64;

And finally up_stackcheck.c needs to be included in the build and the compiler flags set to reserve R10 and enable the instrumentation.

This is done for a given architecture in nuttx/arch/arm/src/<arch>/Make.defs:

  CMN_CSRCS += up_stackcheck.c

The compiler flags are added in the nuttx/arch/arm/src/armv7-m/Toolchain.defs TBD

  # enable precise stack overflow tracking
  INSTRUMENTATIONDEFINES   = -finstrument-functions -ffixed-r10
Other Considerations

If using the export build feature of Nuttx: For the runtime stack checking both the Application and Nuttx need to be built with the CONFIG_ARMV7M_STACKCHECK option set the same state, enabled or disabled. Any mismatch will created either compile time or runtime issues.