Smaller Vector Tables

One of the largest OS data structures is the vector table, g_irqvector[]. This is the table that holds the vector information when irq_attach() is called and used to dispatch interrupts by irq_dispatch(). Recent changes have made that table even larger, for 32-bit arm the size of that table is given by:

nbytes = number_of_interrupts * (2 * sizeof(void *))

We will focus on the STM32 for this discussion to keep things simple. However, this discussion applies to all architectures.

The number of (physical) interrupt vectors supported by the MCU hardwared given by the definition NR_IRQ which is provided in a header file in arch/arm/include/stm32. This is, by default, the value of number_of_interrupts in the above equation.

For a 32-bit ARM like the STM32 with, say, 100 interrupt vectors, this size would be 800 bytes of memory. That is not a lot for high-end MCUs with a lot of RAM memory, but could be a show stopper for MCUs with minimal RAM.

Two approaches for reducing the size of the vector tables are described below. Both depend on the fact that not all interrupts are used on a given MCU. Most of the time, the majority of entries in g_irqvector[] are zero because only a small number of interrupts are actually attached and enabled by the application. If you know that certain IRQ numbers are not going to be used, then it is possible to filter those out and reduce the size to the number of supported interrupts.

For example, if the actual number of interrupts used were 20, the the above requirement would go from 800 bytes to 160 bytes.

Software IRQ Remapping

[On March 3, 2017, support for this “Software IRQ Remapping” as included in the NuttX repository.]

One of the simplest way of reducing the size of g_irqvector[]] would be to remap the large set of physcial interrupt vectors into a much small set of interrupts that are actually used. For the sake of discussion, let's imagine two new configuration settings:

  • CONFIG_ARCH_MINIMAL_VECTORTABLE: Enables IRQ mapping
  • CONFIG_ARCH_NUSER_INTERRUPTS: The number of IRQs after mapping.

Then it could allocate the interrupt vector table to be size CONFIG_IRQ_NMAPPED_IRQ instead of the much bigger NR_IRQS:

#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS];
#else
struct irq_info_s g_irqvector[NR_IRQS];
#endif

The g_irqvector[] table is accessed in only three places:

irq_attach()

irq_attach() receives the physical vector number along with the information needed later to dispatch interrupts:

int irq_attach(int irq, xcpt_t isr, FAR void *arg);

Logic in irq_attach() would map the incoming physical vector number to a table index like:

#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
int ndx = g_irqmap[irq];
#else
int ndx = irq;
#endif

where up_mapirq[] is an array indexed by the physical interrupt vector number and contains the new, mapped interrupt vector table index. This array must be provided by platform-specific code.

irq_attach() would this use this index to set the g_irqvector[].

g_irqvector[ndx].handler = isr;
g_irqvector[ndx].arg     = arg;

irq_attach()

irq_dispatch() is called by MCU logic when an interrupt is received:

void irq_dispatch(int irq, FAR void *context);

Where, again irq is the physical interrupt vector number.

irq_dispatch() would do essentially the same thing as irq_attach(). First it would map the irq number to a table index:

#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
int ndx = g_irqmap[irq];
#else
int ndx = irq;
#endif

Then dispatch the interrupt handling to the attached interrupt handler. NOTE that the physical vector number is passed to the handler so it is completely unaware of the underlying shell game:

vector = g_irqvector[ndx].handler;
arg    = g_irqvector[ndx].arg;

vector(irq, context, arg);

irq_initialize()

irq_initialize(): simply set the g_irqvector[] table a known state on power-up. It would only have to distinquish the difference in sizes.

#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
#  define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS
#else
#  define TAB_SIZE NR_IRQS
#endif

for (i = 0; i < TAB_SIZE; i++)

g_mapirq[]

An implementation of up_mapirq() might be something like:

#include <nuttx/irq.h>

const irq_mapped_t g_irqmap[NR_IRQS] =
{
  ... IRQ to index mapping values ...
};

g_irqmap[] is a array of mapped irq table indices. It contains the mapped index value and is itself indexed by the physical interrupt vector number. It provides an irq_mapped_t value in the range of 0 to CONFIG_ARCH_NUSER_INTERRUPTS that is the new, mapped index into the vector table. Unsupported IRQs would simply map to an out of range value like IRQMAPPED_MAX. So, for example, if g_irqmap[37] == 24, then the hardware interrupt vector 37 will be mapped to the interrupt vector table at index 24. if g_irqmap[42] == IRQMAPPED_MAX, then hardware interrupt vector 42 is not used and if it occurs will result in an unexpected interrupt crash.

Hardware Vector Remapping

[This technical approach is discussed here but is discouraged because of technical “Complications” and “Dubious Performance Improvements” discussed at the end of this section.]

Most ARMv7-M architectures support two mechanism for handling interrupts:

  1. The so-called common vector handler logic enabled with CONFIG_ARMV7M_CMNVECTOR=y that can be found in arch/arm/src/armv7-m/, and
  2. MCU-specific interrupt handling logic. For the STM32, this logic can be found at arch/arm/src/stm32/gnu/stm32_vectors.S.

The common vector logic is slightly more efficient, the MCU-specific logic is slightly more flexible.

If we don't use the common vector logic enabled with CONFIG_ARMV7M_CMNVECTOR=y, but instead the more flexible MCU-specific implementation, then we can also use this to map the large set of hardware interrupt vector numbers to a smaller set of software interrupt numbers. This involves minimal changes to the OS and does not require any magic software lookup table. But is considerably more complex to implement.

This techical approach requires changes to three files:

  1. A new header file at arch/arm/include/stm32, say xyz_irq.h for the purposes of this discussion. This new header file is like the other IRQ definition header files in that directory except that it defines only the IRQ number of the interrupts after remapping. So, instead of having the 100 IRQ number definitions of the original IRQ header file based on the physical vector numbers, this header file would define only the small set of 20 mapped IRQ numbers in the range from 0 through 19. It would also set NR_IRQS to the value 20.
  2. A new header file at arch/arm/src/stm32/chip, say xyz_vector.h. It would be similar to the other vector definitions files in that directory: It will consist of a sequence of 100 VECTOR and UNUSED macros. It will define VECTOR entries for the 20 valid interrupts and 80 UNUSED entries for the unused interrupt vector numbers. More about this below.
  3. Modifaction of the stm32_vectors.S file. These changes are trivial and involve only the conditional inclusion of the new, special xyz_vectors.h header file.

Vector Definitions

In arch/arm/src/stm32/gnu/stm32_vector.S, notice that the xyz_vector.h file will be included twice. Before each inclusion, the macros VECTOR and UNUSED are defined.

The first time that xyz_vector.h included, it defines the hardware vector table. The hardware vector table consists of NR_IRQS 32-bit addresses in an array. This is accomplished by setting:

#undef VECTOR
#define VECTOR(l,i) .word l

#undef UNUSED
#define UNUSED(i)   .word stm32_reserved

Then including xyz_vector.h. So consider the following definitions in the original file:

...
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global interrupt */
VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global interrupt */
...

Suppose that we wanted to support only USART1 and that we wanted to have the IRQ number for USART1 to be 12. That would be accomplished in the xyz_vector.h header file like this:

...
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
UNUSED(0)                              /* Vector 16+38: USART2 global interrupt */
UNUSED(0)                              /* Vector 16+39: USART3 global interrupt */
...

Where the value of STM32_IRQ_USART1 was defined to be 12 in the arch/arm/include/stm32/xyz_irq.h header file. When xyz_vector.h is included by stm32_vectors.S with the above definitions for VECTOR and UNUSED, the following would result:

...
.word stm32_usart1
.word stm32_reserved
.word stm32_reserved
...

These are the settings for vector 53, 54, and 55, respectively. The entire vector table would be populated in this way. stm32_reserved, if called would result in an “unexpected ISR” crash. stm32_usart1, if called will process the USART1 interrupt normally as we will see below.

Interrupt Handler Definitions

in the vector table, all of the valid vectors are set to the address of a handler function. All unused vectors are force to vector to stm32_reserved. Currently, only vectors that are not supported by the hardware are marked UNUSED, but you can mark any vector UNUSED in order to eliminate it.

The second time that xyz_vector.h is included by stm32_vector.S, the handler functions are generated. Each of the valid vectors point to the matching handler function. In this case, you do NOT have to provide handlers for the UNUSED vectors, only for the used VECTOR vectors. All of the unused vectors will go to the common stm32_reserved handler. The remaining set of handlers is very sparse.

These are the valus of UNUSED and VECTOR macros on the second time the xzy_vector.h is included by stm32_vectors.S:

.macro HANDLER, label, irqno
    .thumb_func
\label:
    mov r0, #\irqno
    b       exception_common
.endm

#undef VECTOR
#define VECTOR(l,i) HANDLER l, i

#undef UNUSED
#define UNUSED(i)

In the above USART1 example, a single handler would be generated that will provide the IRQ number 12. Remember that 12 is the expansion of the macro STM32_IRQ_USART1 that is provided in the arch/arm/include/stm32/xyz_irq.h header file:

    .thumb_func
stm32_usart1:
    mov r0, #12
    b       exception_common

Now, when vector 16+37 occurs it is mapped to IRQ 12 with no significant software overhead.

A Complication

A complication in the above logic has been noted by David Sidrane: When we access the NVIC in stm32_irq.c in order to enable and disable interrupts, the logic requires the physical vector number in order to select the NVIC register and the bit(s) the modify in the NVIC register.

This could be handled with another small IRQ lookup table (20 uint8_t entries in our example situation above). But then this approach is not so much better than the Software Vector Mapping described about which does not suffer from this problem. Certainly enabling/disabling interrupts in a much lower rate operation and at least does not put the lookup in the critical interrupt path.

Another option suggested by David Sidrane is equally ugly:

  • Don't change the arch/arm/include/stm32 IRQ definition file.
  • Instead, encode the IRQ number so that it has both the index and physical vector number:
...
VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1)
UNUSED(0)
UNUSED(0)
...

The STM32_INDEX_USART1 would have the value 12 and STM32_IRQ_USART1 would be as before (53). This encoded value would be received by irq_dispatch() and it would decode both the index and the physical vector number. It would use the index to look up in the g_irqvector[] table but would pass the physical vector number to the interrupt handler as the IRQ number.

A lookup would still be required in irq_attach() in order to convert the physical vector number back to an index (100 uint8_t entries in our example). So some lookup is unavoidable.

Based upon these analysis, my recommendation is that we do not consider the second option any further. The first option is cleaner, more portable, and generally preferable.is well worth that.

Dubious Performance Improvements

The intent of this second option was to provide a higher performance mapping of physical interrupt vectors to IRQ numbers compared to the pure software mapping of option 1. However, in order to implement this approach, we had to use the less efficient, non-common vector handling logic. That logic is not terribly less efficient, the cost is probably only a 16 bit load immediate instruction and branch to another location in FLASH (which will cause the CPU pipeline to be flushed).

The variant of option 2 where both the physical vector number and vector table index are encoded would require even more processing in irq_dispatch() in order to decode the physical vector number and vector table index. Possible just AND and SHIFT instructions.

However, the minimal cost of the first pure software mapping approach was possibly as small as a single indexed byte fetch from FLASH in irq_attach(). Indexing is, of course, essentially free in the ARM ISA, the primary cost would be the FLASH memory access. So my first assessment is that the performance of both approaches is the essentially the same. If anything, the first approach is possibly the more performant if implemented efficiently.

Both options would require some minor range checking in irq_attach() as well.

Because of this and because of the simplicity of the first option, I see no reason to support or consider this second option any further.

Complexity and Generalizability

Option 2 is overly complex; it depends on a deep understanding on how the MCU interrupt logic works and on a high level of Thumb assembly language skills.

Another problem with option 2 is that really only applies to the Cortex-M family of processors and perhaps others that support interrupt vectored interrupts in a similar fashion. It is not a general solution that can be used with any CPU architectures.

And even worse, the MCU-specific interrupt handling logic that this support depends upon is is very limited. As soon as the common interrupt handler logic was added, I stopped implementing the MCU specific logic in all newer ARMv7-M ports. So that MCU specific interrupt handler logic is only present for EFM32, Kinetis, LPC17, SAM3/4, STM32, Tiva, and nothing else. Very limited!

These are further reasons why option 2 is no recommended and will not be supported explicitly.