Debug Output (SYSLOG) Issues

Lately, I have starting thinking a little more about the SYSLOG functions in general (as well as all of those debug macros). I think I am coming to the conclusion that there needs to be some things done.

The original design of syslog() was just some hooks for simple serial debug output. These simple debug hooks were standardized and renamed syslog() so that they provide a portable debug interface. As NuttX has increased in complexity and sophistication over the years, the implementation of syslog() under the hood is still that mindlessly simple serial debug logic from the original NuttX release. Perhaps the time as come to assess what is wrong with the solution and to implement some improvements in the syslog() design?

Here are some of the issues of the current implementation that bother me:

Use of File Descriptors

In the default case, the debug output goes out on file descriptor 1, stdout. But if you think about that, it is a little crazy.

Each task can have I/O redirected in various ways. In most simple systems, a serial console is used for stdout in all tasks and the debug output goes to the serial console so everything is seamless. But if you are using a mixture of serial consoles, USB serial consoles, telnet sessions, etc. then who know were where the output is going to go in the most general case. Bits and pieces could go to different devices.

If you are redirecting stdout to a file, for example, then the debug information could go into your file, corrupting the output that you wanted. As another perverse example, try enabling network debug output using a Telnet session. That is an interesting exercise for anyone who like to see infinite loops: Telnet I/O generates debug output, the debug output goes to the Telnet network connection, which generates more network debug output, and on and one.

Interrupt Handlers

Output from interrupt handlers, of course, cannot use the console device at all. File descriptor I/O is not permitted from interrupt handlers. Attempts to do “normal” SYSLOG output from interrupt handlers will just result in the output going to the bit bucket.

Low-Level Serial Driver

The usual workaround to get debug output from interrupt handlers is to use the low-level serial I/O from interrupt handlers. But there are issues with this as well:

  • First, the console may not be the same serial device. It might be something else althogether. That means that non-interrupt SYSLOG output goes output one way and interrupt level SYSLOG output always goes out the serial port. Potentially very strange behavior could result.
  • Second, the low-level serial output does a busy wait poll! That interferes badly with the behavior of the interrupt handler – it has to wait within the interrupt handler while the serial output is performed. That wait would will be many milliseconds with interrupts disabled!
  • The low level serial output also interacts the serial driver itself making other use of the console impossible. Even, in some cases on some platforms, locking up the serial driver.

Asynchronous Output

Another syslog related issue is the asynchronous behavior with debug output from interrupt handlers. The normal debug output goes to the serial driver and is buffered for sending there. The size of the serial RX buffer is configurable. At any given point in time, the current output is behind realtme depending on that buffer size and the serial BAUD.

The interrupt debug output does not use the serial driver but immdiately commandeers the serial port and outputs dara in real time. The result is it appears to happen earlier in the output.

This is also why you lose the last debug output on a crash… the last debug data is stranded in the serial drivers RX buffer.

Interleaved Output

Because the output is done character at a time, the debug output from different tasks may get multiplexed and unreadable in the most critical of cases. The interleaved output can become totally useless, usually in the most complex situations where you need the debug output the most. Many times my plans to debug a problem with SYSLOG output has been thwarted because the output is just uninterpretable in a highly multitasking context.

Buffer Overrun

The root cause of this problem is the RX buffering in the serial driver: Character output is done one character at a time. That is not usually and issue. But when the system is very busy and he serial RX buffer becomes full, then each character output causes the caller to suspending, waiting for space in the RX buffer. It suspends and is moved to the last the of the FIFO for that priority.

When this happens, many tasks may be suspended waiting space to put the next byte in the serial RX buffer. If the tasks are the same priority, then the output will be interleaved as described since each task gets essentially round-robin access to the serial driver. If one of the tasks is lower priority, then its its output may be deferred for some time until all of the higher priority tasks complete their output. Again, leaving a big time skew in the output data.

When there is is a big time skew in the debug output – whether from asynchronous output from interrupt handlers or from blocked, lower priority tasks – This can lead to misinterpretation of the debug output since our instinct is to treat the output as if it were in sequential order in time.

NOTE that a simple working around for a partical debug scenario is simply to increase the size of the RX buffer in the serial driver. Assuming that the RX buffer overrun is only the result of short-term, bursty behavior, then the large buffer migh prevent tasks blocking waiting for space to write the next byte.

Of course, there is no work around for the perverse case where the debug output is generated at a higher rate than can be transferred on the serial port. You are just basically out-of-luck in that case.

Solutions

Serialization Buffer

Some of these asynchrony problems could also be reduced or eliminated if all debug output were buffered in an in-memory FIFO. That buffer would serialize output from diffrent sources: Various tasks and interrupt level logic.

There is already a outgoing, RX buffer in the serial driver. Why would an additional layer of buffering help? Only because then the interrupt handler debug output could also be serialized. The input to the FIFO is debug output from all concurrent tasks and interrupt handlers; the single output of the FIFO would be the serial driver.

Crash Dump

It might even be possible to flush that serialization buffer at the time of ghe crash.

CONFIG_SYSLOG and the RAMLOG

There is a partial solution for all of these issues when CONFIG_SYSLOG=y. In that case, stdout is not used but instead some custom logic is used that is established by the configuration. Currently that option is only used with the RAMLOG. The RAMLOG actually works very nicely and eliminates most of the above issue (except for the interleaving issue). But the RAMLOG also requires some additional logic to get the debug information out of the RAMLOG. In NSH, you can use the dmesg command to dump the content of the RAMLOG to the NSH console.

But I am thinking that some in-memory buffering and serialization is the solution to most of the issues mentioned about: That would end the reliance on stdout. It would be compatible with debug output from interrupt handlers; interrupt handler output would be serialized in the in-memory buffer in the correct position.

The same interleaving problem could still occur. Adding more buffering can always eliminate the interleaving problem, provided that the problem is due to bursty, high-volume output. But you can just increase the size of the serial RX buffer to accomplish that; any special serialization is of no help (other than it effectively increases the size of the RX buffer).

The existing RAMLOG code could be extended so that the buffered data is agressively dumped to the SYSLOG device and I think all problems would be eliminated. The buffer could be dumped that the end of each non-interrupt level execution of syslog() (and also on a crash). Interrupt level output would have to pend in the buffer until the next, non-interrupt level debug output pushes it out to the serial driver.