Measuring kernel latencies to ensure real-time constraints

Device drivers in the kernel often need to perform some task in response to some events. To do this, there is not one but many different ways. These deferred execution methods include the Linux workqueue, the tasklet, the kernel thread and so on. Different methods have different scheduling priorities and thus different response latencies. They also differ in what execution context they are in (e.g., process vs. interrupt context), and may affect which method is more suitable for a specific purpose.

In this article, I describe what I find in experimenting the different methods, in terms of their response latencies, and how the system load, and user space task priorities affect them:

  1. workqueue
  2. tasklet
  3. kernel thread

The latency here means the time after a task is invoked and before it is executed. It depends on Linux scheduler latency, the deferred execution method (workqueue vs. tasklet vs. kthread), and the priorities of competing tasks. The first item, the scheduler latency, means the time between a service being requested and the time the scheduler being executed. This was a significant issue for early Linux kernel because it was not preemptive, and thus the kernel scheduler might not get executed for a fairly long period of time after an event is raised. In the recent kernel, the scheduler latency has been greatly reduced due to preemption in the kernel. The caveat is, however, some synchronization techniques, such as the spinlock, may still prevent preemption from happening, and thus can still slow down kernel response in some conditions.

The latency being measured here is not for the scheduler, but for the scheduled task to be executed. This depends also on the scheduling algorithm, and the priority of the task and other competing tasks. One can find detailed discussion of Linux scheduler here (http://oreilly.com/catalog/linuxkernel/chapter/ch10.html). Now, it suffices to say that the scheduler will do three things in order:

  1. execute deferred tasks in the task queue
  2. execute the bottom half (deferred tasklets and soft irqs)
  3. find some process to run based on their scheduling policies (SCHED_FIFO, SCHED_RR, or SCHED_OTHER), and their priorities.

From here, we can see that the tasklet has the highest priority — it is run even before the scheduler looks at the given priorities of any kernel task. Here, the kernel task means an execution unit with a kernel struct task_struct data structure. It includes any userspace process, any POSIX threads (implemented by native posix thread library, NPTL) and any kernel workqueue (either the kernel global queue or one created by a module). Naturally, any of the latter group would have a higher latency than a tasklet. We will see below that the tasklet is indeed the one with lowest latencies in all conditions, especially when the system load is high. However, it is not to say that we should always use the tasklet for everything, because the tasklet runs in an interrupt context, it cannot be used for operations that may sleep (e.g., some memory allocation and I/O), and it may prevent kernel premption and increase kernel latency itself.

To choose a method wisely, we can measure their runtime performance, and understand how quick each of the methods is, and how setting scheduling policies and priorities may change the latencies. Here, I describe some data I got in some tests. The tests are done with a kernel module, which implements three execution methods, the workqueue (without delay), the tasklet (without delay), and waking up an existing kthread. Each method is tested for N times (N = 10,000). The average and maximum latencies are taken. The outstanding system load is 10 real-time user-space threads (with policy SCHED_RR, priority 1). The test thread is running with policy SCHED_RR, priority 20. The priority is set higher than the background threads to avoid starvation.

System without high-priority load

Latency Workqueue (global) Workqueue (private) Tasklet Kthread Userspace
Avg 6 us 6 us 5 us 5 us 8 us
Stdev 1.414 us 1.000 us 2.646 us 2.646 us 3.606 us
Max 21 us 19 us 135 us 30 us 12 us

System with high-priority load

Latency Workqueue (global) Workqueue (private) Tasklet Kthread Userspace
Avg 101 us 101 us 6 us 195 us 49693 us
Stdev 222.948 us 242.535 us 99.895 us 333.742 us 43963.480 us
Max 950022 us 950070 us 9992 us 950160 us 49887 us

It can be seen that when the userspace has some high-priority load, the kernel performance is affected as well as the user space performance. The latencies of kernel tasks (as opposed to tasklets) are increased from microsecond levels to almost a second. The good thing, however, is that the kernel tasks are never blocked by userspace load, no matter how high their priority is. This is not the case for userspace threads. If the test real-time thread has lower priority than the real-time background threads, the test thread never gets enough time slices to execute.

Conclusion
Therefore, the conclusion from the result is kernel tasks (threads) have best-case latencies at microseconds level, and worst case performance around 1 second. The worst case for tasklets is lower and is at the millisecond level. The kernel tasks and threads are not easily starved for longer than a few seconds due to userspace workload, while userspace threads may.

Download
Source code of the kernel module and the test tool can be downloaded on github: http://github.com/dankex/tools/tree/master/linux-kernel/wake_latency/

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License