PREEMPT_RT Latency: Measure It With cyclictest (Part 1)

This is the first part of a hands-on series on real-time Linux using PREEMPT_RT. The practical skill at the centre of the series is measuring PREEMPT_RT latency, which is the delay between when a task should run and when it actually runs. Across the series we will build and boot a real-time kernel, measure its behaviour, tune the system for low latency, and write user-space code that behaves predictably. Before any of that, we need to answer a basic question: what does “real-time” actually mean on Linux, and how do you measure it? In this part you will install the standard measurement tool, run it correctly, and read its output. This gives you a baseline number you can trust before you change anything.

What you need

A Linux machine you can run as root. A normal x86_64 laptop or a single-board computer both work. You do not need a real-time kernel yet; measuring a standard kernel first is useful for comparison.
The rt-tests package, which provides cyclictest. We install it below.
Basic comfort with the shell and sudo.

This is Part 1, so there is no previous part to revisit. If you already run a PREEMPT_RT kernel, you can still follow every step; you will simply see lower maximum latencies than on a standard kernel.

What PREEMPT_RT changes in the kernel

A standard Linux kernel is built for throughput. It is willing to make any single task wait a little longer if doing so lets the system do more total work. Real-time systems have the opposite priority. They need a bounded, predictable worst-case response time, even if average throughput is lower. PREEMPT_RT is the set of kernel changes that makes this possible.

The main changes PREEMPT_RT introduces are:

Sleeping spinlocks. Most kernel locks of type spinlock_t are converted to sleeping locks built on rt-mutexes. This means a section of kernel code that holds such a lock can be preempted, so a high-priority task does not have to wait for it.
Threaded interrupt handlers. Interrupt handlers run in kernel threads that can themselves be scheduled and preempted, instead of blocking everything in hard interrupt context.
Priority inheritance. In-kernel locks support priority inheritance, which limits the classic priority-inversion problem where a low-priority task blocks a high-priority one.

A small number of locks, the raw_spinlock_t type, stay truly non-preemptible because they protect the lowest-level kernel paths. The important point for this series is the result: with PREEMPT_RT, far more of the kernel can be interrupted, which lowers the worst-case time before a high-priority task runs.

An important milestone is that PREEMPT_RT is no longer an out-of-tree patch set for the common case. The core real-time support was merged into the mainline Linux kernel in version 6.12, released in late 2024, and the real-time configuration can now be selected on architectures such as x86_64 and arm64. This series uses that mainline support.

What “latency” means here

When we measure real-time behaviour we are not measuring how fast a program computes. We are measuring scheduling latency: the gap between the moment a task was supposed to wake up and the moment it actually started running. If a task asks to be woken every 200 microseconds, and on one cycle it wakes 30 microseconds late, that 30 microseconds is the latency for that cycle. The number that matters most is the maximum observed latency, because that is what determines whether the system can meet a deadline.

Installing the measurement tool

The tool is cyclictest, part of the rt-tests suite originally written by Thomas Gleixner and now maintained as part of the kernel project. On Debian or Ubuntu:

raghu@techveda.org:~$ sudo apt update
raghu@techveda.org:~$ sudo apt install rt-tests

If your distribution does not package it, build it from the official source:

raghu@techveda.org:~$ git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
raghu@techveda.org:~$ cd rt-tests
raghu@techveda.org:~$ make
raghu@techveda.org:~$ sudo make install

You can confirm whether you are currently on a real-time kernel in two ways:

raghu@techveda.org:~$ uname -v
#1 SMP PREEMPT_RT ...        # the string PREEMPT_RT appears on an RT kernel

raghu@techveda.org:~$ cat /sys/kernel/realtime
1                            # 1 means a PREEMPT_RT kernel is running

If /sys/kernel/realtime does not exist or uname -v does not mention PREEMPT_RT, you are on a standard kernel. That is fine for a baseline measurement.

Running cyclictest correctly

Running cyclictest with no options is not useful; it creates a single thread with a 1 millisecond period and measures very little. The Linux Foundation real-time documentation gives a command that is appropriate for most SMP systems:

raghu@techveda.org:~$ sudo cyclictest --mlockall --smp --priority=80 --interval=200 --distance=0

Here is what each option does:

--mlockall locks the process memory so it cannot be paged out, which removes page faults as a source of latency.
--smp runs one measuring thread per CPU, each pinned to its own core.
--priority=80 runs the measuring threads as SCHED_FIFO real-time threads at priority 80.
--interval=200 sets the wake-up period of the first thread to 200 microseconds.
--distance=0 keeps every thread at the same interval instead of spreading them out.

Internally, a non real-time master thread starts the measuring threads, which sleep using clock_nanosleep and wake on a periodic timer. On each cycle the difference between the intended and actual wake-up time is recorded, and the master thread prints the statistics. Let the test run for at least several minutes; worst-case events are rare, so a short run can look better than the system really is. Press Ctrl+C to stop.

Reading the output

The output prints one line per measuring thread, updated live:

T: 0 (821)  P:80  I:200  C: 518063  Min:      1  Act:    1  Avg:    1  Max:      15
T: 1 (822)  P:80  I:200  C: 518050  Min:      1  Act:    2  Avg:    1  Max:      23

The columns mean:

T — thread index and its thread ID.
P — the real-time priority used.
I — the wake-up interval in microseconds.
C — the count, that is, how many cycles have been measured so far.
Min, Avg, Max — the minimum, average, and maximum latency observed, in microseconds.
Act — the latency of the most recent cycle.

Latencies are shown in microseconds by default; add --nsecs to see nanoseconds. To produce data for a latency plot, add --histogram=400 (or --histofall on SMP) to print a histogram of latencies when the test ends.

Interpreting PREEMPT_RT latency results

The maximum PREEMPT_RT latency is the value that matters. The average is almost always low and tells you little about worst-case behaviour. On a standard, non real-time kernel under load you may see maximum latencies of hundreds of microseconds to several milliseconds. On a properly configured PREEMPT_RT system, maximum latencies are typically far lower and, more importantly, far more consistent.

There is no single “good” number. A latency is acceptable only in relation to the deadline of the task you care about. A motor control loop running every 1 millisecond and an audio buffer refilled every 10 milliseconds have very different tolerances. Decide the deadline first, then judge the measured maximum against it.

To make the comparison meaningful, run cyclictest while the system is under realistic load, not idle. An idle system gives optimistic numbers. A common method is to run a load generator in another terminal, for example a parallel kernel build or the stress-ng tool, while cyclictest runs. The Open Source Automation Development Lab (OSADL) publishes continuous latency plots using cyclictest, which are a useful reference for the order of magnitude to expect on different hardware.

One limitation to keep in mind: cyclictest results are slightly optimistic. Its threads wake straight from a timer in the most direct path available, so a real application with extra layers of indirection may see somewhat higher latency than cyclictest reports. Treat the numbers as a lower bound on worst-case latency, not a guarantee.

Measuring before tuning is the same discipline we teach in the real-time module of the TECH VEDA Linux systems engineering training: establish a trustworthy baseline first, then change one thing at a time and measure again.

Key takeaways

Real-time means bounded, predictable worst-case latency, not higher throughput.
PREEMPT_RT makes most of the kernel preemptible through sleeping spinlocks, threaded interrupts, and priority inheritance; its core support is in the mainline kernel from version 6.12.
cyclictest from the rt-tests package measures PREEMPT_RT latency by comparing intended and actual wake-up times.
Use a real test command such as cyclictest --mlockall --smp --priority=80 --interval=200 --distance=0, run it for several minutes, and run it under load.
Read the maximum latency, judge it against your task’s deadline, and remember the result is a lower bound.

What’s next in this series

In Part 2 we will build and boot a kernel with the PREEMPT_RT configuration enabled, then re-run the same cyclictest command to compare the maximum latency against the baseline we measured here.

Real-Time Linux with PREEMPT_RT — Part 1: Understanding Latency and Measuring It with cyclictest

What you need

What PREEMPT_RT changes in the kernel

What “latency” means here

Installing the measurement tool

Running cyclictest correctly

Reading the output

Interpreting PREEMPT_RT latency results

Key takeaways

What’s next in this series

Further reading

What you need

What PREEMPT_RT changes in the kernel

What “latency” means here

Installing the measurement tool

Running cyclictest correctly

Reading the output

Interpreting PREEMPT_RT latency results

Key takeaways

What’s next in this series

Further reading

Related reading

AI Can Fetch the Answer. Only You Can Notice the Question.

When Information Is Free, Attention Becomes the Scarce Skill

Linux 7.1 and 7.2: Three Changes Embedded Teams Should Plan For