Deferred Probe in the Linux Kernel: How It Works

On an embedded board, drivers are not guaranteed to probe in the order you expect. A display driver may need a backlight, a regulator, a clock, or a GPIO line that belongs to a different controller, and that controller may not have been initialised yet. The kernel solves this ordering problem with deferred probe. This article explains how deferred probe works inside the driver core, walks through the exact code paths in drivers/base/dd.c, shows a complete driver that cooperates with the mechanism, and gives a full debugging session for a device that never finishes binding. The behaviour described here applies to every bus that uses the standard Linux device model.

A quick recap of bind, match, and probe

Before deferred probe makes sense, it helps to recall how a device and a driver are joined. Every driver registers with a bus and provides a probe() callback. A platform driver, the most common kind on embedded systems, looks like this:

static const struct of_device_id mydev_of_match[] = {
        { .compatible = "vendor,mydev" },
        { /* sentinel */ }
};
MODULE_DEVICE_TABLE(of, mydev_of_match);

static struct platform_driver mydev_driver = {
        .probe  = mydev_probe,
        .remove = mydev_remove,
        .driver = {
                .name           = "mydev",
                .of_match_table = mydev_of_match,
        },
};
module_platform_driver(mydev_driver);

When a device with compatible = "vendor,mydev" appears in the Device Tree, the bus matches it to this driver and calls mydev_probe(). The probe function is where the driver acquires its resources and brings the hardware up. The problem is that those resources are themselves provided by other devices, and the kernel has no global picture of which device must be ready first.

The ordering problem deferred probe solves

Consider an I2C sensor whose Device Tree node references a clock, a regulator, and a reset GPIO that all live behind separate controllers:

&i2c1 {
        sensor@40 {
                compatible = "vendor,mydev";
                reg = <0x40>;
                clocks = <&clk_controller 5>;
                vdd-supply = <&reg_3v3>;
                reset-gpios = <&gpio2 7 GPIO_ACTIVE_LOW>;
        };
};

At boot the kernel registers devices roughly in the order it walks the tree. There is no guarantee that clk_controller, reg_3v3, and gpio2 have all probed before sensor@40 is reached. If the sensor driver runs first and asks for a clock that does not yet exist, it must not fail permanently. It must tell the core to try again later. That signal is the error code -EPROBE_DEFER, and the machinery that acts on it is deferred probe.

Returning -EPROBE_DEFER from a probe function

Here is a probe function that acquires the three resources from the Device Tree node above. Each resource helper can return -EPROBE_DEFER wrapped in an error pointer if the provider is not ready, and the driver passes that code straight back to the core:

struct mydev_priv {
        struct clk       *clk;
        struct regulator *vdd;
        struct gpio_desc *reset;
};

static int mydev_probe(struct platform_device *pdev)
{
        struct device *dev = &pdev->dev;
        struct mydev_priv *priv;
        int ret;

        priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
        if (!priv)
                return -ENOMEM;

        /* Any of these can return -EPROBE_DEFER if the
         * providing device has not been bound yet.        */
        priv->clk = devm_clk_get(dev, NULL);
        if (IS_ERR(priv->clk))
                return dev_err_probe(dev, PTR_ERR(priv->clk),
                                     "failed to get clockn");

        priv->vdd = devm_regulator_get(dev, "vdd");
        if (IS_ERR(priv->vdd))
                return dev_err_probe(dev, PTR_ERR(priv->vdd),
                                     "failed to get vdd supplyn");

        priv->reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH);
        if (IS_ERR(priv->reset))
                return dev_err_probe(dev, PTR_ERR(priv->reset),
                                     "failed to get reset gpion");

        ret = clk_prepare_enable(priv->clk);
        if (ret)
                return dev_err_probe(dev, ret, "cannot enable clockn");

        platform_set_drvdata(pdev, priv);
        return 0;
}

Two things make this correct. First, the driver returns the error code unchanged; it does not translate -EPROBE_DEFER into -ENODEV or -EINVAL. Second, every resource is acquired with a managed (devm_) helper, so if the function returns early on a deferral, anything already acquired is released automatically. If you mix raw allocations with managed ones, a deferral can leak whatever you set up before the failing call, and that leak repeats on every retry.

Inside really_probe(): where the code branches

When the bus decides to try a driver against a device, it ends up in really_probe() in drivers/base/dd.c. The relevant skeleton is below, trimmed to the parts that matter for deferral:

static int really_probe(struct device *dev, struct device_driver *drv)
{
        int ret = -EPROBE_DEFER;
        int local_trigger_count = atomic_read(&deferred_trigger_count);

        /* fw_devlink: bail out before probe() if a supplier
         * named in the firmware/DT is not ready yet.        */
        ret = device_links_check_suppliers(dev);
        if (ret == -EPROBE_DEFER)
                driver_deferred_probe_add_trigger(dev, local_trigger_count);
        if (ret)
                return ret;

        ...
        dev->driver = drv;
        ...
        if (dev->bus->probe)
                ret = dev->bus->probe(dev);
        else if (drv->probe)
                ret = drv->probe(dev);   /* your mydev_probe() runs here */
        ...

probe_failed:
        ...
        switch (ret) {
        case -EPROBE_DEFER:
                /* Driver requested deferred probing */
                dev_dbg(dev, "Driver %s requests probe deferraln", drv->name);
                driver_deferred_probe_add_trigger(dev, local_trigger_count);
                break;
        case -ENODEV:
        case -ENXIO:
                pr_debug("%s: probe of %s rejects match %dn",
                         drv->name, dev_name(dev), ret);
                break;
        default:
                /* matched but the probe failed for real */
                dev_warn(dev, "probe of %s failed with error %dn",
                         dev_name(dev), ret);
        }
        ...
}

Notice the two distinct places a deferral can originate. The first is device_links_check_suppliers(), which runs before your probe() and can defer on its own. The second is your driver returning -EPROBE_DEFER from its own probe(). In both cases the core calls driver_deferred_probe_add_trigger(), which records the device on a waiting list. Also note the switch: a plain -ENODEV is treated as “this driver does not handle this device” and is silent, while any other non-deferral error is logged as a real probe failure. That distinction is why returning the right error code from your driver matters.

The two lists and the retry workqueue

The deferred probe core keeps two lists and an atomic counter, all defined at the top of drivers/base/dd.c:

static LIST_HEAD(deferred_probe_pending_list);
static LIST_HEAD(deferred_probe_active_list);
static atomic_t deferred_trigger_count = ATOMIC_INIT(0);

A device that defers is appended to the pending list by driver_deferred_probe_add():

void driver_deferred_probe_add(struct device *dev)
{
        mutex_lock(&deferred_probe_mutex);
        if (list_empty(&dev->p->deferred_probe)) {
                dev_dbg(dev, "Added to deferred listn");
                list_add_tail(&dev->p->deferred_probe,
                              &deferred_probe_pending_list);
        }
        mutex_unlock(&deferred_probe_mutex);
}

Nothing retries that device immediately. The retry is event driven: whenever any driver binds successfully, driver_bound() calls driver_deferred_probe_trigger(), which moves the whole pending list onto the active list and schedules a workqueue item:

static void driver_deferred_probe_trigger(void)
{
        if (!driver_deferred_probe_enable)
                return;

        mutex_lock(&deferred_probe_mutex);
        atomic_inc(&deferred_trigger_count);
        list_splice_tail_init(&deferred_probe_pending_list,
                              &deferred_probe_active_list);
        mutex_unlock(&deferred_probe_mutex);

        /* Kick the re-probe thread. */
        schedule_work(&deferred_probe_work);
}

The scheduled work function drains the active list and re-attempts each device through bus_probe_device():

static void deferred_probe_work_func(struct work_struct *work)
{
        struct device *dev;
        struct device_private *private;

        mutex_lock(&deferred_probe_mutex);
        while (!list_empty(&deferred_probe_active_list)) {
                private = list_first_entry(&deferred_probe_active_list,
                                           typeof(*dev->p), deferred_probe);
                dev = private->device;
                list_del_init(&private->deferred_probe);

                get_device(dev);
                mutex_unlock(&deferred_probe_mutex);

                device_pm_move_to_tail(dev);
                dev_dbg(dev, "Retrying from deferred listn");
                bus_probe_device(dev);   /* re-enters really_probe() */

                mutex_lock(&deferred_probe_mutex);
                put_device(dev);
        }
        mutex_unlock(&deferred_probe_mutex);
}

This is the whole convergence loop. Drivers probe in whatever order they are reached. Each successful bind moves all waiting devices back to the active list and reruns them. After enough rounds, every device whose dependencies can be satisfied has been bound. The get_device() and put_device() pair is there because the mutex is dropped while probing, and without a reference the device structure could be freed by another thread mid-probe. The device_pm_move_to_tail() call re-orders the power-management list so suspend and resume still see a safe ordering after probe order was shuffled.

When retries start, and when they stop

Deferred probe retries are held back during early boot so the list does not churn while built-in drivers are probing for the first time. The function deferred_probe_initcall() runs as a late_initcall, switches triggering on, flushes the queue, and creates the debugfs file:

static int deferred_probe_initcall(void)
{
        deferred_devices = debugfs_create_file("devices_deferred", 0444,
                                               NULL, NULL, &deferred_devs_fops);
        driver_deferred_probe_enable = true;
        driver_deferred_probe_trigger();
        flush_work(&deferred_probe_work);
        initcalls_done = true;

        /* Trigger again; this pass won't defer optional dependencies. */
        driver_deferred_probe_trigger();
        flush_work(&deferred_probe_work);

        if (deferred_probe_timeout > 0)
                schedule_delayed_work(&deferred_probe_timeout_work,
                                      deferred_probe_timeout * HZ);
        return 0;
}
late_initcall(deferred_probe_initcall);

There is also an optional timeout, controlled by the kernel command-line parameter deferred_probe_timeout. A subsystem can call driver_deferred_probe_check_state() instead of returning -EPROBE_DEFER directly. That helper returns -EPROBE_DEFER while initcalls are still running; once they finish it returns -ENODEV so an optional supplier can be skipped, or -ETIMEDOUT if the deadline passed:

static int __driver_deferred_probe_check_state(struct device *dev)
{
        if (!initcalls_done)
                return -EPROBE_DEFER;

        if (!deferred_probe_timeout) {
                dev_WARN(dev, "deferred probe timeout, ignoring dependency");
                return -ETIMEDOUT;
        }
        return 0;
}

The timeout mainly handles the case where a needed driver is built as a module and is never loaded, so a consumer would otherwise wait forever. In current kernels the timeout is effectively disabled by default, which means the kernel keeps the device waiting unless you set a value on the command line.

fw_devlink: ordering before probe even runs

Modern kernels add a layer on top of the retry loop. From the firmware description (the Device Tree references shown earlier: clocks, vdd-supply, reset-gpios), the kernel builds supplier and consumer device links automatically. This is fw_devlink. Before really_probe() calls your probe(), it checks those links:

ret = device_links_check_suppliers(dev);
if (ret == -EPROBE_DEFER)
        driver_deferred_probe_add_trigger(dev, local_trigger_count);
if (ret)
        return ret;

If a declared supplier has not bound yet, the core returns -EPROBE_DEFER on the driver’s behalf and your probe() is never entered. This is an important debugging fact: a deferral you see in the logs may have nothing to do with code inside your driver. Putting print statements at the top of your probe() tells you nothing when the device never reaches it. The deferral is coming from the device links layer, and you confirm that by reading the deferred-probe reason rather than by instrumenting your driver.

A full debugging session

Suppose the sensor above never appears and there is no entry under /sys/bus/i2c/devices bound to a driver. Start with the debugfs file the driver core exports. It lists every device on the pending list and, on recent kernels, the reason for the deferral:

raghu@techveda.org:~$ cat /sys/kernel/debug/devices_deferred
1c2ac00.i2c:sensor@40   i2c: supplier 1c20800.gpio not ready

This tells you the sensor is waiting on the GPIO controller at 1c20800. The next question is whether that controller has a driver at all. Check the kernel configuration that is actually running:

raghu@techveda.org:~$ zcat /proc/config.gz | grep -i gpio_sunxi
CONFIG_GPIO_SUNXI=m

Here the GPIO driver is built as a module (=m) and may not have been loaded, which is exactly the kind of dependency that keeps a consumer pending. Confirm whether the module is present:

raghu@techveda.org:~$ lsmod | grep gpio
raghu@techveda.org:~$ modprobe gpio-sunxi
raghu@techveda.org:~$ cat /sys/kernel/debug/devices_deferred
raghu@techveda.org:~$

After the supplier module loads, its successful bind triggers the retry workqueue, the sensor is reprobed, and the deferred list becomes empty. If you want to watch the events as they happen, turn on dynamic debug for the driver core. You can do it at runtime through debugfs:

raghu@techveda.org:~$ echo 'file drivers/base/dd.c +p' > /sys/kernel/debug/dynamic_debug/control
raghu@techveda.org:~$ dmesg | grep -iE 'defer|retr'
platform 1c2ac00.i2c:sensor@40: Added to deferred list
platform 1c2ac00.i2c:sensor@40: Retrying from deferred list
platform 1c2ac00.i2c:sensor@40: bound to driver mydev

The same dynamic-debug switch can be set at boot by appending dyndbg="file drivers/base/dd.c +p" to the kernel command line. To see the timing of every probe in order, boot with initcall_debug, which logs each probe call and how long it took. If the debugfs file is empty but the device is still missing, the device is not deferred at all; it usually means no driver matched it (often a typo in the compatible string) or the driver is simply not built. You can force a manual rebind to test a driver in isolation:

raghu@techveda.org:~$ echo 1c2ac00.i2c:sensor@40 > /sys/bus/platform/drivers/mydev/unbind
raghu@techveda.org:~$ echo 1c2ac00.i2c:sensor@40 > /sys/bus/platform/drivers/mydev/bind

Logging deferrals without flooding the console

A naive driver prints an error on every deferral, and because the device is retried many times during boot, the log fills with the same message. The kernel provides dev_err_probe() for exactly this. It returns the error code you pass in, logs at debug level when that code is -EPROBE_DEFER (and at error level otherwise), and records the human-readable reason that later shows up in /sys/kernel/debug/devices_deferred. Compare the old pattern:

/* Old: noisy, and loses the deferral reason */
priv->clk = devm_clk_get(dev, NULL);
if (IS_ERR(priv->clk)) {
        ret = PTR_ERR(priv->clk);
        if (ret != -EPROBE_DEFER)
                dev_err(dev, "failed to get clock: %dn", ret);
        return ret;
}

with the modern one, which collapses all of that into a single line:

/* New: quiet on deferral, records the reason */
priv->clk = devm_clk_get(dev, NULL);
if (IS_ERR(priv->clk))
        return dev_err_probe(dev, PTR_ERR(priv->clk),
                             "failed to get clockn");

Understanding this binding model is core to embedded board bring-up.

Practical rules for cooperating with deferred probe

Acquire all external resources (clocks, regulators, GPIOs, PHYs, backlights, DMA channels) early in probe() and return the error code unchanged when a helper returns -EPROBE_DEFER.
Use the managed devm_ helpers so a deferral does not leak resources acquired before the failing call.
Use dev_err_probe() rather than dev_err() for resource acquisition, so retries stay quiet and the reason is captured.
Never convert -EPROBE_DEFER into another error; doing so turns a temporary wait into a permanent failure.
When a device is stuck, read /sys/kernel/debug/devices_deferred first. It almost always names the missing supplier.

Key takeaways

Deferred probe lets drivers bind in any order by retrying devices whose dependencies were not yet ready.
A driver signals this by returning -EPROBE_DEFER from probe(); really_probe() adds the device to a pending list.
Every successful bind triggers a workqueue that moves the pending list to an active list and retries each device, so the system converges over several rounds.
The device links layer (fw_devlink) can defer a probe before your probe() even runs, based on Device Tree dependencies.
/sys/kernel/debug/devices_deferred, dynamic debug on dd.c, and initcall_debug are the fastest ways to see what is stuck and why.

Deferred Probe in the Linux Kernel: Why a Driver’s probe() Runs Late and How to Debug It

A quick recap of bind, match, and probe

The ordering problem deferred probe solves

Returning -EPROBE_DEFER from a probe function

Inside really_probe(): where the code branches

The two lists and the retry workqueue

When retries start, and when they stop

fw_devlink: ordering before probe even runs

A full debugging session

Logging deferrals without flooding the console

Practical rules for cooperating with deferred probe

Key takeaways

Further reading

A quick recap of bind, match, and probe

The ordering problem deferred probe solves

Returning -EPROBE_DEFER from a probe function

Inside really_probe(): where the code branches

The two lists and the retry workqueue

When retries start, and when they stop

fw_devlink: ordering before probe even runs

A full debugging session

Logging deferrals without flooding the console

Practical rules for cooperating with deferred probe

Key takeaways

Further reading

Related reading

Meet the sched_ext Ecosystem

The sched_ext Architecture

The sched_ext Revolution: The Future of CPU Scheduling in Linux