Bug 217408 - Crash on boot due to race between vt_upgrade() and vt_resume_flush_timer()
Summary: Crash on boot due to race between vt_upgrade() and vt_resume_flush_timer()
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Jonathan T. Looney
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-28 00:09 UTC by Jonathan T. Looney
Modified: 2018-05-28 19:50 UTC (History)
2 users (show)

See Also:


Attachments
This patch makes it more likely we will hit the race condition on bootup. It can be used for seeing the bug and testing fixes. (5.69 KB, patch)
2017-02-28 00:09 UTC, Jonathan T. Looney
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan T. Looney freebsd_committer freebsd_triage 2017-02-28 00:09:24 UTC
Created attachment 180352 [details]
This patch makes it more likely we will hit the race condition on bootup. It can be used for seeing the bug and testing fixes.

It appears this is more fallout in the vt driver from EARLY_AP_STARTUP allowing SMP prior to vt_upgrade() running.

Crash at boot:

    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address   = 0x0
    fault code              = supervisor read instruction, page not present
    instruction pointer     = 0x20:0x0
    stack pointer           = 0x28:0xfffffe3ec2392908
    frame pointer         = 0x28:0xfffffe3ec23929b0
    code segment            = base rx0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 12 (swi4: clock (0))
    [ thread pid 12 tid 100028 ]
    Stopped at      0
    db> bt
    Tracing pid 12 tid 100028 td 0xfffff8018e3a5a00
    ??() at 0
    softclock_call_cc() at softclock_call_cc+0x14a/frame 0xfffffe3ec23929b0
    softclock() at softclock+0x94/frame 0xfffffe3ec23929e0
    intr_event_execute_handlers() at intr_event_execute_handlers+0x20f/frame 0xfffffe3ec2392a20
    ithread_loop() at ithread_loop+0xc6/frame 0xfffffe3ec2392a70
    fork_exit() at fork_exit+0x85/frame 0xfffffe3ec2392ab0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe3ec2392ab0
    --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
    db> show registers
    cs                        0x20
    ds                        0x3b
    es                        0x3b
    fs                        0x13
    gs                        0x28
    ss                        0x28
    rax                        0x4
    rcx                        0x1
    rdx                          0
    rbx         0xfffff8018e3a5a00
    rsp         0xfffffe3ec2392908
    rbp         0xfffffe3ec23929b0
    rsi         0xfffff8018e3a5a00
    rdi                          0
    r8          0xffffffff8029baa0  cam_iosched_ticker
    r9                     0x40002
    r10         0xffffffff80d637f8  vt_consdev+0xd8
    r11                       0x1e
    r12         0xffffffff8102d7b8  cc_cpu+0x78
    r13                       0x84
    rbp         0xfffffe3ec23929b0
    rsi         0xfffff8018e3a5a00
    rdi                          0
    r8          0xffffffff8029baa0  cam_iosched_ticker
    r9                     0x40002
    r10         0xffffffff80d637f8  vt_consdev+0xd8
    r11                       0x1e
    r12         0xffffffff8102d7b8  cc_cpu+0x78
    r13                       0x84
    r14                          0
    r15         0xffffffff8102d740  cc_cpu
    rip                          0
    rflags                 0x10246  _binary_t4fw_cfg_uwire_fw_size+0xae81


It is crashing on the line in softclock_call_cc that calls the function. It is 0 in this case.

The current callout struct is stored at -0x40(%rbp). Examining that location, we see that this is stored in the vt console driver. (Thanks, DDB!)

    db> x/gx 0xfffffe3ec2392970
    0xfffffe3ec2392970:     ffffffff80d63828
    db> x/gx 0xffffffff80d63828,8
    vt_consdev+0x108:       0                               ffffffff8102d818
    vt_consdev+0x118:       1f62b6ace8                      a3d709
    vt_consdev+0x128:       0                               0
    vt_consdev+0x138:       ffffffff80d637f8                800002
    db> 

I suspect we are seeing a race between callout_schedule and callout_reset_on.

CPU1: Running vt_upgrade()
CPU2: Trying to print to the VT device

CPU1 (running vt_upgrade()) is running this code:

    2613        if (!(vd->vd_flags & VDF_ASYNC)) {
    2614                /* Attach keyboard. */
    2615                vt_allocate_keyboard(vd);
    2616
    2617                /* Init 25 Hz timer. */
    2618                callout_init_mtx(&vd->vd_timer, &vd->vd_lock, 0);
    2619
    2620                /* Start timer when everything ready. */
    2621                vd->vd_flags |= VDF_ASYNC;
    2622                callout_reset(&vd->vd_timer, hz / VT_TIMERFREQ, vt_timer, vd);
    2623                vd->vd_timer_armed = 1;
    2624                register_handlers = 1;
    2625        }

In particular, I suspect it is running line 2622 when CPU2 tries to print a '\n'. CPU2 follows this callchain:
{trimmed} -> termcn_cnputc -> teken_input -> teken_input_byte -> teken_input_char -> teken_subr_newline -> teken_funcs_cursor -> vt_resume_flush_timer

vt_resume_flush_timer is pretty simple (but you also need to see vt_schedule_flush() to fully understand the code:

    276static void
    277vt_schedule_flush(struct vt_device *vd, int ms)
    278{
    279
    280        if (ms <= 0)
    281                /* Default to initial value. */
    282                ms = 1000 / VT_TIMERFREQ;
    283
    284        callout_schedule(&vd->vd_timer, hz / (1000 / ms));
    285}
    286
    287void
    288vt_resume_flush_timer(struct vt_device *vd, int ms)
    289{
    290
    291        if (!(vd->vd_flags & VDF_ASYNC) ||
    292            !atomic_cmpset_int(&vd->vd_timer_armed, 0, 1))
    293                return;
    294
    295        vt_schedule_flush(vd, ms);
    296}

Note that this thread does not hold the vd lock at this point. (And, it probably can't hold the vd lock, since the vd lock is a normal mutex and termcn_cnputc() may hold a spin lock.) However, also note that the code told the callout subsystem that this callout structure was protected by the vd lock. Yet, this code path touches the callout structure without holding the vd lock. That means the callout subsystem will make erroneous assumptions about how to handle callout events. (This is a larger problem. And, it is difficult to fix because of the spin lock issue.)

One more relevant piece of code:

    1170int
    1171callout_schedule(struct callout *c, int to_ticks)
    1172{
    1173        return callout_reset_on(c, to_ticks, c->c_func, c->c_arg, c->c_cpu);
    1174}

So, I suspect that what is happening is:

CPU1: Sets (vd->vd_flags |= VDF_ASYNC) and calls callout_reset().
CPU2: Checks (vd->vd_flags & VDF_ASYNC) and finds it is set (line 291).
CPU2: Does atomic_cmpset_int(&vd->vd_timer_armed, 0, 1) and finds it succeeds (line 292).
CPU2: Runs callout_schedule. callout_schedule pulls out the current c->c_func and c->c_arg.
CPU2: Calls callout_reset_on with the c->c_func and c->c_arg it just acquired. (Because CPU1 has not initialized these yet, they are NULL.)
CPU1: callout_reset sets c->c_func and c->c_arg to the correct values and completes.
CPU2: callout_reset_on overwrites c->c_func and c->c_arg with the NULL values it acquired.

Eventually, the callout fires and things blow up.

The easy fix for this problem is to reorder the way the variables are assigned in vt_upgrade().

However, this will not fix the larger error in using the callout system APIs with respect to locking.

I have a reliable replication of this where I lengthen the window to hit the race condition. (I'll attach the reproduction patch to this bug.) It seems to reliably produce a crash without a fix. With a patch to vt_upgrade(), the crash goes away. I've opened review D9828 with the fix.
Comment 1 Ed Schouten freebsd_committer freebsd_triage 2017-03-01 16:32:53 UTC
(In reply to Jonathan T. Looney from comment #0)
> CPU1: Running vt_upgrade()
> CPU2: Trying to print to the VT device

Aha! Yes, vt_upgrade() was never designed with this in mind.

Instead of trying to let vt_upgrade() cope with this, wouldn't it make more sense to stick to the original design, where we call vt_upgrade() in an environment where:

- SMP hasn't been enabled yet, but
- callouts can already be scheduled?

The advantage of doing this is that the console will be set to asynchronous mode a lot earlier than before, which will at the same time really speed up graphics rendering a that point. Right now we see that synchronous graphics rendering can be so slow, that it stalls the boot process.
Comment 2 Jonathan T. Looney freebsd_committer freebsd_triage 2017-03-02 00:35:14 UTC
(In reply to Ed Schouten from comment #1)
> Instead of trying to let vt_upgrade() cope with this, wouldn't it make more sense to stick to the original design, where we call vt_upgrade() in an environment where:
> 
> - SMP hasn't been enabled yet, but
> - callouts can already be scheduled?
> 
> The advantage of doing this is that the console will be set to asynchronous mode a lot earlier than before, which will at the same time really speed up graphics rendering a that point. Right now we see that synchronous graphics rendering can be so slow, that it stalls the boot process.


That might be a good idea. However, I don't think it is a substitute for the patch I proposed. The order in which things happen can change over time (as the EARLY_AP_STARTUP experience has shown) and it would be good to eliminate this potential race condition in case they change again.
Comment 3 Ed Schouten freebsd_committer freebsd_triage 2017-03-03 21:39:51 UTC
I tend to disagree. The console driver, vt(4), was designed to have two modes of operation:

- A synchronous mode that should be used in case there is no concurrency.
- An asynchronous mode that should be used in case there is concurrency.

It doesn't matter how the rest of the kernel is designed; this is simply the two modes it has. Those two modes will remain forever, regardless of how FreeBSD decides to name things. You want the synchronous mode during very early boot and the asynchronous mode otherwise, period.

Now you're proposing that vt(4) should be patched to limp along when this design constraint is violated. You want the synchronous mode to work, even if there is concurrency. I disagree. We should just change the kernel to run vt_upgrade() at the right moment.
Comment 4 Ed Schouten freebsd_committer freebsd_triage 2017-03-03 21:51:17 UTC
Wouldn't this be sufficient?

Index: vt_core.c
===================================================================
--- vt_core.c   (revision 314634)
+++ vt_core.c   (working copy)
@@ -252,7 +252,7 @@
 SYSINIT(vt_update_static, SI_SUB_KMEM, SI_ORDER_ANY, vt_update_static,
     &vt_consdev);
 /* Delay until all devices attached, to not waste time. */
-SYSINIT(vt_early_cons, SI_SUB_INT_CONFIG_HOOKS, SI_ORDER_ANY, vt_upgrade,
+SYSINIT(vt_early_cons, SI_SUB_KICK_SCHEDULER, SI_ORDER_FIRST, vt_upgrade,
     &vt_consdev);

 /* Initialize locks/mem depended members. */
Comment 5 Jonathan T. Looney freebsd_committer freebsd_triage 2017-03-04 01:16:03 UTC
(In reply to Ed Schouten from comment #3)
> I tend to disagree. The console driver, vt(4), was designed to have two modes of operation:
> 
> - A synchronous mode that should be used in case there is no concurrency.
> - An asynchronous mode that should be used in case there is concurrency.
> 
> [...]
> Now you're proposing that vt(4) should be patched to limp along when this design constraint is violated. You want the synchronous mode to work, even if there is concurrency.

Even on a uni-processor system, the current code is broken. It assigns the variables in an order that leaves a window in which other threads will do the wrong thing. Even on a uni-processor system, the code can get pre-empted by an interrupt.

It sets the VDF_ASYNC flag, then schedules the callout, and then sets vd_timer_armed. If another thread tries to print a carriage return while VDF_ASYNC is set, but vd_timer_armed is not, the other thread will call callout_schedule() and we may see this crash. So, if the thread calling vt_upgrade() is pre-empted while the variables are in this (incorrect) state, and a different thread tries to print a carriage return to the console, a crash may result. Again, this can happen even on a uni-processor system.

This is easily fixed by merely changing the order in which we assign the variables, as I proposed.

Granted, EARLY_AP_STARTUP made it more likely we would hit this by introducing multi-processor concurrency; however, it was possible to his this bug even before EARLY_AP_STARTUP. (And, the possibility of this bug will remain as long as this code can be pre-empted.) Fixing the order in which we assign these variables is the right thing to do.


> We should just change the kernel to run vt_upgrade() at the right moment.

I am willing to concede that my fix is not fully sufficient to make SMP operation safe. (This is the second EARLY_AP_STARTUP-exposed vt crash I've fixed in recent days. That tells me that the code at least wasn't tested to run this way.) However, I am not willing to concede that changing the time at which we run vt_upgrade() is sufficient, either.

I still think this is a both/and, rather than an either/or situation.



(In reply to Ed Schouten from comment #4)

> Wouldn't this be sufficient?

I don't think so, for reasons explained above.


> Index: vt_core.c
> ===================================================================
> --- vt_core.c   (revision 314634)
> +++ vt_core.c   (working copy)
> @@ -252,7 +252,7 @@
>  SYSINIT(vt_update_static, SI_SUB_KMEM, SI_ORDER_ANY, vt_update_static,
>      &vt_consdev);
>  /* Delay until all devices attached, to not waste time. */
> -SYSINIT(vt_early_cons, SI_SUB_INT_CONFIG_HOOKS, SI_ORDER_ANY, vt_upgrade,
> +SYSINIT(vt_early_cons, SI_SUB_KICK_SCHEDULER, SI_ORDER_FIRST, vt_upgrade,
>      &vt_consdev);
> 
>  /* Initialize locks/mem depended members. */

I am supportive of you committing this if you have time to thoroughly test it. Personally, I do not have sufficient time to do that at the moment. However, I have thoroughly tested the patch that I proposed and I would like to commit it while you work on testing the change you have proposed.
Comment 6 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:18 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.