Summary: | Crash on boot due to race between vt_upgrade() and vt_resume_flush_timer() | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Jonathan T. Looney <jtl> | ||||
Component: | kern | Assignee: | Jonathan T. Looney <jtl> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | ed, emaste, markj | ||||
Priority: | --- | ||||||
Version: | CURRENT | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Jonathan T. Looney
![]() ![]() (In reply to Jonathan T. Looney from comment #0) > CPU1: Running vt_upgrade() > CPU2: Trying to print to the VT device Aha! Yes, vt_upgrade() was never designed with this in mind. Instead of trying to let vt_upgrade() cope with this, wouldn't it make more sense to stick to the original design, where we call vt_upgrade() in an environment where: - SMP hasn't been enabled yet, but - callouts can already be scheduled? The advantage of doing this is that the console will be set to asynchronous mode a lot earlier than before, which will at the same time really speed up graphics rendering a that point. Right now we see that synchronous graphics rendering can be so slow, that it stalls the boot process. (In reply to Ed Schouten from comment #1) > Instead of trying to let vt_upgrade() cope with this, wouldn't it make more sense to stick to the original design, where we call vt_upgrade() in an environment where: > > - SMP hasn't been enabled yet, but > - callouts can already be scheduled? > > The advantage of doing this is that the console will be set to asynchronous mode a lot earlier than before, which will at the same time really speed up graphics rendering a that point. Right now we see that synchronous graphics rendering can be so slow, that it stalls the boot process. That might be a good idea. However, I don't think it is a substitute for the patch I proposed. The order in which things happen can change over time (as the EARLY_AP_STARTUP experience has shown) and it would be good to eliminate this potential race condition in case they change again. I tend to disagree. The console driver, vt(4), was designed to have two modes of operation: - A synchronous mode that should be used in case there is no concurrency. - An asynchronous mode that should be used in case there is concurrency. It doesn't matter how the rest of the kernel is designed; this is simply the two modes it has. Those two modes will remain forever, regardless of how FreeBSD decides to name things. You want the synchronous mode during very early boot and the asynchronous mode otherwise, period. Now you're proposing that vt(4) should be patched to limp along when this design constraint is violated. You want the synchronous mode to work, even if there is concurrency. I disagree. We should just change the kernel to run vt_upgrade() at the right moment. Wouldn't this be sufficient? Index: vt_core.c =================================================================== --- vt_core.c (revision 314634) +++ vt_core.c (working copy) @@ -252,7 +252,7 @@ SYSINIT(vt_update_static, SI_SUB_KMEM, SI_ORDER_ANY, vt_update_static, &vt_consdev); /* Delay until all devices attached, to not waste time. */ -SYSINIT(vt_early_cons, SI_SUB_INT_CONFIG_HOOKS, SI_ORDER_ANY, vt_upgrade, +SYSINIT(vt_early_cons, SI_SUB_KICK_SCHEDULER, SI_ORDER_FIRST, vt_upgrade, &vt_consdev); /* Initialize locks/mem depended members. */ (In reply to Ed Schouten from comment #3) > I tend to disagree. The console driver, vt(4), was designed to have two modes of operation: > > - A synchronous mode that should be used in case there is no concurrency. > - An asynchronous mode that should be used in case there is concurrency. > > [...] > Now you're proposing that vt(4) should be patched to limp along when this design constraint is violated. You want the synchronous mode to work, even if there is concurrency. Even on a uni-processor system, the current code is broken. It assigns the variables in an order that leaves a window in which other threads will do the wrong thing. Even on a uni-processor system, the code can get pre-empted by an interrupt. It sets the VDF_ASYNC flag, then schedules the callout, and then sets vd_timer_armed. If another thread tries to print a carriage return while VDF_ASYNC is set, but vd_timer_armed is not, the other thread will call callout_schedule() and we may see this crash. So, if the thread calling vt_upgrade() is pre-empted while the variables are in this (incorrect) state, and a different thread tries to print a carriage return to the console, a crash may result. Again, this can happen even on a uni-processor system. This is easily fixed by merely changing the order in which we assign the variables, as I proposed. Granted, EARLY_AP_STARTUP made it more likely we would hit this by introducing multi-processor concurrency; however, it was possible to his this bug even before EARLY_AP_STARTUP. (And, the possibility of this bug will remain as long as this code can be pre-empted.) Fixing the order in which we assign these variables is the right thing to do. > We should just change the kernel to run vt_upgrade() at the right moment. I am willing to concede that my fix is not fully sufficient to make SMP operation safe. (This is the second EARLY_AP_STARTUP-exposed vt crash I've fixed in recent days. That tells me that the code at least wasn't tested to run this way.) However, I am not willing to concede that changing the time at which we run vt_upgrade() is sufficient, either. I still think this is a both/and, rather than an either/or situation. (In reply to Ed Schouten from comment #4) > Wouldn't this be sufficient? I don't think so, for reasons explained above. > Index: vt_core.c > =================================================================== > --- vt_core.c (revision 314634) > +++ vt_core.c (working copy) > @@ -252,7 +252,7 @@ > SYSINIT(vt_update_static, SI_SUB_KMEM, SI_ORDER_ANY, vt_update_static, > &vt_consdev); > /* Delay until all devices attached, to not waste time. */ > -SYSINIT(vt_early_cons, SI_SUB_INT_CONFIG_HOOKS, SI_ORDER_ANY, vt_upgrade, > +SYSINIT(vt_early_cons, SI_SUB_KICK_SCHEDULER, SI_ORDER_FIRST, vt_upgrade, > &vt_consdev); > > /* Initialize locks/mem depended members. */ I am supportive of you committing this if you have time to thoroughly test it. Personally, I do not have sufficient time to do that at the moment. However, I have thoroughly tested the patch that I proposed and I would like to commit it while you work on testing the change you have proposed. batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed. Based on the age of this PR and the fact that D9828 was committed, plus the lack of any related bug reports, I think this can be closed. Please re-open if I'm mistaken. (In reply to Mark Johnston from comment #7) I agree. ed@ is no longer active; if someone else wants to carry on with the patch in comment #4 they can do so independent of this PR. |