Summary: | Kernel panic at boot under XEN: integer divide fault while in kernel mode | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Sylvain Garrigues <sylvain> | ||||
Component: | kern | Assignee: | freebsd-xen (Nobody) <xen> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Only Me | CC: | cperciva, dim, meyer.sydney, royger, sylvain, xen | ||||
Priority: | --- | ||||||
Version: | CURRENT | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Sylvain Garrigues
2016-12-11 10:55:25 UTC
May I say that I discourage all FreeBSD users to upgrade their CURRENT systems if running on Amazon EC2! Machine won't reboot until this is fixed. This is urgent bug. This seems to have been introduced by the import of clang 3.9.0 in r309124. I'll work on tracking this down further next week; I'm working on NFS right now and it's best if I don't context-switch. As you say, best to avoid HEAD right now on EC2. I'm aware of this, I've already realised last week but haven't been able to debug it. Will get to it now. In the meantime, you can boot the the previous kernel using:
> boot kernel.old
From the loader command line.
I don't seem to be able to reproduce this with r309875, can you check if you still get the panic with that or any later revision? Thanks, Roger. Which compiler did you use? I'm seeing this only with clang 3.9.0; if you check out a new src tree on a pre-r309124 system and just buildkernel then it will use the 3.8.0 compiler. (In reply to Colin Percival from comment #5) I had a system running CURRENT as of Nov. 3 so it was pre-r309124 and I upgraded to CURRENT as of Dec. 10. At first since the machine didn't survive a reboot I thought I did a bad mergemaster on passwd files. So I installed a fresh EC2 AMI which also happened to be from CURRENT-amd64-2016-12-10 and I had the same issue. I looked at the console and saw the kernel panic. Tried the EC2 AMI from CURRENT-amd64-2016-11-30, same problem. Finally tried the EC2 AMI from CURRENT-amd64-2016-11-01, NO problem. A commit references this bug: Author: cperciva Date: Tue Dec 13 06:54:14 UTC 2016 New revision: 310013 URL: https://svnweb.freebsd.org/changeset/base/310013 Log: Check that blkfront devices have a non-zero number of sectors and a non-zero sector size. Such a device would be a virtual disk of zero bytes; clearly not useful, and not something we should try to attach. As a fortuitous side effect, checking that these values are non-zero here results in them not *becoming* zero later on the function. This odd behaviour began with r309124 (clang 3.9.0) but is challenging to debug; making any changes to this function whatsoever seems to affect the llvm optimizer behaviour enough to make the unexpected zeroing of the sector_size variable cease. PR: 215209 Security: The potential for variables to unexpectedly become zero has worrying consequences for security in general, but not so much in this particular context. Changes: head/sys/dev/xen/blkfront/blkfront.c A commit references this bug: Author: dim Date: Wed Dec 14 19:28:19 UTC 2016 New revision: 310086 URL: https://svnweb.freebsd.org/changeset/base/310086 Log: In xbd_connect(), use correct scanf conversion specifiers for the feature_barrier and feature_flush variables. Otherwise, adjacent variables on the stack, such as sector_size, may be overwritten, with disastrous results. Note that I did not see a good reason to revert the addition of zero checks introduced in r310013. Better safe than sorry. PR: 215209 Tested by: royger MFC after: 3 days Changes: head/sys/dev/xen/blkfront/blkfront.c A commit references this bug: Author: dim Date: Sun Dec 18 14:31:12 UTC 2016 New revision: 310228 URL: https://svnweb.freebsd.org/changeset/base/310228 Log: MFC r310013 (by cperciva): Check that blkfront devices have a non-zero number of sectors and a non-zero sector size. Such a device would be a virtual disk of zero bytes; clearly not useful, and not something we should try to attach. As a fortuitous side effect, checking that these values are non-zero here results in them not *becoming* zero later on the function. This odd behaviour began with r309124 (clang 3.9.0) but is challenging to debug; making any changes to this function whatsoever seems to affect the llvm optimizer behaviour enough to make the unexpected zeroing of the sector_size variable cease. PR: 215209 Security: The potential for variables to unexpectedly become zero has worrying consequences for security in general, but not so much in this particular context. MFC r310086: In xbd_connect(), use correct scanf conversion specifiers for the feature_barrier and feature_flush variables. Otherwise, adjacent variables on the stack, such as sector_size, may be overwritten, with disastrous results. Note that I did not see a good reason to revert the addition of zero checks introduced in r310013. Better safe than sorry. PR: 215209 Tested by: royger Changes: _U stable/10/ stable/10/sys/dev/xen/blkfront/blkfront.c _U stable/11/ stable/11/sys/dev/xen/blkfront/blkfront.c _U stable/9/ _U stable/9/sys/ stable/9/sys/dev/xen/blkfront/blkfront.c A commit references this bug: Author: dim Date: Sun Dec 18 14:31:12 UTC 2016 New revision: 310228 URL: https://svnweb.freebsd.org/changeset/base/310228 Log: MFC r310013 (by cperciva): Check that blkfront devices have a non-zero number of sectors and a non-zero sector size. Such a device would be a virtual disk of zero bytes; clearly not useful, and not something we should try to attach. As a fortuitous side effect, checking that these values are non-zero here results in them not *becoming* zero later on the function. This odd behaviour began with r309124 (clang 3.9.0) but is challenging to debug; making any changes to this function whatsoever seems to affect the llvm optimizer behaviour enough to make the unexpected zeroing of the sector_size variable cease. PR: 215209 Security: The potential for variables to unexpectedly become zero has worrying consequences for security in general, but not so much in this particular context. MFC r310086: In xbd_connect(), use correct scanf conversion specifiers for the feature_barrier and feature_flush variables. Otherwise, adjacent variables on the stack, such as sector_size, may be overwritten, with disastrous results. Note that I did not see a good reason to revert the addition of zero checks introduced in r310013. Better safe than sorry. PR: 215209 Tested by: royger Changes: _U stable/10/ stable/10/sys/dev/xen/blkfront/blkfront.c _U stable/11/ stable/11/sys/dev/xen/blkfront/blkfront.c _U stable/9/ _U stable/9/sys/ stable/9/sys/dev/xen/blkfront/blkfront.c |