Created attachment 188209 [details] Entire core dump Hi! I just upgraded our firewall from 11.0 to 11.1-p4, however after about ~35-45 minutes it panics. After some poking around i saw there were quite a bit of changes in the mps driver so I'm guessing a bug snuck in there somewhere. I'm happy to apply a debug patch to get you more information to sort this out if you need to. It panics with this stracktrace: Unread portion of the kernel message buffer: panic: vm_fault: fault on nofault entry, addr: fffffe00003eb000 cpuid = 4 KDB: stack backtrace: #0 0xffffffff80aadac7 at kdb_backtrace+0x67 #1 0xffffffff80a6bba6 at vpanic+0x186 #2 0xffffffff80a6ba13 at panic+0x43 #3 0xffffffff80d58b90 at vm_fault_hold+0x2070 #4 0xffffffff80d56ad5 at vm_fault+0x75 #5 0xffffffff80edf927 at trap_pfault+0xe7 #6 0xffffffff80edf0c6 at trap+0x286 #7 0xffffffff80ec36d1 at calltrap+0x8 #8 0xffffffff8067b346 at mps_ioctl+0x2e86 #9 0xffffffff8093ae38 at devfs_ioctl_f+0x128 #10 0xffffffff80ac9415 at kern_ioctl+0x255 #11 0xffffffff80ac914f at sys_ioctl+0x16f #12 0xffffffff80ee0394 at amd64_syscall+0x6c4 #13 0xffffffff80ec39bb at Xfast_syscall+0xfb And here is the doadump log: (kgdb) #0 doadump (textdump=<value optimized out>) at pcpu.h:222 #1 0xffffffff80a6b721 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80a6bbe0 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80a6ba13 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80d58b90 in vm_fault_hold (map=<value optimized out>, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=<value optimized out>, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:524 #5 0xffffffff80d56ad5 in vm_fault (map=0xfffff80003000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:475 #6 0xffffffff80edf927 in trap_pfault (frame=0xfffffe08595cb510, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:708 #7 0xffffffff80edf0c6 in trap (frame=0xfffffe08595cb510) at /usr/src/sys/amd64/amd64/trap.c:421 #8 0xffffffff80ec36d1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #9 0xffffffff80edd63f in copyout () at /usr/src/sys/amd64/amd64/support.S:255 #10 0xffffffff8067b346 in mps_ioctl () at /usr/src/sys/dev/mps/mps_user.c:1040 #11 0xffffffff8093ae38 in devfs_ioctl_f (fp=0xfffff80013466e10, com=3224914180, data=0xfffffe08595cb870, cred=0xfffff80013892500, td=0xfffff8000ab48000) at /usr/src/sys/fs/devfs/devfs_vnops.c:791 #12 0xffffffff80ac9415 in kern_ioctl (td=<value optimized out>, fd=3, com=<value optimized out>, data=<value optimized out>) at file.h:323 #13 0xffffffff80ac914f in sys_ioctl (td=<value optimized out>, uap=0xfffffe08595cba30) at /usr/src/sys/kern/sys_generic.c:745 #14 0xffffffff80ee0394 in amd64_syscall (td=0xfffff8000ab48000, traced=0) at subr_syscall.c:135 #15 0xffffffff80ec39bb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #16 0x0000000000446adc in ?? () Previous frame inner to this frame (corrupt stack?)
Worth noting the chip perhaps: mps0 Adapter: Board Name: SAS9207-4i4e Board Assembly: H3-25434-00K Chip Name: LSISAS2308 Chip Revision: ALL BIOS Revision: 7.39.00.00 Firmware Revision: 20.00.04.00
The technical merits of your bug report notwithstanding, I would downgrade to P19 for sanity's sake. There are a host of issues with the P20 releases even with the official P20 drivers on other platforms (ESX); it's tough to figure out what's FreeBSD's fault and what's Broadcom/Avago's. (Speaking from similar experience after upgrading to 11.0 from 10.1)
You could easily reproduce it by calling `sas2ircu LABEL` sub-command on any vdev in a zpool. It does not happen (in my case) if physical disk is not in a zpool. Some more information taken from vmcore: (kgdb) bt #0 doadump () at pcpu.h:234 #1 0xffffffff80b050e8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:388 #2 0xffffffff80b05508 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:781 #3 0xffffffff80b05343 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:712 #4 0xffffffff80dfc2c6 in vm_fault_hold (map=<value optimized out>, vaddr=<value optimized out>, fault_type=<value optimized out>, fault_flags=<value optimized out>, m_hold=<value optimized out>) at /usr/src/sys/vm/vm_fault.c:561 #5 0xffffffff80df9db5 in vm_fault (map=0xfffff80003000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:512 #6 0xffffffff80f89675 in trap_pfault (frame=0xfffffe085b757610, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:805 #7 0xffffffff80f88bdd in trap (frame=0xfffffe085b757610) at /usr/src/sys/amd64/amd64/trap.c:438 #8 0xffffffff80f68d9c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231 #9 0xffffffff80f8696e in copyout () at /usr/src/sys/amd64/amd64/support.S:254 #10 0xffffffff8069c502 in mps_ioctl (dev=<value optimized out>, cmd=<value optimized out>, arg=<value optimized out>) at /usr/src/sys/dev/mps/mps_user.c:1040 #11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000) at /usr/src/sys/fs/devfs/devfs_vnops.c:791 #12 0xffffffff80b68637 in kern_ioctl (td=0xfffff800251f5000, fd=5, com=3224914180, data=<value optimized out>) at src/sys/sys/file.h:323 #13 0xffffffff80b6835b in sys_ioctl (td=0xfffff800251f5000, uap=0xfffff800251f5538) at /usr/src/sys/kern/sys_generic.c:745 #14 0xffffffff80f8a5f6 in amd64_syscall (td=0xfffff800251f5000, traced=0) at src/sys/amd64/amd64/../../kern/subr_syscall.c:132 #15 0xffffffff80f6967d in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:494 #16 0x0000000000446adc in ?? () Previous frame inner to this frame (corrupt stack?) Frame 10: (kgdb) up #10 0xffffffff8069c502 in mps_ioctl (dev=<value optimized out>, cmd=<value optimized out>, arg=<value optimized out>) at /usr/src/sys/dev/mps/mps_user.c:1040 1040 copyout(cm->cm_reply, PTRIN(data->PtrReply), data->ReplySize); Current language: auto; currently minimal (kgdb) list 1035 mps_printf(sc, "%s: user reply buffer (%d) smaller " 1036 "than returned buffer (%d)\n", __func__, 1037 data->ReplySize, sz); 1038 } 1039 mps_unlock(sc); 1040 copyout(cm->cm_reply, PTRIN(data->PtrReply), data->ReplySize); 1041 mps_lock(sc); 1042 1043 if ((function == MPI2_FUNCTION_SCSI_IO_REQUEST) || 1044 (function == MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH)) { Frame 11: (kgdb) up #11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000) at /usr/src/sys/fs/devfs/devfs_vnops.c:791 791 error = dsw->d_ioctl(dev, com, data, fp->f_flag, td); (kgdb) list 786 error = copyout(p, fgn->buf, i); 787 td->td_fpop = fpop; 788 dev_relthread(dev, ref); 789 return (error); 790 } 791 error = dsw->d_ioctl(dev, com, data, fp->f_flag, td); 792 td->td_fpop = NULL; 793 dev_relthread(dev, ref); 794 if (error == ENOIOCTL) 795 error = ENOTTY;
(kgdb) up #11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000) at /usr/src/sys/fs/devfs/devfs_vnops.c:791 791 error = dsw->d_ioctl(dev, com, data, fp->f_flag, td); (kgdb) list 786 error = copyout(p, fgn->buf, i); 787 td->td_fpop = fpop; 788 dev_relthread(dev, ref); 789 return (error); 790 } 791 error = dsw->d_ioctl(dev, com, data, fp->f_flag, td); 792 td->td_fpop = NULL; 793 dev_relthread(dev, ref); 794 if (error == ENOIOCTL) 795 error = ENOTTY;
Babak, which firmware are you running?
(In reply to Mahmoud Al-Qudsi from comment #5) Controller and Firmware information: Controller type : SAS2008 BIOS version : 7.39.02.00 Firmware version : 20.00.07.00 Firmware Revision : A384
Downgraded firmware to P19 and I can still reproduce the same panic.
Created attachment 215361 [details] proposed patch Would anyone be able to test the attached patch against stable/11?
(In reply to Mark Johnston from comment #8) I will test the patch and provide feedback soon.
(In reply to Mark Johnston from comment #8) Applied the patched on 12-STABLE and I could not reproduce the panic. It seems to be fixed. Thanks!
A commit references this bug: Author: markj Date: Thu Jun 11 14:48:20 UTC 2020 New revision: 362057 URL: https://svnweb.freebsd.org/changeset/base/362057 Log: MFC r342660 (by scottl): Port over the SCSI sense handling fix from mpr(4) in r342528, and fix whitespace to match. PR: 223813 Reported and tested by: farrokhi Changes: _U stable/12/ stable/12/sys/dev/mps/mps_user.c
A commit references this bug: Author: markj Date: Thu Jun 11 14:49:38 UTC 2020 New revision: 362058 URL: https://svnweb.freebsd.org/changeset/base/362058 Log: MFC r342660 (by scottl): Port over the SCSI sense handling fix from mpr(4) in r342528, and fix whitespace to match. PR: 223813 Reported and tested by: farrokhi Changes: _U stable/11/ stable/11/sys/dev/mps/mps_user.c
(In reply to Babak Farrokhi from comment #10) Thank you. I will work on getting EN patches released for 12.1 and 11.4, since it's too late to patch the release. If anyone can confirm that the patch fixes the problem on stable/11, I would appreciate it.
(In reply to Mark Johnston from comment #13) Successfully tested on 11-STABLE.
(In reply to Babak Farrokhi from comment #14) Thank you. I submitted an EN request to secteam.
Author: gordon Date: Wed Jul 8 19:58:00 2020 New Revision: 363024 URL: https://svnweb.freebsd.org/changeset/base/363024 Log: Fix kernel panic in mps(4) driver. Approved by: so Security: FreeBSD-EN-20:15.mps Modified: releng/11.3/sys/dev/mps/mps_user.c releng/11.4/sys/dev/mps/mps_user.c releng/12.1/sys/dev/mps/mps_user.c