Bug 223813

Summary: [mps] page fault in mps_user_pass_thru() -> copyout() on 11.1-RELEASE-p4, sys/dev/mps/mps_user.c:1040
Product: Base System Reporter: Daniel Ylitalo <daniel>
Component: kernAssignee: freebsd-bugs mailing list <bugs>
Status: New ---    
Severity: Affects Some People CC: avos, farrokhi, mqudsi
Priority: --- Keywords: panic
Version: 11.1-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Entire core dump none

Description Daniel Ylitalo 2017-11-23 08:40:49 UTC
Created attachment 188209 [details]
Entire core dump

Hi!

I just upgraded our firewall from 11.0 to 11.1-p4, however after about ~35-45 minutes it panics. After some poking around i saw there were quite a bit of changes in the mps driver so I'm guessing a bug snuck in there somewhere.

I'm happy to apply a debug patch to get you more information to sort this out if you need to.

It panics with this stracktrace:

Unread portion of the kernel message buffer:
panic: vm_fault: fault on nofault entry, addr: fffffe00003eb000
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff80aadac7 at kdb_backtrace+0x67
#1 0xffffffff80a6bba6 at vpanic+0x186
#2 0xffffffff80a6ba13 at panic+0x43
#3 0xffffffff80d58b90 at vm_fault_hold+0x2070
#4 0xffffffff80d56ad5 at vm_fault+0x75
#5 0xffffffff80edf927 at trap_pfault+0xe7
#6 0xffffffff80edf0c6 at trap+0x286
#7 0xffffffff80ec36d1 at calltrap+0x8
#8 0xffffffff8067b346 at mps_ioctl+0x2e86
#9 0xffffffff8093ae38 at devfs_ioctl_f+0x128
#10 0xffffffff80ac9415 at kern_ioctl+0x255
#11 0xffffffff80ac914f at sys_ioctl+0x16f
#12 0xffffffff80ee0394 at amd64_syscall+0x6c4
#13 0xffffffff80ec39bb at Xfast_syscall+0xfb


And here is the doadump log:
(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:222
#1  0xffffffff80a6b721 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80a6bbe0 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80a6ba13 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80d58b90 in vm_fault_hold (map=<value optimized out>,
    vaddr=<value optimized out>, fault_type=1 '\001',
    fault_flags=<value optimized out>, m_hold=0x0)
    at /usr/src/sys/vm/vm_fault.c:524
#5  0xffffffff80d56ad5 in vm_fault (map=0xfffff80003000000,
    vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0)
    at /usr/src/sys/vm/vm_fault.c:475
#6  0xffffffff80edf927 in trap_pfault (frame=0xfffffe08595cb510, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:708
#7  0xffffffff80edf0c6 in trap (frame=0xfffffe08595cb510)
    at /usr/src/sys/amd64/amd64/trap.c:421
#8  0xffffffff80ec36d1 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff80edd63f in copyout () at /usr/src/sys/amd64/amd64/support.S:255
#10 0xffffffff8067b346 in mps_ioctl () at /usr/src/sys/dev/mps/mps_user.c:1040
#11 0xffffffff8093ae38 in devfs_ioctl_f (fp=0xfffff80013466e10,
    com=3224914180, data=0xfffffe08595cb870, cred=0xfffff80013892500,
    td=0xfffff8000ab48000) at /usr/src/sys/fs/devfs/devfs_vnops.c:791
#12 0xffffffff80ac9415 in kern_ioctl (td=<value optimized out>, fd=3,
    com=<value optimized out>, data=<value optimized out>) at file.h:323
#13 0xffffffff80ac914f in sys_ioctl (td=<value optimized out>,
    uap=0xfffffe08595cba30) at /usr/src/sys/kern/sys_generic.c:745
#14 0xffffffff80ee0394 in amd64_syscall (td=0xfffff8000ab48000, traced=0)
    at subr_syscall.c:135
#15 0xffffffff80ec39bb in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:396
#16 0x0000000000446adc in ?? ()
Previous frame inner to this frame (corrupt stack?)
Comment 1 Daniel Ylitalo 2017-11-23 09:35:02 UTC
Worth noting the chip perhaps:

mps0 Adapter:
       Board Name: SAS9207-4i4e
   Board Assembly: H3-25434-00K
        Chip Name: LSISAS2308
    Chip Revision: ALL
    BIOS Revision: 7.39.00.00
Firmware Revision: 20.00.04.00
Comment 2 Mahmoud Al-Qudsi 2018-01-02 01:33:56 UTC
The technical merits of your bug report notwithstanding, I would downgrade to P19 for sanity's sake. There are a host of issues with the P20 releases even with the official P20 drivers on other platforms (ESX); it's tough to figure out what's FreeBSD's fault and what's Broadcom/Avago's. 

(Speaking from similar experience after upgrading to 11.0 from 10.1)
Comment 3 Babak Farrokhi freebsd_committer 2020-03-17 14:52:21 UTC
You could easily reproduce it by calling `sas2ircu LABEL` sub-command on any vdev in a zpool. It does not happen (in my case) if physical disk is not in a zpool. 

Some more information taken from vmcore:

(kgdb) bt
#0  doadump () at pcpu.h:234
#1  0xffffffff80b050e8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:388
#2  0xffffffff80b05508 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:781
#3  0xffffffff80b05343 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:712
#4  0xffffffff80dfc2c6 in vm_fault_hold (map=<value optimized out>, vaddr=<value optimized out>, fault_type=<value optimized out>, 
    fault_flags=<value optimized out>, m_hold=<value optimized out>) at /usr/src/sys/vm/vm_fault.c:561
#5  0xffffffff80df9db5 in vm_fault (map=0xfffff80003000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0)
    at /usr/src/sys/vm/vm_fault.c:512
#6  0xffffffff80f89675 in trap_pfault (frame=0xfffffe085b757610, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:805
#7  0xffffffff80f88bdd in trap (frame=0xfffffe085b757610) at /usr/src/sys/amd64/amd64/trap.c:438
#8  0xffffffff80f68d9c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#9  0xffffffff80f8696e in copyout () at /usr/src/sys/amd64/amd64/support.S:254
#10 0xffffffff8069c502 in mps_ioctl (dev=<value optimized out>, cmd=<value optimized out>, arg=<value optimized out>)
    at /usr/src/sys/dev/mps/mps_user.c:1040
#11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:791
#12 0xffffffff80b68637 in kern_ioctl (td=0xfffff800251f5000, fd=5, com=3224914180, data=<value optimized out>) at src/sys/sys/file.h:323
#13 0xffffffff80b6835b in sys_ioctl (td=0xfffff800251f5000, uap=0xfffff800251f5538) at /usr/src/sys/kern/sys_generic.c:745
#14 0xffffffff80f8a5f6 in amd64_syscall (td=0xfffff800251f5000, traced=0) at src/sys/amd64/amd64/../../kern/subr_syscall.c:132
#15 0xffffffff80f6967d in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:494
#16 0x0000000000446adc in ?? ()
Previous frame inner to this frame (corrupt stack?)



Frame 10:

(kgdb) up
#10 0xffffffff8069c502 in mps_ioctl (dev=<value optimized out>, cmd=<value optimized out>, arg=<value optimized out>)
    at /usr/src/sys/dev/mps/mps_user.c:1040
1040			copyout(cm->cm_reply, PTRIN(data->PtrReply), data->ReplySize);
Current language:  auto; currently minimal
(kgdb) list
1035				mps_printf(sc, "%s: user reply buffer (%d) smaller "
1036				    "than returned buffer (%d)\n", __func__,
1037				    data->ReplySize, sz);
1038			}
1039			mps_unlock(sc);
1040			copyout(cm->cm_reply, PTRIN(data->PtrReply), data->ReplySize);
1041			mps_lock(sc);
1042	
1043			if ((function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
1044			    (function == MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH)) {



Frame 11:

(kgdb) up
#11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:791
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
(kgdb) list
786				error = copyout(p, fgn->buf, i);
787			td->td_fpop = fpop;
788			dev_relthread(dev, ref);
789			return (error);
790		}
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
792		td->td_fpop = NULL;
793		dev_relthread(dev, ref);
794		if (error == ENOIOCTL)
795			error = ENOTTY;
Comment 4 Babak Farrokhi freebsd_committer 2020-03-17 15:41:07 UTC
(kgdb) up
#11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:791
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
(kgdb) list
786				error = copyout(p, fgn->buf, i);
787			td->td_fpop = fpop;
788			dev_relthread(dev, ref);
789			return (error);
790		}
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
792		td->td_fpop = NULL;
793		dev_relthread(dev, ref);
794		if (error == ENOIOCTL)
795			error = ENOTTY;
Comment 5 Mahmoud Al-Qudsi 2020-03-17 17:35:46 UTC
Babak, which firmware are you running?
Comment 6 Babak Farrokhi freebsd_committer 2020-03-18 08:15:23 UTC
(In reply to Mahmoud Al-Qudsi from comment #5)

Controller and Firmware information:

Controller type      : SAS2008
BIOS version         : 7.39.02.00
Firmware version     : 20.00.07.00
Firmware Revision    : A384
Comment 7 Babak Farrokhi freebsd_committer 2020-03-18 11:21:32 UTC
Downgraded firmware to P19 and I can still reproduce the same panic.