Bug 223813

Summary: [mps] page fault in mps_user_pass_thru() -> copyout() on 11.1-RELEASE-p4, sys/dev/mps/mps_user.c:1040
Product: Base System Reporter: Daniel Ylitalo <daniel>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Some People CC: avos, emaste, farrokhi, markj, mqudsi, pi, re
Priority: --- Keywords: panic, regression
Version: 11.1-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Entire core dump
none
proposed patch none

Description Daniel Ylitalo 2017-11-23 08:40:49 UTC
Created attachment 188209 [details]
Entire core dump

Hi!

I just upgraded our firewall from 11.0 to 11.1-p4, however after about ~35-45 minutes it panics. After some poking around i saw there were quite a bit of changes in the mps driver so I'm guessing a bug snuck in there somewhere.

I'm happy to apply a debug patch to get you more information to sort this out if you need to.

It panics with this stracktrace:

Unread portion of the kernel message buffer:
panic: vm_fault: fault on nofault entry, addr: fffffe00003eb000
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff80aadac7 at kdb_backtrace+0x67
#1 0xffffffff80a6bba6 at vpanic+0x186
#2 0xffffffff80a6ba13 at panic+0x43
#3 0xffffffff80d58b90 at vm_fault_hold+0x2070
#4 0xffffffff80d56ad5 at vm_fault+0x75
#5 0xffffffff80edf927 at trap_pfault+0xe7
#6 0xffffffff80edf0c6 at trap+0x286
#7 0xffffffff80ec36d1 at calltrap+0x8
#8 0xffffffff8067b346 at mps_ioctl+0x2e86
#9 0xffffffff8093ae38 at devfs_ioctl_f+0x128
#10 0xffffffff80ac9415 at kern_ioctl+0x255
#11 0xffffffff80ac914f at sys_ioctl+0x16f
#12 0xffffffff80ee0394 at amd64_syscall+0x6c4
#13 0xffffffff80ec39bb at Xfast_syscall+0xfb


And here is the doadump log:
(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:222
#1  0xffffffff80a6b721 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80a6bbe0 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80a6ba13 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80d58b90 in vm_fault_hold (map=<value optimized out>,
    vaddr=<value optimized out>, fault_type=1 '\001',
    fault_flags=<value optimized out>, m_hold=0x0)
    at /usr/src/sys/vm/vm_fault.c:524
#5  0xffffffff80d56ad5 in vm_fault (map=0xfffff80003000000,
    vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0)
    at /usr/src/sys/vm/vm_fault.c:475
#6  0xffffffff80edf927 in trap_pfault (frame=0xfffffe08595cb510, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:708
#7  0xffffffff80edf0c6 in trap (frame=0xfffffe08595cb510)
    at /usr/src/sys/amd64/amd64/trap.c:421
#8  0xffffffff80ec36d1 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff80edd63f in copyout () at /usr/src/sys/amd64/amd64/support.S:255
#10 0xffffffff8067b346 in mps_ioctl () at /usr/src/sys/dev/mps/mps_user.c:1040
#11 0xffffffff8093ae38 in devfs_ioctl_f (fp=0xfffff80013466e10,
    com=3224914180, data=0xfffffe08595cb870, cred=0xfffff80013892500,
    td=0xfffff8000ab48000) at /usr/src/sys/fs/devfs/devfs_vnops.c:791
#12 0xffffffff80ac9415 in kern_ioctl (td=<value optimized out>, fd=3,
    com=<value optimized out>, data=<value optimized out>) at file.h:323
#13 0xffffffff80ac914f in sys_ioctl (td=<value optimized out>,
    uap=0xfffffe08595cba30) at /usr/src/sys/kern/sys_generic.c:745
#14 0xffffffff80ee0394 in amd64_syscall (td=0xfffff8000ab48000, traced=0)
    at subr_syscall.c:135
#15 0xffffffff80ec39bb in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:396
#16 0x0000000000446adc in ?? ()
Previous frame inner to this frame (corrupt stack?)
Comment 1 Daniel Ylitalo 2017-11-23 09:35:02 UTC
Worth noting the chip perhaps:

mps0 Adapter:
       Board Name: SAS9207-4i4e
   Board Assembly: H3-25434-00K
        Chip Name: LSISAS2308
    Chip Revision: ALL
    BIOS Revision: 7.39.00.00
Firmware Revision: 20.00.04.00
Comment 2 Mahmoud Al-Qudsi 2018-01-02 01:33:56 UTC
The technical merits of your bug report notwithstanding, I would downgrade to P19 for sanity's sake. There are a host of issues with the P20 releases even with the official P20 drivers on other platforms (ESX); it's tough to figure out what's FreeBSD's fault and what's Broadcom/Avago's. 

(Speaking from similar experience after upgrading to 11.0 from 10.1)
Comment 3 Babak Farrokhi freebsd_committer 2020-03-17 14:52:21 UTC
You could easily reproduce it by calling `sas2ircu LABEL` sub-command on any vdev in a zpool. It does not happen (in my case) if physical disk is not in a zpool. 

Some more information taken from vmcore:

(kgdb) bt
#0  doadump () at pcpu.h:234
#1  0xffffffff80b050e8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:388
#2  0xffffffff80b05508 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:781
#3  0xffffffff80b05343 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:712
#4  0xffffffff80dfc2c6 in vm_fault_hold (map=<value optimized out>, vaddr=<value optimized out>, fault_type=<value optimized out>, 
    fault_flags=<value optimized out>, m_hold=<value optimized out>) at /usr/src/sys/vm/vm_fault.c:561
#5  0xffffffff80df9db5 in vm_fault (map=0xfffff80003000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0)
    at /usr/src/sys/vm/vm_fault.c:512
#6  0xffffffff80f89675 in trap_pfault (frame=0xfffffe085b757610, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:805
#7  0xffffffff80f88bdd in trap (frame=0xfffffe085b757610) at /usr/src/sys/amd64/amd64/trap.c:438
#8  0xffffffff80f68d9c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#9  0xffffffff80f8696e in copyout () at /usr/src/sys/amd64/amd64/support.S:254
#10 0xffffffff8069c502 in mps_ioctl (dev=<value optimized out>, cmd=<value optimized out>, arg=<value optimized out>)
    at /usr/src/sys/dev/mps/mps_user.c:1040
#11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:791
#12 0xffffffff80b68637 in kern_ioctl (td=0xfffff800251f5000, fd=5, com=3224914180, data=<value optimized out>) at src/sys/sys/file.h:323
#13 0xffffffff80b6835b in sys_ioctl (td=0xfffff800251f5000, uap=0xfffff800251f5538) at /usr/src/sys/kern/sys_generic.c:745
#14 0xffffffff80f8a5f6 in amd64_syscall (td=0xfffff800251f5000, traced=0) at src/sys/amd64/amd64/../../kern/subr_syscall.c:132
#15 0xffffffff80f6967d in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:494
#16 0x0000000000446adc in ?? ()
Previous frame inner to this frame (corrupt stack?)



Frame 10:

(kgdb) up
#10 0xffffffff8069c502 in mps_ioctl (dev=<value optimized out>, cmd=<value optimized out>, arg=<value optimized out>)
    at /usr/src/sys/dev/mps/mps_user.c:1040
1040			copyout(cm->cm_reply, PTRIN(data->PtrReply), data->ReplySize);
Current language:  auto; currently minimal
(kgdb) list
1035				mps_printf(sc, "%s: user reply buffer (%d) smaller "
1036				    "than returned buffer (%d)\n", __func__,
1037				    data->ReplySize, sz);
1038			}
1039			mps_unlock(sc);
1040			copyout(cm->cm_reply, PTRIN(data->PtrReply), data->ReplySize);
1041			mps_lock(sc);
1042	
1043			if ((function == MPI2_FUNCTION_SCSI_IO_REQUEST) ||
1044			    (function == MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH)) {



Frame 11:

(kgdb) up
#11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:791
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
(kgdb) list
786				error = copyout(p, fgn->buf, i);
787			td->td_fpop = fpop;
788			dev_relthread(dev, ref);
789			return (error);
790		}
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
792		td->td_fpop = NULL;
793		dev_relthread(dev, ref);
794		if (error == ENOIOCTL)
795			error = ENOTTY;
Comment 4 Babak Farrokhi freebsd_committer 2020-03-17 15:41:07 UTC
(kgdb) up
#11 0xffffffff809d24a8 in devfs_ioctl_f (fp=0xfffff80010ed3320, com=3224914180, data=0xfffffe085b7578d0, cred=0xfffff80100e65e00, td=0xfffff800251f5000)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:791
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
(kgdb) list
786				error = copyout(p, fgn->buf, i);
787			td->td_fpop = fpop;
788			dev_relthread(dev, ref);
789			return (error);
790		}
791		error = dsw->d_ioctl(dev, com, data, fp->f_flag, td);
792		td->td_fpop = NULL;
793		dev_relthread(dev, ref);
794		if (error == ENOIOCTL)
795			error = ENOTTY;
Comment 5 Mahmoud Al-Qudsi 2020-03-17 17:35:46 UTC
Babak, which firmware are you running?
Comment 6 Babak Farrokhi freebsd_committer 2020-03-18 08:15:23 UTC
(In reply to Mahmoud Al-Qudsi from comment #5)

Controller and Firmware information:

Controller type      : SAS2008
BIOS version         : 7.39.02.00
Firmware version     : 20.00.07.00
Firmware Revision    : A384
Comment 7 Babak Farrokhi freebsd_committer 2020-03-18 11:21:32 UTC
Downgraded firmware to P19 and I can still reproduce the same panic.
Comment 8 Mark Johnston freebsd_committer 2020-06-08 14:28:03 UTC
Created attachment 215361 [details]
proposed patch

Would anyone be able to test the attached patch against stable/11?
Comment 9 Babak Farrokhi freebsd_committer 2020-06-09 08:53:00 UTC
(In reply to Mark Johnston from comment #8)

I will test the patch and provide feedback soon.
Comment 10 Babak Farrokhi freebsd_committer 2020-06-11 08:44:40 UTC
(In reply to Mark Johnston from comment #8)
Applied the patched on 12-STABLE and I could not reproduce the panic. It seems to be fixed. Thanks!
Comment 11 commit-hook freebsd_committer 2020-06-11 14:48:27 UTC
A commit references this bug:

Author: markj
Date: Thu Jun 11 14:48:20 UTC 2020
New revision: 362057
URL: https://svnweb.freebsd.org/changeset/base/362057

Log:
  MFC r342660 (by scottl):
  Port over the SCSI sense handling fix from mpr(4) in r342528, and fix
  whitespace to match.

  PR:		223813
  Reported and tested by:	farrokhi

Changes:
_U  stable/12/
  stable/12/sys/dev/mps/mps_user.c
Comment 12 commit-hook freebsd_committer 2020-06-11 14:50:33 UTC
A commit references this bug:

Author: markj
Date: Thu Jun 11 14:49:38 UTC 2020
New revision: 362058
URL: https://svnweb.freebsd.org/changeset/base/362058

Log:
  MFC r342660 (by scottl):
  Port over the SCSI sense handling fix from mpr(4) in r342528, and fix
  whitespace to match.

  PR:		223813
  Reported and tested by:	farrokhi

Changes:
_U  stable/11/
  stable/11/sys/dev/mps/mps_user.c
Comment 13 Mark Johnston freebsd_committer 2020-06-11 14:51:45 UTC
(In reply to Babak Farrokhi from comment #10)
Thank you.  I will work on getting EN patches released for 12.1 and 11.4, since it's too late to patch the release.  If anyone can confirm that the patch fixes the problem on stable/11, I would appreciate it.
Comment 14 Babak Farrokhi freebsd_committer 2020-06-17 16:43:12 UTC
(In reply to Mark Johnston from comment #13)
Successfully tested on 11-STABLE.
Comment 15 Mark Johnston freebsd_committer 2020-06-17 19:50:23 UTC
(In reply to Babak Farrokhi from comment #14)
Thank you.  I submitted an EN request to secteam.
Comment 16 Ed Maste freebsd_committer 2020-07-08 20:06:55 UTC
Author: gordon
Date: Wed Jul  8 19:58:00 2020
New Revision: 363024
URL: https://svnweb.freebsd.org/changeset/base/363024

 Log:
  Fix kernel panic in mps(4) driver.

  Approved by:  so
  Security:     FreeBSD-EN-20:15.mps

Modified:
  releng/11.3/sys/dev/mps/mps_user.c
  releng/11.4/sys/dev/mps/mps_user.c
  releng/12.1/sys/dev/mps/mps_user.c