Bug 246280 - ciss driver causes immediate system crash&reboot if external SAS cable unplugged
Summary: ciss driver causes immediate system crash&reboot if external SAS cable unplugged
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2020-05-07 11:17 UTC by Peter Eriksson
Modified: 2020-05-24 21:42 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2020-05-07 11:17:33 UTC
If the SAS cable (or the I/O module on the D6020 fails) between a D6020 external SAS JBOB cabinet is unplugged the kernel panics & immediately reboots. This is a bit annoying :-)

I've so far been unable to catch an error message since it reboots so quickly but will try to have a serial console attached while unplugging a cable on a test server...
Comment 1 Peter Eriksson 2020-05-07 11:18:40 UTC
HP D6020 cabinets
HP H241 SmartHBA SAS controllers (in JBOD mode)
HP DL380 Gen 9 servers
FreeBSD 12.1-RELEASE-p3
Comment 2 Peter Eriksson 2020-05-20 14:43:18 UTC
It doesn't happen all the time, but I managed to provoke it finally by unplugging and the reconnecting it while having some I/O active:

ciss0: *** Hot-plug drive inserted, Port=1E Box=2 Bay=3 SN=            5PGTTP6C
ciss0: *** Hot-plug drive inserted, Port=1E Box=2 Bay=2 SN=            5PGU1RZC
ciss0: *** Hot-plug drive inserted, Port=1E Box=2 Bay=1 SN=            5PGTSWAC
ciss0: *** Expander Link Up, Port=2E Box=1 Exp=1 Phy=44 Port on module=255
ciss0: *** Expander Link Up, Port=2E Box=1 Exp=1 Phy=45 Port on module=255
ciss0: *** Expander Link Up, Port=2E Box=1 Exp=1 Phy=46 Port on module=255
ciss0: *** Expander Link Up, Port=2E Box=1 Exp=1 Phy=47 Port on module=255
ciss0: *** Enclosure added, Port=2E, storage system 2 - 7CE952P09P
ciss0: Unknown hotplug event 6
ciss0: *** Hot-plug drive inserted, Port=2E Box=2 Bay=35 SN=            5PGLBRRE


Fatal trap 18: integer divide fault while in kernel mode
cpuid = 12; apic id = 14
instruction pointer     = 0x20:0xffffffff805c692a
stack pointer           = 0x28:0xfffffe01fa4f1b60
frame pointer           = 0x28:0xfffffe01fa4f1bb0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 25 (ciss_notify0)
trap number             = 18
panic: integer divide fault
cpuid = 12
time = 1589905482
KDB: stack backtrace:
#0 0xffffffff80c1ce67 at kdb_backtrace+0x67
#1 0xffffffff80bcff4d at vpanic+0x19d
#2 0xffffffff80bcfda3 at panic+0x43
#3 0xffffffff810aabbc at trap_fatal+0x39c
#4 0xffffffff810aa00a at trap+0x6a
#5 0xffffffff810832cc at calltrap+0x8
#6 0xffffffff80b901e3 at fork_exit+0x83
#7 0xffffffff8108431e at fork_trampoline+0xe

I'm guessing this happens in sys/dev/ciss/ciss.c:ciss_notify_thread():

       cr = ciss_dequeue_notify(sc);

       if (cr == NULL)
                panic("cr null");