I'm using iSCSI to connect to a remote drive over an SSH tunnel. When the connection drops, I get a kernel panic and reboot. This has occurred three times in the past three days. dmesg: WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff802e1b6e stack pointer = 0x28:0xfffffe0119c68ae0 frame pointer = 0x28:0xfffffe0119c68b10 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (iscsimt) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80963000 at kdb_backtrace+0x60 #1 0xffffffff80928125 at panic+0x155 #2 0xffffffff80d24f0f at trap_fatal+0x38f #3 0xffffffff80d24b6c at trap+0x75c #4 0xffffffff80d0a772 at calltrap+0x8 #5 0xffffffff81e43f47 at iscsi_session_terminate_task+0x57 #6 0xffffffff81e43ed0 at iscsi_session_cleanup+0x1c0 #7 0xffffffff81e43a78 at iscsi_maintenance_thread+0x58 #8 0xffffffff808f8b6a at fork_exit+0x9a #9 0xffffffff80d0acae at fork_trampoline+0xe Uptime: 15h46m39s (da0:iscsi1:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da0:iscsi1:0:0:0): CAM status: Command timeout (da0:iscsi1:0:0:0): Error 5, Retries exhausted (da0:iscsi1:0:0:0): Synchronize cache failed Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... Copyright (c) 1992-2014 The FreeBSD Project. System info: GENERIC kernel uname -a FreeBSD legion.textplain.net 10.1-RELEASE-p6 FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 19:00:21 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
Can you obtain a crashdump and do "where" in kgdb, as documented below? Thanks! https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
(In reply to Edward Tomasz Napierala from comment #1) I don't have any core dumps under /var/crash. dumpdev="AUTO" in /etc/rc.conf. I have a ZFS mirror of two drives, each with swap space. Here's some more info: root@legion:/var/crash # swapinfo Device 1K-blocks Used Avail Capacity /dev/ada0p2.eli 2097152 0 2097152 0% /dev/ada1p2.eli 2097152 0 2097152 0% Total 4194304 0 4194304 0% root@legion:/var/crash # savecore -C -vv unable to open bounds file, using 0 checking for kernel dump on device /dev/ada0p2.eli mediasize = 2147483648 sectorsize = 4096 savecore: error reading last dump header at offset 2147479552 in /dev/ada0p2.eli: Invalid argument unable to open bounds file, using 0 checking for kernel dump on device /dev/ada1p2.eli mediasize = 2147483648 sectorsize = 4096 savecore: error reading last dump header at offset 2147479552 in /dev/ada1p2.eli: Invalid argument No dump exists I tried using -f, but got the same output. I'm unable to find the solution online. If you can help me get crashdumps working, I'll try to trigger the crash again.
Could you try to hook an USB flash drive and use that for crashdump?
Dumping to the flash drive worked. I can reliably trigger the panic and I have a dump. I use freebsd-update, so I don't have a kernel.debug and I'm not sure how to build one that matches my 10.1-RELEASE-p6 GENERIC kernel. So here's the dump: https://cumulus.textplain.net/index.php/s/rr8a7OQJybNjl5L dmesg: WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting (da1:iscsi1:0:0:0): WRITE(10). CDB: 2a 00 08 20 ff 3a 00 00 1e 00 (da1:iscsi1:0:0:0): CAM status: SCSI Status Error (da1:iscsi1:0:0:0): SCSI status: Check Condition (da1:iscsi1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,7 (I_T nexus loss occurred) (da1:iscsi1:0:0:0): Retrying command (per sense data) WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting WARNING: localhost (iqn.cryo): login timed out after 61 seconds; reconnecting (da1:iscsi1:0:0:0): WRITE(10). CDB: 2a 00 08 20 d4 47 00 00 20 00 (da1:iscsi1:0:0:0): CAM status: SCSI Status Error (da1:iscsi1:0:0:0): SCSI status: Check Condition (da1:iscsi1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,7 (I_T nexus loss occurred) (da1:iscsi1:0:0:0): Retrying command (per sense data) WARNING: icl_pdu_append_data: failed to allocate mbuf for 131072 bytes (da1:iscsi1:0:0:0): WRITE(10). CDB: 2a 00 0c 20 4d 48 00 00 20 00 (da1:iscsi1:0:0:0): CAM status: Resource Unavailable (da1:iscsi1:0:0:0): Retrying command WARNING: icl_pdu_append_data: failed to allocate mbuf for 126976 bytes (da1:iscsi1:0:0:0): WRITE(10). CDB: 2a 00 08 27 a3 73 00 00 1f 00 (da1:iscsi1:0:0:0): CAM status: Resource Unavailable (da1:iscsi1:0:0:0): Retrying command WARNING: icl_pdu_append_data: failed to allocate mbuf for 126976 bytes (da1:iscsi1:0:0:0): WRITE(10). CDB: 2a 00 08 27 a3 92 00 00 1f 00 (da1:iscsi1:0:0:0): CAM status: Resource Unavailable (da1:iscsi1:0:0:0): Retrying command WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting WARNING: localhost (iqn.cryo): no ping reply (NOP-In) after 5 seconds; reconnecting Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff802e1b6e stack pointer = 0x28:0xfffffe011a1c1ae0 frame pointer = 0x28:0xfffffe011a1c1b10 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (iscsimt) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80963000 at kdb_backtrace+0x60 #1 0xffffffff80928125 at panic+0x155 #2 0xffffffff80d24f0f at trap_fatal+0x38f #3 0xffffffff80d24b6c at trap+0x75c #4 0xffffffff80d0a772 at calltrap+0x8 #5 0xffffffff81e43f47 at iscsi_session_terminate_task+0x57 #6 0xffffffff81e43ed0 at iscsi_session_cleanup+0x1c0 #7 0xffffffff81e43a78 at iscsi_maintenance_thread+0x58 #8 0xffffffff808f8b6a at fork_exit+0x9a #9 0xffffffff80d0acae at fork_trampoline+0xe Uptime: 1d6h58m29s (da1:iscsi1:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da1:iscsi1:0:0:0): CAM status: Command timeout (da1:iscsi1:0:0:0): Error 5, Retries exhausted (da1:iscsi1:0:0:0): Synchronize cache failed Dumping 954 out of 3930 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete Automatic reboot in 15 seconds - press a key on the console to abort Steps to reproduce: 1) Forward iSCSI traffic over SSH ssh -L 3260:localhost:3260 example.com 2) Connect to iSCSI device over SSH tunnel iscsictl -A -p localhost -t iqn.cryo 3) Decrypt drive geli attach /dev/da1p1 4) Import zpool zpool import cryo 5) Start zfs send from local pool to iSCSI pool (I use zxfer) 6) Router at remote site crashes and reboots (unrelated problem) 7) After about 10 minutes, kernel panic on local machine Sometimes simply killing the ssh tunnel is enough to cause the panic, but sometimes it causes zfs commands to hang until the connection is reestablished. When the connection is lost due to the remote router crashing, the panic happens every time. If there's anything else I can do to help, just let me know.
This is still an issue on 10.1-RELEASE-p9. Here's a stack trace. (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80927da2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff80928164 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80d24f5f in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:865 #4 0xffffffff80d25278 in trap_pfault (frame=0xfffffe011a263a30, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:676 #5 0xffffffff80d248da in trap (frame=0xfffffe011a263a30) at /usr/src/sys/amd64/amd64/trap.c:440 #6 0xffffffff80d0a7c2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #7 0xffffffff802e1b6e in xpt_freeze_devq (path=0x0, count=1) at /usr/src/sys/cam/cam_xpt.c:4300 #8 0xffffffff81e43f47 in iscsi_session_terminate_task (is=0xfffff800c06ca000, io=0xfffff800b53f7398, requeue=<value optimized out>) at /usr/src/sys/modules/iscsi/../../dev/iscsi/iscsi.c:297 #9 0xffffffff81e43ed0 in iscsi_session_cleanup (is=0xfffff800c06ca000, destroy_sim=<value optimized out>) at /usr/src/sys/modules/iscsi/../../dev/iscsi/iscsi.c:313 #10 0xffffffff81e43a78 in iscsi_maintenance_thread (arg=0xfffff800c06ca000) at /usr/src/sys/modules/iscsi/../../dev/iscsi/iscsi.c:396 #11 0xffffffff808f8b6a in fork_exit (callout=0xffffffff81e43a20 <iscsi_maintenance_thread>, arg=0xfffff800c06ca000, frame=0xfffffe011a263c00) at /usr/src/sys/kern/kern_fork.c:996 #12 0xffffffff80d0acfe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 #13 0x0000000000000000 in ?? ()
In kgdb, can you go up to iscsi_session_terminate_task() and do "p *io" and "p *io->io_ccb"? I didn't notice the NULL path there; looks suspicious.
(In reply to Edward Tomasz Napierala from comment #6) (kgdb) #8 0xffffffff81e43f47 in iscsi_session_terminate_task (is=0xfffff800c06ca000, io=0xfffff800b53f7398, requeue=<value optimized out>) at /usr/src/sys/modules/iscsi/../../dev/iscsi/iscsi.c:297 warning: Source file is more recent than executable. 297 xpt_freeze_devq(io->io_ccb->ccb_h.path, 1); (kgdb) p *io $1 = {io_next = {tqe_next = 0xfffff8005b47df78, tqe_prev = 0xfffff800c06ca180}, io_ccb = 0xfffff8002e0c1800, io_received = 0, io_initiator_task_tag = 74078, io_datasn = 0} (kgdb) p *io->io_ccb Cannot access memory at address 0xfffff8002e0c1800
Hi. I've tried to reproduce this under 11-CURRENT and failed. It's either that I'm doing something differently, or it was fixed in the meantime. Could you see if this still happens under 11-CURRENT, or at least 10.3-RC2? Thanks!
Damn, I was looking in the wrong place all along. Could you try the patch at https://reviews.freebsd.org/D5673? Thanks!
A commit references this bug: Author: trasz Date: Fri Mar 25 16:01:40 UTC 2016 New revision: 297268 URL: https://svnweb.freebsd.org/changeset/base/297268 Log: Fix iSCSI initiator crash that could happen with out-of-memory conditions with in-flight IO and subsequent reconnection. PR: 199117 MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5673 Changes: head/sys/dev/iscsi/iscsi.c