Env: ==== # uname -a FreeBSD san01.bsh-ru.playkey.net 11.0-RELEASE-p7 FreeBSD 11.0-RELEASE-p7 #2 r314026M: Wed Mar 29 06:54:29 UTC 2017 emz@san01.bsh-ru.playkey.net:/usr/obj/usr/src/sys/SAN amd64 Symtoms: ======== [root@san01:~]# ps ax | grep ctld 838 - Ds 1094:40,74 /usr/sbin/ctld 12354 2 S+ 0:00,00 grep ctld [root@san01:~]# ps alx | grep ctld 0 838 1 0 21 0 1188720 1173044 cfiscsi_ Ds - 1094:40,74 /usr/sbin/ctld 0 12647 81380 0 20 0 14836 2540 piperd S+ 2 0:00,00 grep ctld [root@san01:~]# procstat -k 838 PID TID COMM TDNAME KSTACK 838 102821 ctld - mi_switch sleepq_wait _cv_wait cfiscsi_offline ctl_port_offline ctl_ioctl devfs_ioctl_f kern_ioctl sys_ioctl amd64_syscall Xfast_syscall [root@san01:~]# procstat -k 838 PID TID COMM TDNAME KSTACK 838 102821 ctld - mi_switch sleepq_wait _cv_wait cfiscsi_offline ctl_port_offline ctl_ioctl devfs_ioctl_f kern_ioctl sys_ioctl amd64_syscall Xfast_syscall ctld in this state stops reacting on HUPs, still serves the old targets in it's config but is unable to load any new ones.
I'd be very suspicious of a problem with the disk subsystem in this system. Can you describe the disk subsystem and filesystem please?
I have several SAN systems like this one. This issue reproduces at least on two on these. The enviromnent is reported from the system which falls into this state more often. These two servers have similar hardware configurations, but not identical, here they are: the system from this PR has the following disk configuration: =========================== 32 Gigs RAM, default (not tweaked) ZFS ARC. disks in the internal enclosure (one expander). # camcontrol devlist <LSI SAS3x40 0601> at scbus1 target 4 lun 0 (ses0,pass0) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 5 lun 0 (pass1,da0) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 7 lun 0 (pass2,da1) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 8 lun 0 (pass3,da2) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 9 lun 0 (pass4,da3) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 11 lun 0 (pass5,da4) <ATA ST2000NX0253 SN02> at scbus1 target 12 lun 0 (pass6,da5) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 15 lun 0 (pass7,da6) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 16 lun 0 (pass8,da7) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 18 lun 0 (pass9,da8) <ATA ST2000NX0253 SN02> at scbus1 target 19 lun 0 (pass10,da9) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 20 lun 0 (pass11,da10) <ATA ST2000NX0253 SN02> at scbus1 target 21 lun 0 (pass12,da11) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 22 lun 0 (pass13,da12) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 23 lun 0 (pass14,da13) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 24 lun 0 (pass15,da14) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 25 lun 0 (pass16,da15) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 26 lun 0 (pass17,da16) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 27 lun 0 (pass18,da17) <ATA TOSHIBA THNSN81Q 6101> at scbus1 target 28 lun 0 (pass19,da18) <AHCI SGPIO Enclosure 1.00 0001> at scbus6 target 0 lun 0 (ses1,pass20) <AHCI SGPIO Enclosure 1.00 0001> at scbus13 target 0 lun 0 (ses2,pass21) # grep mrsas /var/run/dmesg.boot AVAGO MegaRAID SAS FreeBSD mrsas driver version: 06.709.07.00-fbsd mrsas0: <AVAGO Fury SAS Controller> port 0x6000-0x60ff mem 0xc7300000-0xc730ffff,0xc7200000-0xc72fffff irq 32 at device 0.0 numa-domain 0 on pci4 mrsas0: Using MSI-X with 24 number of vectors mrsas0: FW supports <96> MSIX vector,Online CPU 24 Current MSIX <24> mrsas0: FW supports SED mrsas0: FW supports JBOD Map mrsas0: Jbod map is supported mrsas0: MSI-x interrupts setup success # zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data 17,4T 12,2T 5,20T - 61% 70% 1.00x ONLINE - userdata 1,72T 143G 1,58T - 62% 8% 1.00x ONLINE - zroot 63,5G 27,7G 35,8G - 52% 43% 1.00x ONLINE - ctld serves several hundreds of targers, made up from zfs zvols and their clones (the list is really long, please tell me if you really want to see it of the ctl.conf). another system: ========================= 64 Gigs RAM, default (not tweaked) ZFS ARC. disks in the external enclosure (actually in an external derivative of the previous one, one expander, one external SAS cable). # grep mpr /var/run/dmesg.boot mpr0: <Avago Technologies (LSI) SAS3008> port 0xe000-0xe0ff mem 0xfbd40000-0xfbd4ffff,0xfbd00000-0xfbd3ffff irq 56 at device 0.0 numa-domain 1 on pci13 mpr0: Firmware: 14.00.00.00, Driver: 15.03.00.00-fbsd mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray> mpr0: SAS Address for SATA device = 4621345283685159 mpr0: SAS Address for SATA device = 4621345284576165 mpr0: SAS Address for SATA device = 46213452845d5553 mpr0: SAS Address for SATA device = 4621345285586058 mpr0: SAS Address for SATA device = 4621345285696061 mpr0: SAS Address for SATA device = 4621345285596163 mpr0: SAS Address for SATA device = 4621345285696265 mpr0: SAS Address for SATA device = 46213452855b4f54 mpr0: SAS Address for SATA device = 46213453716c4f55 mpr0: SAS Address for SATA device = 4621345285695458 mpr0: SAS Address for SATA device = 4621345372686062 mpr0: SAS Address for SATA device = 4621345286584e62 mpr0: SAS Address from SAS device page0 = 500304801f07c980 mpr0: SAS Address from SATA device = 4621345283685159 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000a> enclosureHandle<0x0002> slot 0 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c981 mpr0: SAS Address from SATA device = 4621345284576165 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000b> enclosureHandle<0x0002> slot 1 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c982 mpr0: SAS Address from SATA device = 46213452845d5553 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000c> enclosureHandle<0x0002> slot 2 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c983 mpr0: SAS Address from SATA device = 4621345285586058 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000d> enclosureHandle<0x0002> slot 3 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c984 mpr0: SAS Address from SATA device = 4621345285696061 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000e> enclosureHandle<0x0002> slot 4 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c985 mpr0: SAS Address from SATA device = 4621345285596163 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000f> enclosureHandle<0x0002> slot 5 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c986 mpr0: SAS Address from SATA device = 4621345285696265 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0010> enclosureHandle<0x0002> slot 6 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c987 mpr0: SAS Address from SATA device = 46213452855b4f54 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0011> enclosureHandle<0x0002> slot 7 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c988 mpr0: SAS Address from SATA device = 46213453716c4f55 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0012> enclosureHandle<0x0002> slot 8 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c989 mpr0: SAS Address from SATA device = 4621345285695458 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0013> enclosureHandle<0x0002> slot 9 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c98a mpr0: SAS Address from SATA device = 4621345372686062 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0014> enclosureHandle<0x0002> slot 10 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c98b mpr0: SAS Address from SATA device = 4621345286584e62 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0015> enclosureHandle<0x0002> slot 11 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address for SATA device = 46213452836d5761 mpr0: SAS Address for SATA device = 46213452875f5754 mpr0: SAS Address for SATA device = 4621345287684f56 mpr0: SAS Address for SATA device = 46213452836b6351 mpr0: SAS Address for SATA device = 46213452875e4e58 mpr0: SAS Address from SAS device page0 = 500304801f07c99c mpr0: SAS Address from SATA device = 46213452836d5761 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0016> enclosureHandle<0x0002> slot 12 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c99d mpr0: SAS Address from SATA device = 46213452875f5754 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0017> enclosureHandle<0x0002> slot 13 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c99e mpr0: SAS Address from SATA device = 4621345287684f56 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0018> enclosureHandle<0x0002> slot 14 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c99f mpr0: SAS Address from SATA device = 46213452836b6351 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0019> enclosureHandle<0x0002> slot 15 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c9a0 mpr0: SAS Address from SATA device = 46213452875e4e58 mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x001a> enclosureHandle<0x0002> slot 16 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address from SAS device page0 = 500304801f07c9bd mpr0: Found device <4411<SmpInit,SspTarg,SepDev>,End Device> <12.0Gbps> handle<0x001b> enclosureHandle<0x0002> slot 24 mpr0: At enclosure level 0 and connector name ( ) ses2 at mpr0 bus 0 scbus12 target 32 lun 0 da0 at mpr0 bus 0 scbus12 target 8 lun 0 da1 at mpr0 bus 0 scbus12 target 9 lun 0 da2 at mpr0 bus 0 scbus12 target 10 lun 0 da3 at mpr0 bus 0 scbus12 target 11 lun 0 da5 at mpr0 bus 0 scbus12 target 13 lun 0 da11 at mpr0 bus 0 scbus12 target 19 lun 0 da8 at mpr0 bus 0 scbus12 target 16 lun 0 da9 at mpr0 bus 0 scbus12 target 17 lun 0 da6 at mpr0 bus 0 scbus12 target 14 lun 0 da15 at mpr0 bus 0 scbus12 target 23 lun 0 da7 at mpr0 bus 0 scbus12 target 15 lun 0 da4 at mpr0 bus 0 scbus12 target 12 lun 0 da14 at mpr0 bus 0 scbus12 target 22 lun 0 da12 at mpr0 bus 0 scbus12 target 20 lun 0 da7: 1200.000MB/s transfersda10 at mpr0 bus 0 scbus12 target 18 lun 0 da16 at mpr0 bus 0 scbus12 target 24 lun 0 da13 at mpr0 bus 0 scbus12 target 21 lun 0 # camcontrol devlist <AHCI SGPIO Enclosure 1.00 0001> at scbus4 target 0 lun 0 (ses0,pass0) <AHCI SGPIO Enclosure 1.00 0001> at scbus11 target 0 lun 0 (ses1,pass1) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 8 lun 0 (da0,pass2) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 9 lun 0 (da1,pass3) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 10 lun 0 (da2,pass4) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 11 lun 0 (da3,pass5) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 12 lun 0 (da4,pass6) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 13 lun 0 (da5,pass7) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 14 lun 0 (da6,pass8) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 15 lun 0 (da7,pass9) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 16 lun 0 (da8,pass10) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 17 lun 0 (da9,pass11) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 18 lun 0 (da10,pass12) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 19 lun 0 (da11,pass13) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 20 lun 0 (da12,pass14) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 21 lun 0 (da13,pass15) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 22 lun 0 (da14,pass16) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 23 lun 0 (da15,pass17) <ATA SDLF1CRR-019T-1H ZH06> at scbus12 target 24 lun 0 (da16,pass18) <LSI SAS3x40 0601> at scbus12 target 32 lun 0 (ses2,pass19) Please let me know if you need any additional information.
I wouldn't call the disk/controller configs in these systems optimal, but if you are seeing the same problem across two different hardware platforms it doesn't seem to be likely to be a disk hardware issue. How large is the ARC when the systems get in to this state? ZFS uses ~ 1MB or RAM per zvol, filesystem, or clone. It's possible that there's a memory pressure issue that is getting the system in to a state where a ctld memory allocation is failing. The big stick here would be to get a full kernel dump when the system is in the hung state. That would tell us exactly why ctld is getting stuck.
One system has 32 gigs RAM, ARC is limited to 16G. Another system has 64Gigs rAM, ARC wasn't limited. Actually, the primary use of these systems is iSCSI serving, so the only thing that consumes RAM is the ZFS itself. I'll be happy to provide a dump next time I will see this, thanks.
From the backtrace, it seems this is because ctld is waiting for iSCSI sessions to terminate. This behaviour was added in this commit: ------------------------------------------------------------------------ r274805 | mav | 2014-11-21 18:05:02 +0000 (Fri, 21 Nov 2014) | 5 lines Make cfiscsi_offline() synchronous, waiting for connections termination before return. This should make ctld restart more clean and predictable. MFC after: 2 weeks ------------------------------------------------------------------------ I'm not sure what's the actual problem it fixes. Alexander, do you remember what it was?
Alexander, Can you comment on trasz@'s question please?
(In reply to Edward Tomasz Napierala from comment #5) I don't remember what the problem was exactly, but I suppose it was attempt to make cltd shutdown and later restart clean, and I have no problems with that now. May be submitter should retest it with upcoming 11.1 in case something was fixed there.
Okay, I will try to reproduce this on 11.1, thanks.
Finally I got that dump. Dump is taken from a running system, ctld stuck in a D state, panic invoked though sysctl. Dump and kernel/modules binaries are here (nginx autoindex): http://files2.enaza.ru/ctld-stuck/ Sorry for the size, the server has lot of memory. Did I forget something ? Dump is taken from an r310734 system, I have serious issues on more recent versions - fatal traps 12, zfs corruption (will wait for a serious uptime on r310734 then probably report as ctld/zfs regression).
Today got another one; enev when the load has beed diminished the ctld didn't return from D. Manually requested panuc and took the crashdump. It can be found here: http://files2.enaza.ru/ctld-stuck/new/
(In reply to emz from comment #10) There is not enough permissions on some files.
I'm terribly sorry, permissions are fixed now.
I've took one more crashdump. It can be downloaded here: http://san1.linx.playkey.net/info.ctld-stuck http://san1.linx.playkey.net/vmcore.ctld-stuck Binaries: http://san1.linx.playkey.net/r310734/
I keep getting these. I can also say that ctld during lock up refuses to server new connections, and they all got stuck in queue: ===Cut=== sonewconn: pcb 0xfffff805f754a740: Listen queue overflow: 1537 already in queue awaiting acceptance (608 occurrences) sonewconn: pcb 0xfffff805f754a740: Listen queue overflow: 1537 already in queue awaiting acceptance (598 occurrences) sonewconn: pcb 0xfffff805f754a740: Listen queue overflow: 1537 already in queue awaiting acceptance (562 occurrences) sonewconn: pcb 0xfffff805f754a740: Listen queue overflow: 1537 already in queue awaiting acceptance (589 occurrences) sonewconn: pcb 0xfffff805f754a740: Listen queue overflow: 1537 already in queue awaiting acceptance (588 occurrences) sonewconn: pcb 0xfffff805f754a740: Listen queue overflow: 1537 already in queue awaiting acceptance (592 occurrences) ===Cut=== (got tonnes of this above) ===Cut=== [root@san1:~]# netstat -aLn Current listen queue sizes (qlen/incqlen/maxqlen) Proto Listen Local Address tcp4 0/0/10 127.0.0.1.25 tcp4 0/0/128 *.22 tcp6 0/0/128 *.22 tcp4 0/0/1024 *.80 tcp4 0/0/100 127.0.0.1.1717 tcp4 0/0/128 *.10050 tcp6 0/0/128 *.10050 tcp6 0/0/1024 *.3260 tcp4 1537/0/1024 *.3260 unix 0/0/10 /var/run/snmpd.sock unix 0/0/100 /var/tmp/uwsgi-api_debug.sock unix 0/0/100 /var/tmp/uwsgi-api.sock unix 0/0/4 /var/run/devd.pipe unix 0/0/4 /var/run/devd.seqpacket.pipe ===Cut=== What is even more weird, is that lsof -iTCP -sTCP:LISTEN -P cannot find this connection descriptor - 0xfffff805f754a740 while this is happening.
And I'm currently testing trasz@ patch of session termination crash - we hoped it may have something to do with this issue too, but I just got identical one when running with patch (and still no crashes with patch - seems like it resolves them) - so seems like this is a separate one from the https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222898 .
Today we've got another livelock with ctld, seems like I triggered this with ctladm remove -b block -l <id>, and the kernel was running without MFC 274805, so seems like it's not it.
A commit references this bug: Author: trasz Date: Sat Sep 1 16:16:40 UTC 2018 New revision: 338426 URL: https://svnweb.freebsd.org/changeset/base/338426 Log: Try harder in cfiscsi_offline(). This is believed to be the workaround for the "ctld hanging on reload" problem observed in same cases under high load. I'm not 100% sure it's _the_ fix, as the issue is rather hard to reproduce, but it was tested as part of a larger path and the problem disappeared. It certainly shouldn't break anything. Now, technically, it shouldn't be needed. Quoting mav@, "After ct->ct_online == 0 there should be no new sessions attached to the target. And if you see some problems abbout it, it may either mean that there are some races where single cfiscsi_session_terminate(cs) call may be lost, or as a guess while this thread was sleeping target was reenabbled and redisabled again". Should such race be discovered and properly fixed in the future, than this and the followup two commits can be backed out. PR: 220175 Reported by: Eugene M. Zheganin <emz at norma.perm.ru> Tested by: Eugene M. Zheganin <emz at norma.perm.ru> Discussed with: mav Approved by: re (gjb) MFC after: 2 weeks Sponsored by: playkey.net Changes: head/sys/cam/ctl/ctl_frontend_iscsi.c
A commit references this bug: Author: trasz Date: Tue Sep 11 11:40:00 UTC 2018 New revision: 338586 URL: https://svnweb.freebsd.org/changeset/base/338586 Log: Make the wait in cfiscsi_offline() interruptible. This is the second half of the fix/workaround for the "ctld hanging on reload" problem. PR: 220175 Reported by: Eugene M. Zheganin <emz at norma.perm.ru> Tested by: Eugene M. Zheganin <emz at norma.perm.ru> Approved by: re (kib) MFC after: 2 weeks Sponsored by: playkey.net Changes: head/sys/cam/ctl/ctl_frontend_iscsi.c
Is this resolved with the commits identified in comment #17 and comment #18? (git commits d783154e46adab8d7d7a50c8d552df90e644b908 and d4eab1373824ad00681f27765b0bc9200f53480d)
^Triage: feedback timeout (>1 year).