While performing some tests with nanobad, FreeBSD 10.1-RC3 on alix hardware I discovered a lockup when unmounting filesystems. This hardware is a small motherboard using CF card as main storage. I usually enable trim support on these. NanoBSD mounts filesystems read only, and I use scripts to mount/unmount filesystems when changes need to be saved. I have seen a deadlock when unmounting. With a debugging kernel I got this: root@qtest:~ [0]# umount /cfg panic: detach with active requests KDB: stack backtrace: db_trace_self_wrapper(c0968053,c08ea7f0,c2d48800,c23d6bc8,c0536a16,...) at db_trace_self_wrapper+0x2d/frame 0xc23d6b98 kdb_backtrace(c09639e1,c09fa7e8,c095761d,c23d6c54,c095761d,...) at kdb_backtrace+0x30/frame 0xc23d6c00 vpanic(c09fa682,100,c095761d,c23d6c54,c23d6c54,...) at vpanic+0x80/frame 0xc23d6c24 kassert_panic(c095761d,c09575b3,c2d7acc0,4c7,c2d7acc0,...) at kassert_panic+0xe9/frame 0xc23d6c48 g_detach(c2d7acc0,4,c095725c,1c2,c09c8d5c,...) at g_detach+0x1d3/frame 0xc23d6c64 g_wither_washer(c09f7df4,0,c0956544,124,0,...) at g_wither_washer+0x109/frame 0xc23d6c90 g_run_events(0,c23d6d08,c095d42a,3dc,0,...) at g_run_events+0x40/frame 0xc23d6ccc fork_exit(c05c4e60,0,c23d6d08) at fork_exit+0x7f/frame 0xc23d6cf4 fork_trampoline() at fork_trampoline+0x8/frame 0xc23d6cf4 --- trap 0, eip = 0, esp = 0xc23d6d40, ebp = 0 --- KDB: enter: panic [ thread pid 12 tid 100006 ] Stopped at kdb_enter+0x3d: movl $0,kdb_why db> I played around with ddb and discovered this: db> show geom 0xc2e98b40 consumer: 0xc2e98b40 class: VFS (0xc09c8d5c) geom: ffs.ada0s3 (0xc3293600) provider: ada0s3 (0xc2e7e200) access: r0w0e0 flags: 0x0030 nstart: 19 nend: 18 Which shows nstart != nend, while g_detach asserts them to be the same. Going up the chain of providers I find also it's providers have nstart - nend == 1: db> show geom 0xc2e9b7c0 consumer: 0xc2e9b7c0 class: PART (0xc09c96b0) geom: ada0 (0xc2e7e780) provider: ada0 (0xc2e7e500) access: r2w0e0 flags: 0x0030 nstart: 1430 nend: 1429 db> show geom 0xc2e7e500 provider: ada0 (0xc2e7e500) class: DISK (0xc09c8890) geom: ada0 (0xc2e7e580) mediasize: 4017807360 sectorsize: 512 stripesize: 0 stripeoffset: 0 access: r2w0e0 flags: (0x0030) error: 0 nstart: 2085 nend: 2084 consumer: 0xc2e9a700 (ada0), access=r0w0e0, flags=0x0030 consumer: 0xc2e9b480 (ada0), access=r0w0e0, flags=0x0030 consumer: 0xc2e9b7c0 (ada0), access=r2w0e0, flags=0x0030 Having no idea how to debug further I started testing various revisions and I finally discovered that the commit that broke it is r268815, which MFCed r268205. Also disabling trim on the FS "fixes" the problem, which seems to confirm that change to be involved. Since this depends on hardware support for trim I have been unable to reproduce this in virtualbox. I'm sorry I'm unable to produce a use case. I'm CCing imp, who committed r268815, hoping he can have some more insight in this. This also affects head, obviously. I'm available for any further testing or information needed. Thanks in advance.
If you disable trim, does the problem go away? I had a hard time scrounging up a CF card to test with on a SATA system. I'm guessing that I've dropped a biodone given the debug you've posted.
(In reply to Warner Losh from comment #1) > If you disable trim, does the problem go away? Yes, it does go away. It easy to test, since simply mounting thee FS, editing a file with vi and unmounting it causes the panic, if not at first try it does in 2-3. > > I had a hard time scrounging up a CF card to test with on a SATA system. > > I'm guessing that I've dropped a biodone given the debug you've posted. Maybe, unluckily I don't know much about the kernel and the VFS system, so I can't really help with the code. Looking at it I noticed that before that commit thee value of softc->trim_running is changed before any operation is performed, while after the patch the code calls the new functions performing operation before changing that value, which is changed after the conditional (line 1506). It could be unrelated, I don't really know what that variable means, but could it be related? If you have some patch I'll be happy to test and report back. I can perform any kind of test, since this is not production hardware.
A commit references this bug: Author: smh Date: Sun Oct 26 18:41:01 UTC 2014 New revision: 273704 URL: https://svnweb.freebsd.org/changeset/base/273704 Log: Fix CF ERASE breakage caused by 268205. This prevents BIO_DELETE requests getting stuck in the TRIM queue which results in a panic on shutdown due to outstanding requests. PR: 194606 Reported by: Guido Falsi Reviewed by: mav MFC after: 3 days Sponsored by: Multiplay Changes: head/sys/cam/ata/ata_da.c
A commit references this bug: Author: smh Date: Wed Oct 29 11:11:55 UTC 2014 New revision: 273818 URL: https://svnweb.freebsd.org/changeset/base/273818 Log: MFS10 r273814 MFC r273704 Fix ATA CF ERASE breakage caused by 268205 PR: 194606 Approved by: re (marius) Sponsored by: Multiplay Changes: _U releng/10.1/ releng/10.1/sys/cam/ata/ata_da.c
Thanks!