Summary: | [geom] geli livelocks during panic | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Alan Somers <asomers> | ||||
Component: | kern | Assignee: | Alan Somers <asomers> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | dewayne | ||||
Priority: | --- | Keywords: | crash | ||||
Version: | 12.1-STABLE | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
URL: | https://reviews.freebsd.org/D24697 | ||||||
Attachments: |
|
Description
Alan Somers
2020-05-05 01:44:39 UTC
Patch in-progress. I found the reason why the livelock doesn't happen every time: g_eli_shutdown_pre_sync only destroys unused geli devices. In-use geli devices it marks with G_ELI_FLAG_RW_DETACH, which will cause they to be destroyed on last close. Most systems don't typically have unused geli devices, which is why they don't livelock. This also suggests an easier solution to the problem. Instead of modifying g_eli_destroy, just modify g_eli_shutdown_pre_sync to destroy nothing in the event of a panic Steps to Reproduce ================== $ sudo mdconfig -a -t swap -s 128M md0 $ sudo geli init -i0 md0 Enter new passphrase: Reenter new passphrase: Metadata backup for provider md0 can be found in /var/backups/md0.eli and can be restored with the following command: # geli restore /var/backups/md0.eli md0 $ sudo geli attach md0 Enter passphrase: $ geli list md0 # Note that the provider's mode is r0w0e0 # Disable ddb. Otherwise we'll enter ddb before calling pre-sync event hooks $ sudo sysctl debug.debugger_on_panic=0 $ sudo sysctl debug.kdb.panic=1 # The console will print something like this. # Note the lack of a core dump or uptime panic: kdb_sysctl_panic cpuid = 0 time = 1588688768 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00603037c0 vpanic() at vpanic+0x182/frame 0xfffffe0060303810 panic() at panic+0x43/frame 0xfffffe0060303870 kdb_sysctl_panic() at kdb_sysctl_panic+0x61/frame 0xfffffe00603038a0 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 0xfffffe00603038f0 sysctl_root() at sysctl_root+0x20a/frame 0xfffffe0060303970 userland_sysctl() at userland_sysctl+0x17b/frame 0xfffffe0060303a20 sys___sysctl() at sys___sysctl+0x5f/frame 0xfffffe0060303ad0 amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe0060303bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0060303bf0 --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x80042d56a, rsp = 0x7fffffffd538, rbp = 0x7fffffffd570 --- Created attachment 214156 [details]
Fix dumping core on panic when there are unused geli devices
The attached patch is tested on stable/12 and head.
A commit references this bug: Author: asomers Date: Wed May 27 19:13:26 UTC 2020 New revision: 361562 URL: https://svnweb.freebsd.org/changeset/base/361562 Log: geli: fix a livelock during panic During any kind of shutdown, kern_reboot calls geli's pre_sync event hook, which tries to destroy all unused geli devices. But during a panic, geli can't destroy any devices, because the scheduler is stopped, so it can't switch threads. A livelock results, and the system never dumps core. This commit fixes the problem by refusing to destroy any devices during panic, used or otherwise. PR: 246207 Reviewed by: jhb MFC after: 2 weeks Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D24697 Changes: head/sys/geom/eli/g_eli.c A commit references this bug: Author: asomers Date: Fri Jun 12 20:39:42 UTC 2020 New revision: 362118 URL: https://svnweb.freebsd.org/changeset/base/362118 Log: MFC r361562: geli: fix a livelock during panic During any kind of shutdown, kern_reboot calls geli's pre_sync event hook, which tries to destroy all unused geli devices. But during a panic, geli can't destroy any devices, because the scheduler is stopped, so it can't switch threads. A livelock results, and the system never dumps core. This commit fixes the problem by refusing to destroy any devices during panic, used or otherwise. PR: 246207 Reviewed by: jhb Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D24697 Changes: _U stable/12/ stable/12/sys/geom/eli/g_eli.c |