Summary: | Something spawning many "sh" process (possibly zfsd), stalled system (No more processes), would not boot normally afterward | ||
---|---|---|---|
Product: | Base System | Reporter: | Greg <greg> |
Component: | bin | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | asomers, grahamperrin, leeb |
Priority: | --- | Keywords: | needs-qa, regression |
Version: | 13.1-STABLE | ||
Hardware: | amd64 | ||
OS: | Any | ||
See Also: | https://reviews.freebsd.org/D6793 |
Description
Greg
2022-05-11 00:11:02 UTC
I have made some progress, and I have to report that something is up with zfsd! Cannot say for sure that is what was spawning all the "sh" processes, but I suspect that to be the case. After commenting out the following from /etc/rc.conf I am now able to boot normally: #zfsd_enable="YES" #service zfsd start This was after figuring out how to zfs set readonly=off and mount -a my zroot pool in single user mode. And trying all of this first: - Comment out everything in /etc/crontab - Remove all the sysctl and other tunable tweaks I had customized Now it is complaining about my dRAID test pool not being available, and that appears to still be listed in zpool.cache. But it was having this issue with not booting, prior to pulling one of the HBAs (LSI 9361 mentioned previously), so this issue with zfsd existed while that pool was still available. I will double check, but I am fairly sure there was nothing wrong with that pool. Regardless, I cannot imagine it is intended behavior for zfsd to prevent a system from booting, regardless of the state any zpools are in? Beyond perhaps serious issues with zroot, which doesn't appear to be the case here (it passes a scrub no issues). If anyone is interested in getting more debugging, while I still have the test case and hardware setup for this, please let me know. Willing to put a little more effort into figuring this out. Again, same setup under 13.0 was not having this issue. Same benchmarks run back to back for days on end. Same dRAID design. Same used of zfsd. Thanks! -Greg- (In reply to Greg from comment #1) 13.1-RELEASE was announced around four days later. If not too late to ask: can you recall whether the issue persisted with RELEASE? (In reply to Graham Perrin from comment #2) According to my notes, I left these commented out for remainder of 13.1-RC6 testing: #zfsd_enable="YES" #service zfsd start After 13.1-RELEASE came out, and I was ready to move from lab testing to production setup of this new server, I did a fresh install. I did not run into this issue with zfsd spawning many "sh" processes, it has never come back up. Server has been in production use for almost a year, no real issues. So what ever caused this, was resolved by the time 13.1-RELEASE hit. I did still have this issue: Bug 263906 - MFI driver fails with "Fatal firmware error" line 1155 in ../../dm/src/dm.c Where I had to set: set hw.mfi.mrsas_enable="1" set hint.hw.mfi.mrsas_enable="1" Which had not been an issue under 13.0. Outside of that, 13.1 has been very stable and server has been performing well. zfsd never spawns any sh processes, so it can't be the cause of your initial fork bomb. Expected behaviour. The service executable sources rc.conf. If the service executable is in rc.conf it's called recursively. |