Bug 271819

Summary: FREEBSD 13.0 machine becomes unresponsive after some days.
Product: Base System Reporter: Anwar <anwarcse47us>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Some People CC: grahamperrin
Priority: --- Keywords: needs-qa
Version: Unspecified   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272434

Description Anwar 2023-06-04 14:21:28 UTC
Hi,

We have recently upgraded our virtual machines to freebsd 13.0 from freebsd 10.4.
We have seen that machine becomes unresponsive after running for some days. Sometimes, it is becoming unresponsive after 2 days and sometimes it is taking upto 2-3 months before becoming unresponsive.
VM configuration:
number of CPU: 2
RAM: 6GB

We have observed, just before machine becoming unresponsive, access to one of the directory is getting hung means:
Let's say I have a directory /x/y/z/outbox. any command from shell trying to access the directory outbox is going into Uninterrupted sleep state and shell is getting hung. 
There are DU processes, accessing outbox directory, run at certain intervals are getting stuck in UFS state as seen in TOP command.
34877 root          1  20    0    28M  2952K ufs      0   0:01   0.00% du
75703 root          1  20    0    28M  2628K ufs      1   0:01   0.00% du
 6753 root          1  20    0    28M  2980K ufs      0   0:01   0.00% du
87132 root          1  20    0    28M  2980K ufs      0   0:01   0.00% du
63429 root          1  20    0    28M  2972K ufs      1   0:01   0.00% du
18308 root          1  20    0    28M  2652K ufs      1   0:01   0.00% du
18074 root          1  20    0    28M  2580K ufs      0   0:01   0.00% du
88042 root          1  20    0    27M  2992K ufs      0   0:01   0.00% du
82363 root          1  20    0    27M  2996K ufs      0   0:01   0.00% du


There is a JAVA process, that reads data from outbox directory, has also got into STOP state and we are unable to even kill that Java process.
In other occurrences of this issue, we have seen a custom python process got stuck consuming 100% CPU and its input directory was seen getting repaired during reboot.


This problem is getting resolved after reboot. During reboot we can see that fsck command is being run to correct that directory.
reboot logs:
Sometimes DIR becoming UNREF:
kernel: DIR I=2964556 CONNECTED. PARENT WAS I=2964003
kernel:
kernel: UNREF DIR  I=2964499  OWNER=root MODE=40755
kernel: SIZE=2048 MTIME=May 26 13:50 2023
kernel:
kernel: RECONNECT? yes

....
....
kernel: ***** FILE SYSTEM STILL DIRTY *****
kernel: ***** FILE SYSTEM WAS MODIFIED *****
***** PLEASE RERUN FSCK *****
...
...

In another occurrence: Parent DIR had wrong link count.
Logs during reboot:
kernel: /dev/da0p9: LINK COUNT DIR I=700908 OWNER=root MODE=40777
kernel: /dev/da0p9: SIZE=512 MTIME=Jun 2 09:05 2023 COUNT 3 SHOULD BE 2 (ADJUSTED)


These symptoms are similar to freebsd bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292

So we tried patching : https://cgit.freebsd.org/src/commit/sys/ufs/ffs/ffs_softdep.c?id=50acaaef54b4d7811393eb8c05a398d7a1882418

and also added a logic to run sync as mentioned in the bug #224292. But nothing worked.

We have observed that this is happening only on machine having 2 CPU cores and 6 GB ram. It is not happening on machine with greater number of CPU cores and RAM.

Thanks in advance...
Comment 1 Graham Perrin freebsd_committer freebsd_triage 2023-06-04 14:46:18 UTC
> 13.0

Support ceased nine months ago; <https://www.freebsd.org/security/unsupported/>.

Please, can you reproduce symptoms with a supported version? 

<https://www.freebsd.org/security/#sup>

For the upgrade to 13.1 and then (in due course) to 13.2, please note the minor update that should precede a major upgrade: 

<https://www.freebsd.org/releases/13.1R/installation/#upgrade-binary>
<https://www.freebsd.org/releases/13.2R/installation/#upgrade-binary>
Comment 2 Anwar 2023-08-28 12:55:07 UTC
Hi,

Issue has been reproduced on supported versions also like 13.1 and 13.2.

Please refer https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272434

Thanks,
Anwar