Bug 255499

Summary: Freebsd 13 freezes forever on disk activity
Product: Base System Reporter: teodorsigaev <teodorsigaev>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Many People CC: bz, chris, grahamperrin, mckusick, rashey, rodrigo, xxjack12xx
Priority: ---    
Version: 13.0-STABLE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Output of dtrace -n 'profile-99 /arg0/ { @[stack()] = count(); }'
none
kernel config
none
Dmesg output
none
Postgres kernel traces
none
shell script kernel trace
none
Perls kernel traces
none
Some stats
none
Output of dtrace -n 'profile-99 /arg0/ { @[stack()] = count(); }' (correct version)
none
kernel traces of patched kernel
none
ktr dumps
none
kern.cam stats none

Description teodorsigaev@gmail.com 2021-04-29 20:29:45 UTC

    
Comment 1 teodorsigaev@gmail.com 2021-04-29 20:30:37 UTC
Created attachment 224550 [details]
Output of dtrace -n 'profile-99 /arg0/ { @[stack()] = count(); }'
Comment 2 teodorsigaev@gmail.com 2021-04-29 20:31:05 UTC
Created attachment 224551 [details]
kernel config
Comment 3 teodorsigaev@gmail.com 2021-04-29 20:31:25 UTC
Created attachment 224552 [details]
Dmesg output
Comment 4 teodorsigaev@gmail.com 2021-04-29 20:31:52 UTC
Created attachment 224553 [details]
Postgres kernel traces
Comment 5 teodorsigaev@gmail.com 2021-04-29 20:32:45 UTC
Created attachment 224554 [details]
shell script kernel trace

It's on top of top -m io output
Comment 6 teodorsigaev@gmail.com 2021-04-29 20:33:07 UTC
Created attachment 224555 [details]
Perls kernel traces
Comment 7 teodorsigaev@gmail.com 2021-04-29 20:34:14 UTC
Created attachment 224556 [details]
Some stats

systat, top, smartctl etc output
Comment 8 teodorsigaev@gmail.com 2021-04-29 20:53:35 UTC
Hello!

I faced with freezing my desktop. When I compiled new postgres and run tests by 'gmake -j8 check-world' that tests never finished and box after several minutes become unresponsive if current application touches disk. Even Ctrl+Alt+Backspace after Xorg exiting doesn't return me to console command prompt. At this moment, systat shows writes to disk with 1.7Gb per second, although after rebooting I don't see any unexpected files, postgresql directory takes only 4Gb spaces. 

Also, removing ports tree causes very close sympthoms, although after 10 minutes after rm finished OS came back to work. So, after rm finished, I see 1.7Gb writes per second on disks.

I made several tries to resolve:
  - I noticed, that root terminal lives longer than terminals of unprivileged 
    user and found that only root has unlimited memorylocked limit. Changing
    this limit - no success.
  - Tried on/off IOMMU option in kerconf. No luck.
  - increase .sujournal - again
  - Update firmware on NVMe driver - the same
  - Suggested that somehow old data on disk wasn't trimmed (TRIM/BIO_DELETE
    command) and I wrote ~1.5 Tb file and erased it in hope that system will 
    send trim command to drive. No result.
  
Note, Freebsd 12.2 worked well even with -j16 parallelization. And my second box, laptop with Freebsd 13, works well with the same args.

Pls, help me to resolve issue.
Comment 9 Konstantin Belousov freebsd_committer freebsd_triage 2021-04-29 22:47:37 UTC
Are you using SU+J?

If yes, try this https://reviews.freebsd.org/D30041 (I did not even booted that
diff)
Comment 10 teodorsigaev@gmail.com 2021-04-30 07:38:43 UTC
Created attachment 224568 [details]
Output of dtrace -n 'profile-99 /arg0/ { @[stack()] = count(); }' (correct version)
Comment 11 teodorsigaev@gmail.com 2021-04-30 07:48:49 UTC
Yes, SU+J, will try. Thank you.

Also I'v tried to use nda driver instead of nvd - all the same. 

Note, I reloaded dtrace output - first one was a wrong file, dmesg.
Comment 12 teodorsigaev@gmail.com 2021-04-30 09:34:06 UTC
It's very pity, but didn't help
Comment 13 teodorsigaev@gmail.com 2021-04-30 09:35:06 UTC
Created attachment 224572 [details]
kernel traces of patched kernel

Kernel traces of top by io processes
Comment 14 teodorsigaev@gmail.com 2021-04-30 13:16:18 UTC
Created attachment 224573 [details]
ktr dumps

3 ktr dumps, with kernel options
 options KTR
options KTR_ENTRIES=8192

Note, vfs.numdirtybuffers = 0 during freeze
Comment 15 rashey 2021-04-30 20:20:27 UTC
I'm not sure if this is related to PR 253968, but FreeBSD 13.0 have a big issue with UFS performance.

You can try to test custom kernel with CAM_IOSCHED_DYNAMIC option to mitigate the issue.

# Enable dynamic I/O scheduler optimizations
options        CAM_IOSCHED_DYNAMIC
# Publish additional CAM device statics by sysctl
options        CAM_IO_STATS
Comment 16 teodorsigaev@gmail.com 2021-05-01 14:57:19 UTC
(In reply to rashey from comment #15)
This shot is somewhere close. gmake -j4 was finished successfully, but -j8 causes the same trouble. I've added kern.cam stats (3 cases)
Comment 17 teodorsigaev@gmail.com 2021-05-01 14:58:25 UTC
Created attachment 224595 [details]
kern.cam stats

Kernel options:
options        CAM_IOSCHED_DYNAMIC
options        CAM_IO_STATS
Comment 18 Rodrigo Osorio freebsd_committer freebsd_triage 2021-05-04 12:31:24 UTC
It seems I'm facing a similar issue, after upgrading an i386 box in UFS+J from FreeBSD 12 to 13. In fact I'm stuck after the kernel upgrade.

After booting with 13 kernel, the prompt appears. I enter the user name,
then the login freezes forever. The box stops responding to ping and only
^t works, here is the output:

load: 0.21  cmd: login 1132 [ufs] 1927.56r 0.01u 0.00s 0% 432k
mi_switch+0x13e sleepq_switch+0xfa sleeplk+0xef lockmgr_slock_hard+0x33b lockmgr_lock_flags+0x113 ffs_lock+0x57 _vn_lock+0x3e vget_finish+0x1f cache_fplookup_final_child+0x42 cache_fplookup+0x4ab namei+0x5e vn_open_cred+0x35b vn_open+0x20 kern_openat+0x2f0 sys_openat+0x2f syscall+0x17d __stop_set_sysinit_set+0xda1a18cd
Comment 19 Kirk McKusick freebsd_committer freebsd_triage 2021-05-11 23:55:52 UTC
Does this problem go away if you run with just soft updates rather than journalled soft updates?

You can disable journalled soft updates on disk /dev/ada0p2 on /mnt using:

# unmount /mnt
# tunefs -j disable /dev/ada0p2
Clearing journal flags from inode 4
tunefs: soft updates journaling cleared but soft updates still set.
tunefs: remove .sujournal to reclaim space
# mount /dev/ada0p2 /mnt
# rm -f /mnt/.sujournal