Summary: | Freebsd 13 freezes forever on disk activity | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | teodorsigaev <teodorsigaev> | ||||||||||||||||||||||||
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||||||||||||||||||||||
Status: | New --- | ||||||||||||||||||||||||||
Severity: | Affects Many People | CC: | bz, chris, grahamperrin, mckusick, rashey, rodrigo, xxjack12xx | ||||||||||||||||||||||||
Priority: | --- | ||||||||||||||||||||||||||
Version: | 13.0-STABLE | ||||||||||||||||||||||||||
Hardware: | amd64 | ||||||||||||||||||||||||||
OS: | Any | ||||||||||||||||||||||||||
Attachments: |
|
Description
teodorsigaev@gmail.com
2021-04-29 20:29:45 UTC
Created attachment 224550 [details]
Output of dtrace -n 'profile-99 /arg0/ { @[stack()] = count(); }'
Created attachment 224551 [details]
kernel config
Created attachment 224552 [details]
Dmesg output
Created attachment 224553 [details]
Postgres kernel traces
Created attachment 224554 [details]
shell script kernel trace
It's on top of top -m io output
Created attachment 224555 [details]
Perls kernel traces
Created attachment 224556 [details]
Some stats
systat, top, smartctl etc output
Hello! I faced with freezing my desktop. When I compiled new postgres and run tests by 'gmake -j8 check-world' that tests never finished and box after several minutes become unresponsive if current application touches disk. Even Ctrl+Alt+Backspace after Xorg exiting doesn't return me to console command prompt. At this moment, systat shows writes to disk with 1.7Gb per second, although after rebooting I don't see any unexpected files, postgresql directory takes only 4Gb spaces. Also, removing ports tree causes very close sympthoms, although after 10 minutes after rm finished OS came back to work. So, after rm finished, I see 1.7Gb writes per second on disks. I made several tries to resolve: - I noticed, that root terminal lives longer than terminals of unprivileged user and found that only root has unlimited memorylocked limit. Changing this limit - no success. - Tried on/off IOMMU option in kerconf. No luck. - increase .sujournal - again - Update firmware on NVMe driver - the same - Suggested that somehow old data on disk wasn't trimmed (TRIM/BIO_DELETE command) and I wrote ~1.5 Tb file and erased it in hope that system will send trim command to drive. No result. Note, Freebsd 12.2 worked well even with -j16 parallelization. And my second box, laptop with Freebsd 13, works well with the same args. Pls, help me to resolve issue. Are you using SU+J? If yes, try this https://reviews.freebsd.org/D30041 (I did not even booted that diff) Created attachment 224568 [details]
Output of dtrace -n 'profile-99 /arg0/ { @[stack()] = count(); }' (correct version)
Yes, SU+J, will try. Thank you. Also I'v tried to use nda driver instead of nvd - all the same. Note, I reloaded dtrace output - first one was a wrong file, dmesg. It's very pity, but didn't help Created attachment 224572 [details]
kernel traces of patched kernel
Kernel traces of top by io processes
Created attachment 224573 [details]
ktr dumps
3 ktr dumps, with kernel options
options KTR
options KTR_ENTRIES=8192
Note, vfs.numdirtybuffers = 0 during freeze
I'm not sure if this is related to PR 253968, but FreeBSD 13.0 have a big issue with UFS performance. You can try to test custom kernel with CAM_IOSCHED_DYNAMIC option to mitigate the issue. # Enable dynamic I/O scheduler optimizations options CAM_IOSCHED_DYNAMIC # Publish additional CAM device statics by sysctl options CAM_IO_STATS (In reply to rashey from comment #15) This shot is somewhere close. gmake -j4 was finished successfully, but -j8 causes the same trouble. I've added kern.cam stats (3 cases) Created attachment 224595 [details]
kern.cam stats
Kernel options:
options CAM_IOSCHED_DYNAMIC
options CAM_IO_STATS
It seems I'm facing a similar issue, after upgrading an i386 box in UFS+J from FreeBSD 12 to 13. In fact I'm stuck after the kernel upgrade. After booting with 13 kernel, the prompt appears. I enter the user name, then the login freezes forever. The box stops responding to ping and only ^t works, here is the output: load: 0.21 cmd: login 1132 [ufs] 1927.56r 0.01u 0.00s 0% 432k mi_switch+0x13e sleepq_switch+0xfa sleeplk+0xef lockmgr_slock_hard+0x33b lockmgr_lock_flags+0x113 ffs_lock+0x57 _vn_lock+0x3e vget_finish+0x1f cache_fplookup_final_child+0x42 cache_fplookup+0x4ab namei+0x5e vn_open_cred+0x35b vn_open+0x20 kern_openat+0x2f0 sys_openat+0x2f syscall+0x17d __stop_set_sysinit_set+0xda1a18cd Does this problem go away if you run with just soft updates rather than journalled soft updates? You can disable journalled soft updates on disk /dev/ada0p2 on /mnt using: # unmount /mnt # tunefs -j disable /dev/ada0p2 Clearing journal flags from inode 4 tunefs: soft updates journaling cleared but soft updates still set. tunefs: remove .sujournal to reclaim space # mount /dev/ada0p2 /mnt # rm -f /mnt/.sujournal |