I currently have some trouble with a amd64 HEAD build machine.
This machine cross compile for an ARMv6 host.
During the compilation, objcopy enter in an infinite loop. The process is stuck (unkillable) in "RUN" state:
PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
93219 99.5 0.1 12844 3580 - R 11:46 184:34.58 objcopy -j .peh ...
The only thing a can get for now is the kernel backtrace via "procstat -kk 93219", that I run in loop
There are some of the data:
__lockmgr_args+0x62a getblkx+0x154 breadn_flags+0x3d vfs_bio_getpages+0x323 ffs_getpages+0x78 VOP_GETPAGES_APV+0x56 ...
gbincore+0x38 getblkx+0xab breadn_flags+0x3d vfs_bio_getpages+0x323 ffs_getpages+0x78 VOP_GETPAGES_APV+0x56 ...
breadn_flags+0x1e9 vfs_bio_getpages+0x323 ffs_getpages+0x78 VOP_GETPAGES_APV+0x56 ...
__lockmgr_args+0x672 binsfree+0x51 vfs_bio_getpages+0x386 ffs_getpages+0x78 VOP_GETPAGES_APV+0x56 ...
vm_page_grab+0x6b vfs_bio_getpages+0x4ac ffs_getpages+0x78 VOP_GETPAGES_APV+0x56 ...
ffs_getpages+0x78 VOP_GETPAGES_APV+0x56 ...
The "common" part is the vfs_bio_getpages that seems to endless loop.
What I can do to bring more info for that issue ?
Some additional info:
- I'm running on UFS
- I'm running Asynchronous
- The machine has 12 CPU
I updated the issue and changed the affected version. The stable 12 has the same problem.
All file systems are OK. I have to do manually the fsck each time because the tool send me the error "PARTIALLY TRUNCATED INODE" and is unable to recover the error.
/dev/ufs/root on / (ufs, local, noatime)
devfs on /dev (devfs, local, multilabel)
/dev/ufs/var on /var (ufs, local, noatime)
/dev/ufs/tmp on /tmp (ufs, asynchronous, local, noatime)
/dev/ufs/usr on /usr (ufs, asynchronous, local, noatime)
/dev/ufs/home on /home (ufs, asynchronous, local, noatime)
# tunefs -p /dev/ufs/root (all file systems are the same)
tunefs: POSIX.1e ACLs: (-a) disabled
tunefs: NFSv4 ACLs: (-N) disabled
tunefs: MAC multilabel: (-l) disabled
tunefs: soft updates: (-n) disabled
tunefs: soft update journaling: (-j) disabled
tunefs: gjournal: (-J) disabled
tunefs: trim: (-t) disabled
tunefs: maximum blocks per file in a cylinder group: (-e) 4096
tunefs: average file size: (-f) 16384
tunefs: average number of files in a directory: (-s) 64
tunefs: minimum percentage of free space: (-m) 8%
tunefs: space to hold for metadata blocks: (-k) 5240
tunefs: optimization preference: (-o) time
tunefs: volume label: (-L) root
I have not yet been able to reproduce the problem.
I have a core file from Alexandre host:
After some investigation, it seems that the condition "if (ma[i]->valid != VM_PAGE_BITS_ALL)" (into vfs_bio_getpages) is always true in my case.
The problem disappear when I put the /tmp folder (via symlink) in the same partition than /home (where the build run)
To recap my disk configuration:
- the build (source + objects) runs on /home partition
- the /tmp is on the same disk as /home, but before (/tmp is quicker than /home)
- Both /home and /tmp are "async + noatime"
- I use ccache (but seems not relevant)
- The swap is not the problem (freeze occurs when I disable it)
- When /tmp is a symlink to a folder in /home, the problem disappear.
What does "swapctl -l" show?
"systat -swap" also helps to monitor swap page usage.
We believe this will be fixed by r359464.
*** This bug has been marked as a duplicate of bug 242626 ***