I am mirroring the KDE subversion repository via rsync. KDE currently holds at rev. 734839, meaning that there are two subdirectories (revs and revprops) holding 734840 files each. For this to work at all, I have enabled dirhash and set the hashing are to 32MB via vfs.ufs.dirhash_maxmem=33554432 in sysctl.conf.
The problem is that whenever the hashing is done (i.e., after these directories have not been in the kernel for some time, and now are being accessed), they will be read in by the dirhash algorithm, and doing this, consume lots of processor time (my xload jumps to 8+ all at once), and, as far as I can make out in such a situation, also all (or at least most) of the available disk bandwidth.
For my machine the behavior is so bad that for about a minute the X Window system freezes completely (including the cursor). (Note that in fact it is more like 2 x 30 secs, obviously for each of the two directories involved.) The xload spike is becoming visible after this. Also, as I am using pppoa (ADSL over USB, basically), the buffers allotted to this are exhausted, as shown by log messages to the console. To me this looks like even interrupts are not serviced any more.
I assume that the fix involves modifying the dirhash algorithm such that it obeys standard process scheduling behavior, esp. with regard to relinquishing the CPU according to the process' scheduling parameters.
This probably means that the syscall in question can no longer be implemented as a single atomic operation (which it currently seems to be).
Since I am no expert in this area, please take those ideas with a grain of salt!
Please note that the e-mail address given above is not valid, as I am paranoid about spam. Simply reply via adding to the PR, I'll monitor it regularly.
How-To-Repeat: Enter a directory with > 250 k entries after it has not been accessed for a long time.
Over to Ian, who wrote UFS dirhash.
iedowse is not actively working on this problem ATM.
While the kernel scheduler will not preempt a thread in the kernel (e.g.
during a system call) if a timeslice expires, it will preempt that thread for
interrupts (assuming you have 'options PREEMPTION' enabled which has been on
by default in GENERIC for some time now on i386), thus the dirhash
calculations should not starve interrupts. However, X is not an interrupt, so
while things like ping should still work, X will not get to run.
While it would be tempting to defer the hashing of the directory contents to
an asynchronous task for large directories running in a thread with a low
priority, this might have bad side effects due to priority inversions related
to a very low priority thread holding various vnode locks.
Over to maintainer(s).
For bugs matching the following criteria:
Status: In Progress Changed: (is less than) 2014-06-01
Reset to default assignee and clear in-progress tags.
Mail being skipped