I happen a TYPO3 instance under FreeBSD/UFS running 4 to 16 times slower than on Linux/Ext4. Using XDebug I've found out that it's the stat() syscall spending the biggest part of the time. I've already tweaked vfs.ufs.dirhash_maxmem=104857600 vfs.ufs.dirhash_reclaimage=600 The UFS dirhash memory always has plenty space left so this can't be the bottleneck. I further found out with the following Dtrace script that inside the Kernel, the spinlock_exit and rdtsc functions take a huge part of the time: === snip === BEGIN { self->ts = timestamp; } fbt:::entry { printf("fbt enter %s\t%d\n", probefunc, timestamp-self->ts); self->ts = timestamp; } === snap === dtrace -qs teststat.d -c ./cstat | sort -nk4 | less === snip === fbt enter rdtsc 11744696324432 fbt enter rdtsc 11744696324802 fbt enter rdtsc 11744696324810 fbt enter rdtsc 11744696325033 fbt enter rdtsc 11744696325155 fbt enter rdtsc 11744696428364 fbt enter spinlock_exit 11744696510486 fbt enter spinlock_exit 11744697160417 fbt enter spinlock_exit 11744697807952 fbt enter spinlock_exit 11744704378856 fbt enter spinlock_exit 11744704699253 fbt enter spinlock_exit 11744716098379 fbt enter spinlock_exit 11744718145218 fbt enter spinlock_exit 11744750135790 fbt enter spinlock_exit 11744751498167 fbt enter spinlock_exit 11744767477782 fbt enter spinlock_exit 11744771142875 fbt enter spinlock_exit 11744784852745 fbt enter spinlock_exit 11744797873176 fbt enter spinlock_exit 11744804495582 fbt enter spinlock_exit 11744912093185 fbt enter spinlock_exit 11744981165023 === snap === ./cstat is a C program simply stat()ing some 1000 of existing files. Is this on purpose? What can I do to close the gap on Linux? Also, it's important to say that this is not an academic exercise! There is a client announcing to leave for a Linux-based server. The Linux tests were of course taken on absolutely identical hardware. Hopeful greetings Fabiano
do you have the source cstat? What's the output from: sysctl kern.timecounter What's your hardware spec and is it real or VM?
(In reply to Steven Hartland from comment #1) Hi Steven! Thanks for replying so fast! This is the cstat source: === snip === #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> int main(int argc, char* argv[]) { struct stat buf; stat("/home/user/file1", &buf); /* 4198 stat calls */ stat("/home/user/file4200", &buf); } === snap === [root@server ~]# sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 0 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: TSC-low(-100) ACPI-fast(900) i8254(0) HPET(950) dummy(-1000000) kern.timecounter.hardware: HPET kern.timecounter.alloweddeviation: 5 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.TSC-low.quality: -100 kern.timecounter.tc.TSC-low.frequency: 1198612000 kern.timecounter.tc.TSC-low.counter: 1461596423 kern.timecounter.tc.TSC-low.mask: 4294967295 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 16606739 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 4783 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 14318180 kern.timecounter.tc.HPET.counter: 1292503749 kern.timecounter.tc.HPET.mask: 4294967295 [root@sbach ~]# sysctl hw.model hw.ncpu hw.physmem hw.usermem hw.model: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz hw.ncpu: 8 hw.physmem: 34334208000 hw.usermem: 33838137344 Both boxes are ESXi guests.
Its a bit of a long shot, but can you try setting the timecounter to TSC and see if it makes any difference? sysctl kern.timecounter.hardware=TSC-Low
No, it doesn't make a difference, but interestingly, now the last line of the output is something like: fbt enter spinlock_exit 70396867793752 rather than fbt enter spinlock_exit 11744981165023 with both the HPET and the TSC-low (small Ell) kern.timecounter.hardware, even after a reboot.
(In reply to Fabiano Sidler from comment #0) The script looks very strange. What do you expect it to do?
(In reply to Andriy Gapon from comment #5) Originally, the script was like so: === snip === fbt:::entry { entry_times[probefunc] = timestamp; printf("fbt enter %s\n", probefunc); } fbt:::return { printf("fbt return %s\t%d\n", probefunc, timestamp-entry_times[probefunc]); } === snap === But unexpectedly, the returns weren't in reverse call order, so I'm simply printing the time since last function call.
Following http://www.brendangregg.com/blog/2015-03-10/freebsd-flame-graphs.html I've done this: dtrace -x stackframes=100 -n 'profile-197 /arg0/ { @[stack()] = count(); }' -c ./cstat -o out.stacks ./stackcollapse.pl out.stacks | ./flamegraph.pl > out.svg uploaded to http://imgh.us/out_4.svg === snip out.stacks === kernel`cache_lookup+0x21c kernel`namei+0x4d4 kernel`kern_statat_vnhook+0xae kernel`sys_stat+0x2d kernel`amd64_syscall+0x357 kernel`0xffffffff80d30a8b 1 kernel`cache_lookup+0x53f kernel`namei+0x4d4 kernel`kern_statat_vnhook+0xae kernel`sys_stat+0x2d kernel`amd64_syscall+0x357 kernel`0xffffffff80d30a8b 1 kernel`acpi_cpu_c1+0x6 kernel`sched_idletd+0x1d5 kernel`fork_exit+0x9a kernel`0xffffffff80d30cde 14 === snap out.stacks === I don't see anything helpful in it, but maybe someone else does?
Created attachment 166811 [details] graph
Dtraced again, with: dtrace -x ustackframes=100 -n 'profile-99 { @[ustack()] = count(); } tick-60s { exit(0); }' Very weird graph, actually.
Second graph is not for the process in question, so not much use.
Created attachment 166819 [details] php-cgi graph
Dtraced again, with: dtrace -x ustackframes=100 -n 'profile-99 /uid == 1042/ { @[stack()] = count(); } tick-600s { exit(0); }' There are no other processes running with uid 1042. There aren't that many HTTP requests anymore, it's 00:13 AM local time.
How much time is this really using when testing here with == gen == perl -e 'for (1..4200) {my $file = "file$_";open(my $fh, ">", $file) or die "cannot open > $file: $!";}' /== gen == == bin == #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <stdio.h> #include <err.h> #include <sysexits.h> int main(int argc, char* argv[]) { struct stat buf; char name[1024]; int i; for (i = 1; i <= 4200; i++) { sprintf(name, "file%d", i); if (stat(name, &buf) != 0) { printf("Tested %d files\n", i - 1); err(EX_OSERR, NULL); } } printf("Tested %d files\n", i - 1); } /== bin == I'm seeing: /usr/bin/time ./test Tested 4201 files 0.00 real 0.00 user 0.00 sys
Hi Steven! Thank you for your help! I'm seeing the same: /usr/bin/time ./test Tested 4200 files 0.00 real 0.00 user 0.00 sys The interesting thing is that in my cstat program, the files are NOT actually all in the same directory. Could this be the issue? Then the directory tree traversal needed to be speeded up somehow.
it would be nice to check this on 14.0 due to mjg@ vfs/namecache improvements
(In reply to Dmitry Chagin from comment #15) In particular, > I did some benching by running 20 processes in parallel, each doing stat > on a tree of 1 million files (one tree per proc, 1000 dirs x 1000 files, > so 20 mln inodes in total). Box had 24 cores and 24G RAM. > > Best times: > Linux: 7.60s user 1306.90s system 1863% cpu 1:10.55 total > FreeBSD: 3.49s user 345.12s system 1983% cpu 17.573 total > OpenBSD: 5.01s user 6463.66s system 2000% cpu 5:23.42 total > DragonflyBSD: 11.73s user 1316.76s system 1023% cpu 2:09.78 total > OmniosCE: 9.17s user 516.53s system 1550% cpu 33.905 total From https://lore.kernel.org/linux-fsdevel/20231019155958.7ek7oyljs6y44ah7@f/
(In reply to Ed Maste from comment #16) well AFAIK, mjg@ FreeBSD tests on ZFS while here UFS vs Ext4, however UFS results might even better.