Hi, Based on FreeBSD Installer, you can see performance comparison between 12.1-RELEASE and 13.0-BETA4 during extracting base, kernel, lib32 and src: 12.1-RELEASE: ~850-1500 files/sec. 13.0-BETA4: ~360-500 files/sec.
It's even worse with doing specific operations, for example: When I do "rm -fr /usr/ports" with 12.2 takes 5 seconds and the same command with 13.0-BETA2 takes 100 seconds.
(In reply to Christos Chatzaras from comment #1) I wonder if "trim" is involved and, if so, if enabled vs. not makes a difference. Might there now be a lot more trim activity for some reason (when enabled)? (No specific evidence beyond the pure "rm -fr" being reported as so much slower.)
(In reply to Mark Millard from comment #2) In a context that sees the problem (if "trim" is involved): using "gstat -spod" in the fast and slow contexts and comparing might make it obvious if there is a difference in the amount of "d" (delete/trim) activity for similar operations if other things like media differences are unlikely to be contributing.
I'm seeing something similar and have been seeing it on 13 since it was 13-current. Doing a "make extract" before and after building www/firefox takes many, many minutes. The system is a ThinkPad L15, but my file system is SU, not SU+J. When I monitor disc usage, it is inconsistent. It runs at "normal throughput of >40MB/s. It will abruptly drop to <2MB/s and remain there for a time that varies from a few seconds to minutes. It then jumps to normal throughput. This a spinning rust, so TRIM should not be involved. Since 12.2 does not support my graphics (CometLake), I really can't compare to 12.2. Also, I am running 13-STABLE, not BETA4. It was update on the 27th, shortly after BETA4 was announced, so I assume I have all BETA4 fixes. Disk performance monitored with gkrellm2.
(In reply to rkoberman from comment #4) Doing "make fetch" and "make extract" for www/firefox is something I can try on a 128 GiBYTe RAM ThreadRipper 1950X that uses 960 GiByte Optane media and is running main (14-CURRENT) from back on 2021-Feb-08. (Optane does not use trim and is fast.) UFS2+SU (no J). The example context predates any recent potential fixes. So . . . Doing similar "time -l" might happen to expose something common to the good vs. bad cases for those were the issue stands out in their context. # time -l make extract ===> Options unchanged ===> firefox-85.0.2,2 depends on file: /usr/local/sbin/pkg - found ===> Fetching all distfiles required by firefox-85.0.2,2 for building ===> Extracting for firefox-85.0.2,2 => SHA256 Checksum OK for firefox-85.0.2.source.tar.xz. 70.10 real 23.54 user 18.01 sys 77540 maximum resident set size 50 average shared memory size 10 average unshared data size 130 average unshared stack size 37678 page reclaims 0 page faults 0 swaps 12848 block input operations 1175850 block output operations 83 messages sent 79 messages received 0 signals received 1188850 voluntary context switches 1816 involuntary context switches # time -l rm -fr /wrkdirs/usr/ports/www 17.36 real 0.24 user 6.51 sys 4116 maximum resident set size 8 average shared memory size 4 average unshared data size 134 average unshared stack size 587 page reclaims 0 page faults 0 swaps 2 block input operations 555695 block output operations 0 messages sent 0 messages received 0 signals received 607758 voluntary context switches 0 involuntary context switches (No other www/ content.) Looks like "block output operations" and "voluntary context switches" are what has lots of examples in this context. For reference: main 3acea07c1873 based: # ~/fbsd-based-on-what-freebsd-main.sh merge-base: 3acea07c1873b1e4042f4a4fa8668745ee59f15b merge-base: CommitDate: 2021-02-08 19:15:21 +0000 c1845d00f818 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context. 3acea07c1873 (pure-src) Restore the augmented strlen commentary FreeBSD FBSDFHUGE 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n244686-c1845d00f818 GENERIC-NODBG amd64 amd64 1400004 1400004 I may be able to do similarly on some aarch64 SSD based machines (that would have trim involved).
(In reply to Mark Millard from comment #5) [I should have mentioned that the ThreadRipper results are based on non-debug builds of main (14-CURRENT). So are the below. Same FreeBSD vintage (same source tree even). UFS2+SU (no J) again as well.] aarch64 examples that others might be able to compare against . . . From a OverDrive 1000 (4 Cortex-A57 cores 8 GiByte RAM): # time -l make extract ===> firefox-85.0.2,2 depends on file: /usr/local/sbin/pkg - found ===> Fetching all distfiles required by firefox-85.0.2,2 for building ===> Extracting for firefox-85.0.2,2 => SHA256 Checksum OK for firefox-85.0.2.source.tar.xz. 221.36 real 57.32 user 65.94 sys 76068 maximum resident set size 42 average shared memory size 12 average unshared data size 128 average unshared stack size 32646 page reclaims 16 page faults 0 swaps 22893 block input operations 1175983 block output operations 47 messages sent 45 messages received 5 signals received 1198799 voluntary context switches 487 involuntary context switches # time -l rm -fr /wrkdirs/usr/ports/www 111.95 real 0.48 user 18.95 sys 3756 maximum resident set size 7 average shared memory size 3 average unshared data size 126 average unshared stack size 596 page reclaims 0 page faults 0 swaps 2 block input operations 555695 block output operations 0 messages sent 0 messages received 0 signals received 613033 voluntary context switches 1 involuntary context switches The "rm -fr" here took a larger fraction of the "make extract" "real" time compared to the ThreadRipper ratio. A MACCHIATObin Double Shot got similar results (4 Cortex-A72 cores, 16 GiByte RAM): # time -l make extract ===> firefox-85.0.2,2 depends on file: /usr/local/sbin/pkg - found ===> Fetching all distfiles required by firefox-85.0.2,2 for building ===> Extracting for firefox-85.0.2,2 => SHA256 Checksum OK for firefox-85.0.2.source.tar.xz. 209.41 real 49.36 user 62.94 sys 75620 maximum resident set size 42 average shared memory size 12 average unshared data size 127 average unshared stack size 33507 page reclaims 10 page faults 0 swaps 13861 block input operations 1176990 block output operations 47 messages sent 45 messages received 5 signals received 1189697 voluntary context switches 293 involuntary context switches # time -l rm -fr /wrkdirs/usr/ports/www 99.39 real 0.48 user 19.33 sys 3752 maximum resident set size 7 average shared memory size 3 average unshared data size 124 average unshared stack size 596 page reclaims 0 page faults 0 swaps 2 block input operations 555695 block output operations 0 messages sent 0 messages received 0 signals received 639878 voluntary context switches 29 involuntary context switches Of course, the SSD's involved are not likely to be as subject to fragmentation consequences as spinning drives. So comparisons to the above might be complicated by such. (There may be better things to look at with other tools.)
(In reply to Mark Millard from comment #6) [I use -noatime mounts, by the way.] Given the reports I got a surprise booting the OverDrive 1000 via 13.0-BETA4 and trying the operations on the SSD (not the boot media this time), evidence of definite changes from the main (14) vintage that I have things at for normal booting: # time -l make extract ===> firefox-85.0.2,2 depends on file: /usr/local/sbin/pkg - found ===> Fetching all distfiles required by firefox-85.0.2,2 for building ===> Extracting for firefox-85.0.2,2 => SHA256 Checksum OK for firefox-85.0.2.source.tar.xz. 93.39 real 55.46 user 34.47 sys 76576 maximum resident set size 30 average shared memory size 4 average unshared data size 128 average unshared stack size 26460 page reclaims 0 page faults 0 swaps 15569 block input operations 994 block output operations 16 messages sent 14 messages received 0 signals received 16752 voluntary context switches 1192 involuntary context switches Vastly fewer block output operations (and less time). # time -l rm -fr work 12.74 real 0.29 user 7.80 sys 3920 maximum resident set size 8 average shared memory size 4 average unshared data size 128 average unshared stack size 588 page reclaims 0 page faults 0 swaps 644 block input operations 360 block output operations 0 messages sent 0 messages received 0 signals received 1093 voluntary context switches 337 involuntary context switches So even if there are issues still, apparently some have been taken care of. For reference: # uname -apKU FreeBSD generic 13.0-BETA4 FreeBSD 13.0-BETA4 #0 releng/13.0-n244592-e32bc253629: Fri Feb 26 08:02:53 UTC 2021 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1300139 1300139
(In reply to Mark Millard from comment #7) 12.2-RELEASE on the OverDrive 1000 this time, again operating on the SSD: # time -l make extract ===> firefox-85.0.2,2 depends on file: /usr/local/sbin/pkg - found ===> Fetching all distfiles required by firefox-85.0.2,2 for building ===> Extracting for firefox-85.0.2,2 => SHA256 Checksum OK for firefox-85.0.2.source.tar.xz. 120.92 real 55.87 user 61.93 sys 76520 maximum resident set size 27 average shared memory size 10 average unshared data size 127 average unshared stack size 27241 page reclaims 6 page faults 0 swaps 20866 block input operations 6362 block output operations 0 messages sent 0 messages received 4 signals received 21852 voluntary context switches 3121 involuntary context switches # time -l rm -fr work 12.83 real 0.37 user 8.17 sys 3916 maximum resident set size 7 average shared memory size 7 average unshared data size 127 average unshared stack size 934 page reclaims 0 page faults 0 swaps 5 block input operations 348 block output operations 0 messages sent 0 messages received 0 signals received 581 voluntary context switches 736 involuntary context switches 13.0-BETA4 looks a somewhat better for this on the extract. Of course, all my examples are SSDs or Optanes, nothing spinning. I'm unsure how that or other context variations would compare. But now you have some examples for comparison. The early Feb. main (14) had much bigger differences between user+sys vs. real than 13.0-BETA4 or 12.2-RELEASE did. But that may well be expected given the huge "block output operations" count differences. It does not appear that I've managed to run into the 13.0-BETA4 problem(s) via these specific types of experiments for UFS2+SU (no J).
(In reply to Mark Millard from comment #8) I updated the ThreadRipper to be based on c113740f266e (2021-Mar-03): # ~/fbsd-based-on-what-freebsd-main.sh merge-base: c113740f266ecfbe4a34c97607f860d63c08eb76 merge-base: CommitDate: 2021-03-03 22:01:04 +0000 0e72c127fce0 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context. c113740f266e (pure-src) git-arc.1: Fix synopsis for the "update" verb FreeBSD FBSDFHUGE 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245244-0e72c127fce0 GENERIC-NODBG amd64 amd64 1400005 1400005 And things are much better for the UFS2+SU (not J) test: # time -l make extract ===> Options unchanged ===> firefox-85.0.2,2 depends on file: /usr/local/sbin/pkg - found ===> Fetching all distfiles required by firefox-85.0.2,2 for building ===> Extracting for firefox-85.0.2,2 => SHA256 Checksum OK for firefox-85.0.2.source.tar.xz. 34.08 real 23.09 user 10.73 sys 77260 maximum resident set size 50 average shared memory size 11 average unshared data size 128 average unshared stack size 37271 page reclaims 496 page faults 0 swaps 9214 block input operations 801 block output operations 83 messages sent 79 messages received 0 signals received 11598 voluntary context switches 1656 involuntary context switches # time -l rm -fr /wrkdirs/usr/ports/www 3.30 real 0.09 user 3.21 sys 4180 maximum resident set size 8 average shared memory size 4 average unshared data size 128 average unshared stack size 603 page reclaims 0 page faults 0 swaps 1 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 20 voluntary context switches 35 involuntary context switches
FreeBSD 12.2 /usr/bin/time -l rm -fr /usr/ports 12.2 1st run: 8.89 real 0.36 user 1.28 sys 5652 maximum resident set size 7 average shared memory size 3 average unshared data size 126 average unshared stack size 962 page reclaims 0 page faults 0 swaps 38627 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 38629 voluntary context switches 0 involuntary context switches 12.2 2nd run: 5.04 real 0.31 user 1.98 sys 5652 maximum resident set size 7 average shared memory size 3 average unshared data size 126 average unshared stack size 952 page reclaims 0 page faults 0 swaps 13492 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 13502 voluntary context switches 17 involuntary context switches --------------- FreeBSD 13.0 (from 3-Mar-2021 source) /usr/bin/time -l rm -fr /usr/ports 1st run: 12.29 real 0.68 user 2.63 sys 5580 maximum resident set size 8 average shared memory size 4 average unshared data size 129 average unshared stack size 956 page reclaims 0 page faults 0 swaps 40039 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 40041 voluntary context switches 1 involuntary context switches 2rd run: 6.66 real 0.42 user 2.14 sys 5576 maximum resident set size 8 average shared memory size 4 average unshared data size 135 average unshared stack size 955 page reclaims 0 page faults 0 swaps 15705 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 15713 voluntary context switches 10 involuntary context switches
FreeBSD 12.2 /usr/bin/time -l portsnap extract 1st run: 126.12 real 34.99 user 59.58 sys 49040 maximum resident set size 21 average shared memory size 3 average unshared data size 87 average unshared stack size 11820155 page reclaims 1 page faults 0 swaps 25079 block input operations 2780 block output operations 0 messages sent 0 messages received 0 signals received 379570 voluntary context switches 240 involuntary context switches 2nd run: 99.90 real 35.29 user 59.80 sys 49040 maximum resident set size 21 average shared memory size 2 average unshared data size 86 average unshared stack size 11820155 page reclaims 0 page faults 0 swaps 27 block input operations 2782 block output operations 0 messages sent 0 messages received 0 signals received 354645 voluntary context switches 370 involuntary context switches --------------- FreeBSD 13.0 (from 3-Mar-2021 source) /usr/bin/time -l portsnap extract 1st run: 210.68 real 73.30 user 110.37 sys 48952 maximum resident set size 21 average shared memory size 3 average unshared data size 90 average unshared stack size 12287686 page reclaims 1 page faults 0 swaps 27790 block input operations 658 block output operations 4412 messages sent 2536294 messages received 0 signals received 382195 voluntary context switches 292 involuntary context switches 2rd run: 170.19 real 70.89 user 107.19 sys 48952 maximum resident set size 21 average shared memory size 3 average unshared data size 90 average unshared stack size 12287686 page reclaims 0 page faults 0 swaps 1405 block input operations 176 block output operations 4412 messages sent 2536294 messages received 0 signals received 355852 voluntary context switches 399 involuntary context switches
With 13.0-BETA2 "rm -fr" was many times slower than 12.2. With 13.0-BETA4+ (with 3-Mar-2021 source) is a lot better but still slower than 12.2.
Being unable to replicate anything similar to comment #4 and comment #11 being a large increase in Inter-Process-Communication and the rest of the rather-modern-FreeBSD examples being UFS SU (no J) that do not show significant problems, then it would appear that the UFS+SU+J (so with journaling) status of the original report may be an essentially separate issue from other fairly recent UFS I/O performance problems that seem to have been addressed. I do not plan to report other examples that try to isolate the context vs. recent history. It seems that only journaled experiments would likely be relevant to the epecific issue being reported. Sorry it took me so long to reach that conclusion. The comment #11's IPC operation counts in doing a portsnap extract should probably be a separate submittal. I'm less clear on comment #4 vs. some context local issue (say, file system fragmentation just to illustrate). But if it can be replicated by others, it too should probably have a separate submittal.
(In reply to Mark Millard from comment #13) Yes, I have SU without journal so it may be different issue. I made some more tests with 12.2, 13.0 and CURRENT which I will post in the "stable" mailing list. The summary is this: 12.2p4 = 98 sec 13.0-BETA2 = 497 sec 13.0-BETA4 = 163 sec 14-CURRENT = 255 sec
We have had others report that they can reproduce the problem and that 1) disabling journaling (but keeping soft updates) makes the problem go away; 2) increasing the journal size moderates the problem; 3) the patch proposed in https://reviews.freebsd.org/D30041 helps to moderate the problem. This patch is hopefully in a final round of testing before being committed. *** This bug has been marked as a duplicate of bug 255473 ***