Created attachment 242568 [details] Patch for ffmpeg This is default on pretty much all other distros and it's been working fine for years so just enable it by default on aarch64 and amd64.
Do you see a statistical difference? For decoding try "ffmpeg -i foo.mp4 -benchmark -f null -" in a loop then filter the results by ministat(1). For encoding I'm not sure which *builtin* codec(s) to test i.e., ffmpeg uses "-c:v libx264" and "-c:a libvorbis" by default which are external (annotated by "lib" prefix) thus won't benefit from LTO in ffmpeg.
If LTO has placebo benefit then the port option should be removed in favor of WITH_LTO via Mk/Features/lto.mk e.g., to reduce maintenance. LTO isn't safe with mixed alignment (e.g., in dav1d) or mixed toolchain (LLVM bitcode vs. GCC GIMPLE when using static libraries) but I didn't notice issues when dogfooding LTO in ffmpeg. Examples documenting statistical difference: - ports d4e1f93dbb3a (svt-av1) - ports 32c2b95c682e (libjxl) - ports 501e1ed88f97 (foot) However, ports 55006395d27c (mesa-devel) and ports 6ec985b72d58 (firefox) relied on upstream benchmarks.
It is faster but not by much, might be different on other archs Test setup: FreeBSD 14.0-BETA3 (amd64) Ryzen 7 7900 with CPUTYPE set to znver4 in /etc/make.conf ###### Decode MPEG-4 (XviD) ffmpeg -benchmark -hide_banner -i sample-mpeg4.avi -map 0:v -f null - === LTO === bench: utime=13.090s stime=1.264s rtime=1.746s bench: maxrss=60780kB bench: utime=13.161s stime=1.212s rtime=1.749s bench: maxrss=58996kB bench: utime=13.116s stime=1.404s rtime=1.754s bench: maxrss=62712kB === Plain === bench: utime=13.248s stime=1.294s rtime=2.067s bench: maxrss=57208kB bench: utime=13.377s stime=1.062s rtime=1.806s bench: maxrss=56084kB bench: utime=13.286s stime=1.198s rtime=1.812s bench: maxrss=59928kB ###### Decode H264 ffmpeg -benchmark -hide_banner -i sample-h264.mkv -map 0:v -f null - === LTO === bench: utime=844.833s stime=18.668s rtime=90.596s bench: maxrss=210632kB bench: utime=847.122s stime=17.699s rtime=90.694s bench: maxrss=206476kB bench: utime=842.610s stime=17.406s rtime=90.498s bench: maxrss=209076kB === Plain === bench: utime=845.918s stime=17.349s rtime=90.132s bench: maxrss=208204kB bench: utime=844.163s stime=17.513s rtime=90.112s bench: maxrss=208424kB bench: utime=850.070s stime=17.515s rtime=90.462s bench: maxrss=207528kB ###### Decode HEVC ffmpeg -benchmark -hide_banner -i sample-h265.mkv -map 0:v -f null - === LTO === bench: utime=1153.633s stime=9.111s rtime=194.839s bench: maxrss=269004kB bench: utime=1152.164s stime=8.431s rtime=194.523s bench: maxrss=266340kB bench: utime=1150.939s stime=8.937s rtime=194.489s bench: maxrss=271228kB === Plain === bench: utime=1159.971s stime=9.785s rtime=195.534s bench: maxrss=266032kB bench: utime=1157.997s stime=8.402s rtime=195.144s bench: maxrss=266232kB bench: utime=1159.178s stime=8.512s rtime=195.363s bench: maxrss=266276kB ###### Decode MPEG-2 (HDTV) and deinerlace using bwdif ffmpeg -benchmark -hide_banner -i sample.tp -map 0:v -vf bwdif -f null - === LTO === bench: utime=1030.029s stime=33.336s rtime=109.380s bench: maxrss=70504kB bench: utime=1027.714s stime=34.623s rtime=109.114s bench: maxrss=69032kB bench: utime=1026.165s stime=34.285s rtime=109.475s bench: maxrss=68920kB === Plain === bench: utime=1032.744s stime=34.641s rtime=109.661s bench: maxrss=68444kB bench: utime=1034.172s stime=34.125s rtime=109.460s bench: maxrss=69872kB bench: utime=1035.130s stime=33.864s rtime=109.430s bench: maxrss=68116kB ###### Resample audio from 44100Hz to 48000Hz ffmpeg -benchmark -hide_banner -i sample.mp3 -ar 48000 -f null - === LTO === bench: utime=2.947s stime=0.263s rtime=2.672s bench: maxrss=28296kB bench: utime=3.131s stime=0.436s rtime=2.848s bench: maxrss=28324kB bench: utime=3.132s stime=0.317s rtime=2.798s bench: maxrss=28276kB === Plain === bench: utime=3.573s stime=0.491s rtime=4.022s bench: maxrss=28320kB bench: utime=3.484s stime=0.302s rtime=3.132s bench: maxrss=28312kB bench: utime=3.323s stime=0.277s rtime=2.987s bench: maxrss=28312kB ###### Audio calculate EBUR128 values ffmpeg -benchmark -hide_banner -i sample.mp3 -filter_complex ebur128 -f null - === LTO === bench: utime=5.301s stime=0.487s rtime=6.268s bench: maxrss=31820kB bench: utime=4.928s stime=0.365s rtime=4.748s bench: maxrss=31852kB bench: utime=5.195s stime=0.446s rtime=4.925s bench: maxrss=31844kB === Plain === bench: utime=5.667s stime=0.612s rtime=5.283s bench: maxrss=31184kB bench: utime=5.151s stime=0.357s rtime=4.917s bench: maxrss=31184kB bench: utime=5.052s stime=0.271s rtime=4.769s bench: maxrss=31200kB ###### Audio calculate ReplayGain ffmpeg -benchmark -hide_banner -i sample.mp3 -filter_complex replaygain -f null - === LTO === bench: utime=5.558s stime=0.429s rtime=5.230s bench: maxrss=29412kB bench: utime=5.504s stime=0.383s rtime=5.184s bench: maxrss=29420kB bench: utime=5.222s stime=0.169s rtime=4.942s bench: maxrss=29400kB === Plain === bench: utime=5.442s stime=0.434s rtime=5.178s bench: maxrss=28752kB bench: utime=5.393s stime=0.281s rtime=5.083s bench: maxrss=28768kB bench: utime=5.295s stime=0.314s rtime=5.062s bench: maxrss=28756kB ###### File sizes === LTO === ls -l work/stage/usr/local/bin/ff* -rwxr-xr-x 1 root wheel 270928 Sep 23 15:50 work/stage/usr/local/bin/ffmpeg -rwxr-xr-x 1 root wheel 168344 Sep 23 15:50 work/stage/usr/local/bin/ffprobe ls -l work/stage/usr/local/lib/lib*.so*.*.* -rwxr-xr-x 1 root wheel 16656888 Sep 23 15:50 work/stage/usr/local/lib/libavcodec.so.60.3.100 -rwxr-xr-x 1 root wheel 27784 Sep 23 15:50 work/stage/usr/local/lib/libavdevice.so.60.1.100 -rwxr-xr-x 1 root wheel 5668272 Sep 23 15:50 work/stage/usr/local/lib/libavfilter.so.9.3.100 -rwxr-xr-x 1 root wheel 3308024 Sep 23 15:50 work/stage/usr/local/lib/libavformat.so.60.3.100 -rwxr-xr-x 1 root wheel 880224 Sep 23 15:50 work/stage/usr/local/lib/libavutil.so.58.2.100 -rwxr-xr-x 1 root wheel 67104 Sep 23 15:50 work/stage/usr/local/lib/libpostproc.so.57.1.100 -rwxr-xr-x 1 root wheel 126760 Sep 23 15:50 work/stage/usr/local/lib/libswresample.so.4.10.100 -rwxr-xr-x 1 root wheel 1449664 Sep 23 15:50 work/stage/usr/local/lib/libswscale.so.7.1.100 === Plain === ls -l work/stage/usr/local/bin/ff* -rwxr-xr-x 1 root wheel 279688 Sep 23 15:46 work/stage/usr/local/bin/ffmpeg -rwxr-xr-x 1 root wheel 183200 Sep 23 15:46 work/stage/usr/local/bin/ffprobe ls -l work/stage/usr/local/lib/lib*.so*.*.* -rwxr-xr-x 1 root wheel 16418296 Sep 23 15:46 work/stage/usr/local/lib/libavcodec.so.60.3.100 -rwxr-xr-x 1 root wheel 28112 Sep 23 15:46 work/stage/usr/local/lib/libavdevice.so.60.1.100 -rwxr-xr-x 1 root wheel 5527904 Sep 23 15:46 work/stage/usr/local/lib/libavfilter.so.9.3.100 -rwxr-xr-x 1 root wheel 2622664 Sep 23 15:46 work/stage/usr/local/lib/libavformat.so.60.3.100 -rwxr-xr-x 1 root wheel 839488 Sep 23 15:46 work/stage/usr/local/lib/libavutil.so.58.2.100 -rwxr-xr-x 1 root wheel 67536 Sep 23 15:46 work/stage/usr/local/lib/libpostproc.so.57.1.100 -rwxr-xr-x 1 root wheel 127464 Sep 23 15:46 work/stage/usr/local/lib/libswresample.so.4.10.100 -rwxr-xr-x 1 root wheel 1483240 Sep 23 15:46 work/stage/usr/local/lib/libswscale.so.7.1.100
(In reply to Daniel Engberg from comment #3) Interpreting some result with ministat, because raw result aren’t easy to read. All units here are utime in second. ministat -w 74 -s plain.Decode.MPEG-2.bwdif lto.Decode.MPEG-2.bwdif x plain.Decode.MPEG-2.bwdif + lto.Decode.MPEG-2.bwdif +--------------------------------------------------------------------------+ | + + + x x x | | |________A_M_______|| ||____________M_A_______________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 1032.744 1035.13 1034.172 1034.0153 1.2006904 + 3 1026.165 1030.029 1027.714 1027.9693 1.9446132 Difference at 95.0% confidence -6.046 +/- 3.66291 -0.584711% +/- 0.353671% (Student's t, pooled s = 1.61604) ministat -w 74 -s plain.Decode.MPEG-4 lto.Decode.MPEG-4 x plain.Decode.MPEG-4 + lto.Decode.MPEG-4 +--------------------------------------------------------------------------+ | + + + x x x| | |___________M____A_______________| | ||______M_A________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 13.248 13.377 13.286 13.303667 0.066289768 + 3 13.09 13.161 13.116 13.122333 0.03592121 Difference at 95.0% confidence -0.181333 +/- 0.12084 -1.36303% +/- 0.898767% (Student's t, pooled s = 0.0533135) ministat -w 74 -s plain.Decode.H264 lto.Decode.H264 x plain.Decode.H264 + lto.Decode.H264 +--------------------------------------------------------------------------+ |+ x + x + x| | |____________________M_______A_____________________________| | ||_____________________A_____________________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 844.163 850.07 845.918 846.717 3.0334738 + 3 842.61 847.122 844.833 844.855 2.2560805 No difference proven at 95.0% confidence ministat -w 74 -s plain.Decode.HEVC lto.Decode.HEVC x plain.Decode.HEVC + lto.Decode.HEVC +--------------------------------------------------------------------------+ |+ + + x x x | | |_______AM______|| ||_________MA__________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 1157.997 1159.971 1159.178 1159.0487 0.99333496 + 3 1150.939 1153.633 1152.164 1152.2453 1.3488404 Difference at 95.0% confidence -6.80333 +/- 2.68478 -0.586976% +/- 0.23116% (Student's t, pooled s = 1.1845) inistat -w 74 -s plain.audio.ReplayGain lto.audio.ReplayGain x plain.audio.ReplayGain + lto.audio.ReplayGain +--------------------------------------------------------------------------+ |+ x x x + + | | |_____________A__M__________| | | |_________________________________A_____________M___________________|| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 5.295 5.442 5.393 5.3766667 0.074848736 + 3 5.222 5.558 5.504 5.428 0.18043281 So, yes: ship it!
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=4042f2b4959ffeb882c5988f20ea99e74136a757 commit 4042f2b4959ffeb882c5988f20ea99e74136a757 Author: Daniel Engberg <diizzy@FreeBSD.org> AuthorDate: 2023-06-03 07:37:44 +0000 Commit: Jan Beich <jbeich@FreeBSD.org> CommitDate: 2024-09-27 18:52:46 +0000 multimedia/ffmpeg: enable LTO by default on aarch64 and amd64 6% faster decode at least with HEVC and MPEG-2 (HDTV) + -vf bwdif PR: 271798 Inspired by: Apline, Arch, Chimera, CRUX, Fedora, OpenMandriva, Solus Reviewed by: olivier multimedia/ffmpeg/Makefile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
Thanks!