Bug 271798 - multimedia/ffmpeg: Enable LTO by default on aarch64 and amd64
Summary: multimedia/ffmpeg: Enable LTO by default on aarch64 and amd64
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-multimedia (Nobody)
URL:
Keywords:
Depends on: 253124
Blocks:
  Show dependency treegraph
 
Reported: 2023-06-03 07:37 UTC by Daniel Engberg
Modified: 2024-09-27 21:53 UTC (History)
2 users (show)

See Also:


Attachments
Patch for ffmpeg (604 bytes, patch)
2023-06-03 07:37 UTC, Daniel Engberg
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Engberg freebsd_committer freebsd_triage 2023-06-03 07:37:44 UTC
Created attachment 242568 [details]
Patch for ffmpeg

This is default on pretty much all other distros and it's been working fine for years so just enable it by default on aarch64 and amd64.
Comment 1 Jan Beich freebsd_committer freebsd_triage 2023-06-03 10:51:05 UTC
Do you see a statistical difference? For decoding try "ffmpeg -i foo.mp4 -benchmark -f null -" in a loop then filter the results by ministat(1). For encoding I'm not sure which *builtin* codec(s) to test i.e., ffmpeg uses "-c:v libx264" and "-c:a libvorbis" by default which are external (annotated by "lib" prefix) thus won't benefit from LTO in ffmpeg.
Comment 2 Jan Beich freebsd_committer freebsd_triage 2023-06-03 11:36:51 UTC
If LTO has placebo benefit then the port option should be removed in favor of WITH_LTO via Mk/Features/lto.mk e.g., to reduce maintenance. LTO isn't safe with mixed alignment (e.g., in dav1d) or mixed toolchain (LLVM bitcode vs. GCC GIMPLE when using static libraries) but I didn't notice issues when dogfooding LTO in ffmpeg.

Examples documenting statistical difference:
- ports d4e1f93dbb3a (svt-av1)
- ports 32c2b95c682e (libjxl)
- ports 501e1ed88f97 (foot)

However, ports 55006395d27c (mesa-devel) and ports 6ec985b72d58 (firefox) relied on upstream benchmarks.
Comment 3 Daniel Engberg freebsd_committer freebsd_triage 2023-09-23 14:00:38 UTC
It is faster but not by much, might be different on other archs

Test setup:

FreeBSD 14.0-BETA3 (amd64)
Ryzen 7 7900 with CPUTYPE set to znver4 in /etc/make.conf

###### Decode MPEG-4 (XviD)

ffmpeg -benchmark -hide_banner -i sample-mpeg4.avi -map 0:v -f null -

=== LTO ===
bench: utime=13.090s stime=1.264s rtime=1.746s
bench: maxrss=60780kB

bench: utime=13.161s stime=1.212s rtime=1.749s
bench: maxrss=58996kB

bench: utime=13.116s stime=1.404s rtime=1.754s
bench: maxrss=62712kB

=== Plain ===
bench: utime=13.248s stime=1.294s rtime=2.067s
bench: maxrss=57208kB

bench: utime=13.377s stime=1.062s rtime=1.806s
bench: maxrss=56084kB

bench: utime=13.286s stime=1.198s rtime=1.812s
bench: maxrss=59928kB

###### Decode H264

ffmpeg -benchmark -hide_banner -i sample-h264.mkv -map 0:v -f null -

=== LTO ===
bench: utime=844.833s stime=18.668s rtime=90.596s
bench: maxrss=210632kB

bench: utime=847.122s stime=17.699s rtime=90.694s
bench: maxrss=206476kB

bench: utime=842.610s stime=17.406s rtime=90.498s
bench: maxrss=209076kB

=== Plain ===
bench: utime=845.918s stime=17.349s rtime=90.132s
bench: maxrss=208204kB

bench: utime=844.163s stime=17.513s rtime=90.112s
bench: maxrss=208424kB

bench: utime=850.070s stime=17.515s rtime=90.462s
bench: maxrss=207528kB

###### Decode HEVC

ffmpeg -benchmark -hide_banner -i sample-h265.mkv -map 0:v -f null -

=== LTO ===
bench: utime=1153.633s stime=9.111s rtime=194.839s
bench: maxrss=269004kB

bench: utime=1152.164s stime=8.431s rtime=194.523s
bench: maxrss=266340kB

bench: utime=1150.939s stime=8.937s rtime=194.489s
bench: maxrss=271228kB

=== Plain ===
bench: utime=1159.971s stime=9.785s rtime=195.534s
bench: maxrss=266032kB

bench: utime=1157.997s stime=8.402s rtime=195.144s
bench: maxrss=266232kB

bench: utime=1159.178s stime=8.512s rtime=195.363s
bench: maxrss=266276kB

###### Decode MPEG-2 (HDTV) and deinerlace using bwdif

ffmpeg -benchmark -hide_banner -i sample.tp -map 0:v -vf bwdif -f null -

=== LTO ===
bench: utime=1030.029s stime=33.336s rtime=109.380s
bench: maxrss=70504kB

bench: utime=1027.714s stime=34.623s rtime=109.114s
bench: maxrss=69032kB

bench: utime=1026.165s stime=34.285s rtime=109.475s
bench: maxrss=68920kB

=== Plain ===
bench: utime=1032.744s stime=34.641s rtime=109.661s
bench: maxrss=68444kB

bench: utime=1034.172s stime=34.125s rtime=109.460s
bench: maxrss=69872kB

bench: utime=1035.130s stime=33.864s rtime=109.430s
bench: maxrss=68116kB

###### Resample audio from 44100Hz to 48000Hz

ffmpeg -benchmark -hide_banner -i sample.mp3 -ar 48000 -f null -

=== LTO ===
bench: utime=2.947s stime=0.263s rtime=2.672s
bench: maxrss=28296kB

bench: utime=3.131s stime=0.436s rtime=2.848s
bench: maxrss=28324kB

bench: utime=3.132s stime=0.317s rtime=2.798s
bench: maxrss=28276kB

=== Plain ===
bench: utime=3.573s stime=0.491s rtime=4.022s
bench: maxrss=28320kB

bench: utime=3.484s stime=0.302s rtime=3.132s
bench: maxrss=28312kB

bench: utime=3.323s stime=0.277s rtime=2.987s
bench: maxrss=28312kB

###### Audio calculate EBUR128 values

ffmpeg -benchmark -hide_banner -i sample.mp3 -filter_complex ebur128 -f null -

=== LTO ===
bench: utime=5.301s stime=0.487s rtime=6.268s
bench: maxrss=31820kB

bench: utime=4.928s stime=0.365s rtime=4.748s
bench: maxrss=31852kB

bench: utime=5.195s stime=0.446s rtime=4.925s
bench: maxrss=31844kB

=== Plain ===
bench: utime=5.667s stime=0.612s rtime=5.283s
bench: maxrss=31184kB

bench: utime=5.151s stime=0.357s rtime=4.917s
bench: maxrss=31184kB

bench: utime=5.052s stime=0.271s rtime=4.769s
bench: maxrss=31200kB

###### Audio calculate ReplayGain

ffmpeg -benchmark -hide_banner -i sample.mp3 -filter_complex replaygain -f null -

=== LTO ===
bench: utime=5.558s stime=0.429s rtime=5.230s
bench: maxrss=29412kB

bench: utime=5.504s stime=0.383s rtime=5.184s
bench: maxrss=29420kB

bench: utime=5.222s stime=0.169s rtime=4.942s
bench: maxrss=29400kB

=== Plain ===
bench: utime=5.442s stime=0.434s rtime=5.178s
bench: maxrss=28752kB

bench: utime=5.393s stime=0.281s rtime=5.083s
bench: maxrss=28768kB

bench: utime=5.295s stime=0.314s rtime=5.062s
bench: maxrss=28756kB

###### File sizes

=== LTO ===

ls -l work/stage/usr/local/bin/ff*
-rwxr-xr-x  1 root wheel 270928 Sep 23 15:50 work/stage/usr/local/bin/ffmpeg
-rwxr-xr-x  1 root wheel 168344 Sep 23 15:50 work/stage/usr/local/bin/ffprobe

ls -l work/stage/usr/local/lib/lib*.so*.*.*
-rwxr-xr-x  1 root wheel 16656888 Sep 23 15:50 work/stage/usr/local/lib/libavcodec.so.60.3.100
-rwxr-xr-x  1 root wheel    27784 Sep 23 15:50 work/stage/usr/local/lib/libavdevice.so.60.1.100
-rwxr-xr-x  1 root wheel  5668272 Sep 23 15:50 work/stage/usr/local/lib/libavfilter.so.9.3.100
-rwxr-xr-x  1 root wheel  3308024 Sep 23 15:50 work/stage/usr/local/lib/libavformat.so.60.3.100
-rwxr-xr-x  1 root wheel   880224 Sep 23 15:50 work/stage/usr/local/lib/libavutil.so.58.2.100
-rwxr-xr-x  1 root wheel    67104 Sep 23 15:50 work/stage/usr/local/lib/libpostproc.so.57.1.100
-rwxr-xr-x  1 root wheel   126760 Sep 23 15:50 work/stage/usr/local/lib/libswresample.so.4.10.100
-rwxr-xr-x  1 root wheel  1449664 Sep 23 15:50 work/stage/usr/local/lib/libswscale.so.7.1.100

=== Plain ===

ls -l work/stage/usr/local/bin/ff*
-rwxr-xr-x  1 root wheel 279688 Sep 23 15:46 work/stage/usr/local/bin/ffmpeg
-rwxr-xr-x  1 root wheel 183200 Sep 23 15:46 work/stage/usr/local/bin/ffprobe

ls -l work/stage/usr/local/lib/lib*.so*.*.*
-rwxr-xr-x  1 root wheel 16418296 Sep 23 15:46 work/stage/usr/local/lib/libavcodec.so.60.3.100
-rwxr-xr-x  1 root wheel    28112 Sep 23 15:46 work/stage/usr/local/lib/libavdevice.so.60.1.100
-rwxr-xr-x  1 root wheel  5527904 Sep 23 15:46 work/stage/usr/local/lib/libavfilter.so.9.3.100
-rwxr-xr-x  1 root wheel  2622664 Sep 23 15:46 work/stage/usr/local/lib/libavformat.so.60.3.100
-rwxr-xr-x  1 root wheel   839488 Sep 23 15:46 work/stage/usr/local/lib/libavutil.so.58.2.100
-rwxr-xr-x  1 root wheel    67536 Sep 23 15:46 work/stage/usr/local/lib/libpostproc.so.57.1.100
-rwxr-xr-x  1 root wheel   127464 Sep 23 15:46 work/stage/usr/local/lib/libswresample.so.4.10.100
-rwxr-xr-x  1 root wheel  1483240 Sep 23 15:46 work/stage/usr/local/lib/libswscale.so.7.1.100
Comment 4 Olivier Cochard freebsd_committer freebsd_triage 2024-09-25 17:40:33 UTC
(In reply to Daniel Engberg from comment #3)

Interpreting some result with ministat, because raw result aren’t easy to read.
All units here are utime in second.

ministat -w 74 -s plain.Decode.MPEG-2.bwdif lto.Decode.MPEG-2.bwdif
x plain.Decode.MPEG-2.bwdif
+ lto.Decode.MPEG-2.bwdif
+--------------------------------------------------------------------------+
| +           +                  +                    x           x      x |
|                                                      |________A_M_______||
||____________M_A_______________|                                          |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3      1032.744       1035.13      1034.172     1034.0153     1.2006904
+   3      1026.165      1030.029      1027.714     1027.9693     1.9446132
Difference at 95.0% confidence
        -6.046 +/- 3.66291
        -0.584711% +/- 0.353671%
        (Student's t, pooled s = 1.61604)

ministat -w 74 -s plain.Decode.MPEG-4 lto.Decode.MPEG-4
x plain.Decode.MPEG-4
+ lto.Decode.MPEG-4
+--------------------------------------------------------------------------+
| +     +           +                     x        x                      x|
|                                      |___________M____A_______________|  |
||______M_A________|                                                       |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3        13.248        13.377        13.286     13.303667   0.066289768
+   3         13.09        13.161        13.116     13.122333    0.03592121
Difference at 95.0% confidence
        -0.181333 +/- 0.12084
        -1.36303% +/- 0.898767%
        (Student's t, pooled s = 0.0533135)

ministat -w 74 -s plain.Decode.H264 lto.Decode.H264
x plain.Decode.H264
+ lto.Decode.H264
+--------------------------------------------------------------------------+
|+              x      +         x           +                            x|
|           |____________________M_______A_____________________________|   |
||_____________________A_____________________|                             |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3       844.163        850.07       845.918       846.717     3.0334738
+   3        842.61       847.122       844.833       844.855     2.2560805
No difference proven at 95.0% confidence

ministat -w 74 -s plain.Decode.HEVC lto.Decode.HEVC
x plain.Decode.HEVC
+ lto.Decode.HEVC
+--------------------------------------------------------------------------+
|+         +           +                                  x        x     x |
|                                                         |_______AM______||
||_________MA__________|                                                   |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3      1157.997      1159.971      1159.178     1159.0487    0.99333496
+   3      1150.939      1153.633      1152.164     1152.2453     1.3488404
Difference at 95.0% confidence
        -6.80333 +/- 2.68478
        -0.586976% +/- 0.23116%
        (Student's t, pooled s = 1.1845)

inistat -w 74 -s plain.audio.ReplayGain lto.audio.ReplayGain
x plain.audio.ReplayGain
+ lto.audio.ReplayGain
+--------------------------------------------------------------------------+
|+             x                 x         x          +         +          |
|               |_____________A__M__________|                              |
|     |_________________________________A_____________M___________________||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3         5.295         5.442         5.393     5.3766667   0.074848736
+   3         5.222         5.558         5.504         5.428    0.18043281


So, yes: ship it!
Comment 5 commit-hook freebsd_committer freebsd_triage 2024-09-27 18:55:41 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=4042f2b4959ffeb882c5988f20ea99e74136a757

commit 4042f2b4959ffeb882c5988f20ea99e74136a757
Author:     Daniel Engberg <diizzy@FreeBSD.org>
AuthorDate: 2023-06-03 07:37:44 +0000
Commit:     Jan Beich <jbeich@FreeBSD.org>
CommitDate: 2024-09-27 18:52:46 +0000

    multimedia/ffmpeg: enable LTO by default on aarch64 and amd64

    6% faster decode at least with HEVC and MPEG-2 (HDTV) + -vf bwdif

    PR:             271798
    Inspired by:    Apline, Arch, Chimera, CRUX, Fedora, OpenMandriva, Solus
    Reviewed by:    olivier

 multimedia/ffmpeg/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
Comment 6 Daniel Engberg freebsd_committer freebsd_triage 2024-09-27 21:53:51 UTC
Thanks!