Created attachment 263728 [details] kyua test report The 'sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4' testcase fails intermittently in CI: ===> sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4 Result: failed: Target 2001:db8:4202::42:4 not selected after 10 attempts! I have reproduced this error using Bricoler with 100 runs for good measure and attached the failing test report in this bug: $ bricoler run freebsd-src-regression-suite --param freebsd-src:url=/usr/src --param freebsd-src:branch= --param freebsd-src-regression-suite:hypervisor=bhyve --param freebsd-src-regression-suite:memory=4096 --param freebsd-src-regression-suite:ncpus=2 --param freebsd-src-regression-suite:parallelism=1 --param freebsd-src-regression-suite:count=100 --param freebsd-src-regression-suite:tests='sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4' @ks, since you made the last significant edit to the tests in https://cgit.freebsd.org/src/commit/?id=65c318630123fcf2b6f491bf4d02a5cad3031d20 (pf: Add prefer-ipv6-nexthop option for route-to pools), please triage as necessary.
The problem with this test and a few other similar ones is that they test random selection of pf loadbalancing. The test uses a route-to action with lists of good and bad nexthops. For each good nexthop it attempts a tcp connection up to 10 times and checks if this good nexthop was selected. Since it's testing the *random* algorithm it can occasionally fail. Even if I raise the amount of connections being made it could still fail. Any idea how to better test a random algorithm?
(In reply to Kajetan Staszkiewicz from comment #1) I'm afraid I can't think of anything better than 'ensure the sample size is large enough to make failure very unlikely'.
These tests are continuing to fail in CI (https://ci.freebsd.org/view/Test/job/FreeBSD-main-amd64-test/26918/), so if there is no other way to deterministically seed the random algorithm, then please go ahead and increase the sample size to something like 100 instead.
Any updates on this? This is one of the only remaining failing tests in CI. Can we go ahead and increase the sample size as per Comment 2?
Ping on this bug, can we get confirmation on Comment 2? These tests are still intermittently failing in CI: https://ci.freebsd.org/view/Test/job/FreeBSD-main-amd64-test/27448/testReport/sys.netpfil.pf/route_to/random_table/
(In reply to Siva Mahadevan from comment #5) I'd just leave the test as is and wait for https://reviews.freebsd.org/D54105 to land. It's an obvious candidate for that annotation. As are a number of the dummynet tests. Anything probabilistic is going to have this problem, and Igor has a more systematic solution to it.