Bug 289477 - sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4 test fails intermittently in CI
Summary: sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4 test f...
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: tests (show other bugs)
Version: 16.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Kajetan Staszkiewicz
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2025-09-11 20:11 UTC by Siva Mahadevan
Modified: 2025-12-15 15:48 UTC (History)
4 users (show)

See Also:


Attachments
kyua test report (8.35 KB, text/plain)
2025-09-11 20:11 UTC, Siva Mahadevan
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Siva Mahadevan freebsd_committer freebsd_triage 2025-09-11 20:11:16 UTC
Created attachment 263728 [details]
kyua test report

The 'sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4' testcase fails intermittently in CI:


===> sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4
Result:     failed: Target 2001:db8:4202::42:4 not selected after 10 attempts!


I have reproduced this error using Bricoler with 100 runs for good measure and attached the failing test report in this bug:

$ bricoler run freebsd-src-regression-suite --param freebsd-src:url=/usr/src --param freebsd-src:branch= --param freebsd-src-regression-suite:hypervisor=bhyve --param freebsd-src-regression-suite:memory=4096 --param freebsd-src-regression-suite:ncpus=2 --param freebsd-src-regression-suite:parallelism=1 --param freebsd-src-regression-suite:count=100 --param freebsd-src-regression-suite:tests='sys/netpfil/pf/route_to:prefer_ipv6_nexthop_mixed_af_random_table_ipv4'

@ks, since you made the last significant edit to the tests in https://cgit.freebsd.org/src/commit/?id=65c318630123fcf2b6f491bf4d02a5cad3031d20 (pf: Add prefer-ipv6-nexthop option for route-to pools), please triage as necessary.
Comment 1 Kajetan Staszkiewicz 2025-09-16 14:00:42 UTC
The problem with this test and a few other similar ones is that they test random selection of pf loadbalancing. The test uses a route-to action with lists of good and bad nexthops. For each good nexthop it attempts a tcp connection up to 10 times and checks if this good nexthop was selected.

Since it's testing the *random* algorithm it can occasionally fail. Even if I raise the amount of connections being made it could still fail. Any idea how to better test a random algorithm?
Comment 2 Kristof Provost freebsd_committer freebsd_triage 2025-09-17 12:07:22 UTC
(In reply to Kajetan Staszkiewicz from comment #1)
I'm afraid I can't think of anything better than 'ensure the sample size is large enough to make failure very unlikely'.
Comment 3 Siva Mahadevan freebsd_committer freebsd_triage 2025-10-08 14:35:44 UTC
These tests are continuing to fail in CI (https://ci.freebsd.org/view/Test/job/FreeBSD-main-amd64-test/26918/), so if there is no other way to deterministically seed the random algorithm, then please go ahead and increase the sample size to something like 100 instead.
Comment 4 Siva Mahadevan freebsd_committer freebsd_triage 2025-11-04 13:28:23 UTC
Any updates on this? This is one of the only remaining failing tests in CI. Can we go ahead and increase the sample size as per Comment 2?
Comment 5 Siva Mahadevan freebsd_committer freebsd_triage 2025-12-15 15:15:56 UTC
Ping on this bug, can we get confirmation on Comment 2? These tests are still intermittently failing in CI: https://ci.freebsd.org/view/Test/job/FreeBSD-main-amd64-test/27448/testReport/sys.netpfil.pf/route_to/random_table/
Comment 6 Kristof Provost freebsd_committer freebsd_triage 2025-12-15 15:48:18 UTC
(In reply to Siva Mahadevan from comment #5)
I'd just leave the test as is and wait for https://reviews.freebsd.org/D54105 to land. It's an obvious candidate for that annotation. As are a number of the dummynet tests. Anything probabilistic is going to have this problem, and Igor has a more systematic solution to it.