Created attachment 232471 [details] Patch that works around the problem As discussed on the freebsd-net mailing list[0]. Also affects CURRENT. When running on multicore systems, epair interfaces stop forwarding traffic even on moderate load and don't recover unless recreated. This is a critical problem, as it breaks vnet jails running non-trivial workloads. The problem can be reproduced easily using a shell script[1]. This was introduced when adding multi-core improvements to epair[2]. It happens because work is scheduled in taskqueue(s) based on a check if mbuf ring buffers are empty, a logic which is racy on multi-core systems. The race is happening between epair_menq() and epair_tx_start_deferred(). The patch attached to this PR addresses the problem, but it needs to be looked at, profiled, and most likely improved by somebody who has a better understanding of both the code in question and writing lock free-code in general. [0]https://lists.freebsd.org/archives/freebsd-net/2022-March/001449.html [1]https://people.freebsd.org/~grembo/hang_epair.sh [2]https://cgit.freebsd.org/src/commit/?id=24f0bfbad57b9c3cb9b543a60b2ba00e4812c286
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=66acf7685bcd8cf23b6c658a991637238a01859e commit 66acf7685bcd8cf23b6c658a991637238a01859e Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2022-03-16 22:08:55 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-16 22:08:55 +0000 if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in 24f0bfbad57b9, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569 sys/net/if_epair.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bb9ad300f029d57c84804702daa2652542a2535f commit bb9ad300f029d57c84804702daa2652542a2535f Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2022-03-16 22:08:55 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-16 23:38:33 +0000 if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in 24f0bfbad57b9, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Approved by: re (gjb, early MFC) Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569 (cherry picked from commit 66acf7685bcd8cf23b6c658a991637238a01859e) sys/net/if_epair.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
A commit in branch releng/13.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1ddb6c5518e02953f3fe37d4da27a15ebd7a5a12 commit 1ddb6c5518e02953f3fe37d4da27a15ebd7a5a12 Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2022-03-16 22:08:55 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-16 23:40:40 +0000 if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in 24f0bfbad57b9, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Approved by: re (gjb, early MFC) Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569 (cherry picked from commit 66acf7685bcd8cf23b6c658a991637238a01859e) (cherry picked from commit bb9ad300f029d57c84804702daa2652542a2535f) sys/net/if_epair.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0bf7acd6b7047537a38e2de391a461e4e8956630 commit 0bf7acd6b7047537a38e2de391a461e4e8956630 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2022-03-17 02:35:13 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-17 05:43:47 +0000 if_epair: build fix 66acf7685b failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days sys/net/if_epair.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f6138d93b5115ff560b24200d1ea002cdc46bb64 commit f6138d93b5115ff560b24200d1ea002cdc46bb64 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2022-03-17 02:35:13 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-20 00:25:03 +0000 if_epair: build fix 66acf7685b failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days (cherry picked from commit 0bf7acd6b7047537a38e2de391a461e4e8956630) sys/net/if_epair.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fb3644ab2afe777fdd2539bc996a390443f052f1 commit fb3644ab2afe777fdd2539bc996a390443f052f1 Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2022-03-16 22:08:55 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-20 00:24:51 +0000 if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in 24f0bfbad57b9, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569 (cherry picked from commit 66acf7685bcd8cf23b6c658a991637238a01859e) sys/net/if_epair.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b1a3f8dccb6203036b7ee81201fd5b5a8de36f0d commit b1a3f8dccb6203036b7ee81201fd5b5a8de36f0d Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2022-03-17 02:35:13 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-20 00:25:06 +0000 if_epair: build fix 66acf7685b failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days (cherry picked from commit 0bf7acd6b7047537a38e2de391a461e4e8956630) sys/net/if_epair.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
A commit in branch releng/13.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=81c1d7c9c727184dacb31df563fb42d208ccde46 commit 81c1d7c9c727184dacb31df563fb42d208ccde46 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2022-03-17 02:35:13 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2022-03-20 15:25:40 +0000 if_epair: build fix 66acf7685b failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days Approved by: re (gjb) (cherry picked from commit 0bf7acd6b7047537a38e2de391a461e4e8956630) sys/net/if_epair.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
Can this PR be closed?
Yes.