| Summary: | network locks up with IPv6 udp traffic | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Dmitry Sivachenko <demon> | ||||
| Component: | kern | Assignee: | Andrey V. Elsukov <ae> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Only Me | CC: | adrian, ae, jch, net, rwatson | ||||
| Priority: | --- | ||||||
| Version: | 10.0-STABLE | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
|
Description
Dmitry Sivachenko
2015-01-24 23:16:23 UTC
For us this is rather severe problem (it take about 10 seconds to leave machine without working network). If these LORs are not enough to debug this issue, I am more than willing to provide any necessary info, please ask. Some notes form an e-mail from myself to Andrey about this problem: Basically, the general rule, with respect to lock order, is that the network-stack input path can call the output path (e.g., inbound TCP segment triggers an immediate send of a TCP ACK), but you can't directly call in the other direction or you would violate the lock order. In this sort of situation, the output path needs to reinject packets via the netisr, rather than directly invoking the input path. This is how we handle, for example, routing-socket packets triggered by send events -- they are enqueued to the netisr for processing asynchronously, providing a context where transmit-pathj locks can be safely acquired. (Basically, pfctlinput() is never safe to call from the transmit path.) A further note on the problem: A good question is whether the current behaviour actually makes sense: do we really need to notify all sockets of a change in MTU discovered by one socket on transmit? Or can we just let the others sockets discover the change on demand as they next try to transmit? (I don't take a strong view on the answer, except to point out that it would be simpler if, as in IPv4, we didn't try to notify all sockets of the event.) Created attachment 153179 [details] On output path send IPV6_PATHMTU ancillary data only to the socket, that had initiated an error (In reply to Robert Watson from comment #3) > A further note on the problem: > > A good question is whether the current behaviour actually makes sense: do we > really need to notify all sockets of a change in MTU discovered by one > socket on transmit? Or can we just let the others sockets discover the > change on demand as they next try to transmit? > > (I don't take a strong view on the answer, except to point out that it would > be simpler if, as in IPv4, we didn't try to notify all sockets of the event.) I think this was implemented according to what RFC3542 says (p. 11.3):" Note that this also means an application that sets the option may receive an IPV6_MTU ancillary data item for each ICMP too big error the node receives, including such ICMP errors caused by other applications on the node." But this doesn't mean we should send these ancillary data, when message size exceeds link MTU. So, I propose the following patch for testing I can confirm this patch fixes my problem. A commit references this bug: Author: ae Date: Wed Mar 4 11:20:03 UTC 2015 New revision: 279588 URL: https://svnweb.freebsd.org/changeset/base/279588 Log: Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. PR: 197059 Differential Revision: https://reviews.freebsd.org/D1949 Reviewed by: no objection from #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Changes: head/sys/netinet6/in6_pcb.c head/sys/netinet6/ip6_input.c head/sys/netinet6/ip6_output.c head/sys/netinet6/ip6_var.h A commit references this bug: Author: ae Date: Thu Mar 12 09:04:21 UTC 2015 New revision: 279911 URL: https://svnweb.freebsd.org/changeset/base/279911 Log: MFC r279588: Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. MFC r279684: tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify(). Check cmdarg isn't NULL before dereference, this check was in the ip6_notify_pmtu() before r279588. PR: 197059 Sponsored by: Yandex LLC Changes: _U stable/10/ stable/10/sys/netinet6/in6_pcb.c stable/10/sys/netinet6/ip6_input.c stable/10/sys/netinet6/ip6_output.c stable/10/sys/netinet6/ip6_var.h A commit references this bug: Author: ae Date: Thu Mar 12 09:16:52 UTC 2015 New revision: 279912 URL: https://svnweb.freebsd.org/changeset/base/279912 Log: MFC r279588: Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. MFC r279684: tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify(). Check cmdarg isn't NULL before dereference, this check was in the ip6_notify_pmtu() before r279588. PR: 197059 Sponsored by: Yandex LLC Changes: _U stable/9/sys/ stable/9/sys/netinet6/in6_pcb.c stable/9/sys/netinet6/ip6_input.c stable/9/sys/netinet6/ip6_output.c stable/9/sys/netinet6/ip6_var.h |