mlx4 NIC resets when coming back from S3 sleep. dhcpcd does not handle this well. Apparently its reaction to getting a 'Device not configured' error from ps_bpf_recvbpf is to just log and continue like nothing happened? So the end result is, both dhcpcd and syslogd use a whole CPU core each, purely on spamming 'Device not configured' log messages :( Oct 23 13:54:31 ruunvald acpi[6066]: resumed at 20201023 13:54:31 [..snip: USB devices..] Oct 23 13:54:52 ruunvald dhcpcd[5451]: mlxen0: no IPv6 Routers available Oct 23 13:55:21 ruunvald kernel: [47193] mlx4_core0: command 0x49 timed out (go bit not cleared) Oct 23 13:55:21 ruunvald kernel: [47193] mlx4_core0: device is going to be reset Oct 23 13:55:22 ruunvald kernel: [47194] mlx4_core0: device was reset successfully Oct 23 13:55:22 ruunvald kernel: [47194] mlx4_en mlx4_core0: Internal error detected, restarting device Oct 23 13:55:25 ruunvald kernel: [47196] mlx4_core0: Internal error mark was detected on device Oct 23 13:55:25 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:25 ruunvald syslogd: last message repeated 1340 times Oct 23 13:55:25 ruunvald avahi-daemon[47574]: IPV6_DROP_MEMBERSHIP failed: Can't assign requested address Oct 23 13:55:25 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:25 ruunvald syslogd: last message repeated 2172 times Oct 23 13:55:25 ruunvald dhcpcd[5451]: mlxen0: if_getmtu: Device not configured Oct 23 13:55:25 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:25 ruunvald syslogd: last message repeated 396 times Oct 23 13:55:25 ruunvald dhcpcd[5451]: ipv6nd_applyra: Device not configured Oct 23 13:55:25 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:28 ruunvald syslogd: last message repeated 486197 times Oct 23 13:55:28 ruunvald kernel: [47200] mlx4_core0: Old device ETS support detected Oct 23 13:55:28 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:28 ruunvald syslogd: last message repeated 63 times Oct 23 13:55:28 ruunvald kernel: [47200] mlx4_core0: Consider upgrading device FW. Oct 23 13:55:28 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:29 ruunvald syslogd: last message repeated 208939 times Oct 23 13:55:29 ruunvald kernel: [47201] mlx4_core0: Unable to determine PCI device chain minimum BW Oct 23 13:55:29 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:29 ruunvald syslogd: last message repeated 11613 times Oct 23 13:55:29 ruunvald kernel: [47202] mlx4_en mlx4_core0: Activating port:1 Oct 23 13:55:29 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:29 ruunvald syslogd: last message repeated 516 times Oct 23 13:55:29 ruunvald kernel: [47202] mlxen0: Ethernet address: 00:02:c9:[REDACTED] Oct 23 13:55:29 ruunvald kernel: [47202] <4>mlx4_en: mlx4_core0: Port 1: Using 16 TX rings Oct 23 13:55:29 ruunvald kernel: [47202] mlxen0: link state changed to DOWN Oct 23 13:55:29 ruunvald kernel: [47202] <4>mlx4_en: mlx4_core0: Port 1: Using 16 RX rings Oct 23 13:55:29 ruunvald kernel: [47202] <4>mlx4_en: mlxen0: Using 16 TX rings Oct 23 13:55:29 ruunvald kernel: [47202] <4>mlx4_en: mlxen0: Using 16 RX rings Oct 23 13:55:29 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:29 ruunvald syslogd: last message repeated 49 times Oct 23 13:55:29 ruunvald kernel: [47202] <4>mlx4_en: mlxen0: Initializing port Oct 23 13:55:29 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:29 ruunvald syslogd: last message repeated 415 times Oct 23 13:55:29 ruunvald kernel: [47202] mlx4_core0: mlx4_restart_one was ended, ret=0 Oct 23 13:55:29 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:32 ruunvald syslogd: last message repeated 422257 times Oct 23 13:55:32 ruunvald kernel: [47204] <6>mlx4_en: mlxen0: Link Up Oct 23 13:55:32 ruunvald kernel: [47204] mlxen0: link state changed to UP Oct 23 13:55:32 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:52 ruunvald syslogd: last message repeated 3708975 times Oct 23 13:55:52 ruunvald kernel: nfs server fdc0:ffee::1:/solitude<118>[47225] ps_bpf_recvbpf: Device not configured Oct 23 13:55:52 ruunvald kernel: -bolt/greg: not responding Oct 23 13:55:52 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:55:52 ruunvald syslogd: last message repeated 69 times [^^nice^^] Oct 23 13:55:52 ruunvald kernel: nfs server fdc0:ffee::1:/solitude-bolt/greg: is a<118>[47225] ps_bpf_recvbpf: Device not configured Oct 23 13:55:52 ruunvald kernel: live again [^^nfs is up means network is okay now, but dhcpcd does not recover^^] Oct 23 13:55:52 ruunvald dhcpcd[77694]: ps_bpf_recvbpf: Device not configured Oct 23 13:56:22 ruunvald syslogd: last message repeated 5428175 times Oct 23 13:58:23 ruunvald syslogd: last message repeated 21508608 times Oct 23 14:08:24 ruunvald syslogd: last message repeated 107135631 times Oct 23 14:18:25 ruunvald syslogd: last message repeated 108124940 times Oct 23 14:23:38 ruunvald syslogd: last message repeated 56390531 times
Created attachment 219054 [details] Remove interface on ENXIO Pretty hard for me to test this sadly. It looks like your interface does a hardware reset and this confuses dhcpcd. Hopefully the attached patch helps. When this happens, the interface is clearly not available by name because if_getmtu fails. So it won't be reported by ifconfig(8). This at least will stop the constant log spam. However if dhcpcd is started for a single interface it will exit. You will need to run dhcpcd on all interfaces (ie dhcpcd_enable=YES in rc.conf rather than ifconfig_foo=DHCP). If dhcpcd does not recover then it's either that the kernel does not send RTM_IFANNOUNCE messages for this which is very bad or they got lost as route(4) overflowed. I can't really help for the former, but for the latter there is a kernel patch here so that dhcpcd can spot route(4) overflows and recover: https://reviews.freebsd.org/D26652
Created attachment 219060 [details] Close BPF socket on ENXIO Actually this patch is all that's needed for dhcpcd.
Created attachment 219061 [details] Close BPF socket on ENXIO Test if BPF socket has been closed earlier if resending.
Updated to 9.3.2, which includes the fix. Looks like it's good now. Nov 6 15:00:12 ruunvald kernel: [2049] mlx4_core0: device was reset successfully Nov 6 15:00:15 ruunvald dhcpcd[57525]: mlxen0: ps_bpf_recvbpf: Device not configured Nov 6 15:00:15 ruunvald avahi-daemon[82553]: IPV6_DROP_MEMBERSHIP failed: Can't assign requested address Nov 6 15:00:15 ruunvald dhcpcd[55689]: mlxen0: if_getmtu: Device not configured Nov 6 15:00:15 ruunvald dhcpcd[55689]: ipv6nd_applyra: Device not configured [..mlx stuff..] Nov 6 15:00:21 ruunvald kernel: [2059] mlxen0: link state changed to UP and no leftover BPF ARP process spinning on cpu. Thanks!