Bug 258339

Summary: ports-mgmt/poudriere: Poudriere host loses network connectivity during bulk run
Product: Ports & Packages Reporter: Trix Farrar <trix>
Component: Individual Port(s)Assignee: Bryan Drewery <bdrewery>
Status: New ---    
Severity: Affects Only Me CC: grahamperrin
Priority: --- Flags: bugzilla: maintainer-feedback? (bdrewery)
Version: Latest   
Hardware: amd64   
OS: Any   

Description Trix Farrar 2021-09-07 14:00:04 UTC
Overview:  
OS: FreeBSD 13.0-RELEASE-p4 amd64 (root@amd64-builder.daemonology.net) local ZFS filesystems (root on ZFS)
Poudriere: 3.3.7 (built from Ports)
Hardware: HP Pavilion Desktop 590-p0xxx
  CPU: AMD Ryzen 5 2400G (8) @ 3.593GHz
  RAM: 12GB
    $ grep memory /var/run/dmesg.boot
    real memory  = 12884901888 (12288 MB)
    avail memory = 11296428032 (10773 MB)
    $ 
  NIC: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet>

/usr/local/etc/poudriere.conf contains NO_ZFS=yes and BASEFS=/opt/poudriere.

/opt/poudriere is an NFS mount (from TruNAS) with options "rw,hard,nfsv3,tcp" over IPv4.

At a seemingly random point in the bulk run, the OS will report that the NFS mount has timed out.  This message is repeated.  Console messages indicate that re0's watchdog has timed out, the interface reports that it has gone down and come back up.  This series repeats.

ifconfig(8) reports IPv4 and IPv6 addresses.  DHCP address is a reservation, and should not time out.  Pings sent to local gateway (or any address, really, report "no route to host" even though 'netstat -rn4' appears normal.

Only fix appears to be a power-cycle.  'shutdown -r now' eventually terminates due to timeout after signalling all processes.

Closest I've come to pinpointing a failure is large, memory intensive port builds, like multiple C compilers (llvm _and_ gcc, building at the same time.