I noticed that fxp1 was producing a lot of errors. At first I noticed it because the NFS clients were dropping a lot of packets, and there were big delays in pinging the servers from the clients as well. So, I looked at the console and saw several of these errors over and over. fxp1: SCB timeout: 0x80 0x0 0x50 0x400 In my case, I have ath0 bridged with fxp1, to form one network. So the above errors were mixed in with ath0: ath_reset: unable to reset hardware; hal status 03 This is the first time I've noticed this with this release, after over 60 days of uptime. I had been noticing that the wireless sometimes wasn't routing correctly through the NAT router (natd(8) + ipfw(4)), even though the fxp1 clients could, over that time, but it was an intermittent problem. I assume that issue was related to a bug in if_bridge(4), but that's just a guess. All I know is that issue started happening with 7.0. So, the next thing I decided to do is to bring down the bridge0 interface, and see if that would alleviate the issue (again, thinking the ethernet problems I was seeing were exacerbated by being linked in bridge0 or a problem with ath0. A few minutes after I downed the bridge0 interface, the kernel paniced. I have minidumps turned on so on the next boot it was able to scavange the dump. Here's the backtrace, as seen via kgdb(1): > sudo kgdb /boot/kernel/kernel vmcore.0 Password: [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: ath0: device timeout Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc49a770c fault code = supervisor read, page not present instruction pointer = 0x20:0xc04b569a stack pointer = 0x28:0xe3ffebc4 frame pointer = 0x28:0xe3ffebf8 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 14 (swi4: clock sio) trap number = 12 panic: page fault cpuid = 1 Uptime: 48d12h18m43s Physical memory: 1015 MB Dumping 197 MB: 182 166 150 134 118 102 86 70 54 38 22 6 #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc059fbd6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc059feae in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc08190cc in trap_fatal (frame=0xe3ffeb84, eva=3298457356) at /usr/src/sys/i386/i386/trap.c:899 #4 0xc081933b in trap_pfault (frame=0xe3ffeb84, usermode=0, eva=3298457356) at /usr/src/sys/i386/i386/trap.c:812 #5 0xc0819d32 in trap (frame=0xe3ffeb84) at /usr/src/sys/i386/i386/trap.c:490 #6 0xc080097b in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc04b569a in ath_rxbuf_init (sc=0xc3bdf000, bf=0xc3be9324) at /usr/src/sys/dev/ath/if_ath.c:3284 #8 0xc04b5919 in ath_startrecv (sc=0xc3bdf000) at /usr/src/sys/dev/ath/if_ath.c:4928 #9 0xc04bce7c in ath_reset (ifp=0xc3bc8800) at /usr/src/sys/dev/ath/if_ath.c:1145 #10 0xc04bd3bb in ath_watchdog (ifp=0xc3bc8800) at /usr/src/sys/dev/ath/if_ath.c:5774 #11 0xc0630871 in if_slowtimo (arg=0x0) at /usr/src/sys/net/if.c:1478 #12 0xc05b2136 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274 #13 0xc058242b in ithread_loop (arg=0xc3b00230) at /usr/src/sys/kern/kern_intr.c:1036 #14 0xc057f154 in fork_exit (callout=0xc0582260 <ithread_loop>, arg=0xc3b00230, frame=0xe3ffed38) at /usr/src/sys/kern/kern_fork.c:781 ---Type <return> to continue, or q <return> to quit--- #15 0xc08009f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205 (kgdb) print panicstr $1 = 0xc08f3e00 "page fault" While the server was fscking everything, I disconnected the cable and rerouted it, since it was tangled with a lot of other cables.. so thinking this issue could have been the result of some cross-talk, I rerouted it. I restarted the server and fxp1 has been working normally now for about 5 hours, with not a single new SCB timeout error in the logs, since the restart. As always here's the kernel configuration, sans the commented lines: cpu I686_CPU ident TSERVER-70 makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_4BSD # 4BSD scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options AUDIT # Security event auditing options SMP # Symmetric MultiProcessor Kernel device apic # I/O APIC device cpufreq options IPDIVERT # divert(4) options IPFIREWALL #firewall options IPFIREWALL_VERBOSE #enable logging to syslogd(8) options IPFIREWALL_DEFAULT_TO_ACCEPT #allow everything by default options DUMMYNET # dummynet(4) device pci device fdc device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives options ATA_STATIC_ID # Static device numbering device ahc # AHA2940 and onboard AIC7xxx devices options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. # output. Adds ~215k to driver. device scbus # SCSI bus (required for SCSI) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support device sc device agp # support several AGP chipsets device pmtimer device cbb # cardbus (yenta) bridge device pccard # PC Card (16-bit) bus device cardbus # CardBus (32-bit) bus device sio # 8250, 16[45]50 based serial ports device uart # Generic UART driver device ppc device ppbus # Parallel port bus (required) device lpt # Printer device plip # TCP/IP over parallel device ppi # Parallel port interface device device miibus # MII bus support device fxp # Intel EtherExpress PRO/100B (82557, 82558) device wlan # 802.11 support device wlan_wep # 802.11 WEP support device wlan_ccmp # 802.11 CCMP support device wlan_tkip # 802.11 TKIP support device wlan_amrr # AMRR transmit rate control algorithm device wlan_scan_ap # 802.11 AP mode scanning device wlan_scan_sta # 802.11 STA mode scanning device ath # Atheros pci/cardbus NIC's device ath_hal # Atheros HAL (Hardware Access Layer) device ath_rate_sample # SampleRate tx rate control for ath device loop # Network loopback device random # Entropy device device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) device firmware # firmware assist module device bpf # Berkeley packet filter device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device usb # USB Bus (required) device ugen # Generic device uhid # "Human Interface Devices" device ukbd # Keyboard device ulpt # Printer device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse device ural # Ralink Technology RT2500USB wireless NICs device rum # Ralink Technology RT2501USB wireless NICs How-To-Repeat: Unsure, unless it's something that can always be reproduced by downing the bridge0 interface, which has two members, fxp1 and ath0. Looking at the traceback the panic seemed to have been caused by ath(4), so I'm not sure that the bridge is at fault here, but maybe some kind of unhandled scenario by ath(4).
Ok, I got another panic, which may be related to the one mentioned in the original, simply based on how the system behaved right before it paniced (ath0 & fxp1 suddenly giving errors). However, the code path does seem to be different: > sudo kgdb /boot/kernel/kernel vmcore.0 Password: [GDB will not be able to debug user-mode threads: /usr/lib/ libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xc fault code = supervisor read, page not present instruction pointer = 0x20:0xc05eb700 stack pointer = 0x28:0xe4052b68 frame pointer = 0x28:0xe4052b80 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 23 (irq19: fxp1 uhci0) trap number = 12 panic: page fault cpuid = 1 Uptime: 15d1h16m43s Physical memory: 1015 MB Dumping 249 MB: 234 218 202 186 170 154 138 122 106 90 74 58 42 26 10 #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc059fbd6 in boot (howto=260) at /usr/src/sys/kern/ kern_shutdown.c:409 #2 0xc059feae in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc08190cc in trap_fatal (frame=0xe4052b28, eva=12) at /usr/src/sys/i386/i386/trap.c:899 #4 0xc081933b in trap_pfault (frame=0xe4052b28, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:812 #5 0xc0819d32 in trap (frame=0xe4052b28) at /usr/src/sys/i386/i386/ trap.c:490 #6 0xc080097b in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc05eb700 in m_copydata (m=0x0, off=0, len=140, cp=0xc6289a74 "&_\026?,M\t7P\020") at /usr/src/sys/kern/ uipc_mbuf.c:813 #8 0xc06862b1 in ip_forward (m=0xc54bf800, srcrt=1) at /usr/src/sys/netinet/ip_input.c:1307 #9 0xc0687922 in ip_input (m=0xc54bf800) at /usr/src/sys/netinet/ip_input.c:610 #10 0xc0640a22 in netisr_dispatch (num=2, m=0xc54bf800) at /usr/src/sys/net/netisr.c:185 #11 0xc0637a71 in ether_demux (ifp=0xc3bc1400, m=0xc54bf800) at /usr/src/sys/net/if_ethersubr.c:834 #12 0xc0637e9f in ether_input (ifp=0xc3bc1400, m=0xc54bf800) at /usr/src/sys/net/if_ethersubr.c:692 #13 0xc04c573a in fxp_intr (xsc=0xc3c52000) at /usr/src/sys/dev/fxp/if_fxp.c:1706 ---Type <return> to continue, or q <return> to quit--- #14 0xc058242b in ithread_loop (arg=0xc3bc2b10) at /usr/src/sys/kern/kern_intr.c:1036 #15 0xc057f154 in fork_exit (callout=0xc0582260 <ithread_loop>, arg=0xc3bc2b10, frame=0xe4052d38) at /usr/src/sys/kern/ kern_fork.c:781 #16 0xc08009f0 in fork_trampoline () at /usr/src/sys/i386/i386/ exception.s:205 (kgdb) p panicstr $1 = 0xc08f3e00 "page fault" (kgdb) messages that led up to the panic: Aug 3 10:27:35 Tserver kernel: ath0: stuck beacon; resetting (bmiss count 4) Aug 3 10:27:35 Tserver kernel: ath0: ath_reset: unable to reset Aug 3 10:27:35 Tserver kernel: hardware; hal status Aug 3 10:27:35 Tserver kernel: 3 Aug 3 10:28:00 Tserver kernel: ath0: device timeout Aug 3 10:35:06 Tserver kernel: fxp1: SCB timeout: 0x80 0x0 0x50 0x600 Aug 3 10:35:24 Tserver last message repeated 71 times Aug 3 10:35:28 Tserver kernel: fxp1: device timeout Aug 3 10:35:28 Tserver kernel: fxp1: SCB timeout: 0x10 0x0 0x40 0x0 Aug 3 10:35:30 Tserver kernel: fxp1: device timeout [[ panic happened not long after this event ]]
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
Responsible Changed From-To: freebsd-net->freebsd-wireless Reassigning to freebsd-wireless Submitter: is this still a problem for you?
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Keyword: crash – in lieu of summary line prefix: [panic] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>