Bug 125617 - [ath] [panic] ath(4) related panic
Summary: [ath] [panic] ath(4) related panic
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 7.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-15 00:10 UTC by Rory Arms
Modified: 2018-05-28 19:46 UTC (History)
0 users

See Also:


Attachments
smime.p7s (2.36 KB, application/pkcs7-signature)
2008-08-03 16:57 UTC, rorya
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rory Arms 2008-07-15 00:10:02 UTC
I noticed that fxp1 was producing a lot of errors. At first I noticed it because the NFS clients were dropping a lot of packets, and there were big delays in pinging the servers from the clients as well.

So, I looked at the console and saw several of these errors over and over.

fxp1: SCB timeout: 0x80 0x0 0x50 0x400

In my case, I have ath0 bridged with fxp1, to form one network. So the above errors were mixed in with 

ath0: ath_reset: unable to reset hardware; hal status 03

This is the first time I've noticed this with this release, after over 60 days of uptime. I had been noticing that the wireless sometimes wasn't routing correctly through the NAT router (natd(8) + ipfw(4)), even though the fxp1 clients could, over that time, but it was an intermittent problem. I assume that issue was related to a bug in if_bridge(4), but that's just a guess. All I know is that issue started happening with 7.0.

So, the next thing I decided to do is to bring down the bridge0 interface, and see if that would alleviate the issue (again, thinking the ethernet problems I was seeing were exacerbated by being linked in bridge0 or a problem with ath0.

A few minutes after I downed the bridge0 interface, the kernel paniced.

I have minidumps turned on so on the next boot it was able to scavange the dump. Here's the backtrace, as seen via kgdb(1):

> sudo kgdb /boot/kernel/kernel vmcore.0
Password:
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
ath0: device timeout


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xc49a770c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc04b569a
stack pointer           = 0x28:0xe3ffebc4
frame pointer           = 0x28:0xe3ffebf8
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 14 (swi4: clock sio)
trap number             = 12
panic: page fault
cpuid = 1
Uptime: 48d12h18m43s
Physical memory: 1015 MB
Dumping 197 MB: 182 166 150 134 118 102 86 70 54 38 22 6

#0  doadump () at pcpu.h:195
195     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc059fbd6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc059feae in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc08190cc in trap_fatal (frame=0xe3ffeb84, eva=3298457356)
    at /usr/src/sys/i386/i386/trap.c:899
#4  0xc081933b in trap_pfault (frame=0xe3ffeb84, usermode=0, eva=3298457356)
    at /usr/src/sys/i386/i386/trap.c:812
#5  0xc0819d32 in trap (frame=0xe3ffeb84) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc080097b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc04b569a in ath_rxbuf_init (sc=0xc3bdf000, bf=0xc3be9324)
    at /usr/src/sys/dev/ath/if_ath.c:3284
#8  0xc04b5919 in ath_startrecv (sc=0xc3bdf000)
    at /usr/src/sys/dev/ath/if_ath.c:4928
#9  0xc04bce7c in ath_reset (ifp=0xc3bc8800)
    at /usr/src/sys/dev/ath/if_ath.c:1145
#10 0xc04bd3bb in ath_watchdog (ifp=0xc3bc8800)
    at /usr/src/sys/dev/ath/if_ath.c:5774
#11 0xc0630871 in if_slowtimo (arg=0x0) at /usr/src/sys/net/if.c:1478
#12 0xc05b2136 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274
#13 0xc058242b in ithread_loop (arg=0xc3b00230)
    at /usr/src/sys/kern/kern_intr.c:1036
#14 0xc057f154 in fork_exit (callout=0xc0582260 <ithread_loop>, 
    arg=0xc3b00230, frame=0xe3ffed38) at /usr/src/sys/kern/kern_fork.c:781
---Type <return> to continue, or q <return> to quit---
#15 0xc08009f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205
(kgdb) print panicstr
$1 = 0xc08f3e00 "page fault"

While the server was fscking everything, I disconnected the cable and rerouted it, since it was tangled with a lot of other cables.. so thinking this issue could have been the result of some cross-talk, I rerouted it. I restarted the server and fxp1 has been working normally now for about 5 hours, with not a single new SCB timeout error in the logs, since the restart.

As always here's the kernel configuration, sans the commented lines:

cpu             I686_CPU
ident           TSERVER-70


makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

options         SCHED_4BSD              # 4BSD scheduler
options         PREEMPTION              # Enable kernel thread preemption
options         INET                    # InterNETworking
options         INET6                   # IPv6 communications protocols
options         SCTP                    # Stream Control Transmission Protocol
options         FFS                     # Berkeley Fast Filesystem
options         SOFTUPDATES             # Enable FFS soft updates support
options         UFS_ACL                 # Support for access control lists
options         UFS_DIRHASH             # Improve performance on big directories
options         UFS_GJOURNAL            # Enable gjournal-based UFS journaling
options         NFSCLIENT               # Network Filesystem Client
options         NFSSERVER               # Network Filesystem Server
options         MSDOSFS                 # MSDOS Filesystem
options         CD9660                  # ISO 9660 Filesystem
options         PROCFS                  # Process filesystem (requires PSEUDOFS)
options         PSEUDOFS                # Pseudo-filesystem framework
options         GEOM_PART_GPT           # GUID Partition Tables.
options         GEOM_LABEL              # Provides labelization
options         COMPAT_43TTY            # BSD 4.3 TTY compat [KEEP THIS!]
options         COMPAT_FREEBSD6         # Compatible with FreeBSD6
options         SCSI_DELAY=5000         # Delay (in ms) before probing SCSI
options         KTRACE                  # ktrace(1) support
options         SYSVSHM                 # SYSV-style shared memory
options         SYSVMSG                 # SYSV-style message queues
options         SYSVSEM                 # SYSV-style semaphores
options         _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options         KBD_INSTALL_CDEV        # install a CDEV entry in /dev
options         ADAPTIVE_GIANT          # Giant mutex is adaptive.
options         STOP_NMI                # Stop CPUS using NMI instead of IPI
options         AUDIT                   # Security event auditing

options         SMP                     # Symmetric MultiProcessor Kernel
device          apic                    # I/O APIC

device          cpufreq

options         IPDIVERT                # divert(4)

options         IPFIREWALL              #firewall
options         IPFIREWALL_VERBOSE      #enable logging to syslogd(8)
options         IPFIREWALL_DEFAULT_TO_ACCEPT    #allow everything by default

options         DUMMYNET                # dummynet(4)



device          pci

device          fdc

device          ata
device          atadisk         # ATA disk drives
device          ataraid         # ATA RAID drives
device          atapicd         # ATAPI CDROM drives
options         ATA_STATIC_ID   # Static device numbering

device          ahc             # AHA2940 and onboard AIC7xxx devices
options         AHC_REG_PRETTY_PRINT    # Print register bitfields in debug
                                        # output.  Adds ~128k to driver.
                                        # output.  Adds ~215k to driver.



device          scbus           # SCSI bus (required for SCSI)
device          ch              # SCSI media changers
device          da              # Direct Access (disks)
device          sa              # Sequential Access (tape etc)
device          cd              # CD
device          pass            # Passthrough device (direct SCSI access)
device          ses             # SCSI Environmental Services (and SAF-TE)



device          atkbdc          # AT keyboard controller
device          atkbd           # AT keyboard
device          psm             # PS/2 mouse

device          kbdmux          # keyboard multiplexer

device          vga             # VGA video card driver

device          splash          # Splash screen and screen saver support

device          sc

device          agp             # support several AGP chipsets

device          pmtimer

device          cbb             # cardbus (yenta) bridge
device          pccard          # PC Card (16-bit) bus
device          cardbus         # CardBus (32-bit) bus

device          sio             # 8250, 16[45]50 based serial ports
device          uart            # Generic UART driver

device          ppc
device          ppbus           # Parallel port bus (required)
device          lpt             # Printer
device          plip            # TCP/IP over parallel
device          ppi             # Parallel port interface device



device          miibus          # MII bus support
device          fxp             # Intel EtherExpress PRO/100B (82557, 82558)


device          wlan            # 802.11 support
device          wlan_wep        # 802.11 WEP support
device          wlan_ccmp       # 802.11 CCMP support
device          wlan_tkip       # 802.11 TKIP support
device          wlan_amrr       # AMRR transmit rate control algorithm
device          wlan_scan_ap    # 802.11 AP mode scanning
device          wlan_scan_sta   # 802.11 STA mode scanning
device          ath             # Atheros pci/cardbus NIC's
device          ath_hal         # Atheros HAL (Hardware Access Layer)
device          ath_rate_sample # SampleRate tx rate control for ath

device          loop            # Network loopback
device          random          # Entropy device
device          ether           # Ethernet support
device          sl              # Kernel SLIP
device          ppp             # Kernel PPP
device          tun             # Packet tunnel.
device          pty             # Pseudo-ttys (telnet etc)
device          md              # Memory "disks"
device          gif             # IPv6 and IPv4 tunneling
device          faith           # IPv6-to-IPv4 relaying (translation)
device          firmware        # firmware assist module

device          bpf             # Berkeley packet filter

device          uhci            # UHCI PCI->USB interface
device          ohci            # OHCI PCI->USB interface
device          usb             # USB Bus (required)
device          ugen            # Generic
device          uhid            # "Human Interface Devices"
device          ukbd            # Keyboard
device          ulpt            # Printer
device          umass           # Disks/Mass storage - Requires scbus and da
device          ums             # Mouse
device          ural            # Ralink Technology RT2500USB wireless NICs
device          rum             # Ralink Technology RT2501USB wireless NICs

How-To-Repeat: Unsure, unless it's something that can always be reproduced by downing the bridge0 interface, which has two members, fxp1 and ath0. Looking at the traceback the panic seemed to have been caused by ath(4), so I'm not sure that the bridge is at fault here, but maybe some kind of unhandled scenario by ath(4).
Comment 1 rorya 2008-08-03 16:57:54 UTC
Ok, I got another panic, which may be related to the one mentioned in  
the original, simply based on how the system behaved right before it  
paniced (ath0 & fxp1 suddenly giving errors). However, the code path  
does seem to be different:


 > sudo kgdb /boot/kernel/kernel vmcore.0             Password:
[GDB will not be able to debug user-mode threads: /usr/lib/ 
libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for  
details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xc
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc05eb700
stack pointer           = 0x28:0xe4052b68
frame pointer           = 0x28:0xe4052b80
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 23 (irq19: fxp1 uhci0)
trap number             = 12
panic: page fault
cpuid = 1
Uptime: 15d1h16m43s
Physical memory: 1015 MB
Dumping 249 MB: 234 218 202 186 170 154 138 122 106 90 74 58 42 26 10

#0  doadump () at pcpu.h:195
195     pcpu.h: No such file or directory.
         in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc059fbd6 in boot (howto=260) at /usr/src/sys/kern/ 
kern_shutdown.c:409
#2  0xc059feae in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc08190cc in trap_fatal (frame=0xe4052b28, eva=12)
     at /usr/src/sys/i386/i386/trap.c:899
#4  0xc081933b in trap_pfault (frame=0xe4052b28, usermode=0, eva=12)
     at /usr/src/sys/i386/i386/trap.c:812
#5  0xc0819d32 in trap (frame=0xe4052b28) at /usr/src/sys/i386/i386/ 
trap.c:490
#6  0xc080097b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc05eb700 in m_copydata (m=0x0, off=0, len=140,
     cp=0xc6289a74 "&_\026?,M\t7P\020") at /usr/src/sys/kern/ 
uipc_mbuf.c:813
#8  0xc06862b1 in ip_forward (m=0xc54bf800, srcrt=1)
     at /usr/src/sys/netinet/ip_input.c:1307
#9  0xc0687922 in ip_input (m=0xc54bf800)
     at /usr/src/sys/netinet/ip_input.c:610
#10 0xc0640a22 in netisr_dispatch (num=2, m=0xc54bf800)
     at /usr/src/sys/net/netisr.c:185
#11 0xc0637a71 in ether_demux (ifp=0xc3bc1400, m=0xc54bf800)
     at /usr/src/sys/net/if_ethersubr.c:834
#12 0xc0637e9f in ether_input (ifp=0xc3bc1400, m=0xc54bf800)
     at /usr/src/sys/net/if_ethersubr.c:692
#13 0xc04c573a in fxp_intr (xsc=0xc3c52000)
     at /usr/src/sys/dev/fxp/if_fxp.c:1706
---Type <return> to continue, or q <return> to quit---
#14 0xc058242b in ithread_loop (arg=0xc3bc2b10)
     at /usr/src/sys/kern/kern_intr.c:1036
#15 0xc057f154 in fork_exit (callout=0xc0582260 <ithread_loop>,
     arg=0xc3bc2b10, frame=0xe4052d38) at /usr/src/sys/kern/ 
kern_fork.c:781
#16 0xc08009f0 in fork_trampoline () at /usr/src/sys/i386/i386/ 
exception.s:205
(kgdb) p panicstr
$1 = 0xc08f3e00 "page fault"
(kgdb)

messages that led up to the panic:

Aug  3 10:27:35 Tserver kernel: ath0: stuck beacon; resetting (bmiss  
count 4)
Aug  3 10:27:35 Tserver kernel: ath0: ath_reset: unable to reset
Aug  3 10:27:35 Tserver kernel: hardware; hal status
Aug  3 10:27:35 Tserver kernel: 3
Aug  3 10:28:00 Tserver kernel: ath0: device timeout
Aug  3 10:35:06 Tserver kernel: fxp1: SCB timeout: 0x80 0x0 0x50 0x600
Aug  3 10:35:24 Tserver last message repeated 71 times
Aug  3 10:35:28 Tserver kernel: fxp1: device timeout
Aug  3 10:35:28 Tserver kernel: fxp1: SCB timeout: 0x10 0x0 0x40 0x0
Aug  3 10:35:30 Tserver kernel: fxp1: device timeout
[[ panic happened not long after this event ]]

Comment 2 Alexander Best freebsd_committer 2010-09-07 00:07:02 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 3 Adrian Chadd freebsd_committer 2011-05-10 11:13:54 UTC
Responsible Changed
From-To: freebsd-net->freebsd-wireless

Reassigning to freebsd-wireless 

Submitter: is this still a problem for you?
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:46:44 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.