Bug 47441

Summary: 5.0R: SMP makes xl0 unusable
Product: Base System Reporter: Bjoern A. Zeeb <bzeeb+freebsd>
Component: kernAssignee: silby
Status: Closed FIXED    
Severity: Affects Only Me CC: bzeeb+freebsd
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Bjoern A. Zeeb 2003-01-24 19:10:01 UTC
	5.0R generic kernel with SMP enabled

	I get:

	xl0: watchdog timeout
	xl0: watchdog timeout
	xl0: watchdog timeout

	and xl0 is completle unusable.

	If I boot GENERIC from CD this does not happen but
	also GENERIC does not seem to be SMP enabled

Fix: 

N/A
How-To-Repeat: 
	take 5.0R disc1, install incl. src

	copy GERNERIC to mygenericsmp.conf

	enable
	# To make an SMP kernel, the next two are needed
	options         SMP                     # Symmetric MultiProcessor Kernel
	options         APIC_IO                 # Symmetric (APIC) I/O

	make buildkernel KERNCONF=mygernericsmp.com
	make installkernel KERNCONF=mygenericsmp.conf

	reboot... and watch
Comment 1 nagy.attila 2003-01-27 14:20:33 UTC
Hello,

> 	xl0: watchdog timeout
> 	and xl0 is completle unusable.
> 	If I boot GENERIC from CD this does not happen but
> 	also GENERIC does not seem to be SMP enabled
Are you sure that this is the problem?
I have a box with 5.0-RELEASE, SMP and xl0 and works OK.

----------[ Free Software ISOs - http://www.fsn.hu/?f=download ]----------
Attila Nagy					e-mail: Attila.Nagy@fsn.hu
Free Software Network (FSN.HU)		  phone @work: +361 210 1415 (194)
						cell.: +3630 306 6758
Comment 2 Bjoern A. Zeeb 2003-01-27 18:55:01 UTC
On Mon, 27 Jan 2003, Attila Nagy wrote:

> > 	xl0: watchdog timeout
> > 	and xl0 is completle unusable.
> > 	If I boot GENERIC from CD this does not happen but
> > 	also GENERIC does not seem to be SMP enabled
> Are you sure that this is the problem?
> I have a box with 5.0-RELEASE, SMP and xl0 and works OK.

Pretty sure cause the new kernel was the only difference
between plain 5.0 CD installation and the reboot.

If I went back and boot kernel.old it again worked ok.

Also tried HEAD with same symptoms: GENERIC is ok,
SMP gives watchdog timeouts.


Perhaps it is also dependend on the card used ?

xl0@pci0:15:0:  class=0x020000 card=0x100010b7 chip=0x920010b7 rev=0x74 hdr=0x00
    vendor   = '3COM Corp, Networking Division'
    device   = '3C905C-TX Fast EtherLink for PC Management NIC'
    class    = network
    subclass = ethernet

Also checked IRQs: no sharing from what I could see.

Any more ideas / ways how to debug this / patches to try ?

-- 
Bjoern A. Zeeb				bzeeb at Zabbadoz dot NeT
56 69 73 69 74				http://www.zabbadoz.net/
Comment 3 nagy.attila 2003-01-28 09:59:58 UTC
Hello,

> xl0@pci0:15:0:  class=0x020000 card=0x100010b7 chip=0x920010b7 rev=0x74 hdr=0x00
>     vendor   = '3COM Corp, Networking Division'
>     device   = '3C905C-TX Fast EtherLink for PC Management NIC'
>     class    = network
>     subclass = ethernet
> Also checked IRQs: no sharing from what I could see.
Are you sure that this doesn't conflict with the motherboard's ATA
controller? I would try another PCI slot, or setting the IRQ to another
manually.

----------[ Free Software ISOs - http://www.fsn.hu/?f=download ]----------
Attila Nagy					e-mail: Attila.Nagy@fsn.hu
Free Software Network (FSN.HU)		  phone @work: +361 210 1415 (194)
						cell.: +3630 306 6758
Comment 4 Bjoern A. Zeeb 2003-01-28 21:01:25 UTC
On Tue, 28 Jan 2003, Attila Nagy wrote:

Hi,

> > Also checked IRQs: no sharing from what I could see.
> Are you sure that this doesn't conflict with the motherboard's ATA
> controller?

I think. APIC IRQ routing should do the rest if I understand this
correctly.


S.th. drives me mad...
booted in kernel.old (UP) and cvsuped (the relevant part of my cvsup
log is down under). The build the kernel exactly the same way with the
same KERNCONF as the days before (have a shell script for this).

What should I say. No more watchdog timeouts.


 Edit src/release/i386/drivers.conf
  Add delta 1.19 2003.01.27.17.54.49 ru
 Edit src/release/pc98/drivers.conf
  Add delta 1.9 2003.01.27.17.54.49 ru
 Edit src/sys/conf/NOTES
  Add delta 1.1123 2003.01.28.07.15.22 phk
 Edit src/sys/dev/sab/sab.c
  Add delta 1.11 2003.01.27.18.39.09 jake
 Edit src/sys/i386/i386/pmap.c
  Add delta 1.381 2003.01.28.03.01.35 alc
 Edit src/sys/kern/kern_sig.c
  Add delta 1.203 2003.01.27.23.01.03 peter
 Edit src/sys/kern/tty_tty.c
  Add delta 1.47 2003.01.27.16.54.17 phk
 Edit src/sys/netinet/ip_input.c
  Add delta 1.222 2003.01.28.03.39.39 silby


For now consider this pr closable though I still do not know the
reason.

I will cvsup and build the same kernel the next days/weeks and try
again. If it breaks again I will tell you.

-- 
Bjoern A. Zeeb				bzeeb at Zabbadoz dot NeT
56 69 73 69 74				http://www.zabbadoz.net/
Comment 5 silby freebsd_committer freebsd_triage 2003-02-01 05:10:28 UTC
Responsible Changed
From-To: freebsd-bugs->silby

If this proves to be a problem, I'll look into it.
Comment 6 Bjoern A. Zeeb 2003-02-02 21:45:51 UTC
On Fri, 31 Jan 2003, Mike Silbersack wrote:

Hi,

> Synopsis: 5.0R: SMP makes xl0 unusable
>
> Responsible-Changed-From-To: freebsd-bugs->silby
> Responsible-Changed-By: silby
> Responsible-Changed-When: Fri Jan 31 21:10:28 PST 2003
> Responsible-Changed-Why:
> If this proves to be a problem, I'll look into it.

I could not reproduce it after the cvsup previously showed in this PR.

I also compiled a HEAD kernel with ip_input.c-1.220 and pmap.c-1.380
but no watchdog timeouts.

Either you may close it or I will go back cvsup'ing special date='s
from HEAD from around the time the problem stopped, rebuild kernels
and try to nail it down.

I vote for closing. If it comes back I ever comes back I will open a
new pr.

PS: leave it open if SCHED_4BSD hadn't been default for 5.0 and HEAD
before
	options         SCHED_4BSD              #4BSD scheduler
went into GENERIC (if it had already been in the tree) ?

-- 
Bjoern A. Zeeb				bzeeb at Zabbadoz dot NeT
56 69 73 69 74				http://www.zabbadoz.net/
Comment 7 silby freebsd_committer freebsd_triage 2003-02-03 04:49:09 UTC
State Changed
From-To: open->closed

This problem resolved itself, somehow.
Comment 8 Mike Silbersack 2003-02-03 04:56:32 UTC
On Sun, 2 Feb 2003, Bjoern A. Zeeb wrote:

> I could not reproduce it after the cvsup previously showed in this PR.
>
> I also compiled a HEAD kernel with ip_input.c-1.220 and pmap.c-1.380
> but no watchdog timeouts.
>
> Either you may close it or I will go back cvsup'ing special date='s
> from HEAD from around the time the problem stopped, rebuild kernels
> and try to nail it down.
>
> I vote for closing. If it comes back I ever comes back I will open a
> new pr.

I'll go ahead and close it, tell me if the problem reappears.

> PS: leave it open if SCHED_4BSD hadn't been default for 5.0 and HEAD
> before
> 	options         SCHED_4BSD              #4BSD scheduler
> went into GENERIC (if it had already been in the tree) ?
>
> --
> Bjoern A. Zeeb				bzeeb at Zabbadoz dot NeT

SCHED_4BSD has always been the default, but it never appeared in GENERIC
before because there was nothing else to select.  (There is now an
alternate scheduler, which is why a selection must be made.)

Mike "Silby" Silbersack