If the serial device driver probes at load time and gets no response, it simply labels the device as an 8250 when it may be not present or broken. If the port is not configured in BIOS, the driver sends out a cryptic message that the irq in not in the bitmap of probed irqs, again not making it clear to many where the problem is. How-To-Repeat: De-configure a serial device in BIOS and reboot.
Responsible Changed From-To: freebsd-bugs->bmah I promised Kevin that I'd do this. If anyone else has a compulsion tor review and commit this, that's fine by me too.
State Changed From-To: open->analyzed Committed to -CURRENT, awaiting MFC to 4-STABLE (after 4.5-RELEASE).
Adding to the audit trail from pending/48674: Message-Id: <20030225170511.GA53886@intruder.bmah.org> Date: Tue, 25 Feb 2003 09:05:11 -0800 From: "Bruce A. Mah" <bmah@freebsd.org> To: oberman@es.net, bde@freebsd.org Cc: freebsd-gnats-submit@freebsd.org Subject: pending/48674: PR bin/33963 I'm trying to figure out what to do with this PR, which has been languishing for awhile and gives me momentary angst every week when it shows up in my PR reminders list. Some discussion with bde (shortly after the last PR followup) has made me realize that the root cause of these misleading sio(4) probe messages wasn't what I thought it was, and that I don't have the qualification or time ot deal with this. The alternatives seem to be: 1. Toss the PR back to freebsd-bugs. 2. Give it to bde, who's the de facto sio(4) maintainer. 3. Close it. Any thoughts? Thanks. Bruce.
Adding to audit trail, from pending/48677: Message-Id: <20030225172014.67F7D5D06@ptavv.es.net> Date: Tue, 25 Feb 2003 09:20:14 -0800 From: "Kevin Oberman" <oberman@es.net> To: "Bruce A. Mah" <bmah@freebsd.org> Cc: bde@freebsd.org, freebsd-gnats-submit@freebsd.org Subject: pending/48677: Re: PR bin/33963 > Date: Tue, 25 Feb 2003 09:05:11 -0800 > From: "Bruce A. Mah" <bmah@freebsd.org> > > I'm trying to figure out what to do with this PR, which has been > languishing for awhile and gives me momentary angst every week when it > shows up in my PR reminders list. > > Some discussion with bde (shortly after the last PR followup) has made > me realize that the root cause of these misleading sio(4) probe > messages wasn't what I thought it was, and that I don't have the > qualification or time ot deal with this. > > The alternatives seem to be: > > 1. Toss the PR back to freebsd-bugs. > > 2. Give it to bde, who's the de facto sio(4) maintainer. > > 3. Close it. imp vetoed the first section of the patch and I agree with his point in doing so. But I think the second half (8250 or not responding) is a legitimate bug fix as the driver currently implies that it actually could tell the device is an 8250 when the message is really only an indication that the driver did not receive a response to its query. It assumes that this means an 8250 as any flavor of 16550 or any decent clone will respond in some way. I can re-submit with only the single line change, but if bde "owns" the sio driver these days, I'll leave it up to him. R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634
Adding to audit trail, from pending/48680: Message-Id: <20030225184409.GA54634@intruder.bmah.org> Date: Tue, 25 Feb 2003 10:44:09 -0800 From: "Bruce A. Mah" <bmah@freebsd.org> To: Kevin Oberman <oberman@es.net> Cc: "Bruce A. Mah" <bmah@freebsd.org>, bde@freebsd.org, freebsd-gnats-submit@freebsd.org Subject: pending/48680: Re: PR bin/33963 If memory serves me right, Kevin Oberman wrote: > imp vetoed the first section of the patch and I agree with his point > in doing so.=20 Heh. Actually that was the part that got committed to HEAD. I made an offer to bde to back this out but he didn't take me up on it (yet)...the offer still stands. > But I think the second half (8250 or not responding) is a > legitimate bug fix as the driver currently implies that it actually > could tell the device is an 8250 when the message is really only an > indication that the driver did not receive a response to its query. It > assumes that this means an 8250 as any flavor of 16550 or any decent > clone will respond in some way. What I was recall being told is that the underlying cause was that the driver shouldn't have been trying to probe the device anyways. (A separate, but related problem.) > I can re-submit with only the single line change, but if bde "owns" > the sio driver these days, I'll leave it up to him.=20 Yeah. I should have done a better job handling this PR. Bruce.
Adding to audit trail, from pending/48774: Message-Id: <20030228220111.Y22326-100000@gamplex.bde.org> Date: Fri, 28 Feb 2003 22:56:27 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: "Bruce A. Mah" <bmah@freebsd.org> Cc: Kevin Oberman <oberman@es.net>, <bde@freebsd.org>, <freebsd-gnats-submit@freebsd.org> Subject: pending/48774: Re: PR bin/33963 On Tue, 25 Feb 2003, Bruce A. Mah wrote: > If memory serves me right, Kevin Oberman wrote: > > > imp vetoed the first section of the patch and I agree with his point > > in doing so. > > Heh. Actually that was the part that got committed to HEAD. I made > an offer to bde to back this out but he didn't take me up on it > (yet)...the offer still stands. I think both parts got committed to HEAD. I mostly agreed with the first part but not with the second part. All I've done is fix the formatting and improve the English of the first part in my version: %%% Index: sio.c =================================================================== RCS file: /home/ncvs/src/sys/dev/sio/sio.c,v retrieving revision 1.382 diff -u -2 -r1.382 sio.c --- sio.c 11 Oct 2002 20:22:20 -0000 1.382 +++ sio.c 23 Feb 2003 13:59:40 -0000 @@ -766,6 +783,5 @@ "sio%d: configured irq %ld not in bitmap of probed irqs %#x\n", device_get_unit(dev), xirq, irqs); - printf( - "sio%d: port may not be enabled\n", + printf("sio%d: port might not be enabled\n", device_get_unit(dev)); } %%% I think it's worth saying something here to decrypt/disalarm the "not in bitmap of probed irqs" message. The "port might not be enabled" message is just to clarify the previous message, but it's hard to write paragraphs in boot message so it looks more like a separate message and thus may further obscure things. Some more rewording might help, and it wouldn't hurt to put this in the man page. I rather hoped that you (bmah) would handle the doc aspects of this. This code is only reached in the !noprobe case, which should be only in legacy cases where it should not be suprising to get misconfigured irqs, but I think pnp and/or acpi are now too successful at finding devices (even when you have disabled them in static hints?) so we now get more half-working probes. So the main problems here are: - the driver doesn't understand that some hints are better than others so it shouldn't attempt to check them. - various problems if multiple sio devices share an interrupt (and interrupts are ISAish (edge sensitive, etc.) so they can't be shared at runtime. The message is the first hint of such problems. > > But I think the second half (8250 or not responding) is a > > legitimate bug fix as the driver currently implies that it actually > > could tell the device is an 8250 when the message is really only an > > indication that the driver did not receive a response to its query. It > > assumes that this means an 8250 as any flavor of 16550 or any decent > > clone will respond in some way. > > What I was recall being told is that the underlying cause was that the > driver shouldn't have been trying to probe the device anyways. (A > separate, but related problem.) It is in the attach code actually. I don't understand how the attach routine can be called on completely "not responding" devices. The noprobe case makes the probe sloppy but should only be used for pccards and certain broken cases where another layer hopefully knows what it's doing. Technically, I think we can do the fifo test first to distinguish the >= 16550 UARTs. Then the scratch register test would only be needed to distinguish between ancient UARTs. NetBSD's com.c doesn't bother with it. > > I can re-submit with only the single line change, but if bde "owns" > > the sio driver these days, I'll leave it up to him. > > Yeah. I should have done a better job handling this PR. Me too, sigh. Now I mostly don't work run anything near a current -current, so find it hard to test things for, but I have more interest in making -stable work right. This doesn't extend to large configuration changes though. (I've grown to detest large init/config code.) Bruce
Adding to audit trail, from pending/48792: Message-Id: <20030228211250.84FCA5D04@ptavv.es.net> Date: Fri, 28 Feb 2003 13:12:50 -0800 From: "Kevin Oberman" <oberman@es.net> To: Bruce Evans <bde@zeta.org.au> Cc: "Bruce A. Mah" <bmah@freebsd.org>, bde@freebsd.org, freebsd-gnats-submit@freebsd.org Subject: pending/48792: Re: PR bin/33963 > Date: Fri, 28 Feb 2003 22:56:27 +1100 (EST) > From: Bruce Evans <bde@zeta.org.au> > > On Tue, 25 Feb 2003, Bruce A. Mah wrote: > > > If memory serves me right, Kevin Oberman wrote: > > > > > imp vetoed the first section of the patch and I agree with his point > > > in doing so. > > > > Heh. Actually that was the part that got committed to HEAD. I made > > an offer to bde to back this out but he didn't take me up on it > > (yet)...the offer still stands. > > I think both parts got committed to HEAD. I mostly agreed with the > first part but not with the second part. All I've done is fix the > formatting and improve the English of the first part in my version: > > %%% > Index: sio.c > =================================================================== > RCS file: /home/ncvs/src/sys/dev/sio/sio.c,v > retrieving revision 1.382 > diff -u -2 -r1.382 sio.c > --- sio.c 11 Oct 2002 20:22:20 -0000 1.382 > +++ sio.c 23 Feb 2003 13:59:40 -0000 > @@ -766,6 +783,5 @@ > "sio%d: configured irq %ld not in bitmap of probed irqs %#x\n", > device_get_unit(dev), xirq, irqs); > - printf( > - "sio%d: port may not be enabled\n", > + printf("sio%d: port might not be enabled\n", > device_get_unit(dev)); > } > %%% I like this a LOT better than the current verbiage and better than mine. Another way to make it clearer that the second line is an extension of the first would be: printf(" port might not be enabled\n", device_get_unit(dev)); In the end the appearance of dmesg will always be a bit ugly and oft times a bit confusing. Extra line are evil as they leave less data on the 24 (or whatever) lines available on the console. (Perhaps it's time to think about increasing the default value of SC_HISTORY_SIZE.) > This code is only reached in the !noprobe case, which should be only in > legacy cases where it should not be suprising to get misconfigured irqs, > but I think pnp and/or acpi are now too successful at finding devices > (even when you have disabled them in static hints?) so we now get more > half-working probes. So the main problems here are: > - the driver doesn't understand that some hints are better than others > so it shouldn't attempt to check them. > - various problems if multiple sio devices share an interrupt (and > interrupts are ISAish (edge sensitive, etc.) so they can't be shared > at runtime. The message is the first hint of such problems. ACPI does not run properly on a great many laptops (both if my IBMs included) and even PNP can be problematic, although now that we have IRQ sharing for PCCARD, it's generally not evil. But there are regular messages on questions asking about this message, so people are still seeing it a lot in V4. > > > But I think the second half (8250 or not responding) is a > > > legitimate bug fix as the driver currently implies that it actually > > > could tell the device is an 8250 when the message is really only an > > > indication that the driver did not receive a response to its query. It > > > assumes that this means an 8250 as any flavor of 16550 or any decent > > > clone will respond in some way. > > > > What I was recall being told is that the underlying cause was that the > > driver shouldn't have been trying to probe the device anyways. (A > > separate, but related problem.) > > It is in the attach code actually. I don't understand how the attach > routine can be called on completely "not responding" devices. The > noprobe case makes the probe sloppy but should only be used for pccards > and certain broken cases where another layer hopefully knows what it's > doing. Technically, I think we can do the fifo test first to distinguish > the >= 16550 UARTs. Then the scratch register test would only be needed > to distinguish between ancient UARTs. NetBSD's com.c doesn't bother > with it. I have a specific case. I maintain (or try to, as it won't work in V5) the package to support the mWave modem on some older IBM ThinkPads. Since the 16550A is emulated in software, there are cases where initialization problems result in the FIFO probe failing and the device being identified as an 8250. I know just what this means, but it took me a while to figure it out and others have just given up, deciding that the port does not work. R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634
Responsible Changed From-To: bmah->bde I give up on this PR. It's been sitting on my pile for over a year and a half, my reasoning about the solution to this PR was totally wrong, and I'm not likely to come up with the right solution anytime in this lifetime. Giving this PR to the sio(4) maintainer to figure out what should be done about this. My apologies.
This was fixed many months ago after consultation with Bruce Mah, Warner Losh and Bruce. Please close it. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634
State Changed From-To: analyzed->analyzed sime more Misprobing as an 8250 may be caused by misconfiguring the serial console flag 0x10. This flag is not just a hint. Bad things happen if it is used on for a device whose hardware doesn't exist. If the device with nonexistent hardware gets used as a serial console, then the system usually just hangs early. Otherwise, sioprobe() trusts the configuration too much and forces the probe to succeed after printing some diagnostics. Probing for the UART type is delayed until sioattach(). sioattach() doesn't probe the hardware except for this and there is no failure case for this, so nonexistent hardware is usually considered to be an 8250 since it has the same number of special features (none). If the device with nonexistent hardware is opened from userland, the system often hangs then.
State Changed From-To: analyzed->closed The submitted patch has been committed and the submitter has requested to close this PR.