Bug 164694 - [ata] Regression in 3726 port multiplier support in 9.0 [regression]
Summary: [ata] Regression in 3726 port multiplier support in 9.0 [regression]
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2012-02-01 19:20 UTC by Allen Belletti
Modified: 2023-12-31 02:59 UTC (History)
0 users

See Also:


Attachments
verbose (97.10 KB, text/plain)
2012-02-01 19:49 UTC, Allen Belletti
no flags Details
lastboot82 (42.72 KB, text/plain)
2012-02-01 19:49 UTC, Allen Belletti
no flags Details
verbose-full.bz2 (14.45 KB, application/x-bzip)
2012-02-01 20:52 UTC, Allen Belletti
no flags Details
verbose-full.bz2 (14.45 KB, application/x-bzip)
2012-02-01 23:02 UTC, Allen Belletti
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Allen Belletti 2012-02-01 19:20:09 UTC
I've recently upgraded an amd64 disk-to-disk backup server from 8.2 to 9.0 and run into a serious problem.  We have three SiI 3132 dual-port eSATA cards. Each of those six ports (two per card) connects to an SiI 3726 (possibly 4726, but detected as 3726) port multiplier, which in turn drives four disks. This worked brilliantly on 8.x with a total of 24 disks running reliably for quite some time.  Upon upgrading to 9.0, the controllers are detected by the Siis driver, and it is aware that the port multipliers exist, but no disks are seen.  I've also booted the 9.0 memstick image to verify that the same problem occurs, which it does.

How-To-Repeat: In our case, this problem occurs on every boot.  The affected hardware is a SuperMicro board (model unknown but I'll check if this is needed) with a pair of Xeon E5504 CPUs.  eSATA cards are SiI 3132, connected to KW-5556 port multipliers (the internal version.)  Disks are connected four to a multiplier and are Samsung 750GB SATA units.
Comment 1 Allen Belletti 2012-02-01 19:49:45 UTC
Here are two log files which may be of use to investigate this problem.  
The first, "verbose", is taken from dmesg after doing a verbose boot of 
the 9.0 GENERIC kernel.  Unfortunately the start is cut off (presumably 
due to kernel buffer size) but presumably that is fixable if needed.  
The second log file, "lastboot82", is a (non-verbose, since I no longer 
have a way to boot it) log from the last time I booted with 8.2 GENERIC.

Note that this machine has three types of disk controllers, so the log 
is quite lengthy.  There are six devices on the motherboard (ahci) and 
many devices connected via the mpt driver.  Those are all working fine.

Thanks,
Allen
Comment 2 Allen Belletti 2012-02-01 20:52:57 UTC
Progress!  Per the discussion at 
http://forums.freebsd.org/showthread.php?t=29429, I've discovered that 
the siis driver does not have an interrupt properly registered, per 
vmstat -i:

interrupt                          total       rate
irq21: hpet0                     1745211       1837
irq23: uhci0 ehci0                   438          0
irq275: cxgbc0                      1959          2
irq276: cxgbc0                        10          0
irq277: cxgbc0                       819          0
irq278: cxgbc0                        29          0
irq279: cxgbc0                        43          0
irq280: cxgbc0                      1757          1
irq281: cxgbc0                        65          0
irq282: cxgbc0                        23          0
irq283: ahci0                      16362         17
irq284: mpt0                       17409         18
irq285: mpt1                        2352          2
irq286: mpt2                       13714         14
Total                            1800191       1894

This would seem to be the root of the problem, rather than anything to 
do with the port multipliers.

Also, please find attached a full verbose boot log, with kern.msgbufsize 
raised.
Comment 3 Allen Belletti 2012-02-01 23:02:14 UTC
One more piece of information.  It appears that the irqs are being 
assigned as shown in dmesg, but no interrupts are being generated or 
received.  Perhaps the 3132s aren't being configured properly to 
generate them?

# vmstat -ia | grep -A1 siis
irq26: siis0                           0          0
stray irq26                            0          0
--
irq30: siis1                           0          0
stray irq30                            0          0
--
irq48: siis2                           0          0
stray irq48                            0          0
Comment 4 Allen Belletti 2012-02-02 18:08:13 UTC
I've reached a dead end but come up with a few more bits of information. 
  It's definitely some sort of irq setup/handling problem.  I 
experimented with setting hint.siis.X.msi=1 for these cards. 
Surprisingly, they almost seem to work.  They're able to immediately 
detect the pmp device and recognize the four disks on the other side of 
it.  However, after a few seconds of I/O, it'll get stuck and time out. 
  Presumably MSI just doesn't work on these cards which is why it 
defaults to disabled (I've seen hints of problems like that in my 
search.)  I did also try forcing hint.siisch.X.sata_rev=1 but it didn't 
seem to improve the situation, which makes sense if it's fundamentally 
an interrupt handling problem.

It's unlikely that I can get any further on my own, but it seems likely 
that some sort of IRQ handling problem was introduced between 8.2 and 9.0.

Thanks,
Allen
Comment 5 Allen Belletti 2012-02-02 20:59:05 UTC
One final thing; I forgot that I still had a mirror of my 8.2 
installation.  I was able to boot that and return to my original 
configuration.  Indeed, the problem with the siis interfaces goes away 
under 8.2, so it's 100% a regression and not a hardware issue.

Also, for reference, the system runs a SuperMicro X8DTH board.
Comment 6 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:01:36 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped