Bug 281177 - 13.2 works, 13.3 and 14.x installers panic on older qlogic isp card
Summary: 13.2 works, 13.3 and 14.x installers panic on older qlogic isp card
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.3-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash, regression
Depends on:
Blocks:
 
Reported: 2024-08-31 23:01 UTC by cheeky.m
Modified: 2024-11-25 02:44 UTC (History)
8 users (show)

See Also:


Attachments
patch to disable FTL handling on 24xx (478 bytes, patch)
2024-09-11 07:12 UTC, Joerg Pulz
no flags Details | Diff
14.2-BETA1 memstick isp panic (4.94 KB, image/png)
2024-11-08 17:07 UTC, Vladimir Druzenko
no flags Details
dmesg from boot 2 (no panic) (26.23 KB, text/plain)
2024-11-08 17:13 UTC, Vladimir Druzenko
no flags Details
14.2-BETA1 memstick isp fails before panic (4.27 KB, image/png)
2024-11-08 17:49 UTC, Vladimir Druzenko
no flags Details
dmesg from boot 1 debug (no panic) (88.01 KB, text/plain)
2024-11-08 18:08 UTC, Vladimir Druzenko
no flags Details
dmesg from boot 1 debug=0x10f (no panic) (24.96 KB, text/plain)
2024-11-08 22:43 UTC, Vladimir Druzenko
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description cheeky.m 2024-08-31 23:01:10 UTC
The 13.3 and 14.x install images panic when booting, then reboot on 13.3 and 14.x releases.  13.2 works.   We can make 13.3 and 14.x work by breaking to the loader on boot, unloading the kernel, loading it again, and loading ispfw.ko, then booting.  Adding 'ispfw_load="YES"' to /boot/loader.conf keeps the system working after installation.

https://mail-archive.freebsd.org/cgi/getmsg.cgi?fetch=75207+0+archive/2024/freebsd-current/20240722.freebsd-current


https://mail-archive.freebsd.org/cgi/getmsg.cgi?fetch=371870+0+archive/2024/freebsd-current/20240805.freebsd-current

sorry for html.



an older system with two qlogic isp cards, isp0 and isp1, nothing attached to them, and it panics on boot with 13.3 and 14.x images.
13.2 works

Autoloading module: ichsmb
ichsmb0: <Intel 631xESB/6321ESB (ESB2) SMBus controller> port 0x300-0x31f irq 22 at device 31.3 on pci0
smbus0: <System Management Bus> on ichsmb0
isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x9c00-0x9cff mem 0xfcbfc000-0xfcbfffff irq 18 at device 0.0 on pci9
isp1: FLT[DEF]: Invalid length=0xffff(65535)
panic: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe0127d22000
cpuid = 6
time = 1721060956
KDB: stack backtrace:
Uptime: 17s
Dumping 936 out of 24532 MB:..2%..11%..21%..31%..42%..52%..62%..71%..81%..91%
------------------------------------------------------------------------







13.2:

grep ^isp 13.2-dmesg

isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x6c00-0x6cff mem 0xfc4fc000-0xfc4fffff irq 16 at device 0.0 on pci3
isp0: Mailbox Command (0x8) Timeout (5000000us) (isp_reset:439)
isp0: Mailbox Command 'ABOUT FIRMWARE' failed (TIMEOUT)
isp0: isp_reinit: cannot reset card
isp0: See the ispfw(4) man page on how to load known good firmware at boot time
isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x9c00-0x9cff mem 0xfcbfc000-0xfcbfffff irq 18 at device 0.0 on pci9
isp0: Mailbox Command (0x8) Timeout (5000000us) (isp_reset:439)
isp0: Mailbox Command 'ABOUT FIRMWARE' failed (TIMEOUT)
isp0: isp_reinit: cannot reset card
isp0: See the ispfw(4) man page on how to load known good firmware at boot time
isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x6c00-0x6cff mem 0xfc4fc000-0xfc4fffff irq 16 at device 0.0 on pci3
isp_2400: could not load firmware image, error 6
isp0: Mailbox Command (0x8) Timeout (5000000us) (isp_reset:439)
isp0: Mailbox Command 'ABOUT FIRMWARE' failed (TIMEOUT)
isp0: isp_reinit: cannot reset card
isp0: See the ispfw(4) man page on how to load known good firmware at boot time
isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x9c00-0x9cff mem 0xfcbfc000-0xfcbfffff irq 18 at device 0.0 on pci9
isp_2400: could not load firmware image, error 6
isp0: Mailbox Command (0x8) Timeout (5000000us) (isp_reset:439)
isp0: Mailbox Command 'ABOUT FIRMWARE' failed (TIMEOUT)
isp0: isp_reinit: cannot reset card
isp0: See the ispfw(4) man page on how to load known good firmware at boot time
isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x6c00-0x6cff mem 0xfc4fc000-0xfc4fffff irq 16 at device 0.0 on pci3
isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x9c00-0x9cff mem 0xfcbfc000-0xfcbfffff irq 18 at device 0.0 on pci9







working 13.3 dmesg with ispfw loaded in loader.conf

grep ^isp 13.3-dmesg

isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x6c00-0x6cff mem 0xfc4fc000-0xfc4fffff irq 16 at device 0.0 on pci3
isp0: FLT[DEF]: Invalid length=0xffff(65535)
isp0: invalid NVRAM header (55 aa 56)
isp0: invalid NVRAM header (55 aa 56)
isp0: bad frame length (0) from NVRAM - using 1024
isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x9c00-0x9cff mem 0xfcbfc000-0xfcbfffff irq 18 at device 0.0 on pci9
isp1: FLT[DEF]: Invalid length=0xffff(65535)
isp1: invalid NVRAM header (55 aa 56)
isp1: invalid NVRAM header (55 aa 56)
isp1: bad frame length (0) from NVRAM - using 1024











maybe 279381 is related.
Comment 1 Warner Losh freebsd_committer freebsd_triage 2024-09-06 18:04:46 UTC
Added cc for mav@ to take a look or provide feedback on next steps.
Comment 2 Vladimir Druzenko freebsd_committer freebsd_triage 2024-09-06 20:38:00 UTC
Work fine for me on 2 hosts with 13.2, 13.3 and now with 14.1:

(Host 1)
isp0: <Qlogic ISP 2532 PCI FC-AL Adapter> port 0x2200-0x22ff mem 0x97a00000-0x97a03fff irq 24 at device 0.0 on pci3
isp1: <Qlogic ISP 2532 PCI FC-AL Adapter> port 0x2000-0x20ff mem 0x97a04000-0x97a07fff irq 34 at device 0.1 on pci3
isp0@pci0:21:0:0:       class=0x0c0400 rev=0x02 hdr=0x00 vendor=0x1077 device=0x2532 subvendor=0x1077 subdevice=0x015d
    vendor     = 'QLogic Corp.'
    device     = 'ISP2532-based 8Gb Fibre Channel to PCI Express HBA'
    class      = serial bus
    subclass   = Fibre Channel
isp1@pci0:21:0:1:       class=0x0c0400 rev=0x02 hdr=0x00 vendor=0x1077 device=0x2532 subvendor=0x1077 subdevice=0x015d
    vendor     = 'QLogic Corp.'
    device     = 'ISP2532-based 8Gb Fibre Channel to PCI Express HBA'
    class      = serial bus
    subclass   = Fibre Channel

(Host 2)
isp0: <Qlogic ISP 2532 PCI FC-AL Adapter> port 0x2c00-0x2cff mem 0xbc2fc000-0xbc2fffff irq 26 at device 0.0 on pci1
isp1: <Qlogic ISP 2532 PCI FC-AL Adapter> port 0x2e00-0x2eff mem 0xbc2f8000-0xbc2fbfff irq 28 at device 0.1 on pci1
isp0@pci0:12:0:0:       class=0x0c0400 rev=0x02 hdr=0x00 vendor=0x1077 device=0x2532 subvendor=0x1077 subdevice=0x015d
    vendor     = 'QLogic Corp.'
    device     = 'ISP2532-based 8Gb Fibre Channel to PCI Express HBA'
    class      = serial bus
    subclass   = Fibre Channel
isp1@pci0:12:0:1:       class=0x0c0400 rev=0x02 hdr=0x00 vendor=0x1077 device=0x2532 subvendor=0x1077 subdevice=0x015d
    vendor     = 'QLogic Corp.'
    device     = 'ISP2532-based 8Gb Fibre Channel to PCI Express HBA'
    class      = serial bus
    subclass   = Fibre Channel
Comment 3 Joerg Pulz 2024-09-11 07:12:01 UTC
Created attachment 253494 [details]
patch to disable FTL handling on 24xx

Looks like a problem reading the FLT of your 24xx based controller.
Can't say if this is specific to your controller or to all 24xx based controllers - can't test as this is the only controller I don't have.

Please try the attached patch - disables FLT handling for 24xx based controllers - and let me know how this works.
Comment 4 Yuri Pankov freebsd_committer freebsd_triage 2024-09-11 08:12:05 UTC
I have found some 2432 cards lying around, and I don't see a panic on boot without having ispfw.ko loaded, on 15-CURRENT though:

isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x7000-0x70ff mem 0xb5e40000-0xb5e43fff at device 0.0 numa-domain 0 on pci11
isp0: FLT[DEF]: Invalid length=0xffff(65535)
isp0: invalid NVRAM header (55 aa 59)
isp0: invalid NVRAM header (55 aa 59)
isp0: bad frame length (0) from NVRAM - using 1024
isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x6000-0x60ff mem 0xb5d40000-0xb5d43fff at device 0.0 numa-domain 0 on pci12
isp1: FLT[DEF]: Invalid length=0xffff(65535)
isp1: invalid NVRAM header (55 aa 59)
isp1: invalid NVRAM header (55 aa 59)
isp1: bad frame length (0) from NVRAM - using 1024
Comment 5 Joerg Pulz 2024-09-11 08:54:27 UTC
Yuri,
thanks for your reply.
Would it be possible for you to test my patch, that would be really helpful.

Thanks
Joerg
Comment 6 Yuri Pankov freebsd_committer freebsd_triage 2024-09-11 11:53:31 UTC
(In reply to Joerg Pulz from comment #5)
Sure! With the patch output is a bit different:

isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x7000-0x70ff mem 0xb5e40000-0xb5e43fff at device 0.0 numa-domain 0 on pci11
isp0: invalid NVRAM header (0 0 0)
isp0: invalid NVRAM header (0 0 0)
isp0: bad frame length (0) from NVRAM - using 1024
isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x6000-0x60ff mem 0xb5d40000-0xb5d43fff at device 0.0 numa-domain 0 on pci12
isp1: invalid NVRAM header (0 0 0)
isp1: invalid NVRAM header (0 0 0)
isp1: bad frame length (0) from NVRAM - using 1024

Note that I can't test if the cards work at the moment (missing the cable) and I have never used them with FreeBSD previously, so can't tell if something changed.
Comment 7 Joerg Pulz 2024-09-11 17:58:31 UTC
(In reply to Yuri Pankov from comment #6)

Thanks for testing.
Looks like reading from the card (FLT or NVRAM) is somehow broken.
Without having one of this cards it's hard to fix this.

Yuri, you said you have found some of those cards.
Are you using those or is it possible to get one from you?
I would pay for the card and shipping costs - if you attend EuroBSDcon next week, we could do the handover there.

Joerg
Comment 8 Yuri Pankov freebsd_committer freebsd_triage 2024-09-12 00:06:27 UTC
I was able to reproduce the panic without Joerg's diff removing isp from kernel config and kldload'ing it after boot, hopefully this is a bit more readable:

isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x7000-0x70ff mem 0xb5e40000-0xb5e43fff at device 0.0 numa-domain 0 on pci11
isp0: FLT[DEF]: Invalid length=0xffff(65535)
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex isp (isp) r = 0 (0xfffff80004af5800) locked @ /home/yuri/ws/isp/sys/dev/isp/isp_pci.c:1096
stack backtrace:

Fatal trap 12: page fault while in kernel mode
cpuid = 17; apic id = 09
fault virtual address   = 0xfffffe01f793f000
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff832f59f2
stack pointer           = 0x28:0xfffffe01f793d690
frame pointer           = 0x28:0xfffffe01f794d6e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 196 (kldload)
rdi: fffff80004af5800 rsi: 0000000000000004 rdx: fffff800b5e40004
rcx: 0000c0c97e2e00dd  r8: 0000c0c97e2d707e  r9: fffff800081a5740
rax: 00000000ffffffff rbx: fffff80004af5800 rbp: fffffe01f794d6e0
r10: 000000000000001d r11: 000000000000001d r12: 0000000000007530
r13: 000000000000065c r14: 000000007ff11a5e r15: fffffe01f793f000
trap number             = 12
panic: page fault
cpuid = 17
time = 1726098938
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01f793d360
vpanic() at vpanic+0x13f/frame 0xfffffe01f793d490
panic() at panic+0x43/frame 0xfffffe01f793d4f0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe01f793d550
trap_pfault() at trap_pfault+0xa0/frame 0xfffffe01f793d5c0
calltrap() at calltrap+0x8/frame 0xfffffe01f793d5c0
--- trap 0xc, rip = 0xffffffff832f59f2, rsp = 0xfffffe01f793d690, rbp = 0xfffffe01f794d6e0 ---
isp_read_flt_2xxx() at isp_read_flt_2xxx+0x562/frame 0xfffffe01f794d6e0
isp_reset() at isp_reset+0x646/frame 0xfffffe01f794d7e0
isp_reinit() at isp_reinit+0xea/frame 0xfffffe01f794d890
isp_pci_attach() at isp_pci_attach+0xff4/frame 0xfffffe01f794d930
device_attach() at device_attach+0x3aa/frame 0xfffffe01f794d970
device_probe_and_attach() at device_probe_and_attach+0x70/frame 0xfffffe01f794d9a0
pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe01f794d9e0
devclass_driver_added() at devclass_driver_added+0x29/frame 0xfffffe01f794da10
devclass_add_driver() at devclass_add_driver+0x138/frame 0xfffffe01f794da50
module_register_init() at module_register_init+0xb0/frame 0xfffffe01f794da80
linker_load_module() at linker_load_module+0xc23/frame 0xfffffe01f794dd80
kern_kldload() at kern_kldload+0x16e/frame 0xfffffe01f794ddd0
sys_kldload() at sys_kldload+0x5c/frame 0xfffffe01f794de00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01f794df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01f794df30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x3c38ff72f7da, rsp = 0x3c38fcd188b8, rbp = 0x3c38fcd18e30 ---
Uptime: 42s
Dumping 2546 out of 65179 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Comment 9 Joerg Pulz 2024-09-12 08:55:28 UTC
(In reply to Yuri Pankov from comment #8)

Can try again with hint.isp.0.debug="0x3f" and send me the output?

Please think about my question about getting one of those 2432 cards from you.

Joerg
Comment 10 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-06 20:40:02 UTC
Possible related.
Just boot from FreeBSD-14.2-BETA1-amd64-memstick.img on IBM x3550 M4 v2 and got a lot of errors on isp0 and isp1 and then panic.

Boot, install from FreeBSD-14.1-RELEASE-amd64-memstick.img and then update to last 14.1-p6 was without errors.

But I have Qlogic 8Gb IBM QLE2562 Dual FC HBA:
isp0@pci0:12:0:0:       class=0x0c0400 rev=0x02 hdr=0x00 vendor=0x1077 device=0x2532 subvendor=0x1077 subdevice=0x015d
    vendor     = 'QLogic Corp.'
    device     = 'ISP2532-based 8Gb Fibre Channel to PCI Express HBA'
    class      = serial bus
    subclass   = Fibre Channel
isp1@pci0:12:0:1:       class=0x0c0400 rev=0x02 hdr=0x00 vendor=0x1077 device=0x2532 subvendor=0x1077 subdevice=0x015d
    vendor     = 'QLogic Corp.'
    device     = 'ISP2532-based 8Gb Fibre Channel to PCI Express HBA'
    class      = serial bus
    subclass   = Fibre Channel
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2024-11-07 14:25:04 UTC
(In reply to Vladimir Druzenko from comment #10)
If this is a regression relative to 14.1, then the problem is likely different from the original bug report.  It's possibly a regression from one of these commits:

https://cgit.freebsd.org/src/commit/?id=ff9458b30fc3b8748f65eca792be7b6e64c639bf
https://cgit.freebsd.org/src/commit/?id=44ca5d40f36704ffa2fa55f8f1403c824400b3ba

Ken, do you have any idea what might be going on here?

I also wonder if the original bug is still reproducible after those two commits?  That is, do we still panic on boot on the latest stable/13 or stable/14?
Comment 12 Kenneth D. Merry freebsd_committer freebsd_triage 2024-11-07 16:20:21 UTC
(In reply to Mark Johnston from comment #11)

The fixes I made (that are in 14.2) shouldn't change the default behavior with any cards.  You have to set a loader tunable to do that.  Joerg's original changes also didn't change the driver behavior in that it still loads firmware from ispfw (if present) by default on 8Gb and older cards.  Joerg did update the 8Gb 25XX firmware, but that firmware change happened before 14.1.

I think to diagnose Vladimir's problem we'll need dmesg output, stack trace, etc. to narrow it down.
Comment 13 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-07 16:47:05 UTC
(In reply to Kenneth D. Merry from comment #12)
I'll try to test boot from 14.2-BETA1 tomorrow.
Comment 14 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-08 17:07:00 UTC
Created attachment 255031 [details]
14.2-BETA1 memstick isp panic

1. Cold boot 14.2-BETA1 - panic.
2. Warm boot 14.2-BETA1 after boot 14.1 - no panic.
I can share dmesg from boot 2.
Comment 15 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-08 17:13:35 UTC
Created attachment 255032 [details]
dmesg from boot 2 (no panic)
Comment 16 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-08 17:49:17 UTC
Created attachment 255034 [details]
14.2-BETA1 memstick isp fails before panic
Comment 17 Kenneth D. Merry freebsd_committer freebsd_triage 2024-11-08 17:51:31 UTC
(In reply to Vladimir Druzenko from comment #16)

Ok, at the loader prompt, can you do:

load ispfw
set hint.isp.0.debug="0x13f"
set hint.isp.1.debug="0x13f"

And then we'll see what output we get.

I suspect the issue may just be that you've got old firmware on the card.
Comment 18 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-08 18:08:27 UTC
Created attachment 255035 [details]
dmesg from boot 1 debug (no panic)

(In reply to Kenneth D. Merry from comment #17)
Boot without panic.
Comment 19 Kenneth D. Merry freebsd_committer freebsd_triage 2024-11-08 18:12:56 UTC
(In reply to Vladimir Druzenko from comment #18)

Ok, just to make sure we know what's going on (too many debugging messages with all of that turned on), how about trying a cold boot and doing this:

load ispfw
set hint.isp.0.debug="0x10f"
set hint.isp.1.debug="0x10f"

And then another cold boot without loading ispfw and just the debugging messages:

set hint.isp.0.debug="0x10f"
set hint.isp.1.debug="0x10f"

I think you'll panic in the second case.
Comment 20 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-08 22:43:42 UTC
Created attachment 255041 [details]
dmesg from boot 1 debug=0x10f (no panic)

Without load ispfw cold boot always panic.
Comment 21 Colin Percival freebsd_committer freebsd_triage 2024-11-10 21:43:20 UTC
Is anyone actively working on this?  Time is rapidly running out if this is going to get fixed in 14.2-RELEASE.
Comment 22 Kenneth D. Merry freebsd_committer freebsd_triage 2024-11-11 03:48:04 UTC
It looks like the solution in both cases (Yuri and Vladimir) is to load ispfw.

You could just put that in the loader.conf by default to fix it.

In almost every case it is better to run with the ispfw firmware than with whatever is flashed on the card.
Comment 23 Colin Percival freebsd_committer freebsd_triage 2024-11-11 03:51:50 UTC
I don't think we want to make the installer load ispfw for everyone; can't it be loaded on demand?
Comment 24 Warner Losh freebsd_committer freebsd_triage 2024-11-11 04:27:39 UTC
The problem is all cards have the firmware, so you'd always have to load on demand. Or are you suggesting just for isp systems? That's not a terrible idea now that we have binary firmware loading in addition to the old.ko based scheme. But I'd wager it is too late for the release, despite this regression. We should note the workaround in the release notes and also task the loader folks to export pci devices to lua so workarounds like this could be scripted in the future.
Comment 25 Colin Percival freebsd_committer freebsd_triage 2024-11-11 05:09:16 UTC
Yeah I was wondering if the isp driver could be taught to load its firmware like some other drivers do.  But you're right, there really isn't time to do that before 14.2.

I'm tempted to remove isp from GENERIC since it's nonfunctional as it is, but I guess we can't do that for 14.2 since there are going to be 14.0 and 14.1 systems out there which have ispfw.ko listed in their loader.conf and we don't want to break them when they upgrade.

I guess this will be a "not going to be fixed in 14.2" issue.
Comment 26 Warner Losh freebsd_committer freebsd_triage 2024-11-11 05:21:55 UTC
Also don't mistake some cards do this with all isp cards are broken. We'd likely have a lot more reports than we've seen so far if that were the case
Comment 27 Joerg Pulz 2024-11-11 11:45:12 UTC
Some background on isp(4) default firmware handling:

Every card has firmware in flash and we have firmware for every supported isp(4) card generation in ispfw(4).

The driver reads the FLT (flash layout table) to get the flash address of the firmware stored on the card.
The firmware header is loaded from flash at this address to get the firmware version.
ispfw(4) is loaded and the firmware header is parsed to get the ispfw(4) firmware version.
After comparison of the available versions the newer one is loaded into the RAM of the card and the card is instructed to execute the loaded firmware.

There are some hints (hint.isp.N.fwload_disable and hint.isp.N.fwload_force) to change the above behavior.

On a running system three sysctl(8) values provide firmware version information:
dev.isp.N.fw_version_flash
  The readonly flash firmware version value in the active region of the controller.

dev.isp.N.fw_version_ispfw
  The readonly firmware version value provided by ispfw(4).

dev.isp.N.fw_version_run
  The readonly firmware version value currently executed on the controller.

As the behavior hasn't changed between 14.1 and 14.2, I wonder what happens here.

It may be that there are some old cards with broken flash and reading the FLT fails or gives bad data.
If that's the case than probably reflashing the card may be of help.
But if that's the case it should happen on 14.1 systems too.

About disabling isp(4) in GENERIC:
Actually we are talking about two/three people with the 24xx 4Gbit/s) and 25xx (8Gbit/s) cards that seem to be problematic. The latest firmware for the 25xx cards is dated 2019.
We seem to have no issues with the 26xx (16Gbit/s), 27xx (32Gbit/s) and 28xx (32 and 64Gbit/s) cards.
Is that enough to justify disabling isp(4) in GENERIC at all? It is no requirement to enable ispfw(4) in loader.conf(5). isp(4) is loading and using it if available automatically if not instructed by a hint to do otherwise. So disabling isp(4) in GENERIC will most probably hit all people that have no explicit isp_enable="YES" in loader.conf(5).

We could think about changing the code to skip FLT reading for those old card generations at all and change the load and exec behavior back to the state before all my changes.
That's probably not going to happen in time for 14.2.

Binary firmware loading instead of .ko is possible. I already have some code for this. But it would be a complete change away from ispw(4).
Anyway, I doubt that this would solve the problem we see here.
Again, probably not going to happen for 14.2.


For further tasks to solve this I would need some detailed data where it breaks on 14.2.
Unfortunately my test system is currently running without the 25xx card. I have physical access to this system tomorrow to plug in a 25xx card and run some tests by myself.

In the meantime:
@Vladimir

Can you please boot your panicing 14.2 memstick using

  hint.isp.0.debug="0x3f"

That should reveal all details about reading and parsing the FLT and firmware header and about firmware loading and execution.
I need all the console output until it panics.
Comment 28 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-11 13:45:04 UTC
(In reply to Joerg Pulz from comment #27)
14.1-p6 amd64 live system (not memstick) with ispfw loaded via /boot/loader.conf:
# sysctl dev.isp.0
dev.isp.0.fw_version_run: 8.8.207
dev.isp.0.fw_version_ispfw: 8.8.207
dev.isp.0.fw_version_flash: not loaded
dev.isp.0.use_gff_id: 1
dev.isp.0.use_gft_id: 1
dev.isp.0.topo: 0
dev.isp.0.loopstate: 10
dev.isp.0.fwstate: 3
dev.isp.0.linkstate: 1
dev.isp.0.speed: 8
dev.isp.0.role: 2
dev.isp.0.gone_device_time: 30
dev.isp.0.loop_down_limit: 60
dev.isp.0.wwpn: <cut>
dev.isp.0.wwnn: <cut>
dev.isp.0.%parent: pci1
dev.isp.0.%pnpinfo: vendor=0x1077 device=0x2532 subvendor=0x1077 subdevice=0x015d class=0x0c0400
dev.isp.0.%location: slot=0 function=0 dbsf=pci0:12:0:0
dev.isp.0.%driver: isp
dev.isp.0.%desc: Qlogic ISP 2532 PCI FC-AL Adapter

> I need all the console output until it panics.
I don't know how to get all dmesg if system panic. I can try to do "fast screenshot" from virtual KVM (IBM IMM2).
Comment 29 Joerg Pulz 2024-11-11 14:32:13 UTC
I have to correct myself regarding the firmware handling.

    For 27xx and newer adapters:
    - load ispfw(4) firmware
    - request (active) flash firmware information
    - compare version numbers of ispfw(4) and flash firmware
    - load firmware with highest version into RISC's RAM
    - if loading ispfw(4) is disabled or failed - load firmware from flash
    - if everything else fails use MBOX_LOAD_FLASH_FIRMWARE as fallback

    For 26xx and older adapters nothing changed:
    - load ispfw(4) firmware and load it into RISC's RAM
    - if loading ispfw(4) is disabled or failed use MBOX_EXEC_FIRMWARE
    - for 26xx a preceding MBOX_LOAD_FLASH_FIRMWARE is used

So for the old 25xx we are talking about, ispfw(4) is always loaded if available and not explicit disabled by hint.
Only for the newer cards the version comparison is done.
But, the FLT parsing is always done to get all the addresses, especially NVRAM where we read out WWPN, WWNN, command limit and so on.

So why does this probably fail for 14.2 while it works on 14.1 when nothing changed there? And why is it not failing if ispfw(4) is loaded by loader?
Reading the FLT is done at the very beginning, before firmware is loaded into RAM of and exec'd by the card.
I will try by myself tomorrow to get more data and details.
Comment 30 Vladimir Druzenko freebsd_committer freebsd_triage 2024-11-11 14:38:23 UTC
Virtual KVM (IBM IMM2) can write something like "video"! I'll try to create screenshots from this "video" (but later evening).
Comment 31 Joerg Pulz 2024-11-14 08:20:09 UTC
Sorry for the delay - here are my results:

Using 14.2-BETA2-memstick:
- 24xx based card: untested - no hardware
- 25xx based card: broken - panic
- 26xx based card: working
- 27xx based card: working
- 28xx based card: working

Situation for the 25xx based cards after cold boot:
During the initialization isp(4) tries to load isp_2500.ko firmware module.
This fails as firmware(9) is unable to load it without mounted root fs.
As fallback method the card is instructed to exec firmware from flash.
This returns without error.
Afterwards an "about firmware" command is sent to the card.
This command returns a timeout - no success message from card - what leads to a device attach failure.
Don't know why this happens as nothing changed in the way how and what isp(4) communicates with the card - don't know if some general PCI communication stuff changed between 14.1 and 14.2.
Later in the boot process (right after root fs mount?) somehow a reprobing is happening and isp(4) is trying to attach to the card again.
This time firmware(9) returns error 6 when trying to load isp_2500.ko and right afterwards the panic occurs.

It panics in "firmware taskq"!!! This does not happen when cold booting from a 14.1-RELEASE-memstick and should be fixed in firmware(9).

Tried the same with "hint.isp.0.fwload_disable=1" set, so firmware(9) is completely out of the game.
This time the system is booting without panic but isp(4) always failing with the "about firmware" command timing out.
I see isp(4) trying to attach to the card 5 times (once during normal device enumeration and 4 times after the root fs is mounted) and I don't know why it is happening so many times.

Tried the same with preloading the firmware (either ispfw.ko or only isp_2500.ko) and the system is booting and isp(4) attaching to the card - no timeouts, everything working.

I did a lot of testing and made several changes to the isp(4) code the past days to find out why device attach fails for 25xx based cards when cold booting and using firmware from flash in 14.2+. Until now without success.

For now the best/easiest solution would be to include ispfw(4) into the kernel (like isp(4) already is) or always load it by default (or at least load isp_2500.ko by default).

Thoughts about binary firmware:
Please correct me if I'm wrong but as firmware(9) needs a mounted root fs to either load binary or .ko firmware it would make no difference here.