Created attachment 218172 [details] boot/dmesg log I compiled a FreeBSD kernel from CURRENT with patches D25219, D26493, D26494, D26495, and D26495 yesterday and attempted to boot it on my Raspberry Pi 4B with 8 GB memory. The kernel version is reported as: FreeBSD 13.0-CURRENT #0 5942f048f5c-c271691(master)-dirty To this end, I prepared a USB drive (an M.2 SSD attached through a M.2 SATA-to-USB bridge) with a UEFI bootloader and a FreeBSD installation in a zpool. When trying to boot the system with the drive attached to a USB2 port, everything works fine. When I instead use a USB3 port, mounting root fails with a series of IO errors: da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: <WDC WDS2 40G2G0B-00EP UJ43> Fixed Direct Access SPC-4 SCSI device da0: Serial Number ABCDEFA74566 da0: 400.000MB/s transfers da0: 228936MB (468862128 512 byte sectors) da0: quirks=0x2<NO_6_BYTE> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 02 c7 77 2e 00 00 05 00 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 02 c7 77 2e 00 00 05 00 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 02 c7 77 2e 00 00 05 00 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 02 c7 77 2e 00 00 05 00 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 02 c7 77 2e 00 00 05 00 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Error 5, Retries exhausted The same happened before I applied the D26493 series of patches. The drive works just fine on a USB3 port on my Haswell-based laptop. Enabling/disabling the RAM limiter in the UEFI configuration does not change the symptoms. See attached boot console log for details.
Does the device respond to "usbconfig dump_all_desc" after this error? --HPS
I cannot tell because as the root file system fails to be mounted, I don't get a shell to type commands into.
I am running a head -r365932 build on a RPi4B with 8GiByte of RAM booting from a USB3 SSD via uefi/ACPI v1.20, no ZFS involved. Previously it was head -r363590 that also worked just fine. All were based on non-debug buildworld buildkernel. Historically tuned via -mcpu=cortex-a53 but recently via -mcpu=cortex-a72 . I use the 3072 MiByte RAM limit setting. If I gather right, you are at head -r365941 in svn terms. So you should have picked up -r365918 that is a xhci fix (important for -cpu=cortex-a72 based kernels). I have https://reviews.freebsd.org/D26495 applied and have had it applied for a long time. I do not have any of the following applied: D26493, D26494, D26495 (you listed this one twice), D26496 (so I guessed this one) In other words, my software context seems similar to yours from before you "applied the D26493 series of patches". This suggests that something more specific to your context is involved and it may be difficult for others to duplicate your problem.
(In reply to Mark Millard from comment #3) It is plausible that the USB bridge does some funky things. It's a cheapo Chinesium bridge, specifically a FIDECO brand model M203CP B-key M.2 (SATA) to USB 3.1 bridge. Attached to that bridge is a Western Digital WDS240G2G0B SSD. The kernel has been compiled with default options; only -mcpu=cortex-a72 has been added to src.conf. The UEFI code is version 1.20, too. The system appears to have crashed a few hours ago (doesn't ping), so I'll have to wait until I'm back home to bring it back up. What further information can I supply to help you debug this? I could apply some patches or even break out the kernel debugger following your instructions if that helps.
(In reply to Robert Clausecker from comment #4) I do not have much to suggest other than booting off of other media via, say, a USB2 device or a microsd card, and then use that context to investigate plugging in and using the USB3 materials. (I've got very little background in the subjects involved.) This might get you to the point of being able to do something like what Hans Petter Selasky suggested and report the results back to him. I've no clue at this point how to find what code initiated the failing read. A stack backtrace from that code would be nice for identifying the context that gets the problem. Comment #3 was more about the environment not being generally broken and replicating the problem elsewhere possibly being problematical.
Thanks, I'll try building a UFS-based FreeBSD setup on a separate USB drive and boot from that for testing purposes. Could take a few days to get done with it.
(In reply to Mark Millard from comment #3) When I listed what I have applied, I messed up. It should have listed: https://reviews.freebsd.org/D25219 I do not have D26495 applied.
Issue still occurs on r366144 with D25219 applied. The XHCI fixes apparently have not affected whatever the underlying problem is here.
Issue still occurs with the 13.0 release.
(In reply to Robert Clausecker from comment #9) It it not clear what version(s) of sysutils/rpi-firmware type materials that you are using. A way of getting solid information about †he RPi firmware (unless it has been mixed-and-matched across releases) is: # strings start4.elf | grep VC_BUILD_ID_ VC_BUILD_ID_USER: dom VC_BUILD_ID_TIME: 12:10:40 VC_BUILD_ID_VARIANT: start VC_BUILD_ID_TIME: Feb 25 2021 VC_BUILD_ID_BRANCH: bcm2711_2 VC_BUILD_ID_HOSTNAME: buildbot VC_BUILD_ID_PLATFORM: raspberrypi_linux VC_BUILD_ID_VERSION: 564e5f9b852b23a330b1764bcf0b2d022a20afd0 (clean) If you are using before the 2021-Feb-21 dated build, you likely have problems in part from the firmware. The status for ones after 2021-Feb-21 is not well known as far as I can tell. If you are still using releases from https://github.com/pftf/RPi4/releases/ to have UEFI (possibly used in ACPI mode), that and the specific version is not clear. (V1.26 is new as of today.) As far as I know no one is officially supporting use of these releases and it is known that the 3 GiByte limitation must be selected for reliable operation to even be a potential. From that point of view, these reports might someday be classified as "not a bug". If you are using ACPI mode is also not clear. If you are using sysutils/u-boot-rpi4 or sysutils/u-boot-arpi-arm64 that also is not clear, including which version(s). These also present a UEFI interface, not necessarily with ACPI as an option but historically with a Device Tree. These also end up using EFI/BOOT/bootaa64.efi ( a.k.a. /boot/loader.efi but copied). None of that is identified by the FreeBSD version(s) unless you also indicate something like that you did a dd of something like FreeBSD-13.0-RELEASE-arm64-aarch64-RPI.img that has more than FreeBSD involved. So far I've still never had a problem like you report. But I have a UFS context, not ZFS. In comment #6 you wrote: QUOTE I'll try building a UFS-based FreeBSD setup on a separate USB drive and boot from that for testing purposes END QUOTE. But, I do not see any explicit reports of what was discovered or if you abandoned the effort. You could potentially try a pure FreeBSD-13.0-RELEASE-arm64-aarch64-RPI.img context and if it worked, then try to substitute more of your context to the media and see what step starts the failures. (Then possibly start over, making just that last substitution to see if it is sufficient.) It appears that such would be required for you to supply enough information for someone to repeat the problem. Of course, if FreeBSD-13.0-RELEASE-arm64-aarch64-RPI.img failed up front, it almost certainly means lack of hardware support in some way --and that would mean needing to replicate the hardware context in order for someone to investigate. As stands it is unclear how anyone can help you or investigate.
(In reply to Robert Clausecker from comment #9) Just for the record: on the lists your have reported: There's some stuff about UEFI booting in there which you can ignore. The same problem also appears when booting via U-Boot.
(In reply to Robert Clausecker from comment #9) As detailed in the below-noted list submittal, I used bsdinstall to set up a RPi4B 8 GiByte with a ZFS USB3 SSD boot/root-file-system media via FreeBSD-13.0-RELEASE-arm64-aarch64-RPI.img on a microsd card as the context bsdisntall ran in. Some RPi4 specfic materials had to be copied to the file system in /dev/gpt/efiboot0 since bsdinstall does not deal with such things. The resultant ZFS USB3 SSD worked fine for booting and operating the RPi4B 8 GiByte. See: https://lists.freebsd.org/pipermail/freebsd-arm/2021-April/023648.html From the booted system: root@RPi4_8G_ZFS:~ # uname -apKU FreeBSD RPi4_8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr 9 03:54:53 UTC 2021 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1300139 1300139 root@RPi4_8G_ZFS:~ # df -m Filesystem 1M-blocks Used Avail Capacity Mounted on zroot/ROOT/default 196003 1110 194893 1% / devfs 0 0 0 100% /dev /dev/gpt/efiboot0 259 18 241 7% /boot/efi zroot/tmp 194893 0 194893 0% /tmp zroot/usr/home 194893 0 194893 0% /usr/home zroot/var/log 194893 0 194893 0% /var/log zroot/var/mail 194893 0 194893 0% /var/mail zroot 194893 0 194893 0% /zroot zroot/var/tmp 194893 0 194893 0% /var/tmp zroot/usr/src 195594 701 194893 0% /usr/src zroot/var/audit 194893 0 194893 0% /var/audit zroot/usr/ports 195593 700 194893 0% /usr/ports zroot/var/crash 194893 0 194893 0% /var/crash Robert's problem seems to be based on some detail(s) specific to his environment, not some sort of general problem with root on ZFS via USB3 for RPi4B's. The problem is to isolate what detail(s). As stands, it appears only Robert has a context to do that in.
(In reply to Mark Millard from comment #10) Hi Mark, > It it not clear what version(s) of sysutils/rpi-firmware type > materials that you are using. A way of getting solid information > about †he RPi firmware (unless it has been mixed-and-matched across > releases) is: I'm using the exact same firmware version shipped on the FreeBSD 13 release images. The strings output is identical to yours. > If you are still using releases from https://github.com/pftf/RPi4/releases/ to have UEFI (possibly used in ACPI mode), that and the specific version is not clear. Nope. I've given up on these attempts when it got clear that UEFI is not going to be supported going forwards. Interestingly, I recall that the problem might not have occurred on UEFI, but I'm not sure. So it's standard U-Boot stuff right now. > So far I've still never had a problem like you report. But I have a UFS context, not ZFS. I did an UFS reinstall some months ago and to my surprise, the problem went away when I did that. This was very surprising to me and I chalked it up to perhaps the problem having been fixed by an update in CURRENT back then. I had updated that install all the way to FreeBSD 13.0-RELEASE before trying to reinstall on ZFS and never had any problems. Those seem to occur only when installing the system on ZFS. > But, I do not see any explicit reports of what was discovered or if you abandoned the effort. After reinstalling on UFS, the problem went away so I thought the problem had been addressed and kinda forgot about the bug report. I'll try to set up a separate UFS-based disk and boot from that, mounting the zpool later on, to see if it changes anything. > It appears that such would be required for you to supply enough information > for someone to repeat the problem. Of course, if > FreeBSD-13.0-RELEASE-arm64-aarch64-RPI.img failed up front, it almost certainly > means lack of hardware support in some way --and that would mean needing to > replicate the hardware context in order for someone to investigate. Yes, this is very unfortunate. > (comment #12) Very strange. Perhaps it is indeed a power issue (as alluded by some people on the list).
Created attachment 224184 [details] usbconfig dump_all_desc output I've flashed the UFS based default installer image to a separate USB drive attached by USB 2.0 and then tried to import the zpool manually. The observed errors are similar: # zpool import ZFS filesystem version: 5 ZFS storage pool version: features support (5000) pool: tau id: 11171206566155786428 state: ONLINE status: Some supported features are not enabled on the pool. action: The pool can be imported using its name or numeric identifier, though some features will not be available without an explicit 'zpool upgrade'. config: tau ONLINE diskid/DISK-ABCDEFA74566s2a ONLINE # zpool import -R /tau tau (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 19 81 f3 ad 00 00 07 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 19 81 f3 ad 00 00 07 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 19 81 f3 ad 00 00 07 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 19 81 f3 ad 00 00 07 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 19 81 f3 ad 00 00 07 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Error 5, Retries exhausted Other operations on the drive, like mounting the UFS partition or reading the whole disk front to back succeed without problems. I've been able to run the usbconfig command asked for by comment #1, so at least I can attach that. If all fails, I can mail someone the drive so you can reproduce this for yourself.
I've now tried a different drive (an ST2000LM007-1R8174; spinning rust) in an external USB-3 enclosure and it works just fine. Either the drive is faulty or perhaps there is some sort of problem that only affects that drive.
I found another clue: If I create a zpool with ashift=12 on the disk in question, it works fine on the RPi 4. Perhaps the disk does not actually support 512 byte access (you can see from the IO errors that they try to do a transfer of 5 sectors of 512 bytes). Could this perhaps be some sort of ZFS regression? The other system I tested the drive on is still on FreeBSD 12 with the old ZFS code. Perhaps that code did not try to perform such accesses? Or perhaps there was some sort of fallback?
(In reply to Robert Clausecker from comment #16) Ronald Klop in https://lists.freebsd.org/pipermail/freebsd-arm/2021-April/023650.html had written: QUOTE Could it be a partitioning difference that you are crossing 4K-sector boundaries or something else that amplifies the traffic when using ZFS? END QUOTE So it sounds like it may be a well-known type of issue that one is supposed to well-manage when setting up ZFS. I would guess that bsdinstall in auto mode for creating a ZFS context likely just uses figures such that it avoids running into such issues. Other contexts may fairly generally require more explicit handling to avoid creating issues. (I'm no ZFS expert.)
(In reply to Mark Millard from comment #17) Hi Mark, Apparently bsdinstall sets up vfs.zfs.min_auto_ashift=12 in /etc/sysctl.conf to address this potential problem. So you are right in that this possibility is already addressed. However, as I manually set up the zpool, this was not the case for me and I ran head first into the problem. For future reference: the thing that made me diagnose the problem is the CDB showing a 5 sector read. 5 is not a multiple of 8... Thanks for your excellent help anyway!