hi there, i'm trying to upgrade from 13.5-RELEASE-p14 to the 14.0-RELEASE but i'm having a problem with mrsas(4) after init (right after ntp service goes up). all the da[0-7] units are detached and periph destroyed messages populate the screen, causing a ZFS i/o failure and leaving the system completely unresponsive. when i boot into single-user mode this doesn't happen, and i can remount the zroot in rw mode and change settings without any issues. i also tried doing the `freebsd-update install` from single-user mode and the problem persists after reboot. i'm using `freebsd-update -r 14.0-RELEASE upgrade` to do the upgrade; i also tried going straight to 14.3- and 14.4-, but the error persists. i'm using the `hw.mfi.mrsas_enable="1"` option in /boot/loader.conf. is there any reason why this could happen? i didn't see any changes related to mrsas(4) nor cam(4) in the release notes.
hw.mfi.mrsas_enable="1" has to be in /boot/device.hints. I wonder why it works for you, it should not. Can you put it in there? Since you have identified the issue to be between 13.5 and 14.0 are you able to build from source and use boot environments to approach to the problematic commit with Git?
hello again, after many retries with different versions and BEs i managed to track the problem down to smartd(8) from smartmontools. the configuration is the default one, i.e. a single line with DEVICESCAN in /usr/local/etc/smartd.conf. for some reason it's causing a fault in the mrsas(4) driver that didn't happen before. Michael, i don't think it worked when i changed /boot/device.hints, but it's been many years. with the setting in /boot/loader.conf it always worked in the 13.x-RELEASE. i can confirm it's also working fine in 14.0- and 14.4-RELEASE. thank you for taking a look at this!
(In reply to rodri from comment #2) In other words, when put into device.hints as written in the manpage it works for you?
(In reply to Michael Osipov from comment #3) no, it doesn't seem to have any effect. when i put it in the loader.conf however, it works.
(In reply to rodri from comment #4) This is weird when I put the config into loader.conf mfi(4) is still attached. Sow when mrsas is working at loader.conf smartmontools don't bother with the drives and with device.hints they do, correct?
(In reply to Michael Osipov from comment #5) no no, this is not what i meant. the original issue was simply fixed by disabling the smartd(8) service, i.e. `smartd_enable="NO"` in /etc/rc.conf. i didn't touch anything else. on the other hand, i've always had the `hw.mfi.mrsas_enable="1"` set in /boot/loader.conf not in /boot/device.hints. but that's a different issue altogether. so the question here would be what's smartd(8) doing that's forcing mrsas(4) to detach the drives?
(In reply to rodri from comment #6) OK, got it now. Since smartd(8) didn't change during upgrade and neither did mrsas(4), you have to git bisect from 13.5 to 14.0. Are you able to build from source?
(In reply to Michael Osipov from comment #7) i've never done it but i could learn, i'll have to do it during the weekend though.
(In reply to rodri from comment #8) That be great. Maybe you can try in a VM how to do it before you go over to the actual server?
(In reply to Michael Osipov from comment #9) yeah, definitely a good idea. i'll set up a BE and mount it on a jail to go through the motions. i'll tell you how it goes :)
(In reply to Michael Osipov from comment #9) hello Michael, ok, so i did this: % cd /usr/src % git clone https://git.freebsd.org/src.git . % git checkout releng/13.5 and then % make -j8 buildworld gives me these errors: --- _bootstrap-tools-lib/clang/libclangminimal --- In file included from /usr/src/contrib/llvm-project/clang/lib/Support/RISCVVIntrinsicUtils.cpp:9: In file included from /usr/src/contrib/llvm-project/clang/include/clang/Support/RISCVVIntrinsicUtils.h:12: In file included from /usr/src/contrib/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:12: In file included from /usr/src/contrib/llvm-project/llvm/include/llvm/ADT/Hashing.h:50: In file included from /usr/src/contrib/llvm-project/llvm/include/llvm/Support/SwapByteOrder.h:17: In file included from /usr/src/contrib/llvm-project/llvm/include/llvm/ADT/STLForwardCompat.h:20: In file included from /usr/include/c++/v1/optional:186: In file included from /usr/include/c++/v1/__functional/hash.h:20: /usr/include/c++/v1/__utility/pair.h:19:10: fatal error: '__tuple/sfinae_helpers.h' file not found 19 | #include <__tuple/sfinae_helpers.h> | ^~~~~~~~~~~~~~~~~~~~~~~~~~ --- _bootstrap-tools-lib/clang/libllvmminimal --- In file included from /usr/src/contrib/llvm-project/llvm/lib/Demangle/ItaniumDemangle.cpp:13: In file included from /usr/src/contrib/llvm-project/llvm/include/llvm/Demangle/Demangle.h:13: In file included from /usr/include/c++/v1/optional:186: In file included from /usr/include/c++/v1/__functional/hash.h:20: /usr/include/c++/v1/__utility/pair.h:19:10: fatal error: '__tuple/sfinae_helpers.h' file not found 19 | #include <__tuple/sfinae_helpers.h> | ^~~~~~~~~~~~~~~~~~~~~~~~~~ --- _bootstrap-tools-usr.bin/yacc --- i'm doing it from the 13.5-RELEASE-p14 BE i had backed up, running on a jail; not sure what's going on. i tried replacing /usr/include with the stuff at /usr/src/include, but then it complained about a missing <sys/ctype.h>. am i missing any steps? i checked the handbook and the UPDATING file in the repo. still no clue.
i managed to fix it by replacing /usr/include with the one at https://download.freebsd.org/ftp/releases/amd64/13.5-RELEASE/base.txz i think i'm ready to bisect.
(In reply to rodri from comment #12) Great, I will engage tomorrow. Meanwhile check how "git bisect" works.
Can you try with this and bisect thorugh these: osipovmi@deblndw011x:~/var/Projekte/freebsd/src (stable/14 =) $ git bisect start Status: warte auf guten und schlechten Commit osipovmi@deblndw011x:~/var/Projekte/freebsd/src (stable/14 =|BISECTING) $ git bisect bad f9716eee8ab45ad906d9b5c5233ca20c10226ca7 Status: warte auf gute(n) Commit(s), schlechter Commit bekannt osipovmi@deblndw011x:~/var/Projekte/freebsd/src (stable/14 =|BISECTING) $ git bisect good 712806fc4b5470eb7d9ce537e3cdf3b386455d86 Binäre Suche: danach noch 215 Commits zum Testen übrig (ungefähr 8 Schritte) [e6d405e2bad22fd98f6296a793ad0c97776fe03c] arp(8): fix by-interface and by-host filtering when using netlink osipovmi@deblndw011x:~/var/Projekte/freebsd/src ((e6d405e2bad2...)|BISECTING) $
(In reply to Michael Osipov from comment #14) yeah, i'm sorry; there was an event this weekend and i could not go through this. i was wondering: is it possible to emulate an mrsas(4) device, for a bhyve vm for example? it would allow for a more accessible testing environment.
(In reply to rodri from comment #15) Unless you pass the entire controller into the VM and use another for your hypervisor, at least I see no other way.
(In reply to Michael Osipov from comment #16) i'm afraid that would defeat the purpose. all right, i see there's a driver for QEMU: https://github.com/qemu/qemu/blob/master/hw/scsi/megasas.c , although it emulates a much older card. i'll give it a try and otherwise will fall back to hardware on weekends. thanks!