While inserting and removing disks in a gmirror, the "Superblock check-hash failed" messaage is seen. fsck_ffs(8) does not find any issues. Details @ https://people.freebsd.org/~pho/stress/log/graid1_7-2.txt
I suspect the taste method, e.g., Tracing command geom pid 13 tid 100429 td 0xfffff80c47554000 (CPU 25) kdb_enter() at kdb_enter+0x3b/frame 0xfffffe00e3bd4f40 vpanic() at vpanic+0x1ba/frame 0xfffffe00e3bd4f90 panic() at panic+0x43/frame 0xfffffe00e3bd4ff0 readsuper() at readsuper+0x1ec/frame 0xfffffe00e3bd5070 ffs_sbget() at ffs_sbget+0xf6/frame 0xfffffe00e3bd5140 g_label_ufs_taste_common() at g_label_ufs_taste_common+0x79/frame 0xfffffe00e3bd5180 g_label_taste() at g_label_taste+0x332/frame 0xfffffe00e3bd5290 g_new_provider_event() at g_new_provider_event+0x1ba/frame 0xfffffe00e3bd52b0 g_run_events() at g_run_events+0x197/frame 0xfffffe00e3bd52f0 fork_exit() at fork_exit+0x84/frame 0xfffffe00e3bd5330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e3bd5330
Doh, I see you added a panic that clearly illustrates the stack. Yep, clearly readsuper / ffs_sbget should not print anything for a taste operation! :-)
Created attachment 202888 [details] Proposed patch to fix bug. Good analysis. I believe this patch will fix this bug.
This patch does not seem to fix the problem seen: 20190316 08:29:28 all: graid1_7.sh Mar 16 08:35:18 t2 kernel: Superblock check-hash failed: recorded check-hash 0xe4e5b25e != computed check-hash 0x1aaf9179 (Ignored) FAIL graid1_7.sh exit code 5 20190316 08:35:52 all: graid1_7.sh ^C $ svn diff -x -p /usr/src/sys Index: /usr/src/sys/geom/label/g_label_ufs.c =================================================================== --- /usr/src/sys/geom/label/g_label_ufs.c (revision 345222) +++ /usr/src/sys/geom/label/g_label_ufs.c (working copy) @@ -76,8 +76,8 @@ g_label_ufs_taste_common(struct g_consumer *cp, ch label[0] = '\0'; fs = NULL; - if (SBLOCKSIZE % pp->sectorsize != 0 || - ffs_sbget(cp, &fs, STDSB, M_GEOM, g_use_g_read_data) != 0) { + if (SBLOCKSIZE % pp->sectorsize != 0 || ffs_sbget(cp, &fs, + STDSB_NOHASHFAIL, M_GEOM, g_use_g_read_data) != 0) { KASSERT(fs == NULL, ("g_label_ufs_taste_common: non-NULL fs %p\n", fs)); return; $ uname -a FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r345222M: Sat Mar 16 08:26:16 CET 2019 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO amd64 $
To be fair: The message now has an added "(Ignored)".
(In reply to Peter Holm from comment #5) I would define warning but proceeding as correct operation. If you do not agree with that, what do you think it should do?
Taste should probably be able to use a "silent" ffs_sbget mode to avoid printing anything. This is like the GEOM equivalent of every newbus driver printing a message on failed probe.
I was expecting not to see any message at all.
I don't quite understand why the test is generating these errors in the first place. If the checksum is failing, it means that we saw the UFS2 magic number in the superblock, so we're probably not looking at random garbage. Is it possible the taste is racing with synchronization of a new gmirror disk, and seeing a partial write of a copied superblock?
(In reply to Mark Johnston from comment #9) It is an out-of-date superblock from a filesystem that did not do superblock check hashes (while the current one does do check-hashes). So the correct thing to do is just reject it silently rather than reporting why it was rejected.
The test uses a memory disk. This is basically what the test does: for i in `jot 8`; do while [ -f $cont ]; do for u in md${u1}p2 md${u1}p3; do gmirror forget test gmirror remove test $u gmirror insert test $u done 2>/dev/null done & done
I have uploaded a self contained version of the test scenario to: https://people.freebsd.org/~pho/graid1_7s2.sh I have run this on three different hosts, all with the same result.
A commit references this bug: Author: mckusick Date: Tue Aug 6 18:10:34 UTC 2019 New revision: 350651 URL: https://svnweb.freebsd.org/changeset/base/350651 Log: A race condition existed between the time a UFS/FFS superblock check hash was computed and the time that the superblock was copied to a buffer to be written to disk. The result was a failed superblock check hash the next time that the superblock was read. The fix is to compute the check hash after the superblock has been copied to a buffer to be written. PR: 236504 Reported by: Peter Holm Tested by: Peter Holm Sponsored by: Netflix Changes: head/sys/ufs/ffs/ffs_vfsops.c
The -r350651 commit resolves the check-hash failures. No need to MFC as superblock check hashes are not in 12-stable.