Summary: | ggatel test assumes "ggate create" creates geom_gate device on command completion, whereas in reality it creates it async; need to wait for char device to show up in test | ||
---|---|---|---|
Product: | Base System | Reporter: | Enji Cooper <ngie> |
Component: | tests | Assignee: | Enji Cooper <ngie> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | asomers, fk, pjd |
Priority: | --- | Flags: | asomers:
mfc-stable11+
asomers: mfc-stable10+ |
Version: | CURRENT | ||
Hardware: | Any | ||
OS: | Any |
Description
Enji Cooper
2015-11-17 00:59:12 UTC
A commit references this bug: Author: ngie Date: Tue Nov 17 01:01:03 UTC 2015 New revision: 290965 URL: https://svnweb.freebsd.org/changeset/base/290965 Log: - Use conf.sh - Consolidate the cleanup steps and always execute on exit - Be more verbose on md5 failure - Use better variable names - Expect the testcase to fail, as noted in bug 204616 PR: 204616 Changes: user/ngie/more-tests2/sbin/geom/class/tests/gate/2_test.sh Did you confirm that the test fails with a clean environment and vanilla FreeBSD 11-CURRENT as well? It seems to work for me on ElectroBSD: [fk@kendra /usr/src/tools/regression/geom_gate]$ uname -a ElectroBSD kendra 11.0-CURRENT ElectroBSD 11.0-CURRENT #19 r290716+b520db4(electrobsd): November 12, 2015 at 10:34:01 PM UTC fk@kendra:/usr/obj/usr/src/sys/ELECTRO_BEER amd64 [fk@kendra /usr/src/tools/regression/geom_gate]$ sudo prove -rv test-2.t test-2.t .. 1..2 ok 1 - md5 checksum ok 2 - md5 checksum ok All tests successful. Files=1, Tests=2, 0 wallclock secs ( 0.05 usr 0.01 sys + 0.02 cusr 0.06 csys = 0.14 CPU) Result: PASS I wasn't able to reproduce the issue after refactoring the tests to use random device names/ggate names. Grr... it's happening again! $ sudo prove -rv tests/sys/geom/class/gate/2_test.sh tests/sys/geom/class/gate/2_test.sh .. 1+0 records in 1+0 records out 1048576 bytes transferred in 0.007606 secs (137856722 bytes/sec) 1+0 records in 1+0 records out 1048576 bytes transferred in 0.007719 secs (135848466 bytes/sec) dd: /dev/ggate46: Operation not supported 1..2 not ok 1 - md5 checksums didn't match (317214e9a41d85336e11874a8c5b3b14 != f346628d0d670af41127fc6d4ba196a0) # TODO: bug 204616 not ok 2 # SKIP: can't validate ggate_checksum Failed 1/2 subtests (less 1 skipped subtest: 0 okay) Test Summary Report ------------------- tests/sys/geom/class/gate/2_test.sh (Wstat: 0 Tests: 2 Failed: 1) Failed test: 2 Files=1, Tests=2, 1 wallclock secs ( 0.02 usr 0.02 sys + 0.01 cusr 0.03 csys = 0.08 CPU) Result: FAIL asomers: this issue is happening consistency, post-ATF-ication: sys/geom/class/gate/ggate_test:ggatel_file -> failed: work md5 checksum didn't match [0.096s] (In reply to Ngie Cooper from comment #5) *consistently I can't reproduce this failure. Is ggate synchronous or asynchronous? ngie, could you please try applying this patch and see if the failure goes away? Index: tests/sys/geom/class/gate/ggate_test.sh =================================================================== --- tests/sys/geom/class/gate/ggate_test.sh (revision 312995) +++ tests/sys/geom/class/gate/ggate_test.sh (working copy) @@ -105,6 +105,7 @@ atf_check ggatel create -u $us /dev/$work dd if=/dev/$src of=/dev/ggate${us} bs=1m count=1 conv=notrunc + fsync /dev/ggate${us} checksum /dev/$src /dev/$work } (In reply to Alan Somers from comment #7) The patch unfortunately didn't work -- kyua debug is pretty revealing: # kyua debug sys/geom/class/gate/ggate_test:ggatel_file Executing command [ ggatel create -u 0 work ] 1+0 records in 1+0 records out 1048576 bytes transferred in 0.013324 secs (78699151 bytes/sec) 1+0 records in 1+0 records out 1048576 bytes transferred in 0.013364 secs (78461104 bytes/sec) dd: /dev/ggate0: Operation not supported fsync: open /dev/ggate0: No such file or directory Files left in work directory after failure: ggate.devs, plainfiles, src, work sys/geom/class/gate/ggate_test:ggatel_file -> failed: work md5 checksum didn't match # uname -a FreeBSD fattw-3-4 12.0-CURRENT FreeBSD 12.0-CURRENT #35 r312933+593b97e50212(isilon-atf)-dirty: Sat Jan 28 15:44:49 PST 2017 ngie@wkstn-fbsd-ngie:/usr/obj/usr/src/sys/GENERIC-NODEBUG amd64 The test in and of itself before might have been racy and may be better reproed on faster hardware/GENERIC-NODEBUG kernels. Yup -- it's a race. This fixes the issue: 76 atf_check ggatel create -u $us work 77 78 while [ ! -c /dev/ggate${us} ]; do 79 sleep 0.5 80 done 81 82 dd if=src of=/dev/ggate${us} bs=1m count=1 conv=notrunc 83 84 fsync /dev/ggate${us} (In reply to Ngie Cooper from comment #9) Sidenote: the fsync isn't required; waiting for the character device to appear is. (In reply to Ngie Cooper from comment #10) This is happening because "ggatel create" is async by design. From .../sbin/ggate/ggatel/ggatel.c : 89 if (g_gate_verbose == 0) { 90 if (daemon(0, 0) == -1) { 91 g_gate_destroy(unit, 1); 92 err(EXIT_FAILURE, "Cannot daemonize"); 93 } 94 } I'll fix it once I get home and backport the change as necessary, because IIRC ^/stable/10 is affected by this issue. Thanks ngie. And don't forget to add the polling loop to the other two test cases, too. A commit references this bug: Author: ngie Date: Tue Jan 31 06:12:52 UTC 2017 New revision: 313008 URL: https://svnweb.freebsd.org/changeset/base/313008 Log: Wait for /dev/ggate* to appear after calling `ggatel create` in :ggatel_{file,md} The test assumed that `ggatel create` created a device on completion, but that's incorrect. This squashes the race by waiting for the device to appear, as `ggatel create` daemonizes before issuing an ioctl to geom_gate(4) if not called with `-v`. Discussed with: asomers MFC after: 1 week PR: 204616 Sponsored by: Dell EMC Isilon Changes: head/tests/sys/geom/class/gate/ggate_test.sh A commit references this bug: Author: asomers Date: Wed Feb 15 00:15:25 UTC 2017 New revision: 313753 URL: https://svnweb.freebsd.org/changeset/base/313753 Log: MFC r311893, r313008, r313081 r311893: ATFify the geom gate tests. This ensures their cleanup routines will be run even if they should timeout. tests/sys/geom/class/gate/ggate_test.sh tests/sys/geom/class/gate/Makefile Add an ATF test with three testcases, one for each TAP test. Use ATF-style cleanup functions, and convert sleeps to polling loops. ObsoleteFiles.inc tests/sys/geom/class/gate/conf.sh tests/sys/geom/class/gate/1_test.sh tests/sys/geom/class/gate/2_test.sh tests/sys/geom/class/gate/3_test.sh Delete TAP test files Reviewed by: ngie MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D8891 r313008: Wait for /dev/ggate* to appear after calling `ggatel create` in :ggatel_{file,md} The test assumed that `ggatel create` created a device on completion, but that's incorrect. This squashes the race by waiting for the device to appear, as `ggatel create` daemonizes before issuing an ioctl to geom_gate(4) if not called with `-v`. Discussed with: asomers MFC after: 1 week PR: 204616 Sponsored by: Dell EMC Isilon r313081: Replace for/retry loops with "wait_for_ggate_device" calls and check results of commands As noted in r313008, the underlying issue was that geom_gate device creation wasn't created at ggatel command completion, but some short time after. ggatec(8) employs similar logic when creating geom_gate(4) devices. Switch from retry loops (after the ggatec/dd write calls) to wait_for_ggate_device function calls after calling ggatec(8) instead to detect the presence of the /dev/ggate* device, as this function is sufficient for determining whether or not the character device is ready for testing While here, use atf_check consistently with all dd calls to ensure that data output is as expected. MFC after: 1 week Reviewed by: asomers Differential Revision: D9409 Sponsored by: Dell EMC Isilon Changes: _U stable/11/ stable/11/ObsoleteFiles.inc stable/11/tests/sys/geom/class/gate/1_test.sh stable/11/tests/sys/geom/class/gate/2_test.sh stable/11/tests/sys/geom/class/gate/3_test.sh stable/11/tests/sys/geom/class/gate/Makefile stable/11/tests/sys/geom/class/gate/conf.sh stable/11/tests/sys/geom/class/gate/ggate_test.sh A commit references this bug: Author: asomers Date: Wed Feb 15 00:16:53 UTC 2017 New revision: 313754 URL: https://svnweb.freebsd.org/changeset/base/313754 Log: MFC r311893, r313008, r313081 I had to modify the tests slightly for the MFC to stable/10, because stable/10 lacks r294037, which enabled /sbin/md5 to work on md(4) devices. r311893: ATFify the geom gate tests. This ensures their cleanup routines will be run even if they should timeout. tests/sys/geom/class/gate/ggate_test.sh tests/sys/geom/class/gate/Makefile Add an ATF test with three testcases, one for each TAP test. Use ATF-style cleanup functions, and convert sleeps to polling loops. ObsoleteFiles.inc tests/sys/geom/class/gate/conf.sh tests/sys/geom/class/gate/1_test.sh tests/sys/geom/class/gate/2_test.sh tests/sys/geom/class/gate/3_test.sh Delete TAP test files Reviewed by: ngie MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D8891 r313008: Wait for /dev/ggate* to appear after calling `ggatel create` in :ggatel_{file,md} The test assumed that `ggatel create` created a device on completion, but that's incorrect. This squashes the race by waiting for the device to appear, as `ggatel create` daemonizes before issuing an ioctl to geom_gate(4) if not called with `-v`. Discussed with: asomers MFC after: 1 week PR: 204616 Sponsored by: Dell EMC Isilon r313081: Replace for/retry loops with "wait_for_ggate_device" calls and check results of commands As noted in r313008, the underlying issue was that geom_gate device creation wasn't created at ggatel command completion, but some short time after. ggatec(8) employs similar logic when creating geom_gate(4) devices. Switch from retry loops (after the ggatec/dd write calls) to wait_for_ggate_device function calls after calling ggatec(8) instead to detect the presence of the /dev/ggate* device, as this function is sufficient for determining whether or not the character device is ready for testing While here, use atf_check consistently with all dd calls to ensure that data output is as expected. MFC after: 1 week Reviewed by: asomers Differential Revision: D9409 Sponsored by: Dell EMC Isilon Changes: _U stable/10/ stable/10/ObsoleteFiles.inc stable/10/tests/sys/geom/class/gate/1_test.sh stable/10/tests/sys/geom/class/gate/2_test.sh stable/10/tests/sys/geom/class/gate/3_test.sh stable/10/tests/sys/geom/class/gate/Makefile stable/10/tests/sys/geom/class/gate/conf.sh stable/10/tests/sys/geom/class/gate/ggate_test.sh |