Bug 204616

Summary: ggatel test assumes "ggate create" creates geom_gate device on command completion, whereas in reality it creates it async; need to wait for char device to show up in test
Product: Base System Reporter: Enji Cooper <ngie>
Component: testsAssignee: Enji Cooper <ngie>
Status: Closed FIXED    
Severity: Affects Some People CC: asomers, fk, pjd
Priority: --- Flags: asomers: mfc-stable11+
asomers: mfc-stable10+
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Enji Cooper freebsd_committer freebsd_triage 2015-11-17 00:59:12 UTC
While doing some digging in ^/user/ngie/more-tests2, I discovered that the assumption that the $work and $src files will match isn't true. I need some more digging into the test case to figure out why it's failing.

$ uname -a
FreeBSD fbsd11 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r290924+b4383cc(isilon-atf): Mon Nov 16 01:11:00 PST 2015     ngie@fbsd11:/usr/obj/usr/src/git/sys/GENERIC-NODEBUG  amd64
$ cd tools/regression/geom_gate
$ sudo prove -rv test-2.t 
test-2.t .. 
1..2
not ok 1 - md5 checksum
error: ggatel: ioctl(/dev/ggctl): Device busy.
error: Exiting.
Failed 2/2 subtests 

Test Summary Report
-------------------
test-2.t (Wstat: 0 Tests: 1 Failed: 1)
  Failed test:  1
  Parse errors: Bad plan.  You planned 2 tests but ran 1.
Files=1, Tests=1,  0 wallclock secs ( 0.03 usr  0.01 sys +  0.01 cusr  0.02 csys =  0.07 CPU)
Result: FAIL
Comment 1 commit-hook freebsd_committer freebsd_triage 2015-11-17 01:01:52 UTC
A commit references this bug:

Author: ngie
Date: Tue Nov 17 01:01:03 UTC 2015
New revision: 290965
URL: https://svnweb.freebsd.org/changeset/base/290965

Log:
  - Use conf.sh
  - Consolidate the cleanup steps and always execute on exit
  - Be more verbose on md5 failure
  - Use better variable names
  - Expect the testcase to fail, as noted in bug 204616

  PR: 204616

Changes:
  user/ngie/more-tests2/sbin/geom/class/tests/gate/2_test.sh
Comment 2 Fabian Keil 2015-11-17 10:28:54 UTC
Did you confirm that the test fails with a clean environment and vanilla FreeBSD 11-CURRENT as well?

It seems to work for me on ElectroBSD:

[fk@kendra /usr/src/tools/regression/geom_gate]$ uname -a
ElectroBSD kendra 11.0-CURRENT ElectroBSD 11.0-CURRENT #19 r290716+b520db4(electrobsd): November 12, 2015 at 10:34:01 PM UTC     fk@kendra:/usr/obj/usr/src/sys/ELECTRO_BEER  amd64
[fk@kendra /usr/src/tools/regression/geom_gate]$ sudo prove -rv test-2.t
test-2.t .. 
1..2
ok 1 - md5 checksum
ok 2 - md5 checksum
ok
All tests successful.
Files=1, Tests=2,  0 wallclock secs ( 0.05 usr  0.01 sys +  0.02 cusr  0.06 csys =  0.14 CPU)
Result: PASS
Comment 3 Enji Cooper freebsd_committer freebsd_triage 2015-12-14 16:10:14 UTC
I wasn't able to reproduce the issue after refactoring the tests to use random device names/ggate names.
Comment 4 Enji Cooper freebsd_committer freebsd_triage 2016-01-28 03:50:01 UTC
Grr... it's happening again!

$ sudo prove -rv tests/sys/geom/class/gate/2_test.sh
tests/sys/geom/class/gate/2_test.sh .. 1+0 records in
1+0 records out
1048576 bytes transferred in 0.007606 secs (137856722 bytes/sec)
1+0 records in
1+0 records out
1048576 bytes transferred in 0.007719 secs (135848466 bytes/sec)
dd: /dev/ggate46: Operation not supported

1..2
not ok 1 - md5 checksums didn't match (317214e9a41d85336e11874a8c5b3b14 != f346628d0d670af41127fc6d4ba196a0) # TODO: bug 204616
not ok 2 # SKIP: can't validate ggate_checksum
Failed 1/2 subtests 
        (less 1 skipped subtest: 0 okay)

Test Summary Report
-------------------
tests/sys/geom/class/gate/2_test.sh (Wstat: 0 Tests: 2 Failed: 1)
  Failed test:  2
Files=1, Tests=2,  1 wallclock secs ( 0.02 usr  0.02 sys +  0.01 cusr  0.03 csys =  0.08 CPU)
Result: FAIL
Comment 5 Enji Cooper freebsd_committer freebsd_triage 2017-01-30 20:31:15 UTC
asomers: this issue is happening consistency, post-ATF-ication:

sys/geom/class/gate/ggate_test:ggatel_file  ->  failed: work md5 checksum didn't match  [0.096s]
Comment 6 Enji Cooper freebsd_committer freebsd_triage 2017-01-30 20:31:46 UTC
(In reply to Ngie Cooper from comment #5)

*consistently
Comment 7 Alan Somers freebsd_committer freebsd_triage 2017-01-30 20:48:50 UTC
I can't reproduce this failure.  Is ggate synchronous or asynchronous?  ngie, could you please try applying this patch and see if the failure goes away?

Index: tests/sys/geom/class/gate/ggate_test.sh
===================================================================
--- tests/sys/geom/class/gate/ggate_test.sh     (revision 312995)
+++ tests/sys/geom/class/gate/ggate_test.sh     (working copy)
@@ -105,6 +105,7 @@
        atf_check ggatel create -u $us /dev/$work
 
        dd if=/dev/$src of=/dev/ggate${us} bs=1m count=1 conv=notrunc
+       fsync /dev/ggate${us}
 
        checksum /dev/$src /dev/$work
 }
Comment 8 Enji Cooper freebsd_committer freebsd_triage 2017-01-30 20:59:36 UTC
(In reply to Alan Somers from comment #7)

The patch unfortunately didn't work -- kyua debug is pretty revealing:

# kyua debug sys/geom/class/gate/ggate_test:ggatel_file
Executing command [ ggatel create -u 0 work ]
1+0 records in
1+0 records out
1048576 bytes transferred in 0.013324 secs (78699151 bytes/sec)
1+0 records in
1+0 records out
1048576 bytes transferred in 0.013364 secs (78461104 bytes/sec)
dd: /dev/ggate0: Operation not supported
fsync: open /dev/ggate0: No such file or directory
Files left in work directory after failure: ggate.devs, plainfiles, src, work
sys/geom/class/gate/ggate_test:ggatel_file  ->  failed: work md5 checksum didn't match
# uname -a
FreeBSD fattw-3-4 12.0-CURRENT FreeBSD 12.0-CURRENT #35 r312933+593b97e50212(isilon-atf)-dirty: Sat Jan 28 15:44:49 PST 2017     ngie@wkstn-fbsd-ngie:/usr/obj/usr/src/sys/GENERIC-NODEBUG  amd64

The test in and of itself before might have been racy and may be better reproed on faster hardware/GENERIC-NODEBUG kernels.
Comment 9 Enji Cooper freebsd_committer freebsd_triage 2017-01-30 21:01:35 UTC
Yup -- it's a race. This fixes the issue:

 76         atf_check ggatel create -u $us work
 77
 78         while [ ! -c /dev/ggate${us} ]; do
 79                 sleep 0.5
 80         done
 81
 82         dd if=src of=/dev/ggate${us} bs=1m count=1 conv=notrunc
 83
 84         fsync /dev/ggate${us}
Comment 10 Enji Cooper freebsd_committer freebsd_triage 2017-01-30 21:02:18 UTC
(In reply to Ngie Cooper from comment #9)

Sidenote: the fsync isn't required; waiting for the character device to appear is.
Comment 11 Enji Cooper freebsd_committer freebsd_triage 2017-01-30 21:05:13 UTC
(In reply to Ngie Cooper from comment #10)

This is happening because "ggatel create" is async by design. From .../sbin/ggate/ggatel/ggatel.c :

 89         if (g_gate_verbose == 0) {
 90                 if (daemon(0, 0) == -1) {
 91                         g_gate_destroy(unit, 1);
 92                         err(EXIT_FAILURE, "Cannot daemonize");
 93                 }
 94         }
Comment 12 Enji Cooper freebsd_committer freebsd_triage 2017-01-30 21:06:35 UTC
I'll fix it once I get home and backport the change as necessary, because IIRC ^/stable/10 is affected by this issue.
Comment 13 Alan Somers freebsd_committer freebsd_triage 2017-01-30 21:12:16 UTC
Thanks ngie.  And don't forget to add the polling loop to the other two test cases, too.
Comment 14 commit-hook freebsd_committer freebsd_triage 2017-01-31 06:13:09 UTC
A commit references this bug:

Author: ngie
Date: Tue Jan 31 06:12:52 UTC 2017
New revision: 313008
URL: https://svnweb.freebsd.org/changeset/base/313008

Log:
  Wait for /dev/ggate* to appear after calling `ggatel create` in :ggatel_{file,md}

  The test assumed that `ggatel create` created a device on completion, but that's
  incorrect. This squashes the race by waiting for the device to appear, as
  `ggatel create` daemonizes before issuing an ioctl to geom_gate(4) if not called
  with `-v`.

  Discussed with:	asomers
  MFC after:	1 week
  PR:		204616
  Sponsored by:	Dell EMC Isilon

Changes:
  head/tests/sys/geom/class/gate/ggate_test.sh
Comment 15 commit-hook freebsd_committer freebsd_triage 2017-02-15 00:16:25 UTC
A commit references this bug:

Author: asomers
Date: Wed Feb 15 00:15:25 UTC 2017
New revision: 313753
URL: https://svnweb.freebsd.org/changeset/base/313753

Log:
  MFC r311893, r313008, r313081

  r311893:
  ATFify the geom gate tests.

  This ensures their cleanup routines will be run even if they should timeout.

  tests/sys/geom/class/gate/ggate_test.sh
  tests/sys/geom/class/gate/Makefile
  	Add an ATF test with three testcases, one for each TAP test. Use
  	ATF-style cleanup functions, and convert sleeps to polling loops.

  ObsoleteFiles.inc
  tests/sys/geom/class/gate/conf.sh
  tests/sys/geom/class/gate/1_test.sh
  tests/sys/geom/class/gate/2_test.sh
  tests/sys/geom/class/gate/3_test.sh
  	Delete TAP test files

  Reviewed by:	ngie
  MFC after:	4 weeks
  Sponsored by:	Spectra Logic Corp
  Differential Revision:	https://reviews.freebsd.org/D8891

  r313008:
  Wait for /dev/ggate* to appear after calling `ggatel create` in :ggatel_{file,md}

  The test assumed that `ggatel create` created a device on completion, but that's
  incorrect. This squashes the race by waiting for the device to appear, as
  `ggatel create` daemonizes before issuing an ioctl to geom_gate(4) if not called
  with `-v`.

  Discussed with:	asomers
  MFC after:	1 week
  PR:		204616
  Sponsored by:	Dell EMC Isilon

  r313081:
  Replace for/retry loops with "wait_for_ggate_device" calls and check
  results of commands

  As noted in r313008, the underlying issue was that geom_gate device
  creation wasn't created at ggatel command completion, but some short
  time after. ggatec(8) employs similar logic when creating geom_gate(4)
  devices.

  Switch from retry loops (after the ggatec/dd write calls) to
  wait_for_ggate_device function calls after calling ggatec(8) instead
  to detect the presence of the /dev/ggate* device, as this function is
  sufficient for determining whether or not the character device is ready
  for testing

  While here, use atf_check consistently with all dd calls to ensure that
  data output is as expected.

  MFC after:	1 week
  Reviewed by:	asomers
  Differential Revision:	D9409
  Sponsored by:	Dell EMC Isilon

Changes:
_U  stable/11/
  stable/11/ObsoleteFiles.inc
  stable/11/tests/sys/geom/class/gate/1_test.sh
  stable/11/tests/sys/geom/class/gate/2_test.sh
  stable/11/tests/sys/geom/class/gate/3_test.sh
  stable/11/tests/sys/geom/class/gate/Makefile
  stable/11/tests/sys/geom/class/gate/conf.sh
  stable/11/tests/sys/geom/class/gate/ggate_test.sh
Comment 16 commit-hook freebsd_committer freebsd_triage 2017-02-15 00:17:28 UTC
A commit references this bug:

Author: asomers
Date: Wed Feb 15 00:16:53 UTC 2017
New revision: 313754
URL: https://svnweb.freebsd.org/changeset/base/313754

Log:
  MFC r311893, r313008, r313081

  I had to modify the tests slightly for the MFC to stable/10, because
  stable/10 lacks r294037, which enabled /sbin/md5 to work on md(4)
  devices.

  r311893:
  ATFify the geom gate tests.

  This ensures their cleanup routines will be run even if they should timeout.

  tests/sys/geom/class/gate/ggate_test.sh
  tests/sys/geom/class/gate/Makefile
  	Add an ATF test with three testcases, one for each TAP test. Use
  	ATF-style cleanup functions, and convert sleeps to polling loops.

  ObsoleteFiles.inc
  tests/sys/geom/class/gate/conf.sh
  tests/sys/geom/class/gate/1_test.sh
  tests/sys/geom/class/gate/2_test.sh
  tests/sys/geom/class/gate/3_test.sh
  	Delete TAP test files

  Reviewed by:	ngie
  MFC after:	4 weeks
  Sponsored by:	Spectra Logic Corp
  Differential Revision:	https://reviews.freebsd.org/D8891

  r313008:
  Wait for /dev/ggate* to appear after calling `ggatel create` in :ggatel_{file,md}

  The test assumed that `ggatel create` created a device on completion, but that's
  incorrect. This squashes the race by waiting for the device to appear, as
  `ggatel create` daemonizes before issuing an ioctl to geom_gate(4) if not called
  with `-v`.

  Discussed with:	asomers
  MFC after:	1 week
  PR:		204616
  Sponsored by:	Dell EMC Isilon

  r313081:
  Replace for/retry loops with "wait_for_ggate_device" calls and check
  results of commands

  As noted in r313008, the underlying issue was that geom_gate device
  creation wasn't created at ggatel command completion, but some short
  time after. ggatec(8) employs similar logic when creating geom_gate(4)
  devices.

  Switch from retry loops (after the ggatec/dd write calls) to
  wait_for_ggate_device function calls after calling ggatec(8) instead
  to detect the presence of the /dev/ggate* device, as this function is
  sufficient for determining whether or not the character device is ready
  for testing

  While here, use atf_check consistently with all dd calls to ensure that
  data output is as expected.

  MFC after:	1 week
  Reviewed by:	asomers
  Differential Revision:	D9409
  Sponsored by:	Dell EMC Isilon

Changes:
_U  stable/10/
  stable/10/ObsoleteFiles.inc
  stable/10/tests/sys/geom/class/gate/1_test.sh
  stable/10/tests/sys/geom/class/gate/2_test.sh
  stable/10/tests/sys/geom/class/gate/3_test.sh
  stable/10/tests/sys/geom/class/gate/Makefile
  stable/10/tests/sys/geom/class/gate/conf.sh
  stable/10/tests/sys/geom/class/gate/ggate_test.sh