Bug 267554 - ZFS tests should avoid atf_expect_fail due to coarse integration with ATF
Summary: ZFS tests should avoid atf_expect_fail due to coarse integration with ATF
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: tests (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Eric van Gyzen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-03 19:16 UTC by Eric van Gyzen
Modified: 2022-12-29 01:21 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eric van Gyzen freebsd_committer freebsd_triage 2022-11-03 19:16:22 UTC
Since the ZFS tests are coarsely integrated with ATF, using atf_expect_fail in the top-level function should be considered harmful since it can hide unexpected failures.

I'll soon reference this bug in a review/commit that gives more detail and an example.
Comment 1 Eric van Gyzen freebsd_committer freebsd_triage 2022-11-03 20:16:15 UTC
https://reviews.freebsd.org/D37257
Comment 2 commit-hook freebsd_committer freebsd_triage 2022-11-11 20:45:23 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=11ed0a95bfa76791dc6428eb2d47a986c0c6f8a3

commit 11ed0a95bfa76791dc6428eb2d47a986c0c6f8a3
Author:     Eric van Gyzen <vangyzen@FreeBSD.org>
AuthorDate: 2022-11-03 02:42:54 +0000
Commit:     Eric van Gyzen <vangyzen@FreeBSD.org>
CommitDate: 2022-11-11 20:43:47 +0000

    zfs tests: stop writing to arbitrary devices

    TL;DR:  Three ZFS tests created ZFS pools on all unmounted devices listed
    in /etc/fstab, corrupting their contents.  Stop that.

    Imagine my surprise when the ESP on my main dev/test VM would "randomly"
    become corrupted, making it unbootable.  Three tests collect various devices
    from the system and try to add them to a test pool.  The test expects this
    to fail because it _assumes_ these devices are in use and ZFS will correctly
    reject the request.

    My /etc/fstab has two entries for devices in /dev:

        /dev/gpt/swap0  none        swap    sw,trimonce,late
        /dev/gpt/esp0   /boot/efi   msdosfs rw,noauto

    Note the `noauto` on the ESP.  In a remarkable example of irony, I chose
    this because it should keep the ESP more protected from corruption;
    in fact, mounting it would have protected it from this case.

    The tests added all of these devices to a test pool in a _single command_,
    expecting the command to fail.  The swap device was in use, so the command
    correctly failed, but the ESP was added and therefore corrupted.  However,
    since the command correctly failed, the test didn't notice the ESP problem.
    If each device had been added with its own command, the test _might_ have
    noticed that one of them incorrectly succeeded.  However, two of these
    tests would not have noticed:

    hotspare_create_001_neg was incorrectly specified as needing the Solaris
    dumpadm command, so it was skipped.  _Some_ of the test needs that command,
    but it checks for its presence and runs fine without it.

    Due to bug 241070, zpool_add_005_pos was marked as an expected failure.
    Due to the coarse level of integration with ATF, this test would still
    "pass" even if it failed for the wrong reason.  I wrote bug 267554 to
    reconsider the use of atf_expect_fail in these tests.

    Let's further consider the use of various devices found around the system.
    In addition to devices in /etc/fstab, the tests also used mounted devices
    listed by the `mount` command.  If ZFS behaves correctly, it will refuse
    to added mounted devices and swap devices to a pool.  However, these are
    unit tests used by developers to ensure that ZFS still works after they
    modify it, so it's reasonable to expect ZFS to do the _wrong_ thing
    sometimes.  Using random host devices is unsafe.

    Fix the root problem by using only the disks provided via the "disks"
    variable in kyua.conf.  Use one to create a UFS file system and mount it.
    Use another as a swap device.  Use a third as a dump device, but expect
    it to fail due to bug 241070.

    While I'm here:

    Due to commit 6b6e2954dd65, we can simply add a second dump device and
    remove it in cleanup.  We no longer need to save, replace, and restore the
    pre-existing dump device.

    The cleanup_devices function used `camcontrol inquiry` to distinguish disks
    from other devices, such as partitions.  That works fine for SCSI, but not
    for ATA or VirtIO block.  Use `geom disk list` instead.

    PR:             241070
    PR:             267554
    Reviewed by:    asomers
    Sponsored by:   Dell Inc.
    Differential Revision:  https://reviews.freebsd.org/D37257

 tests/sys/cddl/zfs/include/libtest.kshlib          |  2 +-
 .../zfs/tests/cli_root/zpool_add/zpool_add.kshlib  | 68 ----------------------
 .../tests/cli_root/zpool_add/zpool_add_005_pos.ksh | 29 ++++++---
 .../zfs/tests/cli_root/zpool_add/zpool_add_test.sh |  3 +-
 .../zfs/tests/hotspare/hotspare_add_003_neg.ksh    | 60 +++++++++----------
 .../zfs/tests/hotspare/hotspare_create_001_neg.ksh | 60 +++++++++----------
 tests/sys/cddl/zfs/tests/hotspare/hotspare_test.sh |  4 +-
 7 files changed, 83 insertions(+), 143 deletions(-)
Comment 3 Graham Perrin freebsd_committer freebsd_triage 2022-12-28 17:25:37 UTC
Thank you, anything more to do here? 

I see <https://reviews.freebsd.org/D37257#848954> …
Comment 4 Eric van Gyzen freebsd_committer freebsd_triage 2022-12-29 01:21:24 UTC
(In reply to Graham Perrin from comment #3)

> Thank you, anything more to do here? 
> 
> I see <https://reviews.freebsd.org/D37257#848954>

That change fixes one instance of this problem, but there could be many more.  This bug is asking for (1) a review of the ZFS tests that use atf_expect_fail and (2) fixes for those findings, similar to the above change.