Bug 240049

Summary: Flakey test case: lib.libc.gen.getmntinfo_test.getmntinfo_test
Product: Base System Reporter: Li-Wen Hsu <lwhsu>
Component: testsAssignee: Mateusz Guzik <mjg>
Status: Closed FIXED    
Severity: Affects Only Me CC: asomers, cem, mckusick, mjg, ngie
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Li-Wen Hsu freebsd_committer freebsd_triage 2019-08-23 01:13:26 UTC
This case is unstable since 8/20, from this build:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/12261/testReport/junit/lib.libc.gen/getmntinfo_test/getmntinfo_test/

Error Message
/usr/src/lib/libc/tests/gen/getmntinfo_test.c:49: 0
Comment 1 commit-hook freebsd_committer freebsd_triage 2019-08-23 05:25:35 UTC
A commit references this bug:

Author: lwhsu
Date: Fri Aug 23 05:25:22 UTC 2019
New revision: 351416
URL: https://svnweb.freebsd.org/changeset/base/351416

Log:
  lib.libc.gen.getmntinfo_test.getmntinfo_test is unstable since 8/20, skip it
  in CI env temporarily for more offline diagnosis

  PR:		240049
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/lib/libc/tests/gen/getmntinfo_test.c
Comment 2 Conrad Meyer freebsd_committer freebsd_triage 2020-04-28 02:20:31 UTC
Any idea what changed in the CI environment around 8/20?  Seems like some filesystem is or was breaking ABI by setting f_version to zero (probably by bzero'ing the entire statfs structure).  __vfs_statfs() sets f_version before even invoking specific vfs_statfs operations and I don't see an obvious culprit in CURRENT.
Comment 3 Conrad Meyer freebsd_committer freebsd_triage 2020-04-28 05:06:15 UTC
Probably fallout from r351193 / r351215, which went in shortly before the first failure.
Comment 4 Enji Cooper freebsd_committer freebsd_triage 2020-04-29 00:27:15 UTC
(In reply to Conrad Meyer from comment #3)

I agree. Some of the new code doesn't appear equivalent.
Comment 5 Alan Somers freebsd_committer freebsd_triage 2020-05-03 17:28:50 UTC
mjg, please investigate whether r351193 and r351215 could be responsible for this intermittent test failure.
Comment 6 Mateusz Guzik freebsd_committer freebsd_triage 2020-05-03 19:46:58 UTC
Are you sure this is flakey past r351215?
Comment 7 Alan Somers freebsd_committer freebsd_triage 2020-05-03 20:30:43 UTC
Yes.  In fact, it's _only_ flaky after r351215.  The CI jobs that failed were built at revisions 351237 through 351412.  Here's a link to the last failure before lwhsu skipped the test:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/12307/testReport/junit/lib.libc.gen/getmntinfo_test/getmntinfo_test/

For reference, the test was skipped beginning at job 12309.  The jobs that failed were 12261, 12302, 12304, and 12307.
Comment 8 Mateusz Guzik freebsd_committer freebsd_triage 2020-05-03 23:11:39 UTC
Is this testcase running in parallel with mount/unmount requests for something else? If not, what's "mount" output while the test is running?

It is supposed to be an invariant that the fields tested here are set correctly at mount time, but I see a potential for a race. Prior to my changes the kernel kept overwriting these fields before copyout with expected values.

VFS_STATFS(mp, &mp->mnt_stat) called on mount can easily inject transiently "bad" values due to memcpy. Note the rest of the buffer has arbitrary content. If I'm right, the code was already broken prior to my changes but not in a way which could have been detected by this testcase.
Comment 9 Mateusz Guzik freebsd_committer freebsd_triage 2020-05-03 23:13:14 UTC
I'm unable to reproduce the failure myself.
Comment 10 Alan Somers freebsd_committer freebsd_triage 2020-05-03 23:26:12 UTC
In the current CI environment, every test runs exclusively.  Nothing runs in parallel.  But perhaps possible that something that ran just before this test caused the kernel to unmount something asynchronously?  You can look at https://ci.freebsd.org/job/FreeBSD-head-amd64-test/12307/consoleFull to see which tests ran before this one.
Comment 11 Mateusz Guzik freebsd_committer freebsd_triage 2020-07-10 05:12:56 UTC
I don't see the problem yet but something clearly is up.
Comment 12 Mateusz Guzik freebsd_committer freebsd_triage 2020-07-10 06:50:43 UTC
I committed the following (which should have been done anyway): https://svnweb.freebsd.org/base?view=revision&revision=363068

It has a side effect of no longer overwriting the version field at runtime, which as noted earlier I suspect was racing with other stuff. Now this matches the old behavior and I strongly suspect the flakey test will be cleared now.

Note the entire mechanism remains buggy as other fields can get updated as they are being read, but it predates my involvement in the area.

That said, could you please re-enable the test? Should it still cause trouble I'll work on a full fix to the entire thing.
Comment 13 Li-Wen Hsu freebsd_committer freebsd_triage 2020-07-13 18:15:50 UTC
(In reply to Mateusz Guzik from comment #12)
I've run this test case in a loop for few hours and it is fine so far.  I'll go ahead to enable it again to get it tested more in the CI.

BTW, if this entire thing looks buggy, I would suggest let's just fix it. :-)
Comment 14 commit-hook freebsd_committer freebsd_triage 2020-07-13 18:19:24 UTC
A commit references this bug:

Author: lwhsu
Date: Mon Jul 13 18:19:03 UTC 2020
New revision: 363165
URL: https://svnweb.freebsd.org/changeset/base/363165

Log:
  Revert r351416 to let lib.libc.gen.getmntinfo_test.getmntinfo_test get more test

  This is supposed to be fixed by r363068

  PR:		240049
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/lib/libc/tests/gen/getmntinfo_test.c
Comment 15 Kirk McKusick freebsd_committer freebsd_triage 2021-05-19 22:54:14 UTC
This bug has been successfully resolved.