Bug 212323

Summary: tests/sys/acl/01:main fails due to changes in NFSv4 ACL behavior on ^/head
Product: Base System Reporter: Enji Cooper <ngie>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: New ---    
Severity: Affects Some People CC: bugzilla.freebsd, linimon, lwhsu, mav, ngie, trasz
Priority: --- Keywords: regression
Version: 11.0-STABLE   
Hardware: Any   
OS: Any   

Description Enji Cooper freebsd_committer freebsd_triage 2016-09-02 07:37:07 UTC
Some of the output from getfacl after setting NFSv4 ACLs on test files with tests/sys/acl/01:main demonstrate a change in behavior with NFSv4 ACLs:

These errors don't occur on ^/stable/10 and older versions of ^/head (it started failing some time between 05/04/2016 and 08/13/2016):

Repro:

sudo kldload zfs
sudo kyua debug -k /usr/tests/sys/acl/Kyuafile 01:main

e.g.

[376] $ getfacl -qn xxx -- failed
           user:41:-w-----A------:------I:allow !=            user:41:--------------:------I:allow
           user:42:--------------:------I:allow ==            user:42:--------------:------I:allow
           user:42:r-x-----------:------I:allow !=            user:42:r-------------:------I:allow
          group:43:-w---------C--:------I:deny ==           group:43:-w---------C--:------I:deny
            owner@:rw-p--aARWcCos:-------:allow ==             owner@:rw-p--aARWcCos:-------:allow
            group@:r-----a-R-c--s:-------:allow ==             group@:r-----a-R-c--s:-------:allow
         everyone@:r-----a-R-c--s:-------:allow ==          everyone@:r-----a-R-c--s:-------:allow
[385] $ rm xxx -- ok
[386] $ umask 077 -- ok
[387] $ touch xxx -- ok
[388] $ getfacl -qn xxx -- failed
           user:41:-w-----A------:------I:allow !=            user:41:--------------:------I:allow
           user:42:--------------:------I:allow ==            user:42:--------------:------I:allow
           user:42:r-x-----------:------I:allow !=            user:42:--------------:------I:allow
          group:43:-w---------C--:------I:deny ==           group:43:-w---------C--:------I:deny
            owner@:rw-p--aARWcCos:-------:allow ==             owner@:rw-p--aARWcCos:-------:allow
            group@:------a-R-c--s:-------:allow ==             group@:------a-R-c--s:-------:allow
         everyone@:------a-R-c--s:-------:allow ==          everyone@:------a-R-c--s:-------:allow
[397] $ rm xxx -- ok
[398] $ umask 770 -- ok
[399] $ touch xxx -- ok
[400] $ getfacl -qn xxx -- failed
            owner@:rw-p----------:-------:deny ==             owner@:rw-p----------:-------:deny
            group@:rw-p----------:-------:deny ==             group@:rw-p----------:-------:deny
           user:41:-w-----A------:------I:allow !=            user:41:--------------:------I:allow
           user:42:--------------:------I:allow ==            user:42:--------------:------I:allow
           user:42:r-x-----------:------I:allow !=            user:42:--------------:------I:allow
          group:43:-w---------C--:------I:deny ==           group:43:-w---------C--:------I:deny
            owner@:------aARWcCos:-------:allow ==             owner@:------aARWcCos:-------:allow
            group@:------a-R-c--s:-------:allow ==             group@:------a-R-c--s:-------:allow
         everyone@:rw-p--a-R-c--s:-------:allow ==          everyone@:rw-p--a-R-c--s:-------:allow
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2016-09-06 18:08:50 UTC
It was caused by one of (relatively) recent commits to ZFS.  I've commented on this on src-commits@.
Comment 2 commit-hook freebsd_committer freebsd_triage 2016-12-03 02:24:43 UTC
A commit references this bug:

Author: ngie
Date: Sat Dec  3 02:24:15 UTC 2016
New revision: 309464
URL: https://svnweb.freebsd.org/changeset/base/309464

Log:
  Expect 01:main to fail

  Changes were made to ZFS in the past year with respect to how ACLs
  are handled, causing failures in this test. Mark it TODO so (hopefully)
  someone more knowledgeable (like mav or trasz) will fix the code or the
  test.

  PR:	212323

Changes:
  head/tests/sys/acl/01.sh
Comment 3 commit-hook freebsd_committer freebsd_triage 2017-01-14 10:23:35 UTC
A commit references this bug:

Author: ngie
Date: Sat Jan 14 10:23:06 UTC 2017
New revision: 312163
URL: https://svnweb.freebsd.org/changeset/base/312163

Log:
  MFC r309464:

  Expect 01:main to fail

  Changes were made to ZFS in the past year with respect to how ACLs
  are handled, causing failures in this test. Mark it TODO so (hopefully)
  someone more knowledgeable (like mav or trasz) will fix the code or the
  test.

  PR:	212323

Changes:
_U  stable/11/
  stable/11/tests/sys/acl/01.sh
Comment 4 Harald Schmalzbauer 2017-01-14 11:32:01 UTC
(In reply to Edward Tomasz Napierala from comment #1)

To make things easier for those who want to follow:
https://lists.freebsd.org/pipermail/svn-src-head/2016-June/088460.html

No time to look into the change right now, but very interesting, since I'm extensively using NFSv4 ACLs in real-world and ever since had big problems working arround many NTFS ACL <-> NFSv4 ACL differences. Will try to find records and see if I can help by showing advantages/disadvantages.

But the commit in question was reverted with r302297 (https://svnweb.freebsd.org/base?view=revision&revision=302297)
So 11-RLEASE was without this commit.

Thanks,

-harry
Comment 5 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-01-14 15:13:40 UTC
Harald, that would be very welcome - I do know a bit or two about NFSv4 ACLs, but have no practical experience with NTFS, so your feedback would be very useful.

As for the revert - _some_ changes were reverted.  Others were not, and that's why the regression tests don't pass.  My personal preference (although I'd like to know your opinion first, because of NTFS) would be to just revert it all, back to how it was throughout 10.X, with the addition of the "I" flag (which is a good thing), verified by passing the regression tests.
Comment 6 Mark Linimon freebsd_committer freebsd_triage 2018-08-29 20:08:15 UTC
Is this PR still relevant?
Comment 7 Harald Schmalzbauer 2018-08-31 12:29:29 UTC
My ZFS setups suffer from an upstream ACL inheritance bug:
https://www.illumos.org/issues/9722 wich is referring https://www.illumos.org/issues/8984
The latter is marked closed/solved but it is only partially solved, like the newer issue describes.

I had to give up using samba for environments where NTFS ACLs are crucial.
I also gave up deploying new NFSv4 ACL setups.  All previous setups were damaged by the mentioned ZFS inheritance bugs.
I remember many questions to be clarified regarding d and D flags (and in respect to NTFS conventions also R and W and C), but lost almost all experience results.

I can't tell anything about the test failure status!
I just can tell that I'm observing issues regading ACL/permission mappings. Time for me to look into tests...


I'm really interested in polishing our ACL implementation, but I'm lacking a good portion of standards knowledge (posix) and my lousy C skills require someone else to fix the ZFS inheritance bug first.

Symlinks are one major issue to discuss regarding inheritance on ZFS – that was/is another big problem in my setups.

I still have my private ACL setups (partially damaged), which is a collection of special usage scenarios (e.g. users can add files and directories, but can't delete anything, even not their own; or shared home directory, which suffers from chmod(2) fallout due to the ACL).

So it's my pleasure to test any changes and put some light in uncommon but usefuls corner cases ;-)
And I'm willing to do my best to catch up with posix knowledge, but unfortunately I can't afford spending much time :-(
My personal estimation was that some very skilled persons would need at least a week to decide/implement and test a new/improved ACL mapping mode.
My needs aren't covered by the current ACL/permission mappings – not for NFSv4 only environments and additional CIFS/NTFS bridges were mutual destructive.

Thanks,

-harry
Comment 8 Edward Tomasz Napierala freebsd_committer freebsd_triage 2018-08-31 15:33:48 UTC
Lack of POSIX knowledge shouldn't be a problem, as most of the NFSv4 ACL semantics is not covered by any kind of standard.  That's actually a large part of the problem: it's hard to fix something if we don't know what the 'fixed' state should be.

Would it be possible for you to set up a system with a FreeBSD version from before the ZFS ACL changes (before r299448, if I'm reading this right), and see if the Samba problems are fixed?  If they are, perhaps we could just revert the ZFS ACL changes.
Comment 9 Harald Schmalzbauer 2018-08-31 17:13:33 UTC
(In reply to Edward Tomasz Napierala from comment #8)

Unfortunately samba is an extra bunch of issues.
The problems which made me stop using samba for distinct jobs in windows environments were not only/directly related to ACLs.
I know that some of them have been fixed meanwhile, but I know of others which are new and unresolved (at least on FreeBSD), so even if you end up with a satisfying NFSv4 ACL overhaul/FreeBSD-standardization, this was only a small part making samba usable in production environments – which is not the goal of this problem report (still have to analyze a trace about WindowsServerBackup failing, compared to windows native implementation of SMB2_02, SMB2_10 and another version, but couldn't find time and have to start over again at next attempt – which probably will never happen because a few hours weren't enough for me to isolate the crucial area last time).

Since it's about a design job rather than fixing, it's best to look at existing conventions for other type of ACLs on other operating systems – Linux and XFS comes to my mind and of course NTFS on Windows.  JFS on AIX was more POSIXACL like, missing the majority of the NFSv4/NTFS flags.
Stearing the unclear semantics and convention into a NTFS friendly direction would be desireable.
But I never read any posixacl standard also :-(
That said, I also missed reading any (official) NTFSv5 documents and I don't even know how the inheritance is done in windows. The NTFS versions I started with (NT4) didn't provide inheritance and since then I did observations only.

I'd need to setup a complete artificial test environment with at least 3 windows versions, since I currently have absolutely no idea which combination does anything else wrong but ACL related stuff...
Let me think about it over the weekend.  I planed to abuse one cold-standby system for extended iflib, if_vlan(4) and ctld(8) tests before 12-release, since there are untracked issues as well (btw. the ctld(8) on 11.2 and Server2016-initiators turns out to be jumbo frames related).
Let me start materializing this test environment and if things run smooth, I'll extend the FreeBSD machine to serve a r299448 bhyve VM.  For NFSv4 ACLs we're not bound to real hardware as far as I can imagine at the moment (while iflib needs...)
The ESXi Hypervisor is yet to setup too, and undusting the equipment I have in mind will most likely discover new problems, and this will bring me into big time troubles. 
Sorry for my hesitation, but I don't want to promise anything I can't do due to real job interference...  But I'll try hard and come back if I made first progress.

Thanks,

-harry


P.S.: samba gained a vfs_freebsd module, which is an attempt to fix various problems which never affected me as far as I understood.  Haven't found any documentation about this module, so I need to look into that code to make sure there's no adaptor to sysutils/libsunacl disturbing test cases. For this reading my C skills should suffice.
Comment 10 Harald Schmalzbauer 2018-09-07 08:59:48 UTC
(In reply to Harald Schmalzbauer from comment #9)

Just a short note:
Made some progress with the test setups, but hardware was in bad condition; replacement parts expected to arrive on monday, so I'll hopefully be able to run tests with 11-current (pre- r299448 was mentioned, will check) the weekend after this, but then I should be able to cover various kinds of interoperability  tests and simply run things :-) [offtopic: and re-use the setup to track down the Server2016 WindowsServerBackup failure with Samba 4.x ].

-harry