I have a server FreeBSD 10.1-RELEASE with some zfs pools and datasets on it. There are acl on datasets on source server: # ls -l /pool1/samba/IT -rw-rwx---+ 1 tabolin domain admins 10244 4 мар 14:39 .DS_Store -rw-rwx---+ 1 tabolin domain admins 4096 3 мар 18:51 ._.DS_Store drwxrwx---+ 3 tabolin domain admins 3 14 май 2012 Books drwxrwx---+ 3 tabolin domain admins 3 5 мар 2014 Common drwxrwx---+ 5 root domain admins 5 11 фев 2011 GVP -rw-rwx---+ 1 tabolin domain admins 609280 13 дек 2012 IT-Servers.vsd drwxrwx---+ 9 tabolin domain admins 16 3 окт 08:50 Other drwxr-x---+ 135 root wheel 137 18 фев 17:00 Print-History drwxrwx---+ 9 gavrilov domain admins 9 1 окт 2011 SF drwxrwx---+ 8 tabolin domain admins 240 4 мар 13:14 Showroom-video drwxrwx---+ 10 gavrilov domain admins 13 6 ноя 15:57 Software d---rwx---+ 101 root domain admins 128 10 фев 11:19 TS drwxrwx---+ 21 tabolin domain admins 31 16 фев 15:49 Ustorage # getfacl /pool1/samba/IT/Showroom-video # file: /pool1/samba/IT/Showroom-video # owner: tabolin # group: domain admins user:gurashov:rwxpDdaARWc--s:fd----:allow user:account:rwxpDdaARWc--s:fd----:allow user:raev:rwxpDdaARWc--s:fd----:allow user:becker:rwxpDdaARWc--s:fd----:allow user:zaretskaya:rwxpDdaARWc--s:fd----:allow user:dmitrieva:rwxpDdaARWc--s:fd----:allow user:ddv:rwxpDdaARWc--s:fd----:allow everyone@:------a-R-c--s:------:allow owner@:rwxpD-aARWcCos:------:allow user:ovcharenko:rwxpDdaARWc--s:fd----:allow user:stepkin:rwxpDdaARWc--s:fd----:allow user:khitrov:rwxpDdaARWc--s:fd----:allow user:ivan:rwxpDdaARWc--s:fd----:allow user:egorov-s:rwxpDdaARWc--s:fd----:allow user:julia:rwxpDdaARWc--s:fd----:allow user:polyakova:rwxpDdaARWc--s:fd----:allow user:koval:rwxpDdaARWc--s:fd----:allow user:victor:rwxpDdaARWc--s:fd----:allow user:korg:rwxpDdaARWc--s:fd----:allow user:zharov:rwxpDdaARWc--s:fd----:allow user:semenov-y:rwxpDdaARWc--s:fd----:allow user:kiselev-v:rwxpDdaARWc--s:fd----:allow group@:rwxpDdaARWcCos:fd----:allow group:domain users:r-x---a-R-c--s:fd----:allow I used zfs send-receive for backup datasets to another server # zfs send -vR pool1/samba/IT@-2015-02-24 | ssh tabolin@stor sudo zfs recv -v pool1/samba/IT Send-receive ends without any error, but on destination server on same files and folders acl is lost (see Showroom-video folder) # ls -l /pool1/samba/IT total 3264 -rw-rwx---+ 1 gavrilov 10007 10244 4 мар 14:39 .DS_Store -rw-rwx---+ 1 gavrilov 10007 4096 3 мар 18:51 ._.DS_Store drwxrwx---+ 3 gavrilov 10007 3 14 май 2012 Books drwxrwx---+ 3 gavrilov 10007 3 5 мар 2014 Common drwxrwx---+ 5 root 10007 5 11 фев 2011 GVP -rw-rwx---+ 1 gavrilov 10007 609280 13 дек 2012 IT-Servers.vsd drwxrwx---+ 9 gavrilov 10007 16 3 окт 08:50 Other drwxr-x---+ 135 root wheel 137 18 фев 17:00 Print-History drwxrwx---+ 9 10000 10007 9 1 окт 2011 SF ls: /pool1/samba/IT/Showroom-video: No such file or directory drwxrwx--- 8 gavrilov 10007 240 4 мар 13:14 Showroom-video drwxrwx---+ 10 10000 10007 13 6 ноя 15:57 Software d---rwx---+ 101 root 10007 128 10 фев 11:19 TS drwxrwx---+ 21 gavrilov 10007 31 16 фев 15:49 Ustorage # ls -l /pool1/samba/IT/Showroom-video total 1515210995 ls: /pool1/samba/IT/Showroom-video/.DS_Store: No such file or directory -rwxrwx--- 1 gavrilov 10007 24580 18 апр 2014 .DS_Store -rwxrwx---+ 1 gavrilov 10007 4096 15 янв 2014 ._.DS_Store -rwxrwx---+ 1 gavrilov 10007 4096 3 дек 2012 ._AV00_01_30-02_54_04.avi -rwxrwx---+ 1 gavrilov 10007 4096 3 дек 2012 ._AV00_02_24-02_01_10.avi ----rwx---+ 1 gavrilov 10007 4096 18 апр 2014 ._йНПФ б. нАСВЕМХЕ дсо ОН Jira Х Confluence.avi -rwxrwx---+ 1 gavrilov 10007 1421801430 17 сен 2013 13_09_17-11_05_59_аЕКНГЕПВХЙ_HD.mp4 -rwxrwx---+ 1 gavrilov 10007 235532535 11 фев 12:09 2015-02-10_15.32_дЕЛНМЯРПЮЖХЪ_ЮМЮКХРХЙХ_SAS._аЕКНГЕПВХЙ.wmv -rwxrwx---+ 1 gavrilov 10007 594386969 22 янв 14:51 22-01-2015_13-26-38_гСАНБ.mp4 ----rwx---+ 1 10001 10007 5190520832 28 ноя 2011 AV00_01_01-00_24_30.avi ----rwx---+ 1 gavrilov 10007 13026264576 28 фев 2012 AV00_01_01-03_26_14.avi # getfacl /pool1/samba/IT/Showroom-video # file: /pool1/samba/IT/Showroom-video # owner: gavrilov # group: 10007 getfacl: /pool1/samba/IT/Showroom-video: No such file or directory # chmod o+r /pool1/samba/IT/Showroom-video /pool1/samba/IT/Showroom-video: No such file or directory Same problem is with hundreds of folders and files on this pool and another one. After I found this bug, I checked another couple of FreeBSD zfs servers. And there I was found the same problem with lost acl after send-receive data. Second part of problem is a kernel panic when I create file or directory in any of thats "No such file or directory". # cd /pool1/samba/IT/Showroom-video # mkdir 1111 panic: solaris assert: 0 == zfs_acl_node_read(dzp, B_TRUE, &paclp, В_FALSЕ)‚ file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c, line: 1718 cpuid = 16 KDB: stack backtrace: #0 0xffffffff8096cb00 at kdb_backtrace+0x60 #1 0xffffffff80931c25 at panic+0x155 #2 0xffffffff81b7c1fd at assfail+0x1d #3 0xffffffff81ab28af at zfs_acl_ids_create+0x1ef #4 0xffffffff81ad292a at zfs_freebsd_mkdir+0x21a #5 0xffffffff80e17dd7 at VOP_MKDIR_APV+0xa7 #6 0хгггггггг809dde49 at kern_mkdirat+0x209 #7 0xffffffff80cfa581 at amd64_syscall+0x351 #8 0xffffffff80cdf79b at Xfast_syscall+0xfb Uptime: 11d1h0m35s Kernel panic appears always when I create file or directory in directory, which have lost acl. Thanks for any help!
I experienced the same problem in 2014 and tried today again with the same result. Are there any news regarding this? I cannot believe, there are only 3 persons in the world having this issue.
Link to other Bug-Tickets: https://bugs.freenas.org/issues/5225 https://forums.freenas.org/index.php?threads/zfs-replication-corrupts-entire-zfs-volume-warning.17566/#post-126243 So far, only posted in Freenas bug-trackers - but same issue here. Probably, this is a similar/same issue: http://markmail.org/message/4nl4dzkmuo7gidlu
So, from my reading, it panics, because for some reason it cannot gracefully handle error returned by zfs_acl_node_read(). Why? No idea. Note however, that this is code that's not specific to FreeBSD - it might be a good idea to consult Nexenta or ZoL folks.
Thanks for your Comment. I think the main problem is not the handling of the error, but that there is an error! I can (probably) understand, why the system should panic because of such a file-system issue - but why is there this issue? Summary: Why are the ACLs broken after replication? And another question: Why do you think we should contact ZoL? I always thought, ZoL and FreeBSD ZFS are two different systems?! Actually, I did not choose ZoL because it is not really defined production-ready - but FreeBSD ZFS is.
(In reply to o.bende from comment #4) No idea. I'm guessing it's some kind of corruption, but one that for some reason only affects ACLs on send/receive. My advice about talking to Nexenta/ZoL folks is that because the code in question is not specific to FreeBSD, it might also affect them. They may have already found and fixed the bug. Otherwise... This is pure speculation, but user/group identifiers used natively by ZFS are somewhat complicated (ZFS supports various identifier schemes, eg Windows SIDs), and there might be some FUID-related problem? Again - shot in the dark.
Hi there , any update on this bug ? we do have the same issue on FreeBSD 10.1 , no kernel panic but some ACL are loose. To get access to the file again we need to do a mv and then reconfigure ACL on the file.
Hi. I have found this same issue to still be present in 11.0-RELEASE. A replicated pool with several TB of data, several volumes, and some 50 snapshots was sent to a new pool on another system, all the files were verified on both pools in the most recent snapshot, md5 hashes generated with cfv matched. This comparison was run as root and access to the files caused no problem. Then the new pool was put into production, supplying a samba volume for windows backups with robocopy (inluding acls). This was meant to replace the original pool. The kernel always crashes shortly after the backup starts, with: panic: solaris assert: 0 == zfs_acl_node_read(dzp, &paclp, B_FALSE), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c, line: 1692 cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b24477 at kdb_backtrace+0x67 #1 0xffffffff80ad97e2 at vpanic+0x182 #2 0xffffffff80ad9653 at panic+0x43 #3 0xffffffff824b520a at assfail+0x1a #4 0xffffffff82263084 at zfs_acl_ids_create+0x1b4 #5 0xffffffff822689d0 at zfs_make_xattrdir+0x40 #6 0xffffffff82268c95 at zfs_get_xattrdir+0xc5 #7 0xffffffff8227e7e6 at zfs_lookup+0x106 #8 0xffffffff822871d1 at zfs_setextattr+0x181 #9 0xffffffff8110f03f at VOP_SETEXTATTR_APV+0x8f #10 0xffffffff80b9c404 at extattr_set_vp+0x134 #11 0xffffffff80b9c544 at sys_extattr_set_file+0xf4 #12 0xffffffff80fa26ae at amd64_syscall+0x4ce #13 0xffffffff80f8488b at Xfast_syscall+0xfb I have not yet pinned exactly which files are hit when the crash happens, but the backtrace is always the same. I'm guessing this bug is not found more often because most people do not put the replicas into production, and the data seems to be copied correctly anyway. It's the metadata, extended attributes that get corrupted. So this will mostly hit people who expose and use volumes in the received pool through samba.
A commit references this bug: Author: mav Date: Sat Nov 3 03:10:06 UTC 2018 New revision: 340096 URL: https://svnweb.freebsd.org/changeset/base/340096 Log: 9952 Block size change during zfs receive drops spill block Replication code in receive_object() falsely assumes that if received object block size is different from local, then it must be a new object and calls dmu_object_reclaim() to wipe it out. In most cases it is not a problem, since all dnode, bonus buffer and data block(s) are immediately rewritten any way, but the problem is that spill block (if used) is not. This means loss of ACLs, extended attributes, etc. This issue can be triggered in very simple way: 1. create 4KB file with 10+ ACL entries; 2. take snapshot and send it to different dataset; 3. append another 4KB to the file; 4. take another snapshot and send incrementally; 5. witness ACL loss on receive side. PR: 198457 Discussed with: mahrens MFC after: 2 weeks Sponsored by: iXsystems, Inc. Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c
A commit references this bug: Author: mav Date: Wed Nov 21 18:18:57 UTC 2018 New revision: 340737 URL: https://svnweb.freebsd.org/changeset/base/340737 Log: Revert r340096: 9952 Block size change during zfs receive drops spill block It was reported, and I easily reproduced it, that this change triggers panic when receiving replication stream with enabled embedded blocks, when short file compressing into one embedded block changes its block size. I am not sure that the problem is in this particuler patch, not just triggered by it, but since investigation and fix will take some time, I've decided to revert this for now. PR: 198457, 233277 Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c
By the way, here is how the problem was fixed in ZoL: https://github.com/zfsonlinux/zfs/commit/caf9dd209fdcfccabc2f32b3f23c5386ccfb896c
Created attachment 210363 [details] spill block send patch I looked at the ZoL patch and based on it and the discussion in that pull request I think the attached patch should be the minimal change necessary to fix this issue.
I have recently run into this issue and was wondering when a fix will be deployed. I am running 11.2-RELEASE-p4 on the source server and 12.1-RELEASE-p5 on the destination server. A rsync from the source to destination server crashes the server with the panic mentioned in this ticket. I am able to resolve the issue be clearing and then applying the FACL on the source server using setfacl, deleting the directories that return "No such file or directory" on the destination side and then running rsync. I do see the "No such file or directory" on ZFS Replica file systems, but a crash does not occur.