Created attachment 234168 [details] Demo program that does the read(2) With the recent 14.0-CURRENT (n255230) and fusefs-ntfs-2017.3.23 (UBLIO=off), I'm frequently getting errors when copying not-very-small files with scp(1) to remote machines, but not with rsync(1); cat(1)ing file e.g. to /dev/null also fails. Running those programs under truss(1) reveals interesting bug: reading the files on Windows 10 formatted NTFS with block of size 32768 and smaller (stat(2) returns optimal size of 4096 bytes, which both scp(1) and cat(1) use) fails after several blocks; rsync(1) uses 262144 and does not fail. Even more interestingly, it does not fail on first couple of blocks and on previously read ones, i.e. once the file had been read, it can be read again with smaller block size. I've written a program to demonstrate this (attached): $ ls -l xorgproto-2021.5.tbz -rw-r--r-- 1 danfe wheel 245998 21 янв. 14:11 xorgproto-2021.5.tbz $ /tmp/a.out xorgproto-2021.5.tbz 32768 using i/o size 32768 (optimal 4096) read: Invalid argument 1 reads were successful $ /tmp/a.out xorgproto-2021.5.tbz 32768 using i/o size 32768 (optimal 4096) read: Invalid argument 3 reads were successful $ /tmp/a.out xorgproto-2021.5.tbz 32768 using i/o size 32768 (optimal 4096) $ /tmp/a.out xorgproto-2021.5.tbz -1 using i/o size 4096 (optimal 4096) I'm not sure if the bug is in the port or fusefs(5) implementation, so I'm CC'ing Alan in case he might have a clue.
I can't reproduce the problem with a different fuse file system, so it's probably a bug in fuse-ntfs. I can try to investigate, but you'll have to tell me how to create the file system first. Have you tried running the daemon in the foreground, with debug output?
(In reply to Alan Somers from comment #1) > I can try to investigate, but you'll have to tell me how to create the file > system first. $ pkg which `which mkntfs` /usr/local/sbin/mkntfs was installed by package fusefs-ntfs-2017.3.23 $ truncate -s 1G ntfs # mdconfig -f ~alice/ntfs md1 # mkntfs -f /dev/md1 ... Have a nice day. # ntfs-3g /dev/md1 /mnt/bar $ cd /mnt/bar $ truncate -s 10M foo $ /tmp/a.out foo -1 using i/o size 4096 (optimal 4096) read: Invalid argument 1 reads were successful ... $ /tmp/a.out foo -1 using i/o size 4096 (optimal 4096) read: Invalid argument 177 reads were successful ... > Have you tried running the daemon in the foreground, with debug output? Not yet, will try.
I have a WIP updating fuse-ntfs to 2021.8.22 in review D34079 Are you able to test that version?
(In reply to Kubilay Kocak from comment #3) > Are you able to test that version? Well, now that I've provided instructions in comment #2 how to reproduce it locally, you should be able to easily test it yourself, no? :-)
(In reply to Alexey Dokuchaev from comment #4) I'm just trying to assist by providing avenues/possible options to isolate that may save people some time.
(In reply to Alexey Dokuchaev from comment #2) I can't reproduce the bug using these instructions: > ntfs-3g --version ntfs-3g 2017.3.23 external FUSE 29 > uname -a FreeBSD myhostname 14.0-CURRENT FreeBSD 14.0-CURRENT #393 loadavg-1k-n255214-639d0eae55bb-dirty: Fri May 6 17:17:27 MDT 2022 somers@myhostname:/usr/obj/usr/home/somers/src/freebsd.org/src/amd64.amd64/sys/GENERIC amd64
(In reply to Alan Somers from comment #6) > I can't reproduce the bug using these instructions Oh well, thank you for trying in any case! I'll play with the port's options, check out Kubilay's WIP update, and other debugging to obtain and share more data points later.
(In reply to Alan Somers from comment #1) > Have you tried running the daemon in the foreground, with debug output? So I did. Apparently the bug exhibits itself when the controversial (see bug #194526 and bug #206978) UBLIO option is OFF (mentioned in the comment #1). Here's the relevant part of the diff between correct and buggy debug output: [...] unique: 14, opcode: BMAP (37), nodeid: 3, insize: 56, pid: <snip> - unique: 14, error: -78 (Function not implemented), outsize: 16 -unique: 15, opcode: READ (15), nodeid: 3, insize: 80, pid: <snip> -read[0] 131072 bytes from 65536 flags: 0x0 - read[0] 131072 bytes from 65536 - unique: 15, success, outsize: 131088 -unique: 16, opcode: GETATTR (3), nodeid: 3, insize: 56, pid: <snip> +bmap /Temp/qemu6-6.2.0.pkg blocksize: 65536 index: 1 + unique: 14, error: -22 (Invalid argument), outsize: 16 +unique: 15, opcode: GETATTR (3), nodeid: 3, insize: 56, pid: <snip> getattr /Temp/qemu6-6.2.0.pkg - unique: 16, success, outsize: 120 -unique: 17, opcode: GETATTR (3), nodeid: 3, insize: 56, pid: <snip> + unique: 15, success, outsize: 120 +unique: 16, opcode: BMAP (37), nodeid: 3, insize: 56, pid: <snip> +bmap /Temp/qemu6-6.2.0.pkg blocksize: 65536 index: 1 + unique: 16, error: -22 (Invalid argument), outsize: 16 +unique: 17, opcode: FLUSH (25), nodeid: 3, insize: 64, pid: <snip> + unique: 17, error: -78 (Function not implemented), outsize: 16 +unique: 18, opcode: SETATTR (4), nodeid: 3, insize: 128, pid: <snip> getattr /Temp/qemu6-6.2.0.pkg - unique: 17, success, outsize: 120 -unique: 18, opcode: GETATTR (3), nodeid: 3, insize: 56, pid: <snip> -getattr /Temp/qemu6-6.2.0.pkg unique: 18, success, outsize: 120 (In reply to Alan Somers from comment #6) > I can't reproduce the bug using these instructions Can you rebuild the port with UBLIO=off and retry? In bug #194526, comment #8 you say that fusefs(5) now has sane caching builtin and suggest that we eliminate libublio, so I guessed you must have it disabled for a long time. :-)
Yes, when I rebuild it without ublio your test program fails immediately.
(In reply to Kubilay Kocak from comment #3) > I have a WIP updating fuse-ntfs to 2021.8.22 in review D34079 It behaves the same way WRT this bug. Apart from that, proposed update path is quite dubious and doing it literally from scratch is uncalled for. Please hold it off for the time being, thank you.
(In reply to Alexey Dokuchaev from comment #8) > ... > +bmap /Temp/qemu6-6.2.0.pkg blocksize: 65536 index: 1 > + unique: 14, error: -22 (Invalid argument), outsize: 16 Looks like this -EINVAL comes from `static int ntfs_fuse_bmap()' in the src/ntfs-3g.c around line 2773. Adding debug fprintf() like this: > if (blocksize > ctx->vol->cluster_size) { > fprintf(stderr, " >>> blocksize = %zd, cluster_size = %d\n", > blocksize, ctx->vol->cluster_size); > return -EINVAL; >} yields ">>> blocksize = 65536, cluster_size = 4096".
Bumped into this with simple dd if=/media/NTFS Volume/4Gb.file of=/dev/zero bs=1 Interesting observation: 1. Insert the stick, mount the FS. 2. Run the command for the first time: 1+0 records in 1+0 records out 1 bytes transferred in 0.002173 secs (460 bytes/sec) 3. Run the command again: 65537+0 records in 65537+0 records out 65537 bytes transferred in 0.435892 secs (150352 bytes/sec) 4. And again: 131073+0 records in 131073+0 records out 131073 bytes transferred in 0.861837 secs (152086 bytes/sec) 5. Again: 196609+0 records in 196609+0 records out 196609 bytes transferred in 1.306456 secs (150490 bytes/sec) 6. And the last time: 262145+0 records in 262145+0 records out 262145 bytes transferred in 2.224164 secs (117862 bytes/sec) 7. Unmount, eject, insert, mount. 8. Retry the test - everything is same up to numbers. Hope this will give a hint on what's going wrong.
^Triage: chase MOVED.
I researched this a bit more. I also bumped into bmap returning EINVAL, like danfe@ reported in comment #11 , but it seems that this is a red herring. Successful reads also have this bmap failing, but they work nevertheless. I managed to reduce the testcase even more: * Mount a NTFS volume with ntfs-3g -o debug /dev/da0 /mnt * Read the same single byte from the file thrice: dd if=/media/NTFS Volume/4Gb.file of=/dev/zero bs=1 count=1 The first and third call of the command succeed, the second one fails. ===> ntfs-3g log for the 1 read <=== unique: 9, opcode: LOOKUP (1), nodeid: 1, insize: 45, pid: 653 LOOKUP /file getattr /file NODEID: 2 unique: 9, success, outsize: 144 unique: 10, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 653 open flags: 0x0 /file open[0] flags: 0x0 /file unique: 10, success, outsize: 32 unique: 11, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 653 getattr /file unique: 11, success, outsize: 120 unique: 12, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 653 getattr /file unique: 12, success, outsize: 120 unique: 13, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 653 getattr /file unique: 13, success, outsize: 120 unique: 14, opcode: READ (15), nodeid: 2, insize: 80, pid: 653 read[0] 65536 bytes from 0 flags: 0x0 read[0] 65536 bytes from 0 unique: 14, success, outsize: 65552 unique: 15, opcode: BMAP (37), nodeid: 2, insize: 56, pid: 653 bmap /file blocksize: 65536 index: 1 bmap a unique: 15, error: -22 (Invalid argument), outsize: 16 unique: 16, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 653 unique: 16, error: -78 (Function not implemented), outsize: 16 unique: 17, opcode: SETATTR (4), nodeid: 2, insize: 128, pid: 653 getattr /file unique: 17, success, outsize: 120 unique: 18, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 653 release[0] flags: 0x0 unique: 18, success, outsize: 16 ===> ntfs-3g log for the 2 read <=== unique: 19, opcode: LOOKUP (1), nodeid: 1, insize: 45, pid: 88220 LOOKUP /file getattr /file NODEID: 2 unique: 19, success, outsize: 144 unique: 20, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 88220 open flags: 0x0 /file open[0] flags: 0x0 /file unique: 20, success, outsize: 32 unique: 21, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 88220 getattr /file unique: 21, success, outsize: 120 unique: 22, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 88220 getattr /file unique: 22, success, outsize: 120 unique: 23, opcode: BMAP (37), nodeid: 2, insize: 56, pid: 88220 bmap /file blocksize: 65536 index: 1 bmap a unique: 23, error: -22 (Invalid argument), outsize: 16 unique: 24, opcode: SETATTR (4), nodeid: 2, insize: 128, pid: 88220 getattr /file unique: 24, success, outsize: 120 unique: 25, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 88220 release[0] flags: 0x0 unique: 25, success, outsize: 16 ===> ntfs-3g log for the 3 read <=== unique: 27, opcode: LOOKUP (1), nodeid: 1, insize: 45, pid: 82781 LOOKUP /file getattr /file NODEID: 2 unique: 27, success, outsize: 144 unique: 28, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 82781 open flags: 0x0 /file open[0] flags: 0x0 /file unique: 28, success, outsize: 32 unique: 29, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 82781 getattr /file unique: 29, success, outsize: 120 unique: 30, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 82781 getattr /file unique: 30, success, outsize: 120 unique: 31, opcode: SETATTR (4), nodeid: 2, insize: 128, pid: 82781 getattr /file unique: 31, success, outsize: 120 unique: 32, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 82781 release[0] flags: 0x0 unique: 32, success, outsize: 16 I'm baffled by the fact that the READ call only happens once. Does that mean that there is some caching in play? Where does it happen - fuse kernel part or inside ntfs-3g? Alan, does this information ring a bell to you?
(In reply to Gleb Popov from comment #14) > I'm baffled by the fact that the READ call only happens once. Does that mean that there is some caching in play? Where does it happen - fuse kernel part or inside ntfs-3g? > Alan, does this information ring a bell to you? Yes, there is caching. For older fuse file systems (protocol older than 7.23, which includes everything using libfuse-2) it's controlled by the vfs.fusefs.data_cache_mode sysctl The default is write-through caching. But you can change it to zero to disable caching completely or two for write-back caching.