I have an USB stick that I zeroed and formatted on my laptop which is running FreeBSD. # dd if=/dev/zero of=/dev/da1 bs=16m # fdisk -i /dev/da1 # newfs_msdos -F32 /dev/da1s1 I then inserted the USB stick into my Sony PlayStation 4 and copied some video recordings of Alien: Isolation gameplay made with the PS4 to the USB stick. When I inserted the USB stick back into my laptop, mounted it and try to look at the files, I get "invalid argument" as shown below. I suspect that the PS4 may have included the trademark symbol and that this might be what is causing the problems to show. Assuming that only the name of the directory is the problem, I think it would be better if it was still possible to descend into the directory. For example, perhaps one could use the "8.3 filename" (<https://en.wikipedia.org/wiki/8.3_filename>) that a file has when decoding the full name of the file fails? In the specific case here, that would be "ALIEN_~1" if I am reading the hexdump of the parent directory correctly. $ doas mount_msdosfs /dev/da1s1 /mnt/ $ cd /mnt/ $ ls -R PS4/ ./PS4: SHARE/ ./PS4/SHARE: Video Clips/ ./PS4/SHARE/Video Clips: ls: Alien_ Isolation?: Invalid argument $ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL= $ env -i ls -R PS4 ./PS4: SHARE ./PS4/SHARE: Video Clips ./PS4/SHARE/Video Clips: ls: Alien_ Isolation?: Invalid argument $ env -i locale LANG= LC_CTYPE="C" LC_COLLATE="C" LC_TIME="C" LC_NUMERIC="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= $ hd PS4 00000000 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |. ...].| 00000010 61 4a 61 4a 00 00 5d 19 61 4a 03 00 00 00 00 00 |aJaJ..].aJ......| 00000020 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |.. ...].| 00000030 61 4a 61 4a 00 00 5d 19 61 4a 00 00 00 00 00 00 |aJaJ..].aJ......| 00000040 53 48 41 52 45 20 20 20 20 20 20 10 00 23 5d 19 |SHARE ..#].| 00000050 61 4a 61 4a 00 00 c2 19 61 4a 04 00 00 00 00 00 |aJaJ....aJ......| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00008000 $ hd PS4/SHARE/ 00000000 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |. ...].| 00000010 61 4a 61 4a 00 00 5d 19 61 4a 04 00 00 00 00 00 |aJaJ..].aJ......| 00000020 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |.. ...].| 00000030 61 4a 61 4a 00 00 5d 19 61 4a 03 00 00 00 00 00 |aJaJ..].aJ......| 00000040 41 56 00 69 00 64 00 65 00 6f 00 0f 00 8e 20 00 |AV.i.d.e.o.... .| 00000050 43 00 6c 00 69 00 70 00 73 00 00 00 00 00 ff ff |C.l.i.p.s.......| 00000060 56 49 44 45 4f 43 7e 31 20 20 20 10 00 2f 5d 19 |VIDEOC~1 ../].| 00000070 61 4a 61 4a 00 00 5d 19 61 4a 05 00 00 00 00 00 |aJaJ..].aJ......| 00000080 e5 74 00 6d 00 70 00 5f 00 63 00 0f 00 8e 70 00 |.t.m.p._.c....p.| 00000090 33 00 2e 00 64 00 61 00 74 00 00 00 00 00 ff ff |3...d.a.t.......| 000000a0 e5 4d 50 5f 43 50 33 20 44 41 54 20 00 bd b8 19 |.MP_CP3 DAT ....| 000000b0 61 4a 61 4a 00 00 c2 19 61 4a 5b 46 8d ef 1b 03 |aJaJ....aJ[F....| 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00008000 $ hd PS4/SHARE/Video\ Clips/ 00000000 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |. ...].| 00000010 61 4a 61 4a 00 00 5d 19 61 4a 05 00 00 00 00 00 |aJaJ..].aJ......| 00000020 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |.. ...].| 00000030 61 4a 61 4a 00 00 5d 19 61 4a 04 00 00 00 00 00 |aJaJ..].aJ......| 00000040 42 69 00 6f 00 6e 00 22 21 00 00 0f 00 72 ff ff |Bi.o.n."!....r..| 00000050 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................| 00000060 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 72 5f 00 |.A.l.i.e.n...r_.| 00000070 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| 00000080 41 4c 49 45 4e 5f 7e 31 20 20 20 10 00 31 5d 19 |ALIEN_~1 ..1].| 00000090 61 4a 61 4a 00 00 c2 19 61 4a 06 00 00 00 00 00 |aJaJ....aJ......| 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00008000 $ hd /dev/da1s1 | grep -B10 -A10 '01 41 00 6c 00 69 00 65' 003af8a0 e5 4d 50 5f 43 50 33 20 44 41 54 20 00 bd b8 19 |.MP_CP3 DAT ....| 003af8b0 61 4a 61 4a 00 00 c2 19 61 4a 5b 46 8d ef 1b 03 |aJaJ....aJ[F....| 003af8c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 003b7800 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |. ...].| 003b7810 61 4a 61 4a 00 00 5d 19 61 4a 05 00 00 00 00 00 |aJaJ..].aJ......| 003b7820 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |.. ...].| 003b7830 61 4a 61 4a 00 00 5d 19 61 4a 04 00 00 00 00 00 |aJaJ..].aJ......| 003b7840 42 69 00 6f 00 6e 00 22 21 00 00 0f 00 72 ff ff |Bi.o.n."!....r..| 003b7850 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................| 003b7860 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 72 5f 00 |.A.l.i.e.n...r_.| 003b7870 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| 003b7880 41 4c 49 45 4e 5f 7e 31 20 20 20 10 00 31 5d 19 |ALIEN_~1 ..1].| 003b7890 61 4a 61 4a 00 00 c2 19 61 4a 06 00 00 00 00 00 |aJaJ....aJ......| 003b78a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 003bf800 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |. ...].| 003bf810 61 4a 61 4a 00 00 5d 19 61 4a 06 00 00 00 00 00 |aJaJ..].aJ......| 003bf820 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 5d 19 |.. ...].| 003bf830 61 4a 61 4a 00 00 5d 19 61 4a 05 00 00 00 00 00 |aJaJ..].aJ......| 003bf840 43 30 00 32 00 32 00 37 00 33 00 0f 00 e9 31 00 |C0.2.2.7.3....1.| 003bf850 2e 00 6d 00 70 00 34 00 00 00 00 00 ff ff ff ff |..m.p.4.........| 003bf860 02 69 00 6f 00 6e 00 22 21 5f 00 0f 00 e9 32 00 |.i.o.n."!_....2.| 003bf870 30 00 31 00 37 00 30 00 33 00 00 00 30 00 31 00 |0.1.7.0.3...0.1.| 003bf880 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 e9 5f 00 |.A.l.i.e.n...._.| 003bf890 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| 003bf8a0 41 4c 49 45 4e 5f 7e 31 4d 50 34 20 00 32 5d 19 |ALIEN_~1MP4 .2].| 003bf8b0 61 4a 61 4a 00 00 68 19 61 4a 07 00 d7 8b fc 03 |aJaJ..h.aJ......| 003bf8c0 43 30 00 32 00 32 00 36 00 30 00 0f 00 c9 34 00 |C0.2.2.6.0....4.| 003bf8d0 2e 00 6d 00 70 00 34 00 00 00 00 00 ff ff ff ff |..m.p.4.........| 003bf8e0 02 69 00 6f 00 6e 00 22 21 5f 00 0f 00 c9 32 00 |.i.o.n."!_....2.| 003bf8f0 30 00 31 00 37 00 30 00 33 00 00 00 30 00 31 00 |0.1.7.0.3...0.1.| 003bf900 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 c9 5f 00 |.A.l.i.e.n...._.| 003bf910 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| 003bf920 41 4c 49 45 4e 5f 7e 32 4d 50 34 20 00 6b 68 19 |ALIEN_~2MP4 .kh.| 003bf930 61 4a 61 4a 00 00 a7 19 61 4a 01 08 b0 3f f0 17 |aJaJ....aJ...?..| 003bf940 43 30 00 32 00 31 00 37 00 32 00 0f 00 29 35 00 |C0.2.1.7.2...)5.| 003bf950 2e 00 6d 00 70 00 34 00 00 00 00 00 ff ff ff ff |..m.p.4.........| 003bf960 02 69 00 6f 00 6e 00 22 21 5f 00 0f 00 29 32 00 |.i.o.n."!_...)2.| 003bf970 30 00 31 00 37 00 30 00 33 00 00 00 30 00 31 00 |0.1.7.0.3...0.1.| 003bf980 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 29 5f 00 |.A.l.i.e.n...)_.| 003bf990 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| 003bf9a0 41 4c 49 45 4e 5f 7e 33 4d 50 34 20 00 66 a7 19 |ALIEN_~3MP4 .f..| 003bf9b0 61 4a 61 4a 00 00 b8 19 61 4a e2 37 14 2e 3c 07 |aJaJ....aJ.7..<.| 003bf9c0 43 30 00 32 00 31 00 34 00 34 00 0f 00 8a 38 00 |C0.2.1.4.4....8.| 003bf9d0 2e 00 6d 00 70 00 34 00 00 00 00 00 ff ff ff ff |..m.p.4.........| 003bf9e0 02 69 00 6f 00 6e 00 22 21 5f 00 0f 00 8a 32 00 |.i.o.n."!_....2.| 003bf9f0 30 00 31 00 37 00 30 00 33 00 00 00 30 00 31 00 |0.1.7.0.3...0.1.| 003bfa00 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 8a 5f 00 |.A.l.i.e.n...._.| 003bfa10 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| 003bfa20 41 4c 49 45 4e 5f 7e 34 4d 50 34 20 00 bd b8 19 |ALIEN_~4MP4 ....| 003bfa30 61 4a 61 4a 00 00 c2 19 61 4a 5b 46 8d ef 1b 03 |aJaJ....aJ[F....| 003bfa40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 003c7800 00 00 00 1c 66 74 79 70 4d 53 4e 56 01 6e 23 04 |....ftypMSNV.n#.| 003c7810 4d 53 4e 56 69 73 6f 6d 6d 70 34 32 00 00 00 94 |MSNVisommp42....| 003c7820 75 75 69 64 50 52 4f 46 21 d2 4f ce bb 88 69 5c |uuidPROF!.O...i\| 003c7830 fa c9 c7 40 00 00 00 00 00 00 00 03 00 00 00 14 |...@............| 003c7840 46 50 52 46 00 00 00 00 00 00 00 00 00 00 00 00 |FPRF............|
Sleuthkit confirms that there are trademark symbols in the file names as I suspected. $ fls -m "E:/" -r /dev/da1s1 0|E:/PS4|3|d/drwxrwxrwx|0|0|32768|1488322800|1488334258|0|1488334258 0|E:/PS4/SHARE|1029|d/drwxrwxrwx|0|0|32768|1488322800|1488334444|0|1488334258 0|E:/PS4/SHARE/Video Clips|2054|d/drwxrwxrwx|0|0|32768|1488322800|1488334258|0|1488334258 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™|3079|d/drwxrwxrwx|0|0|32768|1488322800|1488334444|0|1488334258 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301022731.mp4|4104|r/rrwxrwxrwx|0|0|66882519|1488322800|1488334276|0|1488334258 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301022604.mp4|4108|r/rrwxrwxrwx|0|0|401620912|1488322800|1488334394|0|1488334277 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301021725.mp4|4112|r/rrwxrwxrwx|0|0|121384468|1488322800|1488334428|0|1488334395 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301021448.mp4|4116|r/rrwxrwxrwx|0|0|52162445|1488322800|1488334444|0|1488334429 0|E:/PS4/SHARE/tmp_cp3.dat (deleted)|2056|r/rrwxrwxrwx|0|0|52162445|1488322800|1488334444|0|1488334429 0|E:/$MBR|484143635|v/v---------|0|0|512|0|0|0|0 0|E:/$FAT1|484143636|v/v---------|0|0|1891328|0|0|0|0 0|E:/$FAT2|484143637|v/v---------|0|0|1891328|0|0|0|0 0|E:/$OrphanFiles|484143638|d/d---------|0|0|0|0|0|0|0
How large is the USB stick? Is there any chance you could share an image of the USB stick to aid debugging / reproduction? Thanks.
Well, ok, these are the relevant dirents: 00000040 42 69 00 6f 00 6e 00 22 21 00 00 0f 00 72 ff ff |Bi.o.n."!....r..| 00000050 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................| 00000060 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 72 5f 00 |.A.l.i.e.n...r_.| 00000070 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| It's a two-part LFN, in reverse order. The bytes from 0x41-0x4a are "ion\U2122\0". \U2122 is "Unicode Character 'TRADE MARK SIGN' (U+2122)." The 8.3 encoding is in a 3rd dirent: 00000080 41 4c 49 45 4e 5f 7e 31 20 20 20 10 00 31 5d 19 |ALIEN_~1 ..1].| 00000090 61 4a 61 4a 00 00 c2 19 61 4a 06 00 00 00 00 00 |aJaJ....aJ......| "ALIEN_~1. ".
It looks like you may be able to use mount_msdosfs -o shortnames (-s) or -o nowin95 (-9) as a workaround to access your file, for now. $ dd if=/dev/zero of=./testfs bs=1m count=64 $ sudo mdconfig -a -t vnode -f testfs md1 $ sudo newfs_msdos -c1 -F32 /dev/md1 /dev/md1: 129022 sectors in 129022 FAT32 clusters (512 bytes/cluster) BytesPerSec=512 SecPerClust=1 ResSectors=32 FATs=2 Media=0xf0 SecPerTrack=63 Heads=16 HiddenSecs=0 HugeSectors=131072 FATsecs=1008 RootCluster=2 FSInfo=1 Backup=2 $ mkdir testdir $ sudo mount_msdosfs /dev/md1 ./testdir $ touch "testdir/test™" touch: testdir/test™: Invalid argument $ sudo dtrace -n "fbt:::return /arg1==EINVAL/ { stack(); }" -c "touch 'testdir/test™'" ⏎ dtrace: description 'fbt:::return ' matched 30012 probes touch: 'testdir/test™': No such file or directory <<< different error? Only happens under dtrace dtrace: pid 12937 exited with status 1 CPU ID FUNCTION:NAME 3 56556 _vhold:return kernel`cache_lookup+0xba7 kernel`vfs_cache_lookup+0xac kernel`VOP_LOOKUP_APV+0x87 kernel`lookup+0x711 kernel`namei+0x59d kernel`vn_open_cred+0x21c kernel`kern_openat+0x25f kernel`amd64_syscall+0x51e kernel`0xffffffff80fc867b $ sudo dtrace -n "fbt:::return /arg1==EINVAL/ { @[stack()] = count(); }" -c "touch 'testdir/234test™'" dtrace: description 'fbt:::return ' matched 30012 probes touch: 'testdir/234test™': No such file or directory dtrace: pid 12964 exited with status 1 kernel`cache_lookup+0xba7 kernel`vfs_cache_lookup+0xac kernel`VOP_LOOKUP_APV+0x87 kernel`lookup+0x711 kernel`namei+0x59d kernel`vn_open_cred+0x21c kernel`kern_openat+0x25f kernel`amd64_syscall+0x51e kernel`0xffffffff80fc867b 1 kernel`vn_open_cred+0x10f kernel`kern_openat+0x25f kernel`amd64_syscall+0x51e kernel`0xffffffff80fc867b 1 ... (irrelevant frames elided) Ok, this also seems to work: $ sudo mount_msdosfs -L en_US.UTF-8 /dev/md1 ./testdir $ touch 'testdir/234test™' $ ls testdir 234test™ Does mounting your USB stick with -L <lang>.UTF-8 work?
Ok, I umounted and remounted without -L en-US.UTF-8, and reproduced your issue: $ ls ./testdir ls: 234test?: Invalid argument So -L <foo>.UTF-8 should work around the issue. Why a UTF-8 encoding isn't the default, I don't know.
I think I see why ls(1)/fts(3) shows "Invalid". The directory entries are readable: $ echo ./testdir/* ./testdir/123t�st ./testdir/abc? $ echo $? 0 However, that '?' is an actual question mark symbol, because ™ can not be represented in iso-8859-1, only win-1252 and unicode. So you cannot access that file by its directory entry: $ ls ./testdir/abc? ls: ./testdir/abc?: Invalid argument $ truss stat ./testdir/abc? ... lstat("./testdir/abc?",0x7fffffffe298) ERR#22 'Invalid argument' I think that EINVAL return is bogus. lstat() misses should return ENOENT. But that doesn't help you very much. Maybe VOP_READDIR should prefer 8.3 names if LFN names do not convert into cs_local. Although, lookup of 8.3 names doesn't work. So maybe not. It seems cs_local should default to UTF-8 or user's locale, not ISO-8859-1. Or if it must be a 8-bit character set, Win-1252 may be a better choice.
Thanks for looking into this and for your findings thus far Conrad. Would you like an image of the USB stick still or was the excerpts in my original post sufficient? The original USB stick is several hundreds of MB in size but if desired I can produce a much smaller image for repro and debug containing only a single small file (a single screen shot instead of multiple MP4 videos) written to an otherwise empty USB stick which would have the same directory structure and file names.
(In reply to Erik Nordstrøm from comment #7) Hi Erik, No, we don't need the USB stick image any more. I can reproduce it with a small in-memory filesystem by mounting with and without "-L en_US.UTF-8". Did you see comment #5? Can you try mounting with "mount_msdosfs -L no_NO.UTF-8" or similar and see if that allows you to access your file? Thanks.
(In reply to Conrad Meyer from comment #8) With # mount_msdosfs -L en_US.UTF-8 /dev/da1s1 /mnt/ I am able to list the files correctly indeed. $ ls -laR /mnt/PS4/SHARE/Screenshots/ total 192 drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ./ drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ../ drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 Alien_ Isolation™/ /mnt/PS4/SHARE/Screenshots/Alien_ Isolation™: total 640 drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ./ drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ../ -rwxrwxrwx 1 root wheel 255197 Mar 5 11:15 Alien_ Isolation™_20170305101454.jpg*
Furthermore I agree with your wondering why it doesn't respect locale settings. Building on what you said in that and other comments, I think it might be reasonable to try: 1. User defined locale as per the LANG and/or relevant LC_* env variables. 2. en_US.UTF-8 3. win-1252 4. Short names How one would determine if the right or wrong choice had been made I don't know.
There's no way for the kernel to know the locale. That's purely a userland construct. msdosfs does have a notion of how to decode things, but it has to be done at mount time. And there's likely a bug or two in the translations that's causing the issues that you're seeing. These bugs are likely in the MSDOSFS code. I suspect that Conrad's notion of switching to UTF-8 internally is the right way to go, but worry about compatibility. We should ask the folks that set it to ISO-8859-1 why that specific choice, and if we can shift the interpretation over to UTF-8. Eg, was iso-8859-1 just a placeholder that then got perverted into a format that Windows never produces, or was there a deeper reason...
(In reply to Warner Losh from comment #11) >We should ask the folks that set it to ISO-8859-1 why that specific choice, and if we can shift the interpretation over to UTF-8. I agree. I don't know who to ask and probably couldn't contribute much in conversation anyway, could either of you -- Warner or Conrad -- ask them?
I don't know how to find these people either :-(.
ache@ is the only one I can think of.
And if we can't contact him, go ahead with the change.