Summary: | FAT32 formatted USB stick with files written by PS4 - Invalid argument, unable to list directory contents | ||
---|---|---|---|
Product: | Base System | Reporter: | Erik Nordstrøm <erik> |
Component: | bin | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | New --- | ||
Severity: | Affects Only Me | CC: | ache, cem, imp |
Priority: | --- | ||
Version: | 11.0-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Erik Nordstrøm
2017-03-01 02:45:46 UTC
Sleuthkit confirms that there are trademark symbols in the file names as I suspected. $ fls -m "E:/" -r /dev/da1s1 0|E:/PS4|3|d/drwxrwxrwx|0|0|32768|1488322800|1488334258|0|1488334258 0|E:/PS4/SHARE|1029|d/drwxrwxrwx|0|0|32768|1488322800|1488334444|0|1488334258 0|E:/PS4/SHARE/Video Clips|2054|d/drwxrwxrwx|0|0|32768|1488322800|1488334258|0|1488334258 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™|3079|d/drwxrwxrwx|0|0|32768|1488322800|1488334444|0|1488334258 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301022731.mp4|4104|r/rrwxrwxrwx|0|0|66882519|1488322800|1488334276|0|1488334258 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301022604.mp4|4108|r/rrwxrwxrwx|0|0|401620912|1488322800|1488334394|0|1488334277 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301021725.mp4|4112|r/rrwxrwxrwx|0|0|121384468|1488322800|1488334428|0|1488334395 0|E:/PS4/SHARE/Video Clips/Alien_ Isolation™/Alien_ Isolation™_20170301021448.mp4|4116|r/rrwxrwxrwx|0|0|52162445|1488322800|1488334444|0|1488334429 0|E:/PS4/SHARE/tmp_cp3.dat (deleted)|2056|r/rrwxrwxrwx|0|0|52162445|1488322800|1488334444|0|1488334429 0|E:/$MBR|484143635|v/v---------|0|0|512|0|0|0|0 0|E:/$FAT1|484143636|v/v---------|0|0|1891328|0|0|0|0 0|E:/$FAT2|484143637|v/v---------|0|0|1891328|0|0|0|0 0|E:/$OrphanFiles|484143638|d/d---------|0|0|0|0|0|0|0 How large is the USB stick? Is there any chance you could share an image of the USB stick to aid debugging / reproduction? Thanks. Well, ok, these are the relevant dirents: 00000040 42 69 00 6f 00 6e 00 22 21 00 00 0f 00 72 ff ff |Bi.o.n."!....r..| 00000050 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................| 00000060 01 41 00 6c 00 69 00 65 00 6e 00 0f 00 72 5f 00 |.A.l.i.e.n...r_.| 00000070 20 00 49 00 73 00 6f 00 6c 00 00 00 61 00 74 00 | .I.s.o.l...a.t.| It's a two-part LFN, in reverse order. The bytes from 0x41-0x4a are "ion\U2122\0". \U2122 is "Unicode Character 'TRADE MARK SIGN' (U+2122)." The 8.3 encoding is in a 3rd dirent: 00000080 41 4c 49 45 4e 5f 7e 31 20 20 20 10 00 31 5d 19 |ALIEN_~1 ..1].| 00000090 61 4a 61 4a 00 00 c2 19 61 4a 06 00 00 00 00 00 |aJaJ....aJ......| "ALIEN_~1. ". It looks like you may be able to use mount_msdosfs -o shortnames (-s) or -o nowin95 (-9) as a workaround to access your file, for now. $ dd if=/dev/zero of=./testfs bs=1m count=64 $ sudo mdconfig -a -t vnode -f testfs md1 $ sudo newfs_msdos -c1 -F32 /dev/md1 /dev/md1: 129022 sectors in 129022 FAT32 clusters (512 bytes/cluster) BytesPerSec=512 SecPerClust=1 ResSectors=32 FATs=2 Media=0xf0 SecPerTrack=63 Heads=16 HiddenSecs=0 HugeSectors=131072 FATsecs=1008 RootCluster=2 FSInfo=1 Backup=2 $ mkdir testdir $ sudo mount_msdosfs /dev/md1 ./testdir $ touch "testdir/test™" touch: testdir/test™: Invalid argument $ sudo dtrace -n "fbt:::return /arg1==EINVAL/ { stack(); }" -c "touch 'testdir/test™'" ⏎ dtrace: description 'fbt:::return ' matched 30012 probes touch: 'testdir/test™': No such file or directory <<< different error? Only happens under dtrace dtrace: pid 12937 exited with status 1 CPU ID FUNCTION:NAME 3 56556 _vhold:return kernel`cache_lookup+0xba7 kernel`vfs_cache_lookup+0xac kernel`VOP_LOOKUP_APV+0x87 kernel`lookup+0x711 kernel`namei+0x59d kernel`vn_open_cred+0x21c kernel`kern_openat+0x25f kernel`amd64_syscall+0x51e kernel`0xffffffff80fc867b $ sudo dtrace -n "fbt:::return /arg1==EINVAL/ { @[stack()] = count(); }" -c "touch 'testdir/234test™'" dtrace: description 'fbt:::return ' matched 30012 probes touch: 'testdir/234test™': No such file or directory dtrace: pid 12964 exited with status 1 kernel`cache_lookup+0xba7 kernel`vfs_cache_lookup+0xac kernel`VOP_LOOKUP_APV+0x87 kernel`lookup+0x711 kernel`namei+0x59d kernel`vn_open_cred+0x21c kernel`kern_openat+0x25f kernel`amd64_syscall+0x51e kernel`0xffffffff80fc867b 1 kernel`vn_open_cred+0x10f kernel`kern_openat+0x25f kernel`amd64_syscall+0x51e kernel`0xffffffff80fc867b 1 ... (irrelevant frames elided) Ok, this also seems to work: $ sudo mount_msdosfs -L en_US.UTF-8 /dev/md1 ./testdir $ touch 'testdir/234test™' $ ls testdir 234test™ Does mounting your USB stick with -L <lang>.UTF-8 work? Ok, I umounted and remounted without -L en-US.UTF-8, and reproduced your issue: $ ls ./testdir ls: 234test?: Invalid argument So -L <foo>.UTF-8 should work around the issue. Why a UTF-8 encoding isn't the default, I don't know. I think I see why ls(1)/fts(3) shows "Invalid". The directory entries are readable: $ echo ./testdir/* ./testdir/123t�st ./testdir/abc? $ echo $? 0 However, that '?' is an actual question mark symbol, because ™ can not be represented in iso-8859-1, only win-1252 and unicode. So you cannot access that file by its directory entry: $ ls ./testdir/abc? ls: ./testdir/abc?: Invalid argument $ truss stat ./testdir/abc? ... lstat("./testdir/abc?",0x7fffffffe298) ERR#22 'Invalid argument' I think that EINVAL return is bogus. lstat() misses should return ENOENT. But that doesn't help you very much. Maybe VOP_READDIR should prefer 8.3 names if LFN names do not convert into cs_local. Although, lookup of 8.3 names doesn't work. So maybe not. It seems cs_local should default to UTF-8 or user's locale, not ISO-8859-1. Or if it must be a 8-bit character set, Win-1252 may be a better choice. Thanks for looking into this and for your findings thus far Conrad. Would you like an image of the USB stick still or was the excerpts in my original post sufficient? The original USB stick is several hundreds of MB in size but if desired I can produce a much smaller image for repro and debug containing only a single small file (a single screen shot instead of multiple MP4 videos) written to an otherwise empty USB stick which would have the same directory structure and file names. (In reply to Erik Nordstrøm from comment #7) Hi Erik, No, we don't need the USB stick image any more. I can reproduce it with a small in-memory filesystem by mounting with and without "-L en_US.UTF-8". Did you see comment #5? Can you try mounting with "mount_msdosfs -L no_NO.UTF-8" or similar and see if that allows you to access your file? Thanks. (In reply to Conrad Meyer from comment #8) With # mount_msdosfs -L en_US.UTF-8 /dev/da1s1 /mnt/ I am able to list the files correctly indeed. $ ls -laR /mnt/PS4/SHARE/Screenshots/ total 192 drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ./ drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ../ drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 Alien_ Isolation™/ /mnt/PS4/SHARE/Screenshots/Alien_ Isolation™: total 640 drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ./ drwxrwxrwx 1 root wheel 32768 Mar 5 11:15 ../ -rwxrwxrwx 1 root wheel 255197 Mar 5 11:15 Alien_ Isolation™_20170305101454.jpg* Furthermore I agree with your wondering why it doesn't respect locale settings. Building on what you said in that and other comments, I think it might be reasonable to try: 1. User defined locale as per the LANG and/or relevant LC_* env variables. 2. en_US.UTF-8 3. win-1252 4. Short names How one would determine if the right or wrong choice had been made I don't know. There's no way for the kernel to know the locale. That's purely a userland construct. msdosfs does have a notion of how to decode things, but it has to be done at mount time. And there's likely a bug or two in the translations that's causing the issues that you're seeing. These bugs are likely in the MSDOSFS code. I suspect that Conrad's notion of switching to UTF-8 internally is the right way to go, but worry about compatibility. We should ask the folks that set it to ISO-8859-1 why that specific choice, and if we can shift the interpretation over to UTF-8. Eg, was iso-8859-1 just a placeholder that then got perverted into a format that Windows never produces, or was there a deeper reason... (In reply to Warner Losh from comment #11) >We should ask the folks that set it to ISO-8859-1 why that specific choice, and if we can shift the interpretation over to UTF-8. I agree. I don't know who to ask and probably couldn't contribute much in conversation anyway, could either of you -- Warner or Conrad -- ask them? I don't know how to find these people either :-(. ache@ is the only one I can think of. And if we can't contact him, go ahead with the change. |