Bug 191540

Summary: the FAT32 implementation bugs out on Unicode file names
Product: Base System Reporter: dt71
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me CC: jamie, kevlo
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
an archive containing files with fancy names none

Description dt71 2014-07-01 22:05:30 UTC
Attempting to read or write (also, getting the inode of, shell-autocompleting, and so on) files with non-ASCII Unicode names fails on FAT32 partitions. Such a name can exist, for example, due to Windows.
Comment 1 dt71 2014-07-01 22:08:05 UTC
Created attachment 144326 [details]
an archive containing files with fancy names

Contains 2 empty files:
1’.txt
2–.txt
Comment 2 dt71 2014-07-01 22:18:19 UTC
To reproduce:

Create and mount a FAT32 partition:
# dd if=/dev/zero of=space bs=1m count=4
# mdconfig -a -u 7 -t vnode -f space
# newfs_msdos -F 32 /dev/md7
# mkdir fat
# mount -t msdosfs /dev/md7 fat

Attempt to create files with fancy names on the partition:
# cd fat
# tar -vxf ../files.zip
The output is:
x 1’.txt: Can't create '1’.txt'
x 2–.txt: Can't create '2–.txt'

Other file access can also be attempted, eg.:
(The "–" is a Unicode "en dash".)
# stat –
The output is:
stat: –: stat: Invalid argument

Clean up (optional):
# cd ..
# umount fat
# rmdir fat
# mdconfig -d -u 7
# rm space
Comment 3 Jamie Landeg-Jones 2014-07-02 04:50:22 UTC
With unix filesystems, slap out any 'binary' characters in a filename and they will be recorded 'as is'.

So, when you do an 'ls', the filename data is preserved so that if it was originally a UTF-8 encoded name originally, it will still be a UTF-8 name, and displayed correctly on a UTF-8 terminal.

msdos file systems don't work this way, and translate the filenames before storing. If the msdos filesystem doesn't know what the original character set is,  it won't be coded correctly, and subsequently won't be displayed correctly.

You therefore need to tell it on mount what character set you are using, with the -L option:

-L locale
             Specify locale name used for file name conversions for DOS and
             Win'95 names.  By default ISO 8859-1 assumed as local character
             set.

Your test filenames are in UTF-8 fornat, so if you repeat your exercise, but instead mount the partition with:

mount_msdosfs -L en_GB.UTF-8 /dev/md7 fat

, then everything will work as expected.





.... You are attempting to