Bug 191540 - the FAT32 implementation bugs out on Unicode file names
Summary: the FAT32 implementation bugs out on Unicode file names
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-01 22:05 UTC by dt71
Modified: 2014-07-04 07:39 UTC (History)
2 users (show)

See Also:


Attachments
an archive containing files with fancy names (278 bytes, application/zip)
2014-07-01 22:08 UTC, dt71
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description dt71 2014-07-01 22:05:30 UTC
Attempting to read or write (also, getting the inode of, shell-autocompleting, and so on) files with non-ASCII Unicode names fails on FAT32 partitions. Such a name can exist, for example, due to Windows.
Comment 1 dt71 2014-07-01 22:08:05 UTC
Created attachment 144326 [details]
an archive containing files with fancy names

Contains 2 empty files:
1’.txt
2–.txt
Comment 2 dt71 2014-07-01 22:18:19 UTC
To reproduce:

Create and mount a FAT32 partition:
# dd if=/dev/zero of=space bs=1m count=4
# mdconfig -a -u 7 -t vnode -f space
# newfs_msdos -F 32 /dev/md7
# mkdir fat
# mount -t msdosfs /dev/md7 fat

Attempt to create files with fancy names on the partition:
# cd fat
# tar -vxf ../files.zip
The output is:
x 1’.txt: Can't create '1’.txt'
x 2–.txt: Can't create '2–.txt'

Other file access can also be attempted, eg.:
(The "–" is a Unicode "en dash".)
# stat –
The output is:
stat: –: stat: Invalid argument

Clean up (optional):
# cd ..
# umount fat
# rmdir fat
# mdconfig -d -u 7
# rm space
Comment 3 Jamie Landeg-Jones 2014-07-02 04:50:22 UTC
With unix filesystems, slap out any 'binary' characters in a filename and they will be recorded 'as is'.

So, when you do an 'ls', the filename data is preserved so that if it was originally a UTF-8 encoded name originally, it will still be a UTF-8 name, and displayed correctly on a UTF-8 terminal.

msdos file systems don't work this way, and translate the filenames before storing. If the msdos filesystem doesn't know what the original character set is,  it won't be coded correctly, and subsequently won't be displayed correctly.

You therefore need to tell it on mount what character set you are using, with the -L option:

-L locale
             Specify locale name used for file name conversions for DOS and
             Win'95 names.  By default ISO 8859-1 assumed as local character
             set.

Your test filenames are in UTF-8 fornat, so if you repeat your exercise, but instead mount the partition with:

mount_msdosfs -L en_GB.UTF-8 /dev/md7 fat

, then everything will work as expected.





.... You are attempting to