|Summary:||tar filenames encoding problem|
|Component:||bin||Assignee:||freebsd-bugs mailing list <bugs>|
|Severity:||Affects Some People|
Description aler 2019-06-14 21:49:58 UTC
How to reproduce: ``` #!/bin/sh rm -Rf d e mkdir d touch d/`printf '\306'` mkdir e tar -c -f - d | tar -C e -x -f - ``` Doing this with empty $LANG leads to ``` : Can't translate pathname 'd/Ж' to UTF-8# sh test.sh ``` However, directory `d` proprely copied into `e`. This error message disappears with `LANG=en_US.ISO8859-1` I'm not exactly sure what this error message means, but anyway it is very unclear and may be interpreted as "the file was not archived". Also I don't know why tar even tries to do some charset translations. It should be binary-safe against filenames by default.
Comment 1 aler 2019-06-14 21:55:51 UTC
This started happening in FreeBSD 10. Before that tar never tried to do charset translations by any means.
Comment 2 Conrad Meyer 2019-06-14 22:02:30 UTC
It's a non-fatal warning that changes the exit status to non-zero but as you note, does not prevent correct copy. libarchive changes the copy mode from encoding-aware (UTF-8 default, I guess) to binary mode when it prints that text.
Comment 3 Conrad Meyer 2019-06-14 22:07:14 UTC
This is specified by POSIX' pax: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html "If there is a hdrcharset extended header in effect for a file, the value field for any gname, linkpath, path, and uname extended header records shall be encoded using the character set specified by the hdrcharset extended header record; otherwise, the value field shall be encoded using UTF-8. The value field for all other keywords specified by POSIX.1-2017 shall be encoded using UTF-8."
Comment 4 Conrad Meyer 2019-06-14 22:08:30 UTC
(Prior to FreeBSD 10, the default tar format was likely the older "ustar" instead of "pax".)
Comment 5 aler 2019-06-15 11:13:57 UTC
The filesystem has no internal charset so it is weird to do charset translations from no-charset (= BINARY) to any explicit charset. Also it is not good that resulting archive somehow dependent on environment $LANG which was intended for run-time localization purposes and not for abstract data processing. Also this fact is undocumented on tar manpage.