Bug 246670 - bsdtar: Fails to extract (UTF-8) under QEMU_EMULATING
Summary: bsdtar: Fails to extract (UTF-8) under QEMU_EMULATING
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: Unspecified
Hardware: arm64 Any
: --- Affects Some People
Assignee: Port Management Team
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-23 01:26 UTC by Danilo G. Baio
Modified: 2024-11-06 14:10 UTC (History)
10 users (show)

See Also:


Attachments
set C.UTF-8 as ports tree locale (765 bytes, patch)
2022-02-13 12:46 UTC, Charlie Li
vishwin: maintainer-approval? (portmgr)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Danilo G. Baio freebsd_committer freebsd_triage 2020-05-23 01:26:20 UTC
root@12-aarch64-default:~ # locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=

root@12-aarch64-default:/tmp # tar -zxf /portdistfiles/python/Sphinx-3.0.3.tar.gz
tar: Pathname can't be converted from UTF-8 to current locale.
tar: Error exit delayed from previous errors.
root@12-aarch64-default:/tmp # echo $?
1

root@12-aarch64-default:/tmp # setenv LANG en_US.UTF-8
root@12-aarch64-default:/tmp # setenv LC_ALL en_US.UTF-8
root@12-aarch64-default:/tmp # setenv MM_CHARSET UTF-8
root@12-aarch64-default:/tmp # locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8

root@12-aarch64-default:/tmp # tar -zxf /portdistfiles/python/Sphinx-3.0.3.tar.gz
root@12-aarch64-default:/tmp # echo $?
0


More details on bug #246618
Reported by jbeich@
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2020-05-23 01:40:05 UTC
What's the bug here?  In the C locale, some UTF-8 characters are not representable.  If you're actually using UTF-8, you should set your locale correctly.  In 13-CURRENT this is somewhat improved by making the default locale "C.UTF-8" instead of "C[.USASCII]".
Comment 2 Danilo G. Baio freebsd_committer freebsd_triage 2020-05-23 01:55:01 UTC
(In reply to Conrad Meyer from comment #1)

It just happens under QEMU_EMULATING, default system.
Comment 3 Conrad Meyer freebsd_committer freebsd_triage 2020-05-23 01:57:02 UTC
Seems like a problem with QEMU_EMULATING.
Comment 4 Danilo G. Baio freebsd_committer freebsd_triage 2020-05-23 02:01:50 UTC
(In reply to Conrad Meyer from comment #3)

Right. Should I move it to ports (emulators/qemu-user-static) ?
Comment 5 Jan Beich freebsd_committer freebsd_triage 2020-05-23 02:44:42 UTC
(In reply to Conrad Meyer from comment #1)
> In the C locale, some UTF-8 characters are not representable.

Indeed. Why bsdtar(1) only errors out when built statically? QEMU_EMULATING builds use native-xtools which are native binaries built statically in order to speed up emulated builds.

--- tar tf (en_US.UTF-8)
+++ tar tf (C)
@@ -1258,7 +1258,7 @@
 Sphinx-3.0.3/tests/roots/test-images/subdir/svgimg.pdf
 Sphinx-3.0.3/tests/roots/test-images/subdir/svgimg.svg
 Sphinx-3.0.3/tests/roots/test-images/subdir/svgimg.xx.svg
-Sphinx-3.0.3/tests/roots/test-images/testimäge.png
+Sphinx-3.0.3/tests/roots/test-images/testim?ge.png
 Sphinx-3.0.3/tests/roots/test-index_on_title/
 Sphinx-3.0.3/tests/roots/test-index_on_title/conf.py
 Sphinx-3.0.3/tests/roots/test-index_on_title/contents.rst
Comment 6 commit-hook freebsd_committer freebsd_triage 2020-05-30 12:27:50 UTC
A commit references this bug:

Author: dbaio
Date: Sat May 30 12:27:38 UTC 2020
New revision: 537077
URL: https://svnweb.freebsd.org/changeset/ports/537077

Log:
  textproc/py-sphinx: Fix build (extract) with static bsdtar(1)

  ===>  Extracting for py37-sphinx-3.0.3,1
  => SHA256 Checksum OK for python/Sphinx-3.0.3.tar.gz.
  tar: Pathname can't be converted from UTF-8 to current locale.
  tar: Error exit delayed from previous errors.
  *** Error code 1

  Issue found at least on arm64.aarch64 and mips.mips64 builds using
  native-x-tools/poudriere.

  As /usr/bin/tar is replaced by the binary in /nxb-bin/, pointing EXTRACT_CMD
  to /usr/bin/bsdtar instead.

  root@12-mips64-default:/tmp # file /usr/bin/tar
  /usr/bin/tar: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD),
  statically linked, for FreeBSD 12.1, FreeBSD-style, stripped

  root@12-mips64-default:/tmp # file /usr/bin/bsdtar
  /usr/bin/bsdtar: ELF 64-bit MSB executable, MIPS, MIPS-III version 1 (FreeBSD),
  dynamically linked, interpreter /libexec/ld-elf.so.1, FreeBSD-style, for
  FreeBSD 12.1, stripped

  This patch bypass the issue here (all scenarios we have tested), but the
  problem still exists and it's being tracked in bug 246670.

  Please, see more details in bug 246618. Thanks to tijl, jbeich, kevans and
  all people who helped in testing.

  PR:		246618, 246670
  Submitted by:	tijl
  Reported by:	jbeich

Changes:
  head/textproc/py-sphinx/Makefile
Comment 7 Charlie Li freebsd_committer freebsd_triage 2021-04-29 21:51:48 UTC
This is now happening with devel/py-wheel.
Comment 8 Charlie Li freebsd_committer freebsd_triage 2021-04-29 21:58:05 UTC
I commented too soon. This happens when nearly everything in locale is suffixed UTF-8:

root@aarch64-13-0-default:/ # locale
LANG=C.UTF-8
LC_CTYPE="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_TIME="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

May need to play with $LC_ALL a bit, probably in poudriere itself.
Comment 9 Charlie Li freebsd_committer freebsd_triage 2022-02-13 12:46:59 UTC
Created attachment 231791 [details]
set C.UTF-8 as ports tree locale

Following bug 246618, comment 10, as FreeBSD 11 has been EOL. A number of ports are affected. At least the ones I use or otherwise known:
* textproc/py-sphinx
* devel/py-wheel
* sysutils/py-ansible-core
Comment 10 Danilo G. Baio freebsd_committer freebsd_triage 2022-02-13 13:01:21 UTC
(In reply to Charlie Li from comment #9)

More examples can be found here (Just check the build logs):
https://portsfallout.com/fallout?category=extract
Comment 11 Kyle Evans freebsd_committer freebsd_triage 2022-02-13 15:36:14 UTC
(In reply to Charlie Li from comment #9)

Does this actually work? IIRC the root cause of this problem was that we use statically linked tar in the jail, but now that I say that out loud it may have been trying to use iconv and failing dlopen.
Comment 12 Charlie Li freebsd_committer freebsd_triage 2022-02-13 15:41:05 UTC
(In reply to Kyle Evans from comment #11)
Yes. Was slimming down some of my port overlays and both devel/py-wheel and sysutils/py-ansible-core, which have UTF-8 paths in their tarballs; build successfully without any further locale adjustments ie EXTRACT_CMD under native-xtools. Other ports under all of native-xtools, QEMU_EMULATING and native architecture continue unaffected.
Comment 13 Kyle Evans freebsd_committer freebsd_triage 2022-02-13 19:49:04 UTC
(In reply to Charlie Li from comment #12)

Can you open up a new issue for C.UTF-8 in the ports framework and request an exp-run, please?
Comment 14 Christian Ullrich 2023-04-19 12:37:13 UTC
This just bit me with devel/aarch64-none-elf-gcc in poudriere, 13.2-RELEASE jail with native-xtools, building on amd64 for aarch64. The EXTRACT_CMD workaround worked there, too.

tar: Pathname can't be converted from UTF-8 to current locale.
x gcc-11.3.0/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go
tar: Pathname can't be converted from UTF-8 to current locale.
x gcc-11.3.0/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go
Comment 15 rdunkle 2024-11-06 14:10:51 UTC
I hit this problem with riscv64.  Fails to build:
shells/bash-completion
devel/riscv64-none-elf-gcc
My workaround was to edit Makefile.
Added this:
.if ${ARCH} == riscv64
EXTRACT_CMD=    ${SETENV} LC_ALL=en_US.UTF-8 /usr/bin/bsdtar
.endif