Bug 257353

Summary: lang/python38: Intermittently fails to build under QEMU: BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Product: Ports & Packages Reporter: Robert Clausecker <fuz>
Component: Individual Port(s)Assignee: Warner Losh <imp>
Status: Open ---    
Severity: Affects Many People CC: cyberbotx, danfe, greif, imp, mmpestorich, python
Priority: --- Keywords: needs-qa
Version: LatestFlags: bugzilla: maintainer-feedback? (python)
koobs: maintainer-feedback? (imp)
koobs: merge-quarterly?
Hardware: Any   
OS: Any   
URL: https://portsfallout.com/fallout?port=lang%2Fpython38%24

Description Robert Clausecker freebsd_committer freebsd_triage 2021-07-23 13:20:23 UTC
When building lang/python38 with QEMU, the build usually gets stuck with an error like this:

--->8--->8---
Listing '/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/xml/parsers'...
Listing '/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/xml/sax'...
Listing '/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/xmlrpc'...
Traceback (most recent call last):
  File "/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/compileall.py", line 332, in <module>
    exit_status = int(not main())
  File "/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/compileall.py", line 314, in main
    if not compile_dir(dest, maxlevels, args.ddir,
  File "/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/compileall.py", line 93, in compile_dir
    success = min(results, default=True)
  File "/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/wrkdirs/usr/ports/lang/python38/work/stage/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
--->8--->8---

Some times it also does not fail, which leads me to believe there might be some sort of race
condition made visible by the use of emulation.

There are many ports fallouts demonstrating this issue:

    https://portsfallout.com/fallout?port=lang%2Fpython38%24

Unfortunately, as the bug causes the port to get stuck, this is very annoying to deal with.
Please investigate and perhaps mark as BROKEN when QEMU_EMULATING is defined.
Comment 1 Warner Losh freebsd_committer freebsd_triage 2021-10-29 18:11:14 UTC
I'll take a look at this.
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2021-10-30 00:23:41 UTC
Thank you for your report Robert. Could you clarify:

- uname -a output
- What host and qemu architecture Python is being built under?
- Include the full build log, as an attachment, compressed if necessary
- Whether the issue is observable in any other lang/python* ports

@Warner Do you have any tips/suggestions that may assist debugging this?
Comment 3 Robert Clausecker freebsd_committer freebsd_triage 2021-10-30 08:59:30 UTC
(In reply to Kubilay Kocak from comment #2)

> - uname -a output

FreeBSD udon 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 amd64 amd64 FreeBSD


> - What host and qemu architecture Python is being built under?

amd64 FreeBSD 13.0-RELEASE.  Python is built for armv7 FreeBSD 13 with QEMU.

> - Whether the issue is observable in any other lang/python* ports

No.  Once a Python package has been produced by some mean (e.g. by building it natively and transplanting it to the amd64 machine), Python ports seem to build fine.
Comment 4 Warner Losh freebsd_committer freebsd_triage 2021-10-31 01:44:58 UTC
this is a qemu issue, assign it to me for want of a better place
Comment 5 Alexey Dokuchaev freebsd_committer freebsd_triage 2022-10-26 12:41:45 UTC
Today I've hit this bug as well while trying to build Python 3.8 in the tinderbox backed by qemu-user-static-3.1.0_13 for 13.1-RELEASE riscv64.  Tried several times to no avail, only pinning DEFAULT_VERSIONS="python=3.7" helped.
Comment 6 Naram Qashat 2022-12-12 18:24:05 UTC
So I know this bug is for lang/python38 specifically, but I wanted to bring up that I've been having this problem repeatedly when building either lang/python39 or lang/python310 in a poudriere aarch64 jail under an amd64 machine. It happens more than it doesn't, and my recent move from the default python of 3.9 to using 3.10 was because of this problem, but now 3.10 is doing the same thing.

To answer koobs' questions for my case specifically:

uname -a:
FreeBSD kirby.cyberbotx.com 13.1-RELEASE-p2 FreeBSD 13.1-RELEASE-p2 releng/13.1-n250158-752f813d6cc GENERIC amd64

Host is amd64 13.1-RELEASE-p2, Target is aarch64 13.1-RELEASE-p2

Latest full build log with the failure: https://poudriere.cyberbotx.com:8766/data/local_aarch64-default/2022-12-11_23h33m03s/logs/errors/python310-3.10.9_1.log

(Just to note for the above, sometimes when the BadProcessPool happens, instead of proceeding, it just gets completely stuck and if I don't kill poudriere, it'll eventually kill the build was a runaway process.)

As mentioned above, I've had this happen repeatedly for many of the lang/python<ver> ports. I don't know what allows it to actually complete successfully or not.
Comment 7 Robert Clausecker freebsd_committer freebsd_triage 2022-12-12 18:40:07 UTC
Bumping importance.
Comment 8 Naram Qashat 2022-12-12 21:04:54 UTC
If it helps, I can show some of the older errors for this.

This was building lang/python38 for aarch64 13.0-RELEASE-p3:
https://poudriere.cyberbotx.com:8766/data/local_aarch64-default/2021-07-27_15h53m08s/logs/errors/python38-3.8.11.log

This was building lang/python39 for aarch64 13.1-RELEASE-p2:
https://poudriere.cyberbotx.com:8766/data/local_aarch64-default/2022-12-05_09h30m51s/logs/errors/python39-3.9.15_1.log

(That second one was just before I switched to lang/python310 to get past this, which succeeded until trying to build it again yesterday.)

The same as the previous one except it never got to the package step and instead got stuck for over 6 hours before being killed for runaway process:
https://poudriere.cyberbotx.com:8766/data/local_aarch64-default/2022-12-05_02h41m58s/logs/errors/python39-3.9.15_1.log

(Usually if I spot it getting into that state I'll kill it off myself instead.)
Comment 9 greif 2024-06-12 15:45:42 UTC
I ran into the same problem while building various python versions (3.9, 3.10, 3.11) for armv7 via QEMUed poudriere on an amd64 host.
As a workaround using ALLOW_MAKE_JOBS=no seem to fix it but might be my observation bias, as it hung for 3 to 4 times, then i set the option and it built fine. Fingers crossed it stays like that. 

# uname -a
FreeBSD poudriere 13.2-RELEASE-p11 FreeBSD 13.2-RELEASE-p11 GENERIC amd64