Bug 265241 - Runaway builds on armv6, armv7 in port cad/iverilog
Summary: Runaway builds on armv6, armv7 in port cad/iverilog
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-toolchain (Nobody)
URL: http://beefy12.nyi.freebsd.org/data/m...
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-15 19:12 UTC by Yuri Victorovich
Modified: 2022-08-11 06:47 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Victorovich freebsd_committer freebsd_triage 2022-07-15 19:12:45 UTC
Log: /home/yuri/.cache/fallout/main-armv7-default/cad/iverilog/2022-07-15T19:02:59.log
Comment 1 Yuri Victorovich freebsd_committer freebsd_triage 2022-07-15 19:13:40 UTC
Runaway builds happen over and over again.
Comment 2 Dimitry Andric freebsd_committer freebsd_triage 2022-07-15 19:28:39 UTC
I haven't got any way to reproduce this, so if you have a hanging clang instance, can you send a SIGABRT to it, and attach the preprocessed .i (or .ii) and .sh files it drops in /tmp ?
Comment 3 Yuri Victorovich freebsd_committer freebsd_triage 2022-07-15 19:30:46 UTC
I only saw this in fallout logs.
I don't have an immediate way to reproduce it either.
Comment 4 Mark Millard 2022-07-15 20:02:44 UTC
(In reply to Yuri Victorovich from comment #0)

> Log: /home/yuri/.cache/fallout/main-armv7-default/cad/iverilog/2022-07-15T19:02:59.log

As far as I can tell, this is a private path in your personal context
and does not give anyone else access to the log file in question.

Can you provide public log file access?

Is this a qemu based build environment? An aarch64 that is able to run
aarch32/armv7 code? Native armv7?

So far as I know, qemu has never gotten past the issue of having
various forms of hangups caused by qemu itself, although it has
been improving on the issue as I understand. ld's threading used
to be an example failure context that would show up, for example.

FreeBSD's build servers use qemu for armv7 builds.

If the context is qemu based, it is appropriate to have someone
check on a hardware environment in order to isolate qemu vs. other
cause. If it works on hardware, it likely is a qemu problem.

There is also the issue of the scaling of the various poudriere
timeouts in environments with different performance. Looking at
a recent FreeBSD build server log targeting armv7, I see after
an ld command:

=>> Killing runaway build after 21600 seconds with no output
=>> Cleaning up wrkdir
===>  Cleaning for iverilog-11.0
Killed
build of cad/iverilog | iverilog-11.0 ended at Sun Jul 10 04:22:58 UTC 2022
build time: 06:36:20
!!! build failure encountered !!!

Poudriere expects builds to have periodic status output, but
some ports do not have status output during some time consuming
activities.

Unlike the "/nxb-bin/usr/bin/cc"s in the log, The ld just is using:
"ld" --and so looks to be via qemu emulation, not special native
cross-tools (that are used to make things take much less time
than qemu based execution would).

For the FreeBSD build server, I'd guess a qemu related process hangup
from the threading activity --but I do not really know.
Comment 5 Yuri Victorovich freebsd_committer freebsd_triage 2022-07-15 20:06:11 UTC
(In reply to Mark Millard from comment #4)

Sorry, I filled the log URL in the URL field.
Comment 6 Yuri Victorovich freebsd_committer freebsd_triage 2022-07-15 20:10:55 UTC
You can also install the new ports-mgmt/fallout port and run:

$ fallout fetch
$ fallout grep iverilog

to download all relevant logs for iverilog, and similarly for any other port.
Comment 7 Mark Millard 2022-07-15 20:29:51 UTC
(In reply to Yuri Victorovich from comment #1)

I tried on a aarch64 that can execute armv7 code:

# poudriere jail -i -jmain-CA7-bulk_a
Jail name:         main-CA7-bulk_a
Jail version:      14.0-CURRENT
Jail arch:         arm.armv7
Jail method:       null
Jail mount:        /usr/obj/DESTDIRs/main-CA7-poud-bulk_a
Jail fs:           
Jail updated:      2021-12-04 14:54:10
Jail pkgbase:      disabled

The result was (after it built 17 prerequisite ports):

[00:14:22] [01] [00:00:00] Building cad/iverilog | iverilog-11.0
[00:17:15] [01] [00:02:53] Finished cad/iverilog | iverilog-11.0: Success

(It had 16 hardware threads it could use, partially explaining
the build time.)

Looks like the classic qemu issues to me, not something specific to
cad/iverilog .
Comment 8 Yuri Victorovich freebsd_committer freebsd_triage 2022-07-15 20:34:03 UTC
(In reply to Mark Millard from comment #7)

Maybe increasing runaway timeouts on architectures using qemu would be a solution?
Comment 9 Kyle Evans freebsd_committer freebsd_triage 2022-07-15 20:34:55 UTC
(In reply to Yuri Victorovich from comment #8)

No, that will just make it wait longer before finally timing out. Of the previously seen hangs in qemu-bsd-user, none of them are recoverable.
Comment 10 Mark Millard 2022-07-15 21:33:51 UTC
(In reply to Yuri Victorovich from comment #5)

By the way, main-armv7-default does not target 13.0-RELEASE.
It targets main [so: 14 at this point]. SO the description and
Version fields are mismatched.

However, the qemu issue is not specific to main vs. stable/13
vs. releng/13.0 or the like.

I'm not sure what Component is appropriate to the evidence,
possibly even the qemu port that is used.
Comment 11 Mark Millard 2022-07-15 21:41:01 UTC
(In reply to Yuri Victorovich from comment #8)

Timeouts can be for either of 2 reasons, both of which
happen:

A) hung process that will never unhang on its own

B) processes that really would just take longer

poudriere already has a bunch of separate internal timeout figures for
qemu contexts. But no specific figure is going to cover all cases well.

In this context, I expect (A), making the specific value be of no
fundamental help.

While I've not used qemu builds in a long time, back when I did there
were examples of builds that would sometimes hang and sometimes not,
even for a sequence of simple rebuilds. At the time, some hang-ups
were racy in some way.
Comment 12 Yuri Victorovich freebsd_committer freebsd_triage 2022-07-15 21:53:27 UTC
Another option would be for the FreeBSD foundation to buy appropriate hardware to run builds without qemu.
Comment 13 Mark Millard 2022-07-15 22:00:43 UTC
(In reply to Yuri Victorovich from comment #8)

Workaround . . .?

It would be specific to ld, but if a port allows passing in
a command line option to ld, one that turns off adding
threads would likely avoid the specific hang-up example that
I'd observed during ld activity. The link might take longer
on non-qemu contexts --unless the option could be added only
when running under qemu.

Other hang-up processes (if any) would not be avoided by
this. It is not a general solution to hang-ups under qemu.

If I remember right, llvm's ld has changed the command line
option notation for control of threading over time. If so,
that might complicate things for generating the right command
line option to use --unless the notation-change was far enough
in the past to ignore. (I do not remember the timing or
other details.)
Comment 14 Mark Millard 2022-07-15 22:25:11 UTC
(In reply to Yuri Victorovich from comment #12)

Or you could donate the money to fund adding and supporting such
systems.

More seriously, https://fedoraproject.org/wiki/Changes/RetireARMv7
is about Fedora 37 (or so) working to no longer support armv7 at
all. Also, quoting https://www.linuxserver.io/blog/end-of-an-arch :

"MongoDB dropped support for 32-bit platforms with version 4.x"
"Alpine no longer builds new Java packages for 32-bit platforms"

Things are winding down for armv7 (without aarch64) and
other 32 platforms, though it takes a long time. aarch64
with aarch32/armv7 will likely become less common as well.

Looking to the future seems more likely than looking to the
past. The closest fit might be aarch64 hardware that can
exectute aarch32/armv7 code if something was to be done.
But how likely would this be the top priority for the required
amount of funds vs. other alternative uses?
Comment 15 Ed Maste freebsd_committer freebsd_triage 2022-07-17 17:19:39 UTC
(In reply to Yuri Victorovich from comment #12)
We have a number of Ampere eMAG systems in the cluster for package building, a combination of FreeBSD Foundation purchases and donations from Ampere.
Comment 16 Mark Millard 2022-07-17 18:27:45 UTC
(In reply to Ed Maste from comment #15)

But do the eMAG's support direct aarch32 and armv7 code execution,
like Cortex-A72's do, for example? I think that was Yuri's point
in the suggestion.

If they did, then armv7 poudriere jails could be used to build
ports for armv7 without use of of the problematical qemu
environment. (That is what I do on the HoneyComb that I have
access to. But that is a 16 core Cortex-A72 context. At this
point I build for main [so: 14] but could easily build for
stable/13 and releng/13.1 via such armv7 jails. I've done so
in the past.)

(I'll note that there are a few armv7 ports that work when
there are only 4 cores put to use but that fail for memory
space issues when 16 cores are put to use instead. I've
actually test built such on a RPi4B to check the status on
rare occasion. So I'm not claiming there would be no oddities
for building on aarch64 that was also aarch32/armv7 capable,
but far fewer than for qemu use.)
Comment 17 Ed Maste freebsd_committer freebsd_triage 2022-07-17 19:50:53 UTC
(In reply to Mark Millard from comment #16)

> But do the eMAG's support direct aarch32

They do
Comment 18 Mark Millard 2022-07-17 21:25:58 UTC
(In reply to Ed Maste from comment #17)

Cool.

Then, is there a reason that the problematical qemu technique
of covering the building armv7 ports is still in use instead
of having some eMAG(s) use a armv7 poudriere jail?


Treating armv6 as less important at this stage . . . why . . .

Covering armv6 as well (without qemu use) gets into having an
alternate kernel that is based on instead using:

#define MACHINE_ARCH32  "armv6"

in sys/arm64/include/param.h . That would make it a boot-kernel
choice for which of armv7 vs. armv6 can be done without qemu
--and a matching poudriere jail could be used for armv6 when
booted for armv6 support.

But, not a likely way to set up/handle a build server. Having
an eMAG permanently with such a special kernel could limit its
other uses. So, likely, just armv7 coverage (the default
MACHINE_ARCH32) via an armv7 poudriere jail.

(I had some notes on this from having helped someone establish
an armv6 poudriere jail context, where they could reasonably
control the boot-kernel via reboots as needed and they could
build both styles of kernels for themselves.)
Comment 19 Mark Millard 2022-08-11 06:26:30 UTC
Now that armv7 ports are building on hardware that can execute armv7
code, such as ampere2, instead of using qemu,

http://ampere2.nyi.freebsd.org/data/main-armv7-default/p83aeeda2ebb7_s30253da1a/logs/iverilog-11.0.log

is an example from a successful Sun Jul 31 06:38:02 UTC 2022 build.

(This does not cover armv6, just armv7. Possibly armv6 and armv7
reports should be split because of qemu use being involved (armv6)
vs. not (armv7). That is a huge difference in context.)
Comment 20 Yuri Victorovich freebsd_committer freebsd_triage 2022-08-11 06:47:05 UTC
(In reply to Mark Millard from comment #19)

Yes, that's what we will be doing.