Bug 239891 - net/aoe: Fix for FreeBSD 12+ (no member named 'tqh_first' in 'struct ifnethead')
Summary: net/aoe: Fix for FreeBSD 12+ (no member named 'tqh_first' in 'struct ifnethead')
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Robert Clausecker
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-15 21:41 UTC by chadf
Modified: 2023-10-20 22:01 UTC (History)
3 users (show)

See Also:
bugzilla: maintainer-feedback? (fjoe)


Attachments
Patch file for aoenet.c (1.32 KB, patch)
2019-08-15 21:41 UTC, chadf
no flags Details | Diff
Patch file for aoenet.c (1.31 KB, patch)
2023-08-09 21:41 UTC, chadf
no flags Details | Diff
Match for net/aoe/Makefile (424 bytes, patch)
2023-08-09 21:43 UTC, chadf
no flags Details | Diff
Patch to fix ""redefinition of frame" (1.44 KB, patch)
2023-08-28 02:34 UTC, chadf
no flags Details | Diff
net/aoe: fix build on non-x86, modernise (13.81 KB, patch)
2023-09-05 14:04 UTC, Robert Clausecker
no flags Details | Diff
Set a valid VNET context (548 bytes, patch)
2023-09-07 21:17 UTC, chadf
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description chadf 2019-08-15 21:41:34 UTC
Created attachment 206600 [details]
Patch file for aoenet.c

Potential fix patch for FreeBSD 12.x (and 13.x?). Should be applied after existing ports patches.

Compiles on FreeBSD 11.2 and 12.0.

Note: The IFNET_FOREACH() macro could also be shortened to two arguments, as the third will always be the same.
Comment 1 Rene Ladan freebsd_committer freebsd_triage 2021-12-06 12:25:44 UTC
Maintainer reset.
Comment 2 Robert Clausecker freebsd_committer freebsd_triage 2023-08-08 16:10:04 UTC
Patch seems to work.  Are you interested in taking maintainership over this port?
Comment 3 commit-hook freebsd_committer freebsd_triage 2023-08-09 13:05:29 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=ac47064e4aac2fedc24270363fbb1b26eaf0b2bd

commit ac47064e4aac2fedc24270363fbb1b26eaf0b2bd
Author:     Chad Fraleigh <chadf@triularity.org>
AuthorDate: 2023-08-08 16:10:56 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2023-08-09 13:02:39 +0000

    net/aoe: fix build on FreeBSD 12

    Still broken on FreeBSD 13+, but for a different reason now:

    aoenet.c:313:22: error: use of undeclared identifier 'ifnet'; did you mean 'ifunit'?
            IFNET_FOREACH(ifp, &ifnet, if_link) {
                                ^~~~~
                                ifunit

    PR:             239891
    MFH:            2023Q3

 net/aoe/Makefile             | 13 ++++++-----
 net/aoe/files/patch-aoenet.c | 51 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 7 deletions(-)
Comment 4 commit-hook freebsd_committer freebsd_triage 2023-08-09 13:09:47 UTC
A commit in branch 2023Q3 references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=ee03f6a5f18264c0f7434322d56fcfb83c8324f4

commit ee03f6a5f18264c0f7434322d56fcfb83c8324f4
Author:     Chad Fraleigh <chadf@triularity.org>
AuthorDate: 2023-08-08 16:10:56 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2023-08-09 13:08:21 +0000

    net/aoe: fix build on FreeBSD 12

    Still broken on FreeBSD 13+, but for a different reason now:

    aoenet.c:313:22: error: use of undeclared identifier 'ifnet'; did you mean 'ifunit'?
            IFNET_FOREACH(ifp, &ifnet, if_link) {
                                ^~~~~
                                ifunit

    PR:             239891
    MFH:            2023Q3

    (cherry picked from commit ac47064e4aac2fedc24270363fbb1b26eaf0b2bd)

 net/aoe/Makefile             | 11 +++++-----
 net/aoe/files/patch-aoenet.c | 51 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 5 deletions(-)
Comment 5 Robert Clausecker freebsd_committer freebsd_triage 2023-08-09 14:01:16 UTC
Thank you for your contribution.
I'm once again sorry for how long it took to get this committed.

I found that this patch unfortunately does not suffice to get aoe(4) running on FreeBSD 13.  Perhaps you could check if it can be made to work there, too?
Comment 6 chadf 2023-08-09 21:41:58 UTC
Created attachment 243991 [details]
Patch file for aoenet.c
Comment 7 chadf 2023-08-09 21:43:07 UTC
Created attachment 243992 [details]
Match for net/aoe/Makefile
Comment 8 chadf 2023-08-09 21:45:10 UTC
Added two additional patches to have it compile on 13.x (and load with VIMAGE kernel).

Let me know if I should move these to a new bug report.
Comment 9 Robert Clausecker freebsd_committer freebsd_triage 2023-08-10 08:29:17 UTC
Thanks.  Here are some things I plan to change on commit:

 - It seems like your new option VIMAGE is not required to make the build
   succeed.  The macro seems to be defined internally anyway.  Is it okay if
   I remove the option?
 - I'll go ahead and remove the __FreeBSD_version >= 1200000 bits.
   FreeBSD 11 and before are out of support and need no longer be considered.

Perhaps you might also want to have a look at build failures on other platforms.
The build on arm/arm64 fails because struct frame collides with an existing name:

../../dev/aoe/aoe.h:170:8: error: redefinition of 'frame'
struct frame {
       ^
./machine/frame.h:69:8: note: previous definition is here
struct frame {
       ^
1 error generated.

(sparc is out of support and that should be all failures)
Comment 10 Robert Clausecker freebsd_committer freebsd_triage 2023-08-10 09:08:28 UTC
So I tried to test your patched kernel module on my amd64 server and it crashed the system :-(  The crash occured at:

#7  <signal handler called>
#8  0xffffffff82b9fbe1 in aoenet_xmitbcast () from /boot/modules/aoe.ko
#9  0xffffffff82ba0987 in discover_timer () from /boot/modules/aoe.ko
#10 0xffffffff80c23803 in softclock_call_cc (
    c=0xffffffff82ba2868 <discover_callout>, 
    cc=cc@entry=0xffffffff81cb3940 <cc_cpu>, direct=direct@entry=0)
    at /usr/src/sys/kern/kern_timeout.c:692
#11 0xffffffff80c23c69 in softclock (arg=0xffffffff81cb3940 <cc_cpu>)
    at /usr/src/sys/kern/kern_timeout.c:812
#12 0xffffffff80bc620a in intr_event_execute_handlers (ie=0xfffff80003c1c100, 
    p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1169
#13 ithread_execute_handlers (ie=0xfffff80003c1c100, p=<optimized out>)
    at /usr/src/sys/kern/kern_intr.c:1182
#14 ithread_loop (arg=arg@entry=0xfffff80003c35000)
    at /usr/src/sys/kern/kern_intr.c:1270
#15 0xffffffff80bc2f9e in fork_exit (
    callout=0xffffffff80bc5fb0 <ithread_loop>, arg=0xfffff80003c35000, 
    frame=0xfffffe00e1b39f40) at /usr/src/sys/kern/kern_fork.c:1093
#16 <signal handler called>

This seems to correspond to the line

    IFNET_FOREACH(ifp)

in aoenet_xmitbcast.  The specific faulting instruction seems to be one that tries to load vnet_entry_ifnet.  It fails with what looks like it is a null pointer dereference.
Comment 11 chadf 2023-08-10 18:45:30 UTC
I added the VIMAGE option since it wouldn't load on a 12.1 test system (ifnet symbol missing). After adding it in, it was able to load properly.

I suspect the raw ifnet [ompile time] symbol was dropped in 13.x and it always needs to go through the V_ifnet macro.

I didn't actually try to load it on 13.x, only compiled (as I was using my main server). I could spin up a 13.x VM and do a little more checking. I worry it might be getting beyond my kernel internals expertise, if anything significant about the way the ifnet structures changed.
Comment 12 Robert Clausecker freebsd_committer freebsd_triage 2023-08-11 22:12:50 UTC
It would be great if you could check.  Apparently the file "opt_global.h" has #define VIMAGE 1 in it if that feature is enabled.  Perhaps you can just include the file?

Thank you for your great help in any way.
Comment 13 Robert Clausecker freebsd_committer freebsd_triage 2023-08-13 09:52:21 UTC
Not actually in progress right now; waiting for you to revise the patch.
Comment 14 chadf 2023-08-28 02:29:31 UTC
(In reply to Robert Clausecker from comment #12)

It seems VIMAGE is enabled automatically for 13.x (at least 13.2) except for MIPS. But in 12.1 (12.x?) it was optional. If I remove the makefile option, it will break 12.x builds.
Comment 15 chadf 2023-08-28 02:34:26 UTC
Created attachment 244404 [details]
Patch to fix ""redefinition of frame"

This patch should fix the "redefinition of frame" issue for all architectures. I was unable to get the "implicit declaration of function 'cp15_pmccntr_get' is invalid in C99" error on armv6. Perhaps it was a 12.x error? I only tried building with 13.2 armv6 so far.
Comment 16 Robert Clausecker freebsd_committer freebsd_triage 2023-08-28 08:52:40 UTC
(In reply to chadf from comment #15)

Thanks!  armv6 is obsolete anyway (only supported system is the rpi1).  I'll see that I can get this patch committed following some testing.
Comment 17 Robert Clausecker freebsd_committer freebsd_triage 2023-09-05 14:04:07 UTC
Created attachment 244662 [details]
net/aoe: fix build on non-x86, modernise

This is the patch I synthesised from the patches you submitted.  In addition to your changes, I also tried unsuccessfully to fix the build on FreeBSD 13 and removed some cruft from now unsupported versions of FreeBSD.  I also simplified the installation procedure and installed the kernel module into the standard location.  This builds fine for me on 12.4 (amd64, aarch64).

It also builds fine on 13.2, but fails for the reasons outlined earlier.

Please check if the patch works for you so I can go ahead and commit it.
Comment 18 chadf 2023-09-05 19:19:27 UTC
I'm still having trouble trying to debug the crash problem. I've tried compiling the kernel with remote gdb support, using the directions in section 10.4 of the Kernel Debugging handbook, but it doesn't show up in the debug.kdb.available list.

The way the V_* mapping is implemented is strange and I don't fully understand it, so I'm really just stumbling around, trying to figure it out. It's probably something which someone with in-depth knowledge of how the VNET implementation could look at it for 5 minutes and know exactly what the issue is.

Fixing the "redefinition of frame" issue was something I could do in the meantime, hoping the ifnet crash would eventually be resolved too.
Comment 19 chadf 2023-09-07 21:15:12 UTC
I think I have a usable fix for the vnet crash. But it depends on the flexibility required by this package..

The 3 options are:

 1) It will only be used by the default vnet (vnet0).
   - I have a [tentative] fix for this.

 2) It can be used in any single vnet.
   - This would require more research/work to figure out how to best save and use a specific vnet context.

 3) It can be used in an arbitrary number of vnets.
   - This is same as option #2, but would also require doing a full vnet porting of the code.

For option 1 & 2, if a user attempts to use it in more than one vnet, the OS may allow it, with undefined results such as unexpected behavior and/or crashes. I suppose a check could be added to the MOD_LOAD event to explicitly check/prevent this, if a stern warning in the package description isn't enough.
Comment 20 chadf 2023-09-07 21:17:24 UTC
Created attachment 244700 [details]
Set a valid VNET context

The patch for option #1.
Comment 21 Robert Clausecker freebsd_committer freebsd_triage 2023-09-08 01:09:45 UTC
This is a good question which I am not qualified to answer.
However, if you wish to maintain this port, It is up to you to make a decision.
Comment 22 commit-hook freebsd_committer freebsd_triage 2023-09-10 16:15:15 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=20f05250fa96558ffcb8c93ae4a3f92de646f6ac

commit 20f05250fa96558ffcb8c93ae4a3f92de646f6ac
Author:     Chad Fraleigh <chadf@triularity.org>
AuthorDate: 2023-08-10 08:25:56 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2023-09-10 16:12:56 +0000

    net/aoe: fix build on non-x86

    While we are at it:

     - remove BROKEN_... lines for FreeBSD versions out of support
     - drop homegrown do-install in favour of USES=uidfix
     - install kernel module into standard location
     - attempt to fix the build on FreeBSD 13

    Still doesn't work on FreeBSD 13.2.  While it builds, the code now
    fails at runtime, apparently when it tries to dereference vnet_entry_ifnet
    in the line

        IFNET_FOREACH(ifp)

    in aoenet_xmitbcast.

    PR:             239891

 net/aoe/Makefile                |  21 ++--
 net/aoe/files/aoe.in            |   1 -
 net/aoe/files/patch-aoe.h (new) |  10 ++
 net/aoe/files/patch-aoenet.c    | 248 +++++++++++++++++-----------------------
 net/aoe/pkg-plist (gone)        |   2 -
 5 files changed, 127 insertions(+), 155 deletions(-)
Comment 23 Robert Clausecker freebsd_committer freebsd_triage 2023-09-17 15:54:45 UTC
See also bug #194663.
Comment 24 chadf 2023-10-20 21:30:18 UTC
As a side thought, I was wondering if the IFNET_FOREACH() macro should be moved into the base sys/net/if_var.h header, as the same foreach pattern is used in various kernel functions. I've noticed other cases, like VNET_FOREACH() and CPU_FOREACH(), doing this already. It could reduce the chance of ports breaking due to the underlying ifnet list type changing again.

Assuming the core kernel developers thought it was something of value, I'd be willing to do leg work of creating/submitting the patches for it.
Comment 25 Robert Clausecker freebsd_committer freebsd_triage 2023-10-20 22:01:23 UTC
(In reply to chadf from comment #24)

This sounds like it could be useful, but I'm not really the right person to judge this.  Perhaps you could ask on the freebsd-net@freebsd.org mailing list.