Bug 281266 - dns/nsd: Versions 4.10.0 and 4.10.1 hanging at startup with high CPU load
Summary: dns/nsd: Versions 4.10.0 and 4.10.1 hanging at startup with high CPU load
Status: Closed Overcome By Events
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-ports-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-09-04 13:42 UTC by regbin
Modified: 2024-12-17 00:59 UTC (History)
1 user (show)

See Also:
jaap: maintainer-feedback+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description regbin 2024-09-04 13:42:46 UTC
Version 4.10.0 and subsequently 4.10.1 hangs indefinitely at startup causing high CPU load. Apparently that happens while parsing the zone files. Without any zone files configured the service starts normally. Same configuration works fine with versions < 4.10. No log/console output even with max verbosity and debugging on.

FreeBSD 13.3-RELEASE/amd64
Comment 1 Jaap Akkerhuis 2024-09-06 12:20:40 UTC
(In reply to regbin from comment #0)

This smells very much like what was reported up-stream. See https://github.com/NLnetLabs/unbound/issues/1127 for the details.
Comment 2 Jaap Akkerhuis 2024-09-06 13:11:06 UTC
(In reply to regbin from comment #0)

This smells very much like what was reported up-stream. See https://github.com/NLnetLabs/unbound/issues/1127 for the details.
Comment 3 regbin 2024-09-10 14:04:33 UTC
I can neither confirm, nor dismiss your suggestion for lack of any feedback from the nsd service (using the standard FreeBSD package). However, there are only a couple of zone files with < 100 entries each. Even the minimalistic zone from the official nsd documentation[1] alone triggers the same behaviour.


[1] https://nsd.docs.nlnetlabs.nl/en/latest/zonefile.html
Comment 4 Jaap Akkerhuis 2024-09-11 13:42:47 UTC
(In reply to regbin from comment #3)
I cannot reproduce this problem (using the example.com zone file).

nsd-checkconf, nsd-checkzone might give some hints. Also, raising the verbosity of the  server might give some more information.
Comment 5 regbin 2024-09-12 13:37:58 UTC
Tested with both nsd 4.9.1 and 4.10.1.

nsd-checkconf returns no errors in both versions. nsd-checkzone returns no error in version 4.9.1, however the same zone files (also the example.com zone) in version 4.10.1 result in

# nsd-checkzone example.com zones/example.com.zone 
Illegal instruction (core dumped)

Starting the service (version 4.10.1) with 'verbosity: 3' produces only:

[2024-09-12 15:58:57.130] nsd[12883]: notice: nsd starting (NSD 4.10.1)
[2024-09-12 15:58:57.130] nsd[12883]: notice: listen on ip-address ::1@5353 (udp) with server(s): *
[2024-09-12 15:58:57.130] nsd[12883]: notice: listen on ip-address ::1@5353 (tcp) with server(s): *

which doesn't seem very helpful (already tried with all documented verbosity levels before, hence my comment about no meaningful debugging feedback from the service).

Manually killing the nsd sub-process (since it's not responding to any commands) produces:

[2024-09-12 15:59:59.312] nsd[12919]: error: did not get start signal from main

That's how nsd looks like in the process list:

nsd     65678   0.0  0.1 68964  8264  -  IsJ  16:15   0:00.04 nsd: xfrd (nsd)
nsd     65765 100.0  0.3 50168 37384  -  RJ   16:15   0:32.13 - nsd: main (nsd)


I cannot reproduce the problem with the same configuration files on any other machine either. Is it possible that I've hit some zone-parsing simdzone bug in combination with old hardware? The problematic nsd instance is running on a rather old machine:

CPU: Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz (2809.95-MHz K8-class CPU)

SSE4.2 instructions seem to be available, AVX2 on the other hand - not. I'm not aware of the internal nsd/simdzone workings in such a situation and if that could be the root of the problem at all.
Comment 6 Jaap Akkerhuis 2024-09-13 12:23:10 UTC
(In reply to regbin from comment #5)
Thanks for the extra information. And indeed, it might be possible you hit a simdzone bug figuring out which instructions can be used. There was something discovered like that before, see https://github.com/NLnetLabs/simdzone/issues/222 (and 223).

As just a port maintainer I won't know how to fix this. Could you open an issue for this? Meanwhile, I will try to warn the developers.
Comment 7 regbin 2024-09-16 15:08:20 UTC
Thanks a lot for the efforts and the info. Opened an issue as requested - https://github.com/NLnetLabs/nsd/issues/382
Comment 8 regbin 2024-10-04 09:44:04 UTC
Turned out to be a simdzone bug indeed. The fix was merged in upstream main yesterday and will be available with the next release. More info available in the GitHub issue (link in the comment above).
Comment 9 Jaap Akkerhuis 2024-12-13 13:21:42 UTC
Overtaken by events

The release for nsd-4.11.0 should incorporate a fix, see bug #283308
Comment 10 commit-hook freebsd_committer freebsd_triage 2024-12-17 00:59:41 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=33b3bd3a54f58052db01a7703e88ccf85958c4aa

commit 33b3bd3a54f58052db01a7703e88ccf85958c4aa
Author:     Jaap Akkerhuis <jaap@NLnetLabs.nl>
AuthorDate: 2024-12-13 12:42:53 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2024-12-17 00:57:20 +0000

    dns/nsd: Update to 4.11.0

    Changelog: https://www.nlnetlabs.nl/news/2024/Dec/12/nsd-4.11.0-released/

    PR:             283308, 281266

 dns/nsd/Makefile | 2 +-
 dns/nsd/distinfo | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)