After upgrading to FreeBSD 13.2, I notice issues with the Unbound DNS resolver. There seem to be a lot of timeouts to the forward servers: Apr 18 14:32:59 canobox0 unbound[98053]: [98053:0] error: SERVFAIL <www.google.com. AAAA IN>: all the configured stub or forward servers failed, at zone . from 9.9.9.9 upstream server timeout Disabling ASLR seems to resolve the issue. The Unbound server is using DNS over TLS with the following configuration: forward-zone: name: . forward-tls-upstream: yes forward-addr: 9.9.9.9@853 # Quad9 forward-addr: 149.112.112.112@853 # Quad9 forward-addr: 2620:fe::fe@853 # Quad9 forward-addr: 2620:fe::9@853 # Quad9
Except for the timeouts on the upstream servers, it does not crash the Unbound server. It is able to recover but it keeps happening every so often. Workarounds: # elfctl -e +noaslr /usr/local/sbin/unbound or # sysctl kern.elf64.aslr.enable=0 or Disable TLS support for the upstream servers.
It appears to be failing in the SSL/TLS handshake to the upstream forwarders after a period of time. Almost seems like a queue is getting backed up somewhere and things start overflowing. I've spent the last hour trying to debug it, thinking it was a problem w/ Quad9 itself. The key error in log files from Unbound is this: > unbound[19]: [19:2] error: SSL_handshake syscall: Connection reset by peer I've turned up the verbosity, but the additional detail does not shed any additional light on why the connection was reset (no SSL errors/reasons, etc). It seems like everything just works with each connection until it doesn't. I started noticing these issues when my Squid proxy would just start returning read errors after launching a browser on one of my desktops. Waiting about ~30s-1m after launching the browser seemed to let the dust settle and then things would seemingly work fine for hours before you'd see things start to hiccup again. Websites that do a *lot* of background chatter, like Twitter, Facebook, etc, seemed to trip the issue up the most because they pile the queries up and seem to overload Unbound, causing the SSL errors to appear. A good way to reproduce is first, silence outbound network traffic on the network, or at least traffic that will hit a particular Unbound DNS server. Edit the config, set verbosity to '2' and restart the daemon. tail -f the log file and once it's loaded up, find a Windows box, and launch MS Edge. Edge makes a *ton* of queries at once when it loads and on my end, the first few got resolved fine against Quad9, then the SSL errors would start to appear, causing Unbound to try other forwarders and eventually giving up and returning SERVFAIL. Can confirm, though, that turning ASLR off for the Unbound binary appears to make things smooth again.
Cross-reference: <https://old.reddit.com/r/freebsd/comments/12prz7k/-/>.
This doesn't seem like a port issue to me. Reported to the developers as an issue https://github.com/NLnetLabs/unbound/issues/887
I have aslr's enabled on FreeBSD 13.`-RELEASE and I didn't have problems and after upgrade to 13.2 RELEASE it works without problem to the Tuesday this week and it stopped. I disable aslr and the problem exist..
Is it possible in my case that is a problem related to gnutls/libretls? I am using openntpd which installs libretls and I thing that problem start after update of openntpd. Thank you.
We should probably use the `USES= elfctl` facility to disable ASLR on unbound (and maybe all the unbound binaries) until this can be fixed upstream. Something like this should do +- USES= autoreconf cpe libtool pkgconfig ssl ++ USES= autoreconf cpe elfctl libtool pkgconfig ssl +CPE_VENDOR= nlnetlabs ++ ELF_FEATURES= +noaslr:${PORTNAME}
Perhaps you can add a "NOASLR" build option to the port instead of totally disabling ALSR? I am running a couple of dns/unbound servers with ASLR and PIE enabled as standalone recursive DNS servers for a few hundred clients and both run fine for weeks/months, there is no need to restart or poke them between the upgrades.
Created attachment 242199 [details] dns/unbound: introduce NOASLR port option Good idea. I have attached a patch that adds "NOASLR" as make option
A fix is developed by upstairs. There will be a new release within weeks with this fix. For the inpatients among us, a prerelease is made available <https://github.com/NLnetLabs/unbound/issues/887#issuecomment-1570136710>.
(In reply to Jaap Akkerhuis from comment #10) There is a new release (1.18.0, See bug #73456) which promised to fix the problem. We can close this, as it has been taken over by events.