Nomad v1.8.1 is crashing on a newly installed host. this issue seems to match https://github.com/hashicorp/nomad/issues/23385 root@server:~ # /usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl ==> WARNING: mTLS is not configured - Nomad is not secure without mTLS! ==> Config enable_syslog is `true` with log_level=WARN ==> Loaded configuration from /usr/local/etc/nomad/client.hcl ==> Starting Nomad agent... panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3307f5a] goroutine 227 [running]: github.com/hashicorp/nomad/client/lib/idset.(*Set[...]).Slice(0x874c69d50?) github.com/hashicorp/nomad/client/lib/idset/idset.go:126 +0x1a github.com/hashicorp/nomad/plugins/base.nomadTopologyToProto(0x8748a47e0) github.com/hashicorp/nomad/plugins/base/base.go:171 +0x45 github.com/hashicorp/nomad/plugins/base.(*AgentConfig).toProto(0x8749c2508) github.com/hashicorp/nomad/plugins/base/base.go:96 +0x47 github.com/hashicorp/nomad/plugins/base.(*BasePluginClient).SetConfig(0x8745f2600, 0x874ec26f0) github.com/hashicorp/nomad/plugins/base/client.go:63 +0x28 github.com/hashicorp/nomad/helper/pluginutils/loader.(*PluginLoader).Dispense(0x8745ed5f0, {0x874c05a7d, 0x3}, {0xb8e215, 0x6}, 0x8749c2508, {0x136b730, 0x874ec2e40}) github.com/hashicorp/nomad/helper/pluginutils/loader/loader.go:186 +0x3d5 github.com/hashicorp/nomad/helper/pluginutils/singleton.(*SingletonLoader).dispense(0xbed8f5?, 0x874992880, {0x874c05a7d?, 0x316d685?}, {0xb8e215?, 0x8745fc000?}, 0x874188fd0?, {0x136b730?, 0x874ec2e40?}) github.com/hashicorp/nomad/helper/pluginutils/singleton/singleton.go:109 +0x53 created by github.com/hashicorp/nomad/helper/pluginutils/singleton.(*SingletonLoader).getPlugin in goroutine 189 github.com/hashicorp/nomad/helper/pluginutils/singleton/singleton.go:85 +0x39d
(In reply to Bretton Vine from comment #0) Can you share: - Operating system platform and version - Content of /usr/local/etc/nomad/client.hcl
I can reproduce this, but only if: - nomad-pot-driver is installed - plugin_dir = "/usr/local/libexec/nomad/plugins" is in client.hcl So I guess this is a breaking change in the nomad plugin structure that needs attention in nomad-pot-plugin.
I'll try applying https://github.com/hashicorp/nomad/pull/23399/files
(In reply to Michael Gmelin from comment #1) 14.0-RELEASE-p8 root@server:~ # cat /usr/local/etc/nomad/client.hcl bind_addr = "0.0.0.0" datacenter = "redacted" advertise { # This should be the IP of THIS MACHINE and must be routable by every node # in your cluster http = "10.101.2.1" rpc = "10.101.2.1" serf = "10.101.2.1" } client { enabled = true options { "driver.raw_exec.enable" = "1" } servers = [ "10.1.2.111" ] } data_dir = "/var/tmp/nomad" plugin_dir = "/usr/local/libexec/nomad/plugins" consul { address = "127.0.0.1:8500" client_service_name = "nomad.client.server1.redacted.consul" auto_advertise = true client_auto_join = true } tls { http = false rpc = false verify_server_hostname = false verify_https_client = false } telemetry { collection_interval = "15s" publish_allocation_metrics = true publish_node_metrics = true prometheus_metrics = true disable_hostname = true } enable_syslog=true log_level="WARN" syslog_facility="LOCAL1"
Created attachment 252014 [details] Fix segfault on loading plugins due to missing numa support @bretton Can you please try this patch? It works for me, but I only did superficial tests. cd /usr/ports patch -p1 </path/to/patch cd sysutils/nomad make clean reinstall clean Then test.
(In reply to Michael Gmelin from comment #5) it's working when I start manually with /usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl only errors as follows: 2024-07-13T13:03:55.656Z [WARN] agent.plugin_loader: plugin not referenced in the agent configuration file, future versions of Nomad will not load this plugin until the agent configuration is updated: plugin_dir=/usr/local/libexec/nomad/plugins plugin=nomad-pot-driver 2024-07-13T13:03:55.719Z [WARN] client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="landlock not supported on this platform" 2024-07-13T13:03:55.736Z [WARN] client.fingerprint_mgr.cni_plugins: failed to read CNI plugins directory: cni_path=/opt/cni/bin error="open /opt/cni/bin: no such file or directory" 2024-07-13T13:04:05.759Z [ERROR] client.driver_mgr.docker: failed to list pause containers for recovery: driver=docker error="Get \"http://unix.sock/containers/json?filters=%7B%22label%22%3A%5B%22com.hashicorp.nomad.alloc_id%22%5D%7D\": dial unix /var/run/docker.sock: connect: no such file or directory" However it's not starting with "service nomad restart", it starts then dies. Will investigate some more.
(In reply to Bretton Vine from comment #6) root@server:~/bin # service nomad restart /usr/local/etc/rc.d/nomad: DEBUG: pid file (/var/run/nomad.pid): not readable. /usr/local/etc/rc.d/nomad: DEBUG: checkyesno: nomad_enable is set to YES. /usr/local/etc/rc.d/nomad: DEBUG: pid file (/var/run/nomad.pid): not readable. /usr/local/etc/rc.d/nomad: DEBUG: checkyesno: nomad_enable is set to YES. nomad not running? (check /var/run/nomad.pid). /usr/local/etc/rc.d/nomad: DEBUG: pid file (/var/run/nomad.pid): not readable. /usr/local/etc/rc.d/nomad: DEBUG: checkyesno: nomad_enable is set to YES. /usr/local/etc/rc.d/nomad: DEBUG: run_rc_command: start_precmd: nomad_startprecmd Starting nomad. /usr/local/etc/rc.d/nomad: DEBUG: run_rc_command: doit: limits -C daemon env PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin /usr/sbin/daemon -T nomad -f -t nomad -p /var/run/nomad.pid /usr/bin/env PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin /usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl -network-interface=nomad-pseudo error when I run /usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl -network-interface=nomad-pseudo ==> Failed to parse network-interface: invalid interface name "nomad-pseudo"
(In reply to Bretton Vine from comment #7) I don't know the exact context you're starting this in, but: 1. The interface passed to "-network-interface" has to exist. In my setup, nomad-pseudo is a VLAN interface that got renamed[0]. Not sure how you're using nomad-pseudo in your setup. 2. The access denied issues are probably due to running nomad not as root on host where you did so before. As you need to run nomad as root anyway to use pot, make sure to set nomad_user=root in /etc/rc.conf. [0] Something like this in our setup: ifconfig lo create \ inet 10.20.20.210/24 \ name nomad-pseudo fib 1
(In reply to Michael Gmelin from comment #8) Ah, I'm not using nomad-pseudo at all, it still was in my rc.conf from an earlier bug report. My bad. restart is working normally with patch
@jhixson I think this should land ASAP, as it's breaking nomad deployments that use plugins. Thanks!
(In reply to Michael Gmelin from comment #10) I should have this updated today. I appreciate the debugging, patch, and heads up. Thanks!
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=d799c7d268ff351136575e6f53b4a7d5420bc3b8 commit d799c7d268ff351136575e6f53b4a7d5420bc3b8 Author: John Hixson <jhixson@FreeBSD.org> AuthorDate: 2024-07-13 21:15:57 +0000 Commit: John Hixson <jhixson@FreeBSD.org> CommitDate: 2024-07-13 21:28:59 +0000 sysutils/nomad: Fix crashing on startup PR: 280256 Reported by: Bretton Vine <bv@honeyguide.eu> Obtained from: Michael Gremlin <grembo@FreeBSD.org> sysutils/nomad/Makefile | 6 +++++- sysutils/nomad/distinfo | 6 +++++- 2 files changed, 10 insertions(+), 2 deletions(-)
Port has been updated. Thanks!
Hi John, Could you MFC/do you mind if I do? Thanks Michael
A commit in branch 2024Q3 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=8bd599c5b2b06dda2f72015f8fc2bacac9e640a0 commit 8bd599c5b2b06dda2f72015f8fc2bacac9e640a0 Author: John Hixson <jhixson@FreeBSD.org> AuthorDate: 2024-07-13 21:15:57 +0000 Commit: Michael Gmelin <grembo@FreeBSD.org> CommitDate: 2024-08-08 12:52:03 +0000 sysutils/nomad: Fix crashing on startup PR: 280256 Reported by: Bretton Vine <bv@honeyguide.eu> Obtained from: Michael Gremlin <grembo@FreeBSD.org> (cherry picked from commit d799c7d268ff351136575e6f53b4a7d5420bc3b8) sysutils/nomad/Makefile | 6 +++++- sysutils/nomad/distinfo | 6 +++++- 2 files changed, 10 insertions(+), 2 deletions(-)
@John I took the liberty to cherry-pick it myself to unbreak the port on quarterly.