Bug 280256 - sysutils/nomad: Nomad v1.8.1 crashing on startup
Summary: sysutils/nomad: Nomad v1.8.1 crashing on startup
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: John Hixson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-13 10:43 UTC by Bretton Vine
Modified: 2024-08-08 12:57 UTC (History)
2 users (show)

See Also:
jhixson: maintainer-feedback+


Attachments
Fix segfault on loading plugins due to missing numa support (2.08 KB, patch)
2024-07-13 12:36 UTC, Michael Gmelin
grembo: maintainer-approval? (jhixson)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bretton Vine 2024-07-13 10:43:16 UTC
Nomad v1.8.1 is crashing on a newly installed host.

this issue seems to match
https://github.com/hashicorp/nomad/issues/23385

root@server:~ # /usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl
==> WARNING: mTLS is not configured - Nomad is not secure without mTLS!
==> Config enable_syslog is `true` with log_level=WARN
==> Loaded configuration from /usr/local/etc/nomad/client.hcl
==> Starting Nomad agent...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3307f5a]

goroutine 227 [running]:
github.com/hashicorp/nomad/client/lib/idset.(*Set[...]).Slice(0x874c69d50?)
	github.com/hashicorp/nomad/client/lib/idset/idset.go:126 +0x1a
github.com/hashicorp/nomad/plugins/base.nomadTopologyToProto(0x8748a47e0)
	github.com/hashicorp/nomad/plugins/base/base.go:171 +0x45
github.com/hashicorp/nomad/plugins/base.(*AgentConfig).toProto(0x8749c2508)
	github.com/hashicorp/nomad/plugins/base/base.go:96 +0x47
github.com/hashicorp/nomad/plugins/base.(*BasePluginClient).SetConfig(0x8745f2600, 0x874ec26f0)
	github.com/hashicorp/nomad/plugins/base/client.go:63 +0x28
github.com/hashicorp/nomad/helper/pluginutils/loader.(*PluginLoader).Dispense(0x8745ed5f0, {0x874c05a7d, 0x3}, {0xb8e215, 0x6}, 0x8749c2508, {0x136b730, 0x874ec2e40})
	github.com/hashicorp/nomad/helper/pluginutils/loader/loader.go:186 +0x3d5
github.com/hashicorp/nomad/helper/pluginutils/singleton.(*SingletonLoader).dispense(0xbed8f5?, 0x874992880, {0x874c05a7d?, 0x316d685?}, {0xb8e215?, 0x8745fc000?}, 0x874188fd0?, {0x136b730?, 0x874ec2e40?})
	github.com/hashicorp/nomad/helper/pluginutils/singleton/singleton.go:109 +0x53
created by github.com/hashicorp/nomad/helper/pluginutils/singleton.(*SingletonLoader).getPlugin in goroutine 189
	github.com/hashicorp/nomad/helper/pluginutils/singleton/singleton.go:85 +0x39d
Comment 1 Michael Gmelin freebsd_committer freebsd_triage 2024-07-13 11:46:05 UTC
(In reply to Bretton Vine from comment #0)

Can you share:

- Operating system platform and version
- Content of /usr/local/etc/nomad/client.hcl
Comment 2 Michael Gmelin freebsd_committer freebsd_triage 2024-07-13 11:51:58 UTC
I can reproduce this, but only if:

- nomad-pot-driver is installed
- plugin_dir = "/usr/local/libexec/nomad/plugins" is in client.hcl

So I guess this is a breaking change in the nomad plugin structure that needs attention in nomad-pot-plugin.
Comment 3 Michael Gmelin freebsd_committer freebsd_triage 2024-07-13 11:56:00 UTC
I'll try applying https://github.com/hashicorp/nomad/pull/23399/files
Comment 4 Bretton Vine 2024-07-13 12:01:09 UTC
(In reply to Michael Gmelin from comment #1)

14.0-RELEASE-p8

root@server:~ # cat /usr/local/etc/nomad/client.hcl

bind_addr = "0.0.0.0"
datacenter = "redacted"
advertise {
  # This should be the IP of THIS MACHINE and must be routable by every node
  # in your cluster
  http = "10.101.2.1"
  rpc = "10.101.2.1"
  serf = "10.101.2.1"
}
client {
  enabled = true
  options {
    "driver.raw_exec.enable" = "1"
  }
  servers = [ "10.1.2.111" ]
}
data_dir = "/var/tmp/nomad"
plugin_dir = "/usr/local/libexec/nomad/plugins"
consul {
  address = "127.0.0.1:8500"
  client_service_name = "nomad.client.server1.redacted.consul"
  auto_advertise = true
  client_auto_join = true
}
tls {
  http = false
  rpc = false
  verify_server_hostname = false
  verify_https_client = false
}
telemetry {
  collection_interval = "15s"
  publish_allocation_metrics = true
  publish_node_metrics = true
  prometheus_metrics = true
  disable_hostname = true
}
enable_syslog=true
log_level="WARN"
syslog_facility="LOCAL1"
Comment 5 Michael Gmelin freebsd_committer freebsd_triage 2024-07-13 12:36:49 UTC
Created attachment 252014 [details]
Fix segfault on loading plugins due to missing numa support

@bretton Can you please try this patch? It works for me, but I only did superficial tests.

  cd /usr/ports
  patch -p1 </path/to/patch
  cd sysutils/nomad
  make clean reinstall clean

Then test.
Comment 6 Bretton Vine 2024-07-13 13:07:58 UTC
(In reply to Michael Gmelin from comment #5)

it's working when I start manually with /usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl 

only errors as follows:

2024-07-13T13:03:55.656Z [WARN]  agent.plugin_loader: plugin not referenced in the agent configuration file, future versions of Nomad will not load this plugin until the agent configuration is updated: plugin_dir=/usr/local/libexec/nomad/plugins plugin=nomad-pot-driver
2024-07-13T13:03:55.719Z [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="landlock not supported on this platform"
2024-07-13T13:03:55.736Z [WARN]  client.fingerprint_mgr.cni_plugins: failed to read CNI plugins directory: cni_path=/opt/cni/bin error="open /opt/cni/bin: no such file or directory"
2024-07-13T13:04:05.759Z [ERROR] client.driver_mgr.docker: failed to list pause containers for recovery: driver=docker error="Get \"http://unix.sock/containers/json?filters=%7B%22label%22%3A%5B%22com.hashicorp.nomad.alloc_id%22%5D%7D\": dial unix /var/run/docker.sock: connect: no such file or directory"

However it's not starting with "service nomad restart", it starts then dies. Will investigate some more.
Comment 7 Bretton Vine 2024-07-13 13:12:30 UTC
(In reply to Bretton Vine from comment #6)

root@server:~/bin # service nomad restart
/usr/local/etc/rc.d/nomad: DEBUG: pid file (/var/run/nomad.pid): not readable.
/usr/local/etc/rc.d/nomad: DEBUG: checkyesno: nomad_enable is set to YES.
/usr/local/etc/rc.d/nomad: DEBUG: pid file (/var/run/nomad.pid): not readable.
/usr/local/etc/rc.d/nomad: DEBUG: checkyesno: nomad_enable is set to YES.
nomad not running? (check /var/run/nomad.pid).
/usr/local/etc/rc.d/nomad: DEBUG: pid file (/var/run/nomad.pid): not readable.
/usr/local/etc/rc.d/nomad: DEBUG: checkyesno: nomad_enable is set to YES.
/usr/local/etc/rc.d/nomad: DEBUG: run_rc_command: start_precmd: nomad_startprecmd 
Starting nomad.
/usr/local/etc/rc.d/nomad: DEBUG: run_rc_command: doit:  limits -C daemon  env PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin  /usr/sbin/daemon  -T nomad -f -t nomad -p /var/run/nomad.pid /usr/bin/env PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin /usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl -network-interface=nomad-pseudo

error when I run

/usr/local/bin/nomad agent -config=/usr/local/etc/nomad/client.hcl -network-interface=nomad-pseudo

==> Failed to parse network-interface: invalid interface name "nomad-pseudo"
Comment 8 Michael Gmelin freebsd_committer freebsd_triage 2024-07-13 13:21:26 UTC
(In reply to Bretton Vine from comment #7)

I don't know the exact context you're starting this in, but:

1. The interface passed to "-network-interface" has to exist. In my setup, nomad-pseudo is a VLAN interface that got renamed[0]. Not sure how you're using nomad-pseudo in your setup.
2. The access denied issues are probably due to running nomad not as root on host where you did so before. As you need to run nomad as root anyway to use pot, make sure to set nomad_user=root in /etc/rc.conf.


[0] Something like this in our setup:
ifconfig lo create \
  inet 10.20.20.210/24 \
  name nomad-pseudo fib 1
Comment 9 Bretton Vine 2024-07-13 13:53:56 UTC
(In reply to Michael Gmelin from comment #8)

Ah, I'm not using nomad-pseudo at all, it still was in my rc.conf from an earlier bug report. My bad.

restart is working normally with patch
Comment 10 Michael Gmelin freebsd_committer freebsd_triage 2024-07-13 14:17:41 UTC
@jhixson I think this should land ASAP, as it's breaking nomad deployments that use plugins. Thanks!
Comment 11 John Hixson freebsd_committer freebsd_triage 2024-07-13 18:18:13 UTC
(In reply to Michael Gmelin from comment #10)

I should have this updated today. I appreciate the debugging, patch, and heads up. Thanks!
Comment 12 commit-hook freebsd_committer freebsd_triage 2024-07-13 21:30:22 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=d799c7d268ff351136575e6f53b4a7d5420bc3b8

commit d799c7d268ff351136575e6f53b4a7d5420bc3b8
Author:     John Hixson <jhixson@FreeBSD.org>
AuthorDate: 2024-07-13 21:15:57 +0000
Commit:     John Hixson <jhixson@FreeBSD.org>
CommitDate: 2024-07-13 21:28:59 +0000

    sysutils/nomad: Fix crashing on startup

    PR: 280256
    Reported by: Bretton Vine <bv@honeyguide.eu>
    Obtained from: Michael Gremlin <grembo@FreeBSD.org>

 sysutils/nomad/Makefile | 6 +++++-
 sysutils/nomad/distinfo | 6 +++++-
 2 files changed, 10 insertions(+), 2 deletions(-)
Comment 13 John Hixson freebsd_committer freebsd_triage 2024-07-13 21:30:49 UTC
Port has been updated. Thanks!
Comment 14 Michael Gmelin freebsd_committer freebsd_triage 2024-08-02 14:03:35 UTC
Hi John,

Could you MFC/do you mind if I do?

Thanks
Michael
Comment 15 commit-hook freebsd_committer freebsd_triage 2024-08-08 12:54:47 UTC
A commit in branch 2024Q3 references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=8bd599c5b2b06dda2f72015f8fc2bacac9e640a0

commit 8bd599c5b2b06dda2f72015f8fc2bacac9e640a0
Author:     John Hixson <jhixson@FreeBSD.org>
AuthorDate: 2024-07-13 21:15:57 +0000
Commit:     Michael Gmelin <grembo@FreeBSD.org>
CommitDate: 2024-08-08 12:52:03 +0000

    sysutils/nomad: Fix crashing on startup

    PR: 280256
    Reported by: Bretton Vine <bv@honeyguide.eu>
    Obtained from: Michael Gremlin <grembo@FreeBSD.org>

    (cherry picked from commit d799c7d268ff351136575e6f53b4a7d5420bc3b8)

 sysutils/nomad/Makefile | 6 +++++-
 sysutils/nomad/distinfo | 6 +++++-
 2 files changed, 10 insertions(+), 2 deletions(-)
Comment 16 Michael Gmelin freebsd_committer freebsd_triage 2024-08-08 12:57:14 UTC
@John I took the liberty to cherry-pick it myself to unbreak the port on quarterly.