Bug 259607 - prometheus_sysctl_exporter: Need better encoding support for sysctl OIDs
Summary: prometheus_sysctl_exporter: Need better encoding support for sysctl OIDs
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Xin LI
URL:
Keywords:
: 253862 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-11-02 06:36 UTC by Xin LI
Modified: 2022-08-20 01:11 UTC (History)
7 users (show)

See Also:
asomers: mfc-stable13+
asomers: mfc-stable12-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Xin LI freebsd_committer freebsd_triage 2021-11-02 06:36:24 UTC
node_exporter is spamming syslog with:

level=error ts=2021-11-02T06:18:08.285Z caller=stdlib.go:105 msg="error gathering metrics: 11 error(s) occurred:\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_write_boost\" { untyped:<value:8.388608e+06 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_arc_max\" { untyped:<value:1.2884901888e+10 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_feed_min_ms\" { untyped:<value:200 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vm_uma_tcp_log_bucket_size\" { untyped:<value:30 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_write_max\" { untyped:<value:8.388608e+06 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_feed_again\" { untyped:<value:1 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_norw\" { untyped:<value:0 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_noprefetch\" { untyped:<value:1 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_arc_min\" { untyped:<value:0 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_feed_secs\" { untyped:<value:1 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"sysctl_vfs_zfs_l2arc_headroom\" { untyped:<value:2 > } was collected before with the same name and label values"

The problem is that some ZFS values are exported from two OIDs, for example:

vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc.feed_min_ms: 200

But node_exporter is aliasing "." to "_" unconditionally.

To get a list of all affected sysctl OIDs, one can use:

sysctl -da | grep -E \("$(sysctl -Na | sed -e s,\\.,_,g | sort | uniq -c | sort -n | awk '{ if ($1 >1) print $2; }' | sed -e s,_,.,g | paste -sd \| -)"\):

and on FreeBSD 13.0-RELEASE-p4, I got:

vm.uma.tcp_log_bucket.size: Allocation size
vm.uma.tcp_log.bucket_size: Desired per-cpu cache size
vfs.zfs.arc_max: max arc size (LEGACY)
vfs.zfs.arc_min: min arc size (LEGACY)
vfs.zfs.l2arc_norw: no reads during writes (LEGACY)
vfs.zfs.l2arc_feed_again: turbo warmup (LEGACY)
vfs.zfs.l2arc_noprefetch: don't cache prefetch bufs (LEGACY)
vfs.zfs.l2arc_feed_min_ms: min interval milliseconds (LEGACY)
vfs.zfs.l2arc_feed_secs: interval seconds (LEGACY)
vfs.zfs.l2arc_headroom: number of dev writes (LEGACY)
vfs.zfs.l2arc_write_boost: extra write during warmup (LEGACY)
vfs.zfs.l2arc_write_max: max write size (LEGACY)
vfs.zfs.l2arc.norw: No reads during writes
vfs.zfs.l2arc.feed_again: Turbo L2ARC warmup
vfs.zfs.l2arc.noprefetch: Skip caching prefetched buffers
vfs.zfs.l2arc.feed_min_ms: Min feed interval in milliseconds
vfs.zfs.l2arc.feed_secs: Seconds between L2ARC writing
vfs.zfs.l2arc.headroom: Number of max device writes to precache
vfs.zfs.l2arc.write_boost: Extra write bytes during device warmup
vfs.zfs.l2arc.write_max: Max write bytes per interval
vfs.zfs.arc.max: Max arc size
vfs.zfs.arc.min: Min arc size

So it's not just ZFS.  I *think* the proper fix should be to change the translation code to first replace all _ with __, then replace . with _.
Comment 1 David O'Rourke 2021-11-02 11:56:34 UTC
Hi,

Are these errors actually from the FreeBSD `prometheus_sysctl_exporter`? The `sysctl_` prefix suggests so.

Node Exporter doesn't appear to export all of the sysctls anywhere and the metrics it does expose for ZFS and other sysctl based things are very specifically selected and named to avoid collisions.

prometheus_sysctl_exporter does appear to show this unfortunate problem:

$ prometheus_sysctl_exporter | grep l2arc | sort
sysctl_vfs_zfs_l2arc_feed_again 1
sysctl_vfs_zfs_l2arc_feed_again 1
sysctl_vfs_zfs_l2arc_feed_min_ms 200
sysctl_vfs_zfs_l2arc_feed_min_ms 200

Are you using the Node Exporter to export these `prometheus_sysctl_exporter` metrics via the --collector.textfile.directory feature, perhaps?

-David
Comment 2 Xin LI freebsd_committer freebsd_triage 2021-11-08 07:06:43 UTC
Thanks, I've created a patch for this at https://reviews.freebsd.org/D32886 .
Comment 3 Alan Somers freebsd_committer freebsd_triage 2022-04-13 17:04:06 UTC
*** Bug 253862 has been marked as a duplicate of this bug. ***
Comment 4 commit-hook freebsd_committer freebsd_triage 2022-04-19 12:57:33 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8c47d8f53854825d8e8591ccd06e32b2c798f81c

commit 8c47d8f53854825d8e8591ccd06e32b2c798f81c
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2022-04-18 21:29:37 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2022-04-19 12:56:39 +0000

    prometheus_sysctl_exporter: fix metric aliasing

    When exporting sysctls to Prometheus, the exporter replaces "." with
    "_".  This caused several metrics to alias, confusing the Prometheus
    server.  Fix it by:

    * Renaming the "tcp_log_bucket" UMA zone to "tcp_log_id_bucket".  Also,
      rename "tcp_log_node" to "tcp_log_id_node" for consistency.

    * Not exporting sysctls with "(LEGACY)" in the description.  That is
      used by ZFS sysctls that have been replaced by others, many of which
      alias to the same Prometheus metric name (like "vfs.zfs.arc_max" and
      "vfs.zfs.arc.max").

    PR:             259607
    Reported by:    delphij
    MFC after:      2 weeks
    Sponsored by:   Axcient
    Reviewed by:    delphij,rew,thj
    Differential Revision: https://reviews.freebsd.org/D34952

 sys/netinet/tcp_log_buf.c                          | 33 +++++++++++-----------
 .../prometheus_sysctl_exporter.c                   | 11 ++++++--
 2 files changed, 26 insertions(+), 18 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2022-05-12 20:41:04 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e4f508d5a211e99dd6179794b51fefa329886be3

commit e4f508d5a211e99dd6179794b51fefa329886be3
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2022-04-18 21:29:37 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2022-05-12 20:40:05 +0000

    prometheus_sysctl_exporter: fix metric aliasing

    When exporting sysctls to Prometheus, the exporter replaces "." with
    "_".  This caused several metrics to alias, confusing the Prometheus
    server.  Fix it by:

    * Renaming the "tcp_log_bucket" UMA zone to "tcp_log_id_bucket".  Also,
      rename "tcp_log_node" to "tcp_log_id_node" for consistency.

    * Not exporting sysctls with "(LEGACY)" in the description.  That is
      used by ZFS sysctls that have been replaced by others, many of which
      alias to the same Prometheus metric name (like "vfs.zfs.arc_max" and
      "vfs.zfs.arc.max").

    PR:             259607
    Reported by:    delphij
    Sponsored by:   Axcient
    Reviewed by:    delphij,rew,thj
    Differential Revision: https://reviews.freebsd.org/D34952

    (cherry picked from commit 8c47d8f53854825d8e8591ccd06e32b2c798f81c)

 sys/netinet/tcp_log_buf.c                          | 33 +++++++++++-----------
 .../prometheus_sysctl_exporter.c                   | 11 ++++++--
 2 files changed, 26 insertions(+), 18 deletions(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2022-08-20 01:11:09 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b524667411e93d1e828bbaca03d96bb8637a5357

commit b524667411e93d1e828bbaca03d96bb8637a5357
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2022-04-18 21:29:37 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2022-08-20 01:03:31 +0000

    prometheus_sysctl_exporter: fix metric aliasing

    When exporting sysctls to Prometheus, the exporter replaces "." with
    "_".  This caused several metrics to alias, confusing the Prometheus
    server.  Fix it by:

    * Renaming the "tcp_log_bucket" UMA zone to "tcp_log_id_bucket".  Also,
      rename "tcp_log_node" to "tcp_log_id_node" for consistency.

    * Not exporting sysctls with "(LEGACY)" in the description.  That is
      used by ZFS sysctls that have been replaced by others, many of which
      alias to the same Prometheus metric name (like "vfs.zfs.arc_max" and
      "vfs.zfs.arc.max").

    PR:             259607
    Reported by:    delphij
    Sponsored by:   Axcient
    Reviewed by:    delphij,rew,thj
    Differential Revision: https://reviews.freebsd.org/D34952

    (cherry picked from commit 8c47d8f53854825d8e8591ccd06e32b2c798f81c)

 sys/netinet/tcp_log_buf.c                          | 33 +++++++++++-----------
 .../prometheus_sysctl_exporter.c                   | 11 ++++++--
 2 files changed, 26 insertions(+), 18 deletions(-)