Bug 221035 - prometheus sysctl exporter asserts with nvidia driver loaded
Summary: prometheus sysctl exporter asserts with nvidia driver loaded
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Ed Schouten
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-26 19:37 UTC by Nikolai Lifanov
Modified: 2017-07-29 21:04 UTC (History)
1 user (show)

See Also:


Attachments
without nvidia.ko (98.07 KB, text/plain)
2017-07-28 17:58 UTC, Nikolai Lifanov
no flags Details
with nvidia.ko (99.00 KB, text/plain)
2017-07-28 17:58 UTC, Nikolai Lifanov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-26 19:37:01 UTC
I get this when trying to use prometheus_sysctl_exporter with nvidia driver loaded:

Assertion failed: (name[strspn(name, "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "0123456789_")] == '\0'), function oidname_print, file /usr/src/usr.sbin/prometheus_sysctl_exporter/prometheus_sysctl_exporter.c, line 390.
Comment 1 Ed Schouten freebsd_committer freebsd_triage 2017-07-26 20:37:11 UTC
Thanks for reporting!

I suspect that the nvidia kernel module (which I don't use myself, unfortunately), exports a sysctl that has a weird character (e.g., a dash) in its name. These metrics cannot be exported as proper metrics.

Could you send me the output of 'sysctl -a' prior to and after loading nvidia.ko? If you're not comfortable with sharing all of that info, the diff between the output is sufficient. Thanks!
Comment 2 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-27 16:30:37 UTC
Actually, it doesn't seem to be nvidia at all:

# sysctl -Na | grep -vE '^([a-z]|[A-Z]|[0-9]|_|\.|\%)+$'
kern.timecounter.tc.ACPI-fast.quality
kern.timecounter.tc.ACPI-fast.frequency
kern.timecounter.tc.ACPI-fast.counter
kern.timecounter.tc.ACPI-fast.mask
kern.timecounter.tc.TSC-low.quality
kern.timecounter.tc.TSC-low.frequency
kern.timecounter.tc.TSC-low.counter
kern.timecounter.tc.TSC-low.mask
Comment 3 Ed Schouten freebsd_committer freebsd_triage 2017-07-28 06:56:16 UTC
Yes, but for those specific metrics we already have hints in place to map the timer names (e.g., "ACPI-fast") into labels, meaning they are allowed to contain dashes.

That said, could you please attach a diff between 'sysctl -a' output before/after loading nvidia.ko?
Comment 4 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-28 17:56:07 UTC
I just tested it without nvidia kernel module loaded and it's still failing:
$ prometheus_sysctl_exporter -dgh                                              
Assertion failed: (name[strspn(name, "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "0123456789_")] == '\0'), function oidname_print, file /usr/src/usr.sbin/prometheus_sysctl_exporter/prometheus_sysctl_exporter.c, line 390.
Abort trap (core dumped) 

I'm going to attach sysctl -Na.
Comment 5 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-28 17:58:09 UTC
Created attachment 184807 [details]
without nvidia.ko
Comment 6 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-28 17:58:35 UTC
Created attachment 184808 [details]
with nvidia.ko
Comment 7 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-28 17:58:56 UTC
I made a mistake previously: it fails both with and without nvidia module loaded.
Comment 8 Ed Schouten freebsd_committer freebsd_triage 2017-07-28 22:35:14 UTC
Hmmm... Odd. Looking at the sysctl output, I can't think of any sysctls that would cause this.

You're running "prometheus_sysctl_exporter -dgh". The disadvantage of the -g and -h flags is that it causes the prometheus_sysctl_exporter to buffer output prior to printing it. If you were to run "prometheus_sysctl_exporter -d", it should still crash, but that allows you to get the name of the sysctl entry right before the one causing the crash. Could you please give me the output of that?
Comment 9 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-29 00:36:50 UTC
Here are the last few lines of just -d:

# HELP sysctl_kstat_zfs_misc_zio_trim_bytes Number of bytes successfully TRIMmed                                                
sysctl_kstat_zfs_misc_zio_trim_bytes 0                                                                                          
sysctl_kstat_zfs_misc_metaslab_trace_stats_metaslab_trace_over_limit 0                                                          
Assertion failed: (name[strspn(name, "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "0123456789_")] == '\0'), functio
n oidname_print, file /usr/src/usr.sbin/prometheus_sysctl_exporter/prometheus_sysctl_exporter.c, line 390.                      
sysctl_dev_umsAbort trap (core dumped)
Comment 10 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-29 00:38:38 UTC
This is the output from the one just after this one:

$ sysctl hptmv.status                                                           hptmv.status: RocketRAID 18xx SATA Controller driver Version v1.16
Comment 11 Ed Schouten freebsd_committer freebsd_triage 2017-07-29 08:36:18 UTC
Thanks for pasting the output. That was very helpful. In your case, it tried to export dev.${driver}.${index}.%domain, which fails due to the % being present. I've just committed a fix to convert such characters to underscores.

Can you let me know whether >=r321678 works for you?
Comment 12 Nikolai Lifanov freebsd_committer freebsd_triage 2017-07-29 15:01:32 UTC
It works for me now. Thank you!
Comment 13 Ed Schouten freebsd_committer freebsd_triage 2017-07-29 21:04:33 UTC
Awesome! Enjoy!