Bug 200139 - Auditdistd suddenly stops working and leaves untransmitted files.
Summary: Auditdistd suddenly stops working and leaves untransmitted files.
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-12 03:08 UTC by Peter Wemm
Modified: 2019-01-21 17:53 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Wemm freebsd_committer freebsd_triage 2015-05-12 03:08:18 UTC
I'm not sure why, but we are routinely getting an auditidistd failure in the freebsd.org cluster.

A couple of log samples and data points..

root@ref11-amd64:~ # ls -l /var/audit/dist/
total 262049
-r--r-----  2 root  audit  10535025 May 11 23:00 20150511225958.20150511230009
-r--r-----  2 root  audit  11280061 May 11 23:23 20150511232340.20150511232357
-r--r-----  2 root  audit  10521949 May 11 23:24 20150511232417.20150511232437
-r--r-----  2 root  audit  15688526 May 11 23:25 20150511232533.20150511232553
-r--r-----  2 root  audit  15477527 May 11 23:27 20150511232646.20150511232704
-r--r-----  2 root  audit  13751547 May 11 23:27 20150511232741.20150511232753
-r--r-----  2 root  audit   9764920 May 11 23:28 20150511232840.20150511232849
-r--r-----  2 root  audit  10504847 May 11 23:29 20150511232904.20150511232912
-r--r-----  2 root  audit   9942299 May 11 23:30 20150511233004.20150511233010
-r--r-----  2 root  audit  23953683 May 11 23:30 20150511233010.20150511233025
-r--r-----  2 root  audit  10894790 May 11 23:30 20150511233031.20150511233038
-r--r-----  2 root  audit   7714981 May 11 23:31 20150511233056.20150511233101
-r--r-----  2 root  audit   7938553 May 11 23:31 20150511233138.20150511233145
-r--r-----  2 root  audit  12498960 May 11 23:31 20150511233145.20150511233156
-r--r-----  2 root  audit  10511961 May 11 23:33 20150511233258.20150511233308
-r--r-----  2 root  audit  11620541 May 11 23:33 20150511233308.20150511233319
-r--r-----  2 root  audit  10507759 May 11 23:33 20150511233343.20150511233352
-r--r-----  2 root  audit  12708772 May 11 23:35 20150511233446.20150511233500
-r--r-----  2 root  audit  10542898 May 11 23:35 20150511233524.20150511233538
-r--r-----  2 root  audit  10564417 May 11 23:36 20150511233641.20150511233655
-r--r-----  2 root  audit  10499802 May 11 23:37 20150511233728.20150511233739
-r--r-----  2 root  audit  10571378 May 11 23:38 20150511233809.20150511233818
-r--r-----  2 root  audit   8769659 May 12 02:59 20150512011254.not_terminated

root@ref11-amd64:~ # grep 20150511233809.20150511233818 /var/log/messages
root@ref11-amd64:~ # bzgrep 20150511233809.20150511233818 /var/log/messages*
/var/log/messages.0.bz2:May 11 23:38:24 ref11-amd64 auditd[797]: renamed /var/audit/20150511233809.not_terminated to /var/audit/20150511233809.20150511233818
/var/log/messages.0.bz2:May 11 23:38:24 ref11-amd64 root: audit warning: closefile /var/audit/20150511233809.20150511233818


root@ref11-amd64:~ # bzgrep 20150511233809.20150511233818 /var/log/messages*
/var/log/messages.0.bz2:May 11 23:38:24 ref11-amd64 auditd[797]: renamed /var/audit/20150511233809.not_terminated to /var/audit/20150511233809.20150511233818
/var/log/messages.0.bz2:May 11 23:38:24 ref11-amd64 root: audit warning: closefile /var/audit/20150511233809.20150511233818

root@ref11-amd64:~ # bzgrep auditdist /var/log/messages*
/var/log/messages.0.bz2:May 11 23:00:16 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511225958.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:24:15 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511232340.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:24:51 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511232417.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:26:10 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511232533.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:27:18 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511232646.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:28:07 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511232741.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:29:02 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511232840.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:29:35 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511232904.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:30:22 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233004.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:30:28 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233010.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:30:52 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233031.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:31:08 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233056.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:31:54 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233138.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:32:00 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233145.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:33:18 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233258.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:33:24 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233308.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:33:58 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233343.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:35:08 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233446.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:35:49 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233524.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:37:02 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233641.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:37:44 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233728.not_terminated", skipping: No such file or directory.
/var/log/messages.0.bz2:May 11 23:38:24 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511233809.not_terminated", skipping: No such file or directory.

This is the most telling:

/var/log/messages.0.bz2:May 11 23:00:05 ref11-amd64 auditd[797]: renamed /var/audit/20150511225947.not_terminated to /var/audit/20150511225947.20150511225958
/var/log/messages.0.bz2:May 11 23:00:05 ref11-amd64 auditd[797]: New audit file is /var/audit/20150511225958.not_terminated
/var/log/messages.0.bz2:May 11 23:00:05 ref11-amd64 root: audit warning: closefile /var/audit/20150511225947.20150511225958
/var/log/messages.0.bz2:May 11 23:00:15 ref11-amd64 auditd[797]: renamed /var/audit/20150511225958.not_terminated to /var/audit/20150511225958.20150511230009
/var/log/messages.0.bz2:May 11 23:00:16 ref11-amd64 root: audit warning: closefile /var/audit/20150511225958.20150511230009
/var/log/messages.0.bz2:May 11 23:00:16 ref11-amd64 auditdistd[84495]: [audit.ysv.FreeBSD.org] (sender) Unable to open file "/var/audit/dist/20150511225958.not_terminated", skipping: No such file or directory.

It looks like when it loses a rename race (under high loads), it ignores that audit chunk forever.
Comment 1 Alan Somers freebsd_committer freebsd_triage 2018-06-11 16:22:39 UTC
Is this still happening?
Comment 2 commit-hook freebsd_committer freebsd_triage 2018-10-04 05:48:49 UTC
A commit references this bug:

Author: pjd
Date: Thu Oct  4 05:48:10 UTC 2018
New revision: 339176
URL: https://svnweb.freebsd.org/changeset/base/339176

Log:
  When we look for a new trail file there might be a race between find trail
  file name and opening it. This race was not properly handled, because we were
  copying new name before checking for openat(2) error and when we were trying
  again we were starting with the next trail file. This could result in skipping
  distribution of such a trail file.

  Fix this problem by checking for ENOENT first (only for .not_terminated files)
  and then updating (or not) tr_filename before restarting the search.

  PR:		200139
  Reported by:	peter
  Approved by:	re (kib)

Changes:
  head/contrib/openbsm/bin/auditdistd/trail.c
Comment 3 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 17:53:39 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks