Bug 201953 - Auditdistd does not recover from TLS errors and just stops
Summary: Auditdistd does not recover from TLS errors and just stops
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 10.2-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-28 23:20 UTC by Peter Wemm
Modified: 2019-01-21 17:59 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Wemm freebsd_committer freebsd_triage 2015-07-28 23:20:44 UTC
Auditdistd does not handle transient errors gracefully when tls is involved.  Without tls, it retries the connection.  With a tls:// server, it just stops.

The last log messages are:

Jul 28 01:03:33 stream auditdistd[36170]: [audit.xxx] (sender) Unable to receive reply: Operation timed out.
Jul 28 01:03:33 stream auditdistd[36170]: [audit.xxx] (sender) Disconnected from tls://audit.xxx:7878.

And game over, /var/audit/dist starts filling up.
Comment 1 Peter Wemm freebsd_committer freebsd_triage 2016-07-24 05:33:59 UTC
For what its worth, we still see this every now and then.
Comment 2 Robert Watson freebsd_committer freebsd_triage 2016-07-24 08:00:40 UTC
Just to cross reference the two sets of bug reports:

https://github.com/openbsm/openbsm/issues/3
https://github.com/openbsm/openbsm/issues/2

Filed by brueffer.
Comment 3 commit-hook freebsd_committer freebsd_triage 2018-10-04 05:55:56 UTC
A commit references this bug:

Author: pjd
Date: Thu Oct  4 05:54:58 UTC 2018
New revision: 339177
URL: https://svnweb.freebsd.org/changeset/base/339177

Log:
  When the adist_free list is empty and we lose connection to the receiver we
  move all elements from the adist_send and adist_recv lists back onto the
  adist_free list, but we don't wake consumers waitings for the adist_free list
  to become non-empty. This can lead to the sender process stopping audit trail
  files distribution and waiting forever.

  Fix the problem by adding the missing wakeup.

  While here slow down spinning on CPU in case of a short race in
  sender_disconnect() and add an explaination when it can occur.

  PR:		201953
  Reported by:	peter
  Approved by:	re (kib)

Changes:
  head/contrib/openbsm/bin/auditdistd/auditdistd.h
  head/contrib/openbsm/bin/auditdistd/sender.c
Comment 4 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 17:59:15 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks