Bug 234442 - libnetgraph race condition
Summary: libnetgraph race condition
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.2-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Mark Johnston
Depends on:
Reported: 2018-12-27 11:25 UTC by Eugene Grosbein
Modified: 2019-06-17 16:05 UTC (History)
6 users (show)

See Also:

fight race with mutex (1.55 KB, patch)
2018-12-27 11:25 UTC, Eugene Grosbein
no flags Details | Diff
atomically increment gMsgId (1.40 KB, patch)
2019-01-09 16:55 UTC, Mark Johnston
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Grosbein freebsd_committer 2018-12-27 11:25:47 UTC
Created attachment 200557 [details]
fight race with mutex


lib/libnetgraph/msg.c defines static int gMsgId and public functions NgSendMsg() and NgSendAsciiMsg() that both increment gMsgId in racy way in attempt to produce unique id for a request sent over AF_NETGRAPH socket.

For long-lived multi-threaded application like net/mpd5 daemon: first thread can increase gMsgId upto INT_MAX and next moment another thread can increate gMsgId again to become -1. Then it is copied to unsigned msg.header.token and returned as signed integer. This means false error status returned with errno==0 and this breaks workflow of the daemon. I get this problem "in wild" from time to time.

I have very straightforward and naive patch protecting the variable with simple mutex (attached) but it has its penalty for performance.

Usage of atomic operations should be better approach but I'm not familiar with FreeBSD atomic operations. Any help will be appreciated.
Comment 1 Mark Johnston freebsd_committer 2019-01-09 16:55:12 UTC
Created attachment 200960 [details]
atomically increment gMsgId

The attached patch uses C11 atomics in an attempt to fix the problem.  It even compiles and appears to generate the intended code with gcc 4.2.
Comment 2 Mark Johnston freebsd_committer 2019-01-09 16:58:49 UTC
(In reply to Mark Johnston from comment #1)
Note, the patch permits a msgid of 0 instead of starting from 1.  The netgraph(3) man page says that tokens only need to be non-negative, so I think this is fine, but I did not test it.
Comment 3 Mark Johnston freebsd_committer 2019-05-09 22:12:39 UTC
Ping? Did you try testing the patch?
Comment 4 Eugene Grosbein freebsd_committer 2019-05-10 07:47:27 UTC
(In reply to Mark Johnston from comment #3)

Yes, thanks. Due to the nature of the race it's hard to be exact but at least the change does not make it worse. And the problem has not repeated yet with patch applied.
Comment 5 Mark Johnston freebsd_committer 2019-05-10 16:32:43 UTC
(In reply to Eugene Grosbein from comment #4)
Thanks.  I will commit the change then, with a long MFC timeout.
Comment 6 commit-hook freebsd_committer 2019-05-10 16:44:05 UTC
A commit references this bug:

Author: markj
Date: Fri May 10 16:43:47 UTC 2019
New revision: 347439
URL: https://svnweb.freebsd.org/changeset/base/347439

  Atomically update the global gMsgId in libnetgraph.

  Otherwise concurrently running threads may inadvertently use the same
  token for different messages.

  Preserve the behaviour of disallowing negative message tokens, but allow
  a message token value of zero since this simplifies the code a bit and
  tokens are documented to be non-negative.

  PR:		234442
  Reported and tested by:	eugen
  MFC after:	1 month
  Sponsored by:	The FreeBSD Foundation

Comment 7 commit-hook freebsd_committer 2019-06-09 03:31:29 UTC
A commit references this bug:

Author: markj
Date: Sun Jun  9 03:31:08 UTC 2019
New revision: 348827
URL: https://svnweb.freebsd.org/changeset/base/348827

  MFC r347439:
  Atomically update the global gMsgId in libnetgraph.

  PR:	234442

_U  stable/12/
Comment 8 Mark Johnston freebsd_committer 2019-06-17 16:05:53 UTC
I don't have the means to test this on stable/11, but please feel free to MFC the change there if you can verify that it works.