Bug 234442

Summary: libnetgraph race condition
Product: Base System Reporter: Eugene Grosbein <eugen>
Component: binAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Some People CC: avg, glebius, kib, markj, mav, net
Priority: --- Flags: koobs: mfc-stable12+
koobs: mfc-stable11+
Version: 11.2-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
fight race with mutex
none
atomically increment gMsgId none

Description Eugene Grosbein freebsd_committer freebsd_triage 2018-12-27 11:25:47 UTC
Created attachment 200557 [details]
fight race with mutex

Hi!

lib/libnetgraph/msg.c defines static int gMsgId and public functions NgSendMsg() and NgSendAsciiMsg() that both increment gMsgId in racy way in attempt to produce unique id for a request sent over AF_NETGRAPH socket.

For long-lived multi-threaded application like net/mpd5 daemon: first thread can increase gMsgId upto INT_MAX and next moment another thread can increate gMsgId again to become -1. Then it is copied to unsigned msg.header.token and returned as signed integer. This means false error status returned with errno==0 and this breaks workflow of the daemon. I get this problem "in wild" from time to time.

I have very straightforward and naive patch protecting the variable with simple mutex (attached) but it has its penalty for performance.

Usage of atomic operations should be better approach but I'm not familiar with FreeBSD atomic operations. Any help will be appreciated.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2019-01-09 16:55:12 UTC
Created attachment 200960 [details]
atomically increment gMsgId

The attached patch uses C11 atomics in an attempt to fix the problem.  It even compiles and appears to generate the intended code with gcc 4.2.
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2019-01-09 16:58:49 UTC
(In reply to Mark Johnston from comment #1)
Note, the patch permits a msgid of 0 instead of starting from 1.  The netgraph(3) man page says that tokens only need to be non-negative, so I think this is fine, but I did not test it.
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2019-05-09 22:12:39 UTC
Ping? Did you try testing the patch?
Comment 4 Eugene Grosbein freebsd_committer freebsd_triage 2019-05-10 07:47:27 UTC
(In reply to Mark Johnston from comment #3)

Yes, thanks. Due to the nature of the race it's hard to be exact but at least the change does not make it worse. And the problem has not repeated yet with patch applied.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2019-05-10 16:32:43 UTC
(In reply to Eugene Grosbein from comment #4)
Thanks.  I will commit the change then, with a long MFC timeout.
Comment 6 commit-hook freebsd_committer freebsd_triage 2019-05-10 16:44:05 UTC
A commit references this bug:

Author: markj
Date: Fri May 10 16:43:47 UTC 2019
New revision: 347439
URL: https://svnweb.freebsd.org/changeset/base/347439

Log:
  Atomically update the global gMsgId in libnetgraph.

  Otherwise concurrently running threads may inadvertently use the same
  token for different messages.

  Preserve the behaviour of disallowing negative message tokens, but allow
  a message token value of zero since this simplifies the code a bit and
  tokens are documented to be non-negative.

  PR:		234442
  Reported and tested by:	eugen
  MFC after:	1 month
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/lib/libnetgraph/msg.c
Comment 7 commit-hook freebsd_committer freebsd_triage 2019-06-09 03:31:29 UTC
A commit references this bug:

Author: markj
Date: Sun Jun  9 03:31:08 UTC 2019
New revision: 348827
URL: https://svnweb.freebsd.org/changeset/base/348827

Log:
  MFC r347439:
  Atomically update the global gMsgId in libnetgraph.

  PR:	234442

Changes:
_U  stable/12/
  stable/12/lib/libnetgraph/msg.c
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2019-06-17 16:05:53 UTC
I don't have the means to test this on stable/11, but please feel free to MFC the change there if you can verify that it works.
Comment 9 commit-hook freebsd_committer freebsd_triage 2019-10-11 18:05:56 UTC
A commit references this bug:

Author: eugen
Date: Fri Oct 11 18:05:06 UTC 2019
New revision: 353445
URL: https://svnweb.freebsd.org/changeset/base/353445

Log:
  MFC r347439 by markj: Atomically update the global gMsgId in libnetgraph.

  Otherwise concurrently running threads may inadvertently use the same
  token for different messages.

  Preserve the behaviour of disallowing negative message tokens, but allow
  a message token value of zero since this simplifies the code a bit and
  tokens are documented to be non-negative.

  PR:		234442

Changes:
_U  stable/11/
  stable/11/lib/libnetgraph/msg.c
Comment 10 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-13 21:54:45 UTC
^Triage: Track MFC's