Bug 171615 - net/ntp-devel: ntpd 4.2.7p303 unexpected dying
net/ntp-devel: ntpd 4.2.7p303 unexpected dying
Status: Closed FIXED
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s)
Latest
Any Any
: Normal Affects Only Me
Assigned To: Cy Schubert
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-13 19:20 UTC by ron
Modified: 2012-10-26 09:50 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ron 2012-09-13 19:20:07 UTC
p303 dies unexpectedly with the following error:

12 Sep 23:40:26 ntpd[83434]: 86.47.187.7 local addr 172.16.2.10 -> <null>
12 Sep 23:40:48 ntpd[83434]: ntp_monitor.c:136: ENSURE(punlinked ==
mon) failed
12 Sep 23:40:48 ntpd[83434]: exiting (due to assertion failure)

The last version installing (p295) has not seen this problem.

Startup seems to be optimal.

13 Sep 11:31:49 ntpd[27478]: ntpd 4.2.7p303@1.2483-o Tue Sep 11
17:25:12 UTC 2012 (1): Starting
13 Sep 11:31:49 ntpd[27479]: proto: precision = 1.676 usec (-19)
13 Sep 11:31:49 ntpd[27479]: Listen and drop on 0 v4wildcard 0.0.0.0:123
13 Sep 11:31:49 ntpd[27479]: Listen and drop on 1 v6wildcard [::]:123
13 Sep 11:31:49 ntpd[27479]: Listen normally on 2 bge0 172.16.2.10:123
13 Sep 11:31:49 ntpd[27479]: Listen normally on 3 lo0 [fe80::1%8]:123
13 Sep 11:31:49 ntpd[27479]: setsockopt IPV6_MULTICAST_IF 0 for
fe80::1%8 fails: Can't assign requested address
13 Sep 11:31:49 ntpd[27479]: Listen normally on 4 lo0 [::1]:123
13 Sep 11:31:49 ntpd[27479]: Listen normally on 5 lo0 127.0.0.1:123
13 Sep 11:31:49 ntpd[27479]: peers refreshed
13 Sep 11:31:49 ntpd[27479]: Listening on routing socket on fd #26 for
interface updates
13 Sep 11:34:40 ntpd[27479]: 83.71.171.237 local addr 172.16.2.10 ->
<null>

This has only happened since I have upgraded from p295 three days ago. p295 was installed for many days and did not have these problems.

p303 was installed two days past and has failed with this same message both times.

How-To-Repeat: I do not know these things as I upgraded the system 2 days ago and all was working optimally before this upgrade.

The dying has occurred under normal operations as a member of ie.pool.ntp.org
Comment 1 Edwin Groothuis freebsd_committer 2012-09-13 20:28:12 UTC
Responsible Changed
From-To: freebsd-ports-bugs->cy

Over to maintainer (via the GNATS Auto Assign Tool)
Comment 2 ron 2012-09-14 09:26:07 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Good Morning,

Colleagues on the ntp:questions mailing list tell me to add these
things to the bug report:

In the "flags" section, please "Request blocking 4.2.8"

I am not seeing how to do this now I have submitted this bug.  Maybe
you can help?

Thank you,

Ron
- -- 
Ron Hahn, CISSP, CISA, ITIL-F
T: +353-76-603-4408 M: +353-86-600-5499 E: ron@hahn.ie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQUuofAAoJEKA8j7vu2JieqbwH/RadUmp/7LOT64CCdZJGmY5z
9MJy4Ite1TeRO3xWIA6/xMSyz+HkTgEOav7Z8mx5xXoJzm6zhigdv4Q886o7WcYn
74GMgpy9ugXRYIv8/1TOv7XZ4/8raDj+8kdkXtuLyj9LwiLzjRzoJUMYd2H/KYgA
ixIISKwKQGPN373vsXPWis1jGjXbU7hlIU18Oho45ORvP4LU1A4sp9JIJ3C+2xC1
Vbpa95IogKbkl1exccsKufQNOtk3ve1tNJyodowzaHj5BSWta5tOw/AFbmDmUl0H
Rqb7XyH/6ezOb5+yl11ZIs1PtILr/KuH/EwnH5CQDAIYTWaUH2pV6Kp6DpKTTNk=
=F3ir
-----END PGP SIGNATURE-----
Comment 3 Cy.Schubert 2012-09-14 17:35:10 UTC
We will update to patch level 304 and see if this fixes the problem.


-- 
Cheers,
Cy Schubert <Cy.Schubert@komquats.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org
Comment 4 Cy Schubert freebsd_committer 2012-09-14 17:49:53 UTC
State Changed
From-To: open->feedback

Sent email to submitter about update to p304. We'll see if it addresses this 
issue.
Comment 5 Cy.Schubert 2012-09-14 17:51:28 UTC
Hi Ron,

Updated to p304. Can you see if this fixes the problem?


-- 
Cheers,
Cy Schubert <Cy.Schubert@komquats.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org
Comment 6 ron 2012-09-14 20:35:56 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 14/09/2012 17:51, Cy Schubert wrote:
> Hi Ron,
> 
> Updated to p304. Can you see if this fixes the problem?
> 
> 
Hallo Cy,

I have updated two systems to p304. time.dhco.org is running with
ie.pool.ntp.org. tock.dhco.org is my internal server at site 2.  I
will keep the close eyes on these for the next days.

/var/log/ntpd.log looks optimal with no startup errors showing:

14 Sep 20:20:52 ntpd[27479]: 86.47.121.251 local addr 172.16.2.10 ->
<null>
14 Sep 20:21:23 ntpd[27479]: 54.248.16.192 local addr 172.16.2.10 ->
<null>
14 Sep 20:24:26 ntpd[27479]: ntpd exiting on signal 15 (Terminated: 15)
14 Sep 20:24:34 ntpd[24923]: ntpd 4.2.7p304@1.2483-o Fri Sep 14
19:23:11 UTC 2012 (1): Starting
14 Sep 20:24:34 ntpd[24924]: proto: precision = 1.676 usec (-19)
14 Sep 20:24:34 ntpd[24924]: Listen and drop on 0 v4wildcard 0.0.0.0:123
14 Sep 20:24:34 ntpd[24924]: Listen and drop on 1 v6wildcard [::]:123
14 Sep 20:24:34 ntpd[24924]: Listen normally on 2 bge0 172.16.2.10:123
14 Sep 20:24:34 ntpd[24924]: Listen normally on 3 lo0 [fe80::1%8]:123
14 Sep 20:24:34 ntpd[24924]: setsockopt IPV6_MULTICAST_IF 0 for
fe80::1%8 fails: Can't assign requested address
14 Sep 20:24:34 ntpd[24924]: Listen normally on 4 lo0 [::1]:123
14 Sep 20:24:34 ntpd[24924]: Listen normally on 5 lo0 127.0.0.1:123
14 Sep 20:24:34 ntpd[24924]: peers refreshed
14 Sep 20:24:34 ntpd[24924]: Listening on routing socket on fd #26 for
interface updates

- -- 
Ron Hahn, CISSP, CISA, ITIL-F
T: +353-76-603-4408 M: +353-86-600-5499 E: ron@hahn.ie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQU4ccAAoJEKA8j7vu2Jie90sH+waHOzxL/+EJ+fm6M33NEiSh
r8pFaXCjw3WyRPJqkCGsL3xAMHeJqFFfNi32wKYYAwnXM6i2UZhwKJ6MpAi89mAI
4ceKVhDrxEb8zkbZqyMlY4hzfAfuPyLnIobUS6Z4ZIK0lfyZrnWJ921yIGVyVoxX
y3KxhkEvLmT9vHBMTN/pWDLSSHO1H+nZxHMbqCfTrh+maQuuaEWkfNEdFQOwu0j0
N3ecR6wVbWlwIH7XeVPZ4iSObPlRjQ0gOImkFlaMuBGo1PKGg734+jRLghQnY5s5
SvTJrKIfjOwGSW8Da+eK2hgBLFU5DYoxKqswkhohWkeeZqj+pTCuy0ndPadHGak=
=d6Ge
-----END PGP SIGNATURE-----
Comment 7 ron 2012-09-18 18:02:40 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 14/09/2012 17:51, Cy Schubert wrote:
> Hi Ron,
> 
> Updated to p304. Can you see if this fixes the problem?
> 
> 
Colleagues,

I have updated two servers to p304 and this appears to have fixed the
dying problems with p303.  I have not seen this dying since first
starting of p304 some days ago.

Thanking you for addressing these issues so quickly.

Regards,

Ron

- -- 
Ron Hahn, CISSP, CISA, ITIL-F
T: +353-76-603-4408 M: +353-86-600-5499 E: ron@hahn.ie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQWKkwAAoJEKA8j7vu2JieC5sH/RauCnH3veM0QoB1XpqTigvw
tkdm2+Pwr2IWo/BRomsELPUdIaoinn45ogixxqtrr4lClOTLYl03Q2jf0oGeXjgz
FSGqEqOQ3fZD0p1x4PYrgFBJTeu8HSwGfrb9CnTW0Mn1A/uDuWRL+J9Y8uLojiGs
rfe+kvVb3ajX9RdfBQSTv+cTTwySNe5D/OaX3RUJOBCkcEbzA736YZ8GHWk7wN67
D7AxLkQF/dzpfMt+LwN2pXzfgAHXzn/6X0yGRRDNjdhIC3geny9Kp+CRQuKUxI0L
7T/M6vs0TPWJEsqgXwsHcEjTuzkAVdBqxwWSrjWL5Kl7E4c4EyGEfymE7EfrQSE=
=quWC
-----END PGP SIGNATURE-----
Comment 8 Cy.Schubert 2012-09-18 19:40:57 UTC
In message <5058A930.6000205@hahn.ie>, Ron Hahn writes:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 14/09/2012 17:51, Cy Schubert wrote:
> > Hi Ron,
> > 
> > Updated to p304. Can you see if this fixes the problem?
> > 
> > 
> Colleagues,
> 
> I have updated two servers to p304 and this appears to have fixed the
> dying problems with p303.  I have not seen this dying since first
> starting of p304 some days ago.
> 
> Thanking you for addressing these issues so quickly.

Thanks to Harlan (our ntp.org upline) for fixing this.


-- 
Cheers,
Cy Schubert <Cy.Schubert@komquats.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org
Comment 9 Cy Schubert freebsd_committer 2012-09-18 19:42:33 UTC
State Changed
From-To: feedback->closed

Fixed by update to 4.2.7p304.
Comment 10 ron 2012-09-20 14:40:22 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 18/09/2012 19:40, Cy Schubert wrote:
> In message <5058A930.6000205@hahn.ie>, Ron Hahn writes:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 14/09/2012 17:51, Cy Schubert wrote:
>>> Hi Ron,
>>> 
>>> Updated to p304. Can you see if this fixes the problem?
>>> 
>>> 
>> Colleagues,
>> 
>> I have updated two servers to p304 and this appears to have fixed
>> the dying problems with p303.  I have not seen this dying since
>> first starting of p304 some days ago.
>> 
>> Thanking you for addressing these issues so quickly.
> 
> Thanks to Harlan (our ntp.org upline) for fixing this.
> 
> 
Colleagues,

As of this emailing I have not seen any dying on the patch 304 ntpd on
four servers.  One is public facing (time.dhco.org) the pool and three
are the private facing.

Harlan, I am thinking that these issues are maybe now resolved? If I
can do any further tests let me know please.

Thank you,

Ron

- -- 
Ron Hahn, CISSP, CISA, ITIL-F
T: +353-76-603-4408 M: +353-86-600-5499 E: ron@hahn.ie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQWxzFAAoJEKA8j7vu2Jie6DoH+wT4wcamUTqmJjlqI7bKHeT9
wy7TXAG7U4+rB1yRjwjTjQAg8Y2q05/EDS3Vc4Lx4yVhofSUs+RJ2zomhjBZ9aVF
XHrfPzTWSH/rmQG4kW3JpVyW9aJ/KP0tuZBv9UgINHU2Ac09cJqUu9YyIAzq5SFz
dU2AZ/3fOlmk/V85Qw+LjHflEC3G2ReN0lCO+7VG046Lrbe/5/FsUD5LJuYNgTLC
daSgJkbQA5ol/dSzmAwERFVFDSk5axsDTUEkUOz+v2pqywUI8VrqADi/xgez6L/m
mxrWivFvldoLLT+RML7s8UE2R1gBSWjdawYpmNj0/Dc//wk77i/DK09XxfB8Xqg=
=6EBu
-----END PGP SIGNATURE-----
Comment 11 ron.hahn 2012-10-24 21:40:38 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Colleagues,

This problem has reappeared in the last days after running over 1
month without dying:

21 Oct 20:28:31 ntpd[1063]: 86.43.80.58 local addr 172.16.2.10 -> <null>
21 Oct 20:28:52 ntpd[1063]: ntp_monitor.c:136: ENSURE(punlinked ==
mon) failed
21 Oct 20:28:52 ntpd[1063]: exiting (due to assertion failure)
21 Oct 22:56:31 ntpd[46501]: ntpd 4.2.7p304@1.2483-o Fri Sep 14
19:23:11 UTC 2012 (1): Starting

This dying occurred 12.5 hours after a recycle of the power to the
server.  ntpd has been running 4 days now without dying since.

Regards,

Ron
- -- 
Ron Hahn, CISA, CISSP
Rose Hill
Rosslare Strand,
Wexford, IRELAND
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQiFJGAAoJEJMEFOwWFWJ1y6IIAKg71hBQm7CPHBiTlsX9J9IY
v1RkL0YZnfJYvkmcksuFFdTkbCTKIRAWgkLsWT4gsJoJaNyYRB5ju8a1Mj5W0GlN
DybxDL2bycpRYzvfpciK1eqd2g2DRDtMvTyqLFnCLqkzuGLJm0OnV3x2ZAtu9rZU
Z6D8ZLuywrW3fnNC0OljQ39BSXgTvXFzSNSJ3yh39yxd4TjvyleatiDpxQMIRVLt
Njgqf/+HI5T9BE/7lTSTrjlLJj29GSmdiziRMdfJTxbFrN9EzrclXDnK5tIFcYiH
9j8/hJLs+7AKmiE3qLzZPh2NMD9GodPqTUyJLJkyJ7wDKIjdSztCr+IJ2m3gr70=
=npr2
-----END PGP SIGNATURE-----
Comment 12 ron 2012-10-24 21:41:26 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Colleagues,

This problem has reappeared in the last days after running over 1
month without dying:

21 Oct 20:28:31 ntpd[1063]: 86.43.80.58 local addr 172.16.2.10 -> <null>
21 Oct 20:28:52 ntpd[1063]: ntp_monitor.c:136: ENSURE(punlinked ==
mon) failed
21 Oct 20:28:52 ntpd[1063]: exiting (due to assertion failure)
21 Oct 22:56:31 ntpd[46501]: ntpd 4.2.7p304@1.2483-o Fri Sep 14
19:23:11 UTC 2012 (1): Starting

This dying occurred 12.5 hours after a recycle of the power to the
server.  ntpd has been running 4 days now without dying since.

Regards,

Ron
- -- 
Ron Hahn, CISA, CISSP
Rose Hill
Rosslare Strand,
Wexford, IRELAND

- -- 
Ron Hahn, CISSP, CISA
T: +353-76-603-4408 M: +353-86-600-5499 E: ron@hahn.ie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQiFJ2AAoJEKA8j7vu2JieB9oIAIbCginSa0A/Q1zTtGfxteto
iYDTYJ3QFa+v7ipgig28TIrcdaUxg7q0/TxDLvvwztnIgBnG46bkYEQ6gV09xqg+
i7NJEvXD0UfIupZQw3WNCguSF5Ondb3fncyxsVUPJPObKW14iU+ERbW9xVlyMXTg
Hcua+qLnTKR9p9uUgStlDIAtf8dZZB6AbkUqz0VrwkdAZFQBAX7ojmqDtlvR3Oh4
Dd1x8aElWiS2qa+g4XpMv+WPT2rWoQomaFK+7xhQtisGdDFkBAyya99NlcnkE7Bc
jzxOLAEx6nEbfqVVQ45kXzm9kV7ql3mEd7XmpXyf6WSFtNaTLkdL4JDOg+Pu1NY=
=J3Gb
-----END PGP SIGNATURE-----
Comment 13 hart 2012-10-26 06:27:43 UTC
The assertion failure in ntp_monitor.c looks to me like an impossible
condition, meaning I'd suspect compiler or hardware problems.  Given
it recurs regularly, I'd suggest adding a bit of code before the
ENSURE() line which displays the value of mon and punlinked when they
don't match -- that may shed more light on how it's failing.
Essentially what's going on in this code path is mon_clearinterface()
is called when a local address goes away, and it runs through the list
of recently seen client IP addresses removing all entries which were
associated with (received on) the local address in question.  For each
such entry, it is first removed from the doubly-linked list chaining
all mrulist entries together (the UNLINK_DLIST), then it is removed
from the singly-linked list hanging off the hash bucket corresponding
to the remote IP address by remove_from_hash(), which asserts, and
finally the mrulist entry in question is put on the mrulist free list
for later reuse.  This is "impossible" in that at the time entries are
put on the doubly-linked mrulist, they are also always put on the
appropriate singly-linked hash list for the particular remote IP.  Yet
the assertion failure is saying the removal from that singly-linked
hash bucket list is failing somehow.

If there is a way to switch malloc implementations to one that has
more debugging checks, or to use a tool like valgrind, that might help
find (hypothetical) memory corruption triggering the failure.

Please consider filing a bug report directly at https://bugs.ntp.org/
(using cacert certificate, often causes trust warning that's expected)
if you suspect the problem is in ntpd code rather than a local or
compiler issue.

Thanks,
Dave Hart
Comment 14 ron 2012-10-26 09:34:42 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 26/10/2012 06:27, Dave Hart wrote:
> The assertion failure in ntp_monitor.c looks to me like an
> impossible condition, meaning I'd suspect compiler or hardware
> problems.  Given it recurs regularly, I'd suggest adding a bit of
> code before the ENSURE() line which displays the value of mon and
> punlinked when they don't match -- that may shed more light on how
> it's failing. Essentially what's going on in this code path is
> mon_clearinterface() is called when a local address goes away, and
> it runs through the list of recently seen client IP addresses
> removing all entries which were associated with (received on) the
> local address in question.  For each such entry, it is first
> removed from the doubly-linked list chaining all mrulist entries
> together (the UNLINK_DLIST), then it is removed from the
> singly-linked list hanging off the hash bucket corresponding to the
> remote IP address by remove_from_hash(), which asserts, and finally
> the mrulist entry in question is put on the mrulist free list for
> later reuse.  This is "impossible" in that at the time entries are 
> put on the doubly-linked mrulist, they are also always put on the 
> appropriate singly-linked hash list for the particular remote IP.
> Yet the assertion failure is saying the removal from that
> singly-linked hash bucket list is failing somehow.
> 
> If there is a way to switch malloc implementations to one that has 
> more debugging checks, or to use a tool like valgrind, that might
> help find (hypothetical) memory corruption triggering the failure.
> 
> Please consider filing a bug report directly at
> https://bugs.ntp.org/ (using cacert certificate, often causes trust
> warning that's expected) if you suspect the problem is in ntpd code
> rather than a local or compiler issue.
> 
> Thanks, Dave Hart
> 
Hallo Dave and Good Morning to you.

This is being seen on FreeBSD 8.3 AMD64 with the ntp-devel port.  When
I was running the patch 303 I have seen these errors on two machines.
When I upgraded these to patch 304 I have seen the error disappear for
5 weeks. Now it is happening on the pool machine. I have not seen this
error on the internal machine.

Machine number 1 is externally facing to the pool, machine 2 is
internally facing to my company LAN/WAN.  Machine 1 is receiving much
more traffic than machine 2. Both machines are FreeBSD 8.3. Both
machines are running the ntp-devel port p304. Machine 1 is a Tyan GS14
servver (Pentium 4 3.6GHz 1GB RAM RAID1). Machine 2 is a Atom D525
(1.8Ghz 4GB RAID1).

I have built these two machines with exactly the same procedure and am
patching both at the same time.

I will open the bug at ntp.org as you have requested today.

REgards,

Ron

- -- 
Ron Hahn, CISSP, CISA
T: +353-76-603-4408 M: +353-86-600-5499 E: ron@hahn.ie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQiksiAAoJEKA8j7vu2Jie9cAH/1MA5MTbcqbdoEJJCMtWYqok
5WgKuQeFhhxYMPRj0jse3h273anaE1JYceWqmUxVpzeLtnGJmW4/kZS91UEtnwy9
VSqZM2xKDvQD4AASGFWXjYZHZnzL1syf18X2G8aQ933SGThgo7Q30abnOG06zE//
dR36ykOrHUjCufu35NeF8QP3TOwRV6dNBmZrlftglxkaQWVrwPfeRAELJwlGk06G
NOqcCEn3AuGEX73ty+IqvhH4afYbu2mvnPORFd9ESJwy2+l/CQSQ0msVYnI7V1ad
sM2dJQ7x1sBfZlfkbz1rN7B7M4bFofj7MQ9rgzrxru6jyzBSku7LD+62e3AuzrU=
=VHEP
-----END PGP SIGNATURE-----