Bug 17134

Summary: problem with 3.0-RELEASE cron forgetting jobs
Product: Base System Reporter: Todd Hansen <tshansen>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.0-RELEASE   
Hardware: Any   
OS: Any   

Description Todd Hansen 2000-03-02 20:50:02 UTC
We run a distributed system of currently 102 active measurement probes around the internet (all running freeebsd 3.0).
Basically we are noticing that periodically (almost regularly) the cron daemon will forget about some of our jobs, even
though it lists them with the crontab -l command. This happens on about 10 systems in about 2 months. 

Anyway, the problem is related to what was mentioned in bin/6004. Except we have more information and a greater need to work with you to 
figure this out. Unfortunatly we are still running 3.0 until we can figure out if this is fixed in 3.4 since it is a big deal to upgrade 102 sites remotely.

Eventually when cron forgets about a job, it still trys to execute the job, but instead of 
actually, executing the job we see something like this in the log: 

Mar  2 12:20:00 nai-a-odun /USR/SBIN/CRON[6248]: (actmon) CMD ()

Where the cmd is blank but the command is run at the correct time. The interesting thing is 
other commands are still run fine while this command is not. The line that is affected the most
by this problem is the line in the above crontab that runs ./dogentrace every 10 minutes.

thanks.
Todd

Fix: 

We would love one, if it can be found.
How-To-Repeat: 
It seems to repeat within a reasonable amount of time on our systems, probably  becuase we have so many.
Comment 1 Gregory Bond 2000-03-02 23:03:36 UTC
I have seen similar behaviour as well on various versions up to 3.4-STABLE.  In
particular, if you are testing cron jobs and repeatedly putting crontab entries
in for say 5 minutes in advance of the current time, sometimes these jobs don't
run (but the cron log entry is generated as mentioned in the PR).

I've also seen this happen on cron on Solaris 2.6, btw, which is also _very_
broken with the handling of quotes, escaped % characters. There is no way under
Solaris cron to get the literal string '%' (i.e. squote pct squote) into a
command.  Unescaped % is a newline (as per the manual), '\%' is passed as '\%'.
 This does work as expected on FreeBSD.
Comment 2 Sheldon Hearn 2000-03-03 10:18:14 UTC
On Thu, 02 Mar 2000 12:40:45 PST, Todd Hansen wrote:

> Anyway, the problem is related to what was mentioned in
> bin/6004. Except we have more information and a greater need to work
> with you to figure this out.

Guy Helmer closed that PR because he couldn't get any more information
from the originator.  I'm copying him on this mail in the hopes that
he was actually interested in that PR. :-)

Ciao,
Sheldon.
Comment 3 ghelmer 2000-03-03 16:41:45 UTC
On Fri, 3 Mar 2000, Sheldon Hearn wrote:

> On Thu, 02 Mar 2000 12:40:45 PST, Todd Hansen wrote:
> 
> > Anyway, the problem is related to what was mentioned in
> > bin/6004. Except we have more information and a greater need to work
> > with you to figure this out.
> 
> Guy Helmer closed that PR because he couldn't get any more information
> from the originator.  I'm copying him on this mail in the hopes that
> he was actually interested in that PR. :-)

I closed 6004 since I was neither able to verify that it was still a
problem nor obtain further clues.  I am suprised that more people have not
encountered this problem if it is simply exhibited with
frequently-executed jobs.  I will try running cron with some debugging
options enabled (-x proc and maybe some others), and see if I can
duplicate this; if anyone else wants to do so also, that's fine :-)

It may be helpful to obtain a core dump and executable image from from a
cron daemon built with "cd /usr/src/usr.sbin/cron && make CFLAGS=-g
LDFLAGS=-static clean all" -- then run cron and "kill -6" it after it has
started to exhibit this behavior. The bug's cause could be a stray pointer
or an off-by-one error, but seeing what was in the data structures may
help.

Guy
Comment 4 tshansen 2000-03-03 19:56:05 UTC
I will see what I can do with our systems to help you. Maybe I can run the
tests you mentioned below. At the very least I can get you a core from a
3.0-RELEASE cron binary.
	-todd

On Fri, 3 Mar 2000, Guy Helmer wrote:

> On Fri, 3 Mar 2000, Sheldon Hearn wrote:
> 
> > On Thu, 02 Mar 2000 12:40:45 PST, Todd Hansen wrote:
> > 
> > > Anyway, the problem is related to what was mentioned in
> > > bin/6004. Except we have more information and a greater need to work
> > > with you to figure this out.
> > 
> > Guy Helmer closed that PR because he couldn't get any more information
> > from the originator.  I'm copying him on this mail in the hopes that
> > he was actually interested in that PR. :-)
> 
> I closed 6004 since I was neither able to verify that it was still a
> problem nor obtain further clues.  I am suprised that more people have not
> encountered this problem if it is simply exhibited with
> frequently-executed jobs.  I will try running cron with some debugging
> options enabled (-x proc and maybe some others), and see if I can
> duplicate this; if anyone else wants to do so also, that's fine :-)
> 
> It may be helpful to obtain a core dump and executable image from from a
> cron daemon built with "cd /usr/src/usr.sbin/cron && make CFLAGS=-g
> LDFLAGS=-static clean all" -- then run cron and "kill -6" it after it has
> started to exhibit this behavior. The bug's cause could be a stray pointer
> or an off-by-one error, but seeing what was in the data structures may
> help.
> 
> Guy
> 
> 
>
Comment 5 Mike Barcroft freebsd_committer freebsd_triage 2001-07-22 01:12:34 UTC
State Changed
From-To: open->feedback


Does this problem still occur in newer versions of FreeBSD, 
such as 4.3-RELEASE?
Comment 6 Mike Barcroft freebsd_committer freebsd_triage 2001-08-26 06:58:44 UTC
Adding to Audit-Trail.

----- Forwarded message from Todd Hansen <tshansen@nlanr.net> -----

Delivered-To: mike@freebsd.org
X-Authentication-Warning: mave.nlanr.net: tshansen owned process doing -bs
Date: Mon, 6 Aug 2001 08:59:01 -0700 (PDT)
From: Todd Hansen <tshansen@nlanr.net>
To: mike@FreeBSD.org
Cc: freebsd-bugs@FreeBSD.org, tonym@nlanr.net
Subject: Re: bin/17134: problem with 3.0-RELEASE cron forgetting jobs
In-Reply-To: <200107220012.f6M0Cmg21052@freefall.freebsd.org>

I have forgotten how far we tested, maybe tony knows. I think we tested
with 4.0 cron? I know we stopped playing with the bug well before 4.3 was
released. We were never able to find a version which did not exhibit the
bug. However, the problem was very rare and we can only see it because of
the number of machines we are running. I don't know if we have seen the
bug recently though.
	-todd


----- End forwarded message -----
Comment 7 Mike Barcroft freebsd_committer freebsd_triage 2001-08-26 07:00:36 UTC
Adding to Audit-Trail.

----- Forwarded message from Tony McGregor <tonym@cs.waikato.ac.nz> -----

Delivered-To: mike@freebsd.org
Date: Fri, 24 Aug 2001 10:19:44 +1200 (NZST)
From: Tony McGregor <tonym@cs.waikato.ac.nz>
To: Todd Hansen <tshansen@nlanr.net>
Cc: mike@FreeBSD.org, freebsd-bugs@FreeBSD.org
Subject: Re: bin/17134: problem with 3.0-RELEASE cron forgetting jobs
In-Reply-To: <Pine.BSF.4.21.0108060856300.93879-100000@mave.nlanr.net>


We haven't seen it since I added code to limit the number of concurrent
traceroutes.  That adds weight to the theory that the bug occurred when
the system ran out of memory.

I haven't tested recent version of FreeBSD.

On Mon, 6 Aug 2001, Todd Hansen wrote:

> I have forgotten how far we tested, maybe tony knows. I think we tested
> with 4.0 cron? I know we stopped playing with the bug well before 4.3 was
> released. We were never able to find a version which did not exhibit the
> bug. However, the problem was very rare and we can only see it because of
> the number of machines we are running. I don't know if we have seen the
> bug recently though.
> 	-todd

----------------------------------------------------------------------------
Tony McGregor                   Mail:   T.McGregor@cs.waikato.ac.nz 
Department of Computer Science  Phone:  +64 7 838 4651 
Waikato University              Fax:    +64 7 858 5095       
Private Bag 3105                Home:   +64 7 825 5040 mobile: (021)313004
Hamilton, New Zealand           www:    http://www.cs.waikato.ac.nz/~tonym
----------------------------------------------------------------------------



----- End forwarded message -----
Comment 8 Sheldon Hearn freebsd_committer freebsd_triage 2002-01-30 09:20:33 UTC
State Changed
From-To: feedback->closed

Automatic feedback timeout.  This PR remained unchanged in the feedback 
state for more than 4 months. 

If additional feedback that warrants the re-opening of this PR is 
available but not included in the audit trail, please include the 
feedback in a reply to this message (preserving the Subject line) and 
ask that the PR be re-opened.