Bug 27661

Summary: >1000 ipfw rules and heavy traffic crash the system
Product: Base System Reporter: Pekka Savola <pekkas>
Component: kernAssignee: luigi-bugs
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.3-STABLE   
Hardware: Any   
OS: Any   

Description Pekka Savola 2001-05-26 15:40:00 UTC
See and the threads mentioned there: http://docs.freebsd.org/cgi/getmsg.cgifetch=856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable

I noticed that if you create too many ipfw rules, through which extra
traffic must pass, rather soon you will crash the system.

In this scenario, adding >1000 non-matching rules before the
standard tcp established rule, and doing 20Mbit/s steady through the
rules, caused kernel load to go to ~8.0 (Dual P3/866) and after less than
an hour, crash the system.

==> Of course, adding >1000 non-matching rules is stupid, but that is not
==> the point.  The system should not crash this way, without any error
==> messages.

The crash causes all userspace to become totally non-responsive: ping and
traceroute from the outside work ok, but all existing connections become
non-responsive.  New TCP establishment work until when you'd have
to communicate with the daemon.  Console keyboard does not react to
CTRL-ALT-DEL.

This is _not_ caused by mbuf/mbuf cluster usage; I have a cronjob saving
these as a debugging information every two minutes, and there was no
significant increase there; peak had never gone more than the half of the
maximum.

The same crash has happened with smaller number of non-matching rules too;
e.g. 600.  Usually took longer this way.

This had happened like 3-4 before I realized what was going wrong.

Probably not relevant, but after every crash, there were usually a _lot_
of FS inconsistancies.

Fix: 

Rearrange the ipfw rules (does not fix the _real_ problem, ie.
the system should not crash like this, without any errors, though).
How-To-Repeat: Add a lot of ipfw rules traffic must pass through.
Generate _loads_ of traffic (20+ Mbit/s).
Wait for a few hours.
Comment 1 Kris Kennaway 2001-05-27 00:32:17 UTC
On Sat, May 26, 2001 at 07:31:01AM -0700, pekkas@netcore.fi wrote:

> >Description:
> See and the threads mentioned there: http://docs.freebsd.org/cgi/getmsg.cgifetch=856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable


This URL does not seem to be valid.

> I noticed that if you create too many ipfw rules, through which extra
> traffic must pass, rather soon you will crash the system.
> 
> In this scenario, adding >1000 non-matching rules before the
> standard tcp established rule, and doing 20Mbit/s steady through the
> rules, caused kernel load to go to ~8.0 (Dual P3/866) and after less than
> an hour, crash the system.


When you say "crash" do you mean "panic" (the usual meaning), or "lock
up"?  If the former, please obtain a panic traceback to aid in debugging.

It sounds to me as if this is just a case of giving the system too
much work to do.  If it has to spend more time processing a packet
than the time between packet arrival, things are going to go badly.

As far as I know ipfw doesn't have an 'exit clause' which drops
packets if they are taking too long to process.  I don't know if it
would be easy to add one; the best solution, as you noted, is to not
write inefficient rulesets.

Kris
Comment 2 Pekka Savola 2001-05-27 07:13:53 UTC
On Sat, 26 May 2001, Kris Kennaway wrote:

> On Sat, May 26, 2001 at 07:31:01AM -0700, pekkas@netcore.fi wrote:
>
> > >Description:
> > See and the threads mentioned there: http://docs.freebsd.org/cgi/getmsg.cgifetch=856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable
>
> This URL does not seem to be valid.

Hmm, cut'n'paste error perhaps.  Again:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable

Anyway, these were threads on freebsd-stable:

"4.3-S: >1000 ipfw rules and heavy traffic crash the system" (18 May)
"4.3-S: No buffer space available" (5 May)

> > I noticed that if you create too many ipfw rules, through which extra
> > traffic must pass, rather soon you will crash the system.
> >
> > In this scenario, adding >1000 non-matching rules before the
> > standard tcp established rule, and doing 20Mbit/s steady through the
> > rules, caused kernel load to go to ~8.0 (Dual P3/866) and after less than
> > an hour, crash the system.
>
> When you say "crash" do you mean "panic" (the usual meaning), or "lock
> up"?  If the former, please obtain a panic traceback to aid in debugging.

lock up.  I hope it had paniced, so it could be traceable :-/

> It sounds to me as if this is just a case of giving the system too
> much work to do.  If it has to spend more time processing a packet
> than the time between packet arrival, things are going to go badly.

There seems a to be a point of no return there: if some amount of
processing is done, the TCP connections do not send new data anymore etc.
I haven't monitored what the bandwidth usage is like then, but I suspect
it is very little; I have doubts that there is the equally high number of
incoming connections then.

So, I don't think this is just too slow processing.  It looks like too
heavy processing triggers some big problem causing the lock-up.

> As far as I know ipfw doesn't have an 'exit clause' which drops
> packets if they are taking too long to process.  I don't know if it
> would be easy to add one; the best solution, as you noted, is to not
> write inefficient rulesets.

I'm not used to (partial) kernel lockup's without any messages printed on
console or syslog; these are very difficult to figure out what is causing
them.  That is why I'd like a "right" solution for this, not just "Don't
Do It". Someone is bound to do the same thing sooner or later and wonder
about fscking FreeBSD locking up all the time without explanation.

-- 
Pekka Savola                 "Tell me of difficulties surmounted,
Netcore Oy                   not those you stumble over and fall"
Systems. Networks. Security.  -- Robert Jordan: A Crown of Swords
Comment 3 Luigi Rizzo freebsd_committer freebsd_triage 2001-09-03 21:25:17 UTC
State Changed
From-To: open->closed

This report basically says that when the system is in 
livelock conditions it might crash. 

This does not seem specific to the ipfw code -- the kernel is 
full of places where you can have potentially very time consuming 
processing procedures at various priorities (or, while holding locks) 
and cause havoc to the system. 
If this report identified a specific problem, i'd have no problem 
in fixing it, but there is just nothing evident here. This is why 
I am closing this PR. 



Comment 4 Luigi Rizzo freebsd_committer freebsd_triage 2001-09-03 21:25:17 UTC
Responsible Changed
From-To: freebsd-bugs->luigi-bugs

i have been involved in ipfw mainteinance lately