Bug 19479

Summary: processes stuck in 'ffsvgt' and 'FFS no' states
Product: Base System Reporter: wessels <wessels>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description wessels 2000-06-24 01:20:01 UTC
I'm benchmarking the Squid proxy cache, which generates a lot
of disk I/O.  After an hour or two, all of the Squid processes
get stuck in the 'ffsvgt' state, and one of them is stuck
in 'FFS no':

  PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  368 wessels   98   0   102M 99036K RUN    106:01 98.14% 98.14% squid
  375 wessels  -18   0  1272K   812K ffsvgt   2:23  0.00%  0.00% diskd
  373 wessels  -18   0  1272K   812K ffsvgt   2:22  0.00%  0.00% diskd
  374 wessels  -20   0  1272K   812K FFS no   2:22  0.00%  0.00% diskd
  372 wessels  -18   0  1272K   812K ffsvgt   2:20  0.00%  0.00% diskd
  370 wessels  -18   0  1272K   812K ffsvgt   2:19  0.00%  0.00% diskd
  371 wessels  -18   0  1272K   812K ffsvgt   2:19  0.00%  0.00% diskd

Although the system continues to run, it seems like any process that
tries to access a filesystem also becomes stuck in the ffsvgt
state.  I can't start new commands, log in, etc.  

Filesystems are mounted normally.  No special options:

/dev/da0c               /cache1         ufs     rw              0       3
/dev/da1c               /cache2         ufs     rw              0       3
/dev/da2c               /cache3         ufs     rw              0       3
/dev/da3c               /cache4         ufs     rw              0       3
/dev/da4c               /cache5         ufs     rw              0       3
/dev/da5c               /cache6         ufs     rw              0       3

I gather that 'FFS no' is related to filesystem vnodes.  I wonder
if I'm running out of vnodes, or if they're not getting freed?
Here's systat after the benchmark has been running for 20 minutes:

    2 users    Load  1.23  1.16  0.90                  Fri Jun 23 18:10

Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
        Tot   Share      Tot    Share    Free         in  out     in  out
Act   68748    1064    74260     3860   21700 count
All  513584    1440  3026816     4408         pages    
                                                       53 cow    Interrupts
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt     75 zfod   2377 total
     1     7  4       527  297 6247 2377  268  152  50408 wire    100 clk0 irq0
                                                    71716 act     128 rtc0 irq8
45.3%Sys   6.3%Intr 47.1%User  0.0%Nice  1.2%Idl   370040 inact  1864 pci irq10
|    |    |    |    |    |    |    |    |    |      20564 cache   266 pci irq11
=======================+++>>>>>>>>>>>>>>>>>>>>>>     1136 free        fdc0 irq6
                                                          daefr    19 wdc0 irq14
Namei         Name-cache    Dir-cache                 106 prcfr
    Calls     hits    %     hits    %                     react
     1920     1529   80        6    0                     pdwake
                                                      525 pdpgs
Discs   da0   da1   da2   da3   da4   da5   wd0           intrn
KB/t   7.18  7.38  8.13  8.26 10.58  8.03  8.00      8342 buf
tps      55    57    50    40    18    47     1       202 dirtybuf
MB/s   0.39  0.41  0.39  0.32  0.18  0.37  0.01     32702 desiredvnodes
% busy   25    24    25    16    12    21     0     55047 numvnodes
                                                       24 freevnodes

How-To-Repeat: Install the latest squid-2.4 code and pummel it with a
Web Polygraph workload.
Comment 1 Mike Barcroft freebsd_committer freebsd_triage 2001-07-22 05:24:15 UTC
State Changed
From-To: open->feedback


Does this problem still occur in newer versions of FreeBSD, 
such as 4.3-RELEASE?
Comment 2 Mike Barcroft freebsd_committer freebsd_triage 2001-07-22 18:32:45 UTC
Adding to Audit-Trail.

On Sun, Jul 22, 2001 at 05:56:16AM -0600, Duane Wessels wrote:
> 
> 
> On Sat, 21 Jul 2001 mike@FreeBSD.org wrote:
> 
> > Synopsis: processes stuck in 'ffsvgt' and 'FFS no' states
> > 
> > State-Changed-From-To: open->feedback
> > State-Changed-By: mike
> > State-Changed-When: Sat Jul 21 21:24:15 PDT 2001
> > State-Changed-Why: 
> > 
> > Does this problem still occur in newer versions of FreeBSD,
> > such as 4.3-RELEASE?
> > 
> > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479
> 
> Nope.  I've only seen this in 3.x versions.  I haven't checked any
> recent 3.x versions.  All my servers run 4.x now.  Back then, it
> was really easy to reproduce the problem, however.
> 
>
Comment 3 Mike Barcroft freebsd_committer freebsd_triage 2001-07-22 18:38:06 UTC
State Changed
From-To: feedback->closed


This is no longer a problem in newer versions of FreeBSD.
Comment 4 Mike Barcroft freebsd_committer freebsd_triage 2001-09-22 04:07:13 UTC
Adding to Audit-Trail.

----- Forwarded message from Chuck Cranor <chuck@research.att.com> -----

Delivered-To: mike@freebsd.org
Date: Fri, 21 Sep 2001 13:41:34 -0400
From: Chuck Cranor <chuck@research.att.com>
To: freebsd-stable@FreeBSD.org
Cc: mike@FreeBSD.org
Subject: more on 4.4-RELEASE kernel deadlock ("FFS node")
User-Agent: Mutt/1.2.5i
Organization: AT&T Labs-Research

this is a followup to my previous message:

first, the problem i am seeing is almost identical to the one reported
in kern/19479 (see http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479) 
which was closed by mike@freebsd back in july.   my data would suggest
that the PR should be re-opened, because it doesn't seem fixed?


i reran a "make release" while having a "vmstat -m | grep 'FFS node'"
running every three seconds in another window.  the memory allocation
for "FFS node" is exploding during the "make release"... here is the
last bit of vmstat -m output before the kernel deadlocked:

Memory statistics by type                          Type  Kern
        Type  InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
....
     FFS node199352 99676K  99677K102400K   648929    0     0  512
     FFS node199871 99936K  99936K102400K   649930    0     0  512
     FFS node200457100229K 100229K102400K   651733    0     0  512
     FFS node200794100397K 100398K102400K   653082    0     0  512
     FFS node200875100438K 100440K102400K   653902    0     0  512
     FFS node201466100733K 100733K102400K   655886    0     0  512
     FFS node202732101366K 101367K102400K   658611    0     0  512
     FFS node203300101650K 101651K102400K   660008    0     0  512
     FFS node203851101926K 101926K102400K   661718    0     0  512
     FFS node203923101962K 101962K102400K   661927    0     0  512
     FFS node204489102245K 102245K102400K   663650    0     0  512
<deadlock here!>

102245K seems like an awful lot of memory for FFS nodes?
what would cause this?   does anyone else see very large FFS node
memory allocation during "make release"?

i got a stack trace of the process stuck in "FFS node"... it is
basically:
	open
	vn_open
	namei
	lookup
	ufs_vnoperate
	vfs_cache_lookup
	ufs_vnoperate
	ufs_lookup
	ffs_vget +0x8b
	malloc + 0x1a3
	tsleep

clearly we are reaching the malloc limit for "FFS node" and hanging.


chuck

----- End forwarded message -----
Comment 5 Mike Barcroft freebsd_committer freebsd_triage 2001-09-22 04:12:45 UTC
State Changed
From-To: closed->open


The originator's problem disappeared, but unfortunately another user 
is seeing a similar problem.
Comment 6 iedowse freebsd_committer freebsd_triage 2001-12-02 21:26:17 UTC
State Changed
From-To: open->feedback


This problem should be fixed in recent -STABLE versions. One problem 
was that I had accidentally pushed the size of the inode struct 
over 256 bytes, but a bigger issue relating to limiting the number 
of vnodes was fixed by Matt Dillon about a month ago. Can you confirm 
that this is fixed now?
Comment 7 iedowse freebsd_committer freebsd_triage 2002-06-02 12:11:39 UTC
State Changed
From-To: feedback->closed


Feedback timeout.