Bug 103455

Summary: "swap_pager: indefinite wait buffer" with page file enabled (causes lockups)
Product: Base System Reporter: Nick Withers <nick>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 6.2-PRERELEASE   
Hardware: Any   
OS: Any   

Description Nick Withers 2006-09-21 12:40:20 UTC
This bug was (and still is) noted on the 6.1-RELEASE Release Engineering
"to do" page ("http://www.freebsd.org/releases/6.1R/todo.html"), but I'm
filing this because I have been unable to find a PR for the issue, nor anyone
else who does (I emailed Don Lewis about it a few days ago but have yet to
hear back).

With a page _file_ enabled on one of my servers (which routinely, if not
pretty well always, has to use the page file), I occasionally get messages
similar to these:
____

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43638, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43947, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 44007, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 47939, size: 32768
____

Often enough, they're just "here and there", and nothing I worry too much
about. However, they're often also relatively frequently accompanied by
system lockups: The console will be chock full of messages similar to the
above and the machine will have to be powered off nastily.

The system is still responsive to pings during these lockups and the
"swap_pager" messages do not make it to "/var/log/messages"

The system in question has 64 MB of physical RAM, a 102 MB swap partition
and a 128 MB swap file (I know this is a pretty shoddy configuration...
unfortunately, the machine in question doesn't have an abundance of hard
disk space at its disposal, and I don't have an abundance of money :-))

How-To-Repeat: If I run something particularly memory intensive such as a "make -j2
buildworld" I can generally reproduce the issue fairly reliably.
Comment 1 rbressers 2007-01-25 08:38:21 UTC
Hi,

I noticed the same problem on 6.2-RELEASE on a heavily loaded storage
system running rsync.
When multiple clients connect to the rsync box and start filling up the
memory with file-lists, the machine (once in a while) locks up with the
same messages : 

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43638, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43947, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 44007, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 47939, size: 32768

The machine becomes non-responsive unfortunatly.

Regards,

Remco Bressers
Signet B.V.
Comment 2 lee 2007-11-15 10:36:29 UTC
"Me too"...

Running 7.0-CURRENT, although I've seen this issue since 6.x.

I note that many online resources claim that this error arises with either 
disks about to fail and/or bad cables/controller.

In order to rule this out, I purchased brand new hardware:

atapci2: <HighPoint HPT374 (channel 0+1) UDMA133 controller> port 
0xdb00-0xdb07,0xdc00-0xdc03,0xdd00-0xdd07,0xde00-0xde03,0xdf00-0xdfff irq 
17 at device 11.0 on pci0
atapci3: <HighPoint HPT374 (channel 2+3) UDMA133 controller> port 
0xe000-0xe007,0xe100-0xe103,0xe200-0xe207,0xe300-0xe303,0xe400-0xe4ff irq 
17 at device 11.1 on pci0
[..]
ad8: 238475MB <Seagate ST3250620AS 3.AAE> at ata4-master UDMA133
ad10: 238475MB <Seagate ST3250620AS 3.AAE> at ata5-master UDMA133
ad12: 238475MB <Seagate ST3250620AS 3.AAE> at ata6-master UDMA133
[..]
ar0: 476950MB <HighPoint v3 RocketRAID RAID5 (stripe 64 KB)> status: READY
ar0: disk0 READY using ad8 at ata4-master
ar0: disk1 READY using ad10 at ata5-master

... and despite the new hardware, the issue still occurs.

I have checked the drives with smartmontools and see no issues reported.

This leads me to believe that it's not (in this case) related to failing 
hardware.

This particular machine is a home system that performs filesharing, 
firewall, DNS and mail server duties.  It's by no means heavily loaded.

When the issue does occur, the server is unresponsive on the console, and 
no other access (ssh, www, etc) is possible.  The server does respond to a 
ping.

-L
Comment 3 Jaakko Heinonen freebsd_committer freebsd_triage 2011-03-04 15:40:44 UTC
State Changed
From-To: open->feedback

Can you still reproduce this on a supported release?
Comment 4 Jaakko Heinonen freebsd_committer freebsd_triage 2011-03-26 16:02:37 UTC
State Changed
From-To: feedback->closed

Submitter can't reproduce anymore.
Comment 5 Arnaud.Miche 2011-05-17 10:25:37 UTC
Hi,

sorry if I'm digging out an old subject but I still have this error. It 
occurs quite often and always during build of applications.

My configuration :

Hardware
-------------
Apple Powerbook G4 "Titanium" @550MHz
RAM : 256 MB (512 MB of swap space)
HD: 20 GB (filled at 30%)

OS
---
FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:18:47 UTC 2011 
root@xserve.lan.xcllnt.net:/usr/obj/usr/src/sys/GENERIC powerpc

hw.ata.ata_dma=1
hw.ata.atapi_dma=1

When the problem occurs some messages are interlaced with the "swap_pager" 
messages, I give them here:

ad0: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing 
request directly
acd0: WARNING - TEST_UNIT_READY taskqueue timeout - completing request 
directly
ad0: WARNING - SETFEATURES SET ENABLE RCACHE taskqueue timeout - completing 
request directly
ad0: WARNING - SETFEATURES SET ENABLE WCACHE taskqueue timeout - completing 
request directly
ad0: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=215821
acd0: WARNING - TEST_UNIT_READY freeing taskqueue zombie request

I don't know if this is a clue but is it not weird to have some messages 
coming from the cdrom drive ?

Have you an idea of where do I debug and search to understand the error ?

Many many thanks in advance.

Arnaud
    
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
Comment 6 Nick Withers 2011-05-17 14:49:30 UTC
On Tue, 2011-05-17 at 11:25 +0200, "Arnaud Miché" wrote:
> Hi,


(snip)

Not necessarily - for instance I fully expect the following on my
desktop when I boot: "cd0: Attempt to query device size failed: NOT
READY, Medium not present - tray closed"... Because there isn't a CD in
the drive, as the message suggests. Similarly, the far more scary
sounding "acd0: FAILURE - ATA_IDENTIFY status=51<READY,DSC,ERROR>
error=4<ABORTED> LBA=0" makes sense to me, as ATA_IDENTIFY's trying to
identify the media and... There isn't one :-)

That having been said, from what I understand of the "swap_pager:
indefinite wait buffer: (...)" messages, it's because the swapper
couldn't get access to the swap within the pre-defined timeout period
(could well be totally wrong here, but see Kris Kennaway's response here
http://groups.google.com/group/mailing.freebsd.stable/browse_thread/thread/2e7faeeaca719c52/cdcd4601ce1b90c5). [Please check the URL your mail client's generated before telling me the link doesn't work, or google "swap_pager: indefinite wait buffer"]). If there's an issue reading / writing to your swap area[1] such that it takes a long time to read / write from swap then you *should* see this message - it's a good thing.

Probably worth bearing in mind that PowerPC is NOT a FreeBSD Tier 1
platform and don't be too surprised if there are issues there that may
not be apparent to the vast majority of FreeBSd users. I myself used to
have some "fun" on that very architecture before DMA was supported.

[1]: Do you actually use a swap *****file*****? I really doubt you do -
a swap partition ain't the same thing, and hence not really what PR
kern/103455's about.

Are you actually seeing a lock-up at all? If not, then again this isn't
really what PR kern/103455's about. If I google "swap_pager: indefinite
wait buffer", the very first result (well, for me;
http://www.unixguide.net/freebsd/faq/05.30.shtml) tells you the
following:
____

This means that a process is trying to page memory to disk, and the page
attempt has hung trying to access the disk for more than 20 seconds. It
might be caused by bad blocks on the disk drive, disk wiring, cables, or
any other disk I/O-related hardware. If the drive itself is actually
bad, you will also see disk errors in /var/log/messages and in the
output of dmesg. Otherwise, check your cables and connections.
____

Not any kind of official source or anything, and does disagree with the
30 second quote from good Mr. Kennaway's response (linked to earlier)
but still, is consistent with what I recall from past research on the
issue... Dodgy cables, etc..

> Have you an idea of where do I debug and search to understand the error ?


A WWW search engine - sorry man :-)

-- 
Nick Withers
email: nick@nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446
Comment 7 Jason Helfman freebsd_committer freebsd_triage 2012-05-31 18:35:56 UTC
Hi,

I just spotted this today on a system of ours that is running 7.4-p4. What
information can I gather for this? I am not certain what a "page file," is
however, so I am doubtful that I am running this.

this is the error:

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 67, size: 4096

Only one error at this point, though. System is still responsive, and no
other issues have come up.

Thanks!
-jgh

-- 
Jason Helfman
FreeBSD Committer | http://people.freebsd.org/~jgh | The Power To Serve