| Summary: | "swap_pager: indefinite wait buffer" with page file enabled (causes lockups) | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Nick Withers <nick> |
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 6.2-PRERELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
Hi, I noticed the same problem on 6.2-RELEASE on a heavily loaded storage system running rsync. When multiple clients connect to the rsync box and start filling up the memory with file-lists, the machine (once in a while) locks up with the same messages : swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43638, size: 4096 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43947, size: 32768 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 44007, size: 4096 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 47939, size: 32768 The machine becomes non-responsive unfortunatly. Regards, Remco Bressers Signet B.V. "Me too"... Running 7.0-CURRENT, although I've seen this issue since 6.x. I note that many online resources claim that this error arises with either disks about to fail and/or bad cables/controller. In order to rule this out, I purchased brand new hardware: atapci2: <HighPoint HPT374 (channel 0+1) UDMA133 controller> port 0xdb00-0xdb07,0xdc00-0xdc03,0xdd00-0xdd07,0xde00-0xde03,0xdf00-0xdfff irq 17 at device 11.0 on pci0 atapci3: <HighPoint HPT374 (channel 2+3) UDMA133 controller> port 0xe000-0xe007,0xe100-0xe103,0xe200-0xe207,0xe300-0xe303,0xe400-0xe4ff irq 17 at device 11.1 on pci0 [..] ad8: 238475MB <Seagate ST3250620AS 3.AAE> at ata4-master UDMA133 ad10: 238475MB <Seagate ST3250620AS 3.AAE> at ata5-master UDMA133 ad12: 238475MB <Seagate ST3250620AS 3.AAE> at ata6-master UDMA133 [..] ar0: 476950MB <HighPoint v3 RocketRAID RAID5 (stripe 64 KB)> status: READY ar0: disk0 READY using ad8 at ata4-master ar0: disk1 READY using ad10 at ata5-master ... and despite the new hardware, the issue still occurs. I have checked the drives with smartmontools and see no issues reported. This leads me to believe that it's not (in this case) related to failing hardware. This particular machine is a home system that performs filesharing, firewall, DNS and mail server duties. It's by no means heavily loaded. When the issue does occur, the server is unresponsive on the console, and no other access (ssh, www, etc) is possible. The server does respond to a ping. -L State Changed From-To: open->feedback Can you still reproduce this on a supported release? State Changed From-To: feedback->closed Submitter can't reproduce anymore. Hi, sorry if I'm digging out an old subject but I still have this error. It occurs quite often and always during build of applications. My configuration : Hardware ------------- Apple Powerbook G4 "Titanium" @550MHz RAM : 256 MB (512 MB of swap space) HD: 20 GB (filled at 30%) OS --- FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:18:47 UTC 2011 root@xserve.lan.xcllnt.net:/usr/obj/usr/src/sys/GENERIC powerpc hw.ata.ata_dma=1 hw.ata.atapi_dma=1 When the problem occurs some messages are interlaced with the "swap_pager" messages, I give them here: ad0: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly acd0: WARNING - TEST_UNIT_READY taskqueue timeout - completing request directly ad0: WARNING - SETFEATURES SET ENABLE RCACHE taskqueue timeout - completing request directly ad0: WARNING - SETFEATURES SET ENABLE WCACHE taskqueue timeout - completing request directly ad0: WARNING - SET_MULTI taskqueue timeout - completing request directly ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=215821 acd0: WARNING - TEST_UNIT_READY freeing taskqueue zombie request I don't know if this is a clue but is it not weird to have some messages coming from the cdrom drive ? Have you an idea of where do I debug and search to understand the error ? Many many thanks in advance. Arnaud -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de On Tue, 2011-05-17 at 11:25 +0200, "Arnaud Miché" wrote: > Hi, (snip) Not necessarily - for instance I fully expect the following on my desktop when I boot: "cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed"... Because there isn't a CD in the drive, as the message suggests. Similarly, the far more scary sounding "acd0: FAILURE - ATA_IDENTIFY status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=0" makes sense to me, as ATA_IDENTIFY's trying to identify the media and... There isn't one :-) That having been said, from what I understand of the "swap_pager: indefinite wait buffer: (...)" messages, it's because the swapper couldn't get access to the swap within the pre-defined timeout period (could well be totally wrong here, but see Kris Kennaway's response here http://groups.google.com/group/mailing.freebsd.stable/browse_thread/thread/2e7faeeaca719c52/cdcd4601ce1b90c5). [Please check the URL your mail client's generated before telling me the link doesn't work, or google "swap_pager: indefinite wait buffer"]). If there's an issue reading / writing to your swap area[1] such that it takes a long time to read / write from swap then you *should* see this message - it's a good thing. Probably worth bearing in mind that PowerPC is NOT a FreeBSD Tier 1 platform and don't be too surprised if there are issues there that may not be apparent to the vast majority of FreeBSd users. I myself used to have some "fun" on that very architecture before DMA was supported. [1]: Do you actually use a swap *****file*****? I really doubt you do - a swap partition ain't the same thing, and hence not really what PR kern/103455's about. Are you actually seeing a lock-up at all? If not, then again this isn't really what PR kern/103455's about. If I google "swap_pager: indefinite wait buffer", the very first result (well, for me; http://www.unixguide.net/freebsd/faq/05.30.shtml) tells you the following: ____ This means that a process is trying to page memory to disk, and the page attempt has hung trying to access the disk for more than 20 seconds. It might be caused by bad blocks on the disk drive, disk wiring, cables, or any other disk I/O-related hardware. If the drive itself is actually bad, you will also see disk errors in /var/log/messages and in the output of dmesg. Otherwise, check your cables and connections. ____ Not any kind of official source or anything, and does disagree with the 30 second quote from good Mr. Kennaway's response (linked to earlier) but still, is consistent with what I recall from past research on the issue... Dodgy cables, etc.. > Have you an idea of where do I debug and search to understand the error ? A WWW search engine - sorry man :-) -- Nick Withers email: nick@nickwithers.com Web: http://www.nickwithers.com Mobile: +61 414 397 446 Hi, I just spotted this today on a system of ours that is running 7.4-p4. What information can I gather for this? I am not certain what a "page file," is however, so I am doubtful that I am running this. this is the error: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 67, size: 4096 Only one error at this point, though. System is still responsive, and no other issues have come up. Thanks! -jgh -- Jason Helfman FreeBSD Committer | http://people.freebsd.org/~jgh | The Power To Serve |
This bug was (and still is) noted on the 6.1-RELEASE Release Engineering "to do" page ("http://www.freebsd.org/releases/6.1R/todo.html"), but I'm filing this because I have been unable to find a PR for the issue, nor anyone else who does (I emailed Don Lewis about it a few days ago but have yet to hear back). With a page _file_ enabled on one of my servers (which routinely, if not pretty well always, has to use the page file), I occasionally get messages similar to these: ____ swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43638, size: 4096 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 43947, size: 32768 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 44007, size: 4096 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 47939, size: 32768 ____ Often enough, they're just "here and there", and nothing I worry too much about. However, they're often also relatively frequently accompanied by system lockups: The console will be chock full of messages similar to the above and the machine will have to be powered off nastily. The system is still responsive to pings during these lockups and the "swap_pager" messages do not make it to "/var/log/messages" The system in question has 64 MB of physical RAM, a 102 MB swap partition and a 128 MB swap file (I know this is a pretty shoddy configuration... unfortunately, the machine in question doesn't have an abundance of hard disk space at its disposal, and I don't have an abundance of money :-)) How-To-Repeat: If I run something particularly memory intensive such as a "make -j2 buildworld" I can generally reproduce the issue fairly reliably.