midnight (gmt) maint runs on a number of 8-current systems. has been going on for weeks. swap_pager_getswapspace(16): failed swap_pager_getswapspace(16): failed swap_pager_getswapspace(16): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(2): failed many of these and then system either recovers or is dead to serial console. command prompt remains on ssh shells, but type a command and the shell is locked. system is pingable but non-responsive. the below are from when running normally in the case where it has recovered. work0.psg.com:/root# swapinfo Device 1024-blocks Used Avail Capacity /dev/mirror/bootb 8382708 307416 8075292 4% work0.psg.com:/root# df -m Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/mirror/boota 7931 603 6693 8% / devfs 0 0 0 100% /dev procfs 0 0 0 100% /proc tank/data 663002 0 663002 0% /data tank/data/nfsen 828448 165446 663002 20% /data/nfsen tank/data/rpki 663179 177 663002 0% /data/rpki tank 663002 0 663002 0% /tank tank/usr 668072 5069 663002 1% /usr tank/usr/home 669874 6871 663002 1% /usr/home tank/usr/usr 668320 5317 663002 1% /usr/usr tank/var 663109 106 663002 0% /var tank/var/log 663105 103 663002 0% /var/log tank/var/spool 663171 168 663002 0% /var/spool tank/var/spool/var 663002 0 663002 0% /var/spool/var tank/var/spool/var/spool 663016 13 663002 0% /var/spool/var/spool /dev/md0 247 0 227 0% /tmp devfs 0 0 0 100% /data/rpki/rcynic/dev tank/usr@backup 668580 5578 663002 1% /usr/.zfs/snapshot/backup tank/usr/usr@2009-03-21-00-49-06 668341 5338 663002 1% /usr/usr/.zfs/snapshot/2009-03-21-00-49-06 tank/var/spool/var/spool@2009-03-21-02-06-08 663016 13 663002 0% /var/spool/var/spool/.zfs/snapshot/2009-03-21-02-06-08 work0.psg.com:/root# cat /etc/fstab # Device Mountpoint FStype Options Dump Pass# # /dev/mirror/boota / ufs rw 1 1 # /dev/mirror/bootb none swap sw 0 0 # /dev/acd0 /cdrom cd9660 ro,noauto 0 0 # proc /proc procfs rw 0 0 #linprocfs /compat/linux/proc linprocfs rw 0 0 # # end work0.psg.com:/root# top -bores last pid: 29419; load averages: 1.03, 0.46, 0.17 up 6+06:12:06 12:46:21 147 processes: 3 running, 144 sleeping Mem: 202M Active, 39M Inact, 1173M Wired, 2756K Cache, 39M Buf, 2540M Free Swap: 8186M Total, 300M Used, 7886M Free, 3% Inuse PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1034 ejabberd 2 44 0 124M 51244K ucond 1 7:52 0.00% beam 803 unbound 1 44 0 36420K 24988K select 0 6:05 0.00% unbound 1370 root 1 76 0 225M 22056K nanslp 1 135:43 0.00% perl5.8.9 1229 mysql 9 44 0 66984K 21428K sigwai 1 4:05 0.00% mysqld 29159 www 1 47 0 98M 12824K lockf 0 0:01 0.00% httpd 28447 www 1 44 0 99328K 11968K lockf 0 0:01 0.00% httpd 29081 www 1 44 0 99328K 11836K lockf 0 0:01 0.00% httpd 29217 www 1 45 0 99328K 11600K lockf 1 0:00 1.27% httpd 27072 www 1 44 0 98304K 11064K lockf 0 0:01 0.00% httpd 29018 www 1 44 0 98304K 10912K kqread 1 0:00 0.00% httpd 29141 www 1 44 0 98304K 10840K lockf 0 0:00 0.00% httpd 37090 root 19 44 0 170M 8492K select 0 11:33 0.00% asterisk 28098 root 1 48 0 10528K 6592K select 0 0:17 7.08% cvsup 29222 www 1 44 0 96256K 5700K lockf 0 0:00 0.00% httpd 29306 www 1 44 0 96256K 5624K lockf 0 0:00 0.00% httpd 29337 www 1 44 0 96256K 4536K lockf 0 0:00 0.00% httpd 28072 root 1 44 0 17764K 3316K select 1 0:00 0.00% sshd 1302 root 1 44 0 96256K 2616K select 0 0:15 0.00% httpd work0.psg.com:/root# vmstat -h procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id 0 0 10 4503M 2628M 219 0 0 2 470 423 0 0 205 2341 4388 2 1 97 How-To-Repeat: wait for midnight work0.psg.com:/usr/src# cat /etc/daily.local #!/bin/sh # # # reset ipfw filter counts # ipfw -f resetlog ipfw -f zero # # mail and ntp # ( /usr/bin/ntpq -c peers ; echo ; echo ; \ echo irrd mirror requestors ; \ /usr/bin/gunzip -c /var/log/irrd/irrd.log.0.gz \ | for i in `grep 'rror request' | awk '{print $10}' | sort | uniq`; do \ host $i | awk '{print " " $5}'; done \ ; echo ; echo ; \ /usr/local/sbin/eximstats /var/spool/exim/log/main) \ | Mail -s "`hostname` ntp/mail log report" postmaster /usr/local/sbin/exicyclog #/usr/bin/gzip -9 main.01 # # system log # /usr/bin/gunzip -c /var/log/messages.0.gz | \ /usr/bin/egrep -iv '(last message repeated|logfile turned over|PAM: authentication error|reverse map|sshd.*(Did not receive identification string|Disabling protocol|does not map back|(ftp|uucp) not allowed|Invalid user|Failed password|accepted|connection closed|received disconnect|SSH: Server)|sshguard)' | \ /usr/bin/Mail -s "`hostname` System Log" root /usr/bin/gunzip -c /var/log/messages.0.gz | \ /usr/bin/egrep 'sshd.*((ftp|uucp) not allowed|Invalid user|Failed password)|sshguard' | \ /usr/bin/Mail -s "`hostname` attack Log" root #
bug is not restricted to amd64. here it is on an i386 system rip1.psg.com:/root# uname -a FreeBSD rip1.psg.com 8.0-CURRENT FreeBSD 8.0-CURRENT #9: Sun Apr 5 09:32:56 GMT 2009 root@rip1.psg.com:/usr/obj/usr/src/sys/RIP1 i386 Apr 25 00:07:46 rip1 kernel: swap_pager_getswapspace(16): failed Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(12): failed Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(6): failed Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(16): failed Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(12): failed Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(9): failed Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(5): failed Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(16): failed Apr 25 00:07:47 rip1 kernel: pid 40337 (exim_tidydb), uid 0, was killed: out of swap space
Responsible Changed From-To: freebsd-amd64->freebsd-bugs Not specific to amd64.
this time it actually said something interesting on console! for some values of 'interesting' :) randy --- swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapsp a lcatale t(ra1p 612): : pafgae ifaullte wdhi e sin wkearnepl m_odpe acpugied =r 1_; agpiec itd s= w01a afasulpt vairtcuael (ad3dre)ss := 0x0f odfalulet dc e s =w saupeprv_isopr awgrieter _dagtae, ptagse nowta ppressepnat 0inest(ru3ct)i:on pfoinateir l= e0dx2 :0sxfwfaffpff_ffp80a47gc25e6 rst_acgk eptoinsterw a p s =p 0ax2c8:0exf(3ff)ff:f80 7f9fd1a680i poieadme ntserw a p _ =p 0xag28e:0rxf_fffgff8e07t9fsd16we0 0xdep ssegpmenta c= beas(e 30x0), :lim itf 0xaffifflf,e tydpe 1bs w =a DpPL _0, ppraesg 1,e lorng_ 1g, deef3t2 s0,w garapn s1 ledaocceseso(r 3ef)la:gs =f ianteirrulpt eendab , rseswumae, pIOP_L p= a0 gcuerrren_t pgroecests s = w789a (psysslpogad) ctreap( n9umb)er: = f12 epainilc:e pdag fsauwlta pcp_uipd =a 1g eUptime: 9h50m49s Physical memory: 4083 MB Dumping 1958 MB: <end. required power cycle>
a bit better in last night's syslog, possibly during backup May 30 00:40:14 work0 kernel: lock order reversal: May 30 00:40:14 work0 kernel: 1st 0xffffff0057d019d0 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:423 May 30 00:40:14 work0 kernel: 2nd 0xffffff8052c01aa0 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2556 May 30 00:40:14 work0 kernel: 3rd 0xffffff0004b8d098 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:544 May 30 00:40:16 work0 kernel: lock order reversal: May 30 00:40:16 work0 kernel: 1st 0xffffff8052c01aa0 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2556 May 30 00:40:16 work0 kernel: 2nd 0xffffff00d35c7d30 snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:793 May 30 00:50:14 work0 kernel: lock order reversal: May 30 00:50:14 work0 kernel: 1st 0xffffff00d35c7d30 snaplk (snaplk) @ /usr/src/sys/kern/vfs_vnops.c:297 May 30 00:50:14 work0 kernel: 2nd 0xffffff0057d019d May 30 00:50:14 work0 kernel: 0 ufs (ufs) @ /u May 30 00:50:14 work0 kernel: s May 30 00:50:15 work0 kernel: r/src/sys/ May 30 00:50:15 work0 kernel: ufs/ffs/ May 30 00:50:15 work0 kernel: ffs_snap May 30 00:50:15 work0 kernel: shot.c: May 30 00:50:15 work0 kernel: 1587 May 30 01:45:21 work0 kernel: May 30 01:45:21 work0 kernel: May 30 01:45:21 work0 kernel: Fatal trap 12: page fault while in kernel mode May 30 01:45:21 work0 kernel: cpuid = 0; apic id = 00 May 30 01:45:21 work0 kernel: fault virtual address = 0x0 May 30 01:45:21 work0 kernel: fault code = supervisor write data, page not present May 30 01:45:21 work0 kernel: instruction pointer = 0x20:0xffffffff8047c256 May 30 01:45:21 work0 kernel: sta May 30 01:45:21 work0 kernel: c May 30 01:45:21 work0 kernel: k pointer = 0x28:0xffffff807a057680 May 30 01:45:21 work0 kernel: frame pointer = 0x28:0xffffff807a0576e0 May 30 01:45:21 work0 kernel: code segment = base rx0, limit 0xfffff, type 0x1b May 30 01:45:21 work0 kernel: = DPL 0, pres 1, long 1, def32 0, gran May 30 01:45:21 work0 kernel: 1 May 30 01:45:21 work0 kernel: processor eflags = interrup May 30 01:45:21 work0 kernel: t enabled, resume, May 30 01:45:21 work0 kernel: IOPL = 0 May 30 01:45:21 work0 kernel: current process May 30 01:45:21 work0 kernel: = 9181 (nfcapd) <required power cycle> May 30 02:10:04 work0 syslogd: kernel boot file is /boot/kernel/kernel
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
randy, is this still an issue?
not that i am seeing
^Triage: Close based on comment 7