Bug 220971

Summary:	Freebsd 11.0p11 - system freeze on intensive I/O
Product:	Base System	Reporter:	Gautam Mani <execve>
Component:	kern	Assignee:	freebsd-bugs (Nobody) <bugs>
Status:	New ---
Severity:	Affects Some People	CC:	marklmi26-fbsd
Priority:	---
Version:	11.0-RELEASE
Hardware:	amd64
OS:	Any
See Also:	https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048

Description Gautam Mani 2017-07-24 16:33:23 UTC

Hi,

I am duplicating my text from email to freebsd-questions and also indicating clues from my investigation so far.

I came back to using FreeBSD after many years on my laptop. The machine is currently in a dual-boot configuration and I basically replaced FreeBSD to replace the earlier Ubuntu installation. Machine is at 11.0-RELEASE updated to p11 via freebsd-update.

I encountered a similar outcome of total system freeze in two kinds of usage. State of the system:
1) X is not usable - I use xfce - no login manager
2) I cannot ssh into the box - I do not get the username or password prompt - connection just times out
3) the network interface is ping-able.
4) I am not able to switch to the system consoles using Ctrl-Alt-F1..8
5) No mouse movement nor screen update
6) I suspected an issue with soft-update and then turned that off. Filesystem is ufs. fsck is clean.

Recovery: hard-boot, system comes up, fsck happens, some errors fixed and resumes working normally.

The problem became visible in two kind of usage scenarios:
1) Running the backup port duplicity to create a backup of the / filesystem. It would start but at some point but then get stuck. Running it in verbose mode would sometimes indicate that this would happen when the write to the volume (default setting of 200M) happens. This was tried 4-5 times.
2) running split on a 6.4G file (filesystem dump of disk using dump) -- something like
split -d -b 200M -a 4 - part
This would then freeze at one point - making the system unusable. I tried this 2-3 times.

I finally got it to work using idprio 31 before the split command. Tried this only once - havent tried it with the duplicity command.

The machine has 8GB RAM and is clearly not reaching the out of memory kind of situation - basically only about 1.1 or so GB is used.

I also ran this from the system console without X and faced the same issue - no panic message - nothing in the logs as well.

I got a clue from my further searching on the freebsd mailing lists and forums.

It has to do with the swap file. I dont have a swap partition since I just put the freebsd root over the ubuntu partition. And I had created a
swap file based on the instructions in 11.12.2 Creating a swap file.

https://www.freebsd.org/doc/handbook/adding-swap-space.html

I got a clue from the freebsd-forums:

https://forums.freebsd.org/threads/58266/

If the link is down (for some reason, I see the server is down quite often these days), use the below cached link:

https://webcache.googleusercontent.com/search?q=cache:y1bJLmSEjWUJ:https://forums.freebsd.org/threads/58266/+&cd=2&hl=en&ct=clnk&gl=in

So, I just did a swapoff -La and then ran the split command again - no issues whatsover!

Some idea about the configuration can be had by looking at the information below; please let me know if any further logs / debugging information needed.

$ freebsd-version
11.0-RELEASE-p11

$ uname -a
FreeBSD mellon 11.0-RELEASE-p9 FreeBSD 11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

CPU: Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz (2195.06-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x206a7 Family=0x6 Model=0x2a Stepping=7
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x1dbae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,XSAVE,OSXSAVE,AVX>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1<LAHF>
XSAVE Features=0x1<XSAVEOPT>
VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 8589934592 (8192 MB)
avail memory = 8172896256 (7794 MB)

Comment 1 Mark Millard 2017-08-09 09:06:39 UTC

See bugzilla 206048 for more examples and notes.

It is a long term issue, not new with 11.x .

This submittal is a duplicate. There may be others
besides 206048.

Comment 2 Gautam Mani 2017-08-12 05:17:28 UTC

One update on this one. I installed a custom stable/11 kernel and world (sched = 4BSD was the only change), and the problem
is no longer seen. From my untrained eyes, it looks like some kind of swap request starvation causing a hang when ULE is in use.


root@mellon:~ # uname -a
FreeBSD mellon 11.1-STABLE FreeBSD 11.1-STABLE #0 r313908+14aefcc16ee(stable/11): Sat Aug 12 00:33:04 IST 2017     root@mellon:/usr/obj/usr/home/user1/src/freebsd/sys/MYKERNEL  amd64

Comment 3 Mark Millard 2017-08-12 07:06:24 UTC

(In reply to execve from comment #2)

I suggest the test of using the port sysutils/stress
and trying:

stress -d 2 -m 3 --vm-keep

I'd be interested to know if a swap file context
handles that.

Comment 4 Gautam Mani 2017-08-12 07:39:08 UTC

Yes, it seems to hold itself together fine for more than 10 minutes :).

root@mellon:~ # swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/md99         8388608        0  8388608     0%
root@mellon:~ # sysctl hw.physmem
hw.physmem: 8463953920
root@mellon:~ # date
Sat Aug 12 12:55:36 IST 2017
root@mellon:~ # stress -d 2 -m 3 --vm-keep
stress: info: [11727] dispatching hogs: 0 cpu, 0 io, 3 vm, 2 hdd
^C
root@mellon:~ # date
Sat Aug 12 13:06:37 IST 2017
root@mellon:~ # uname -a
FreeBSD mellon 11.1-STABLE FreeBSD 11.1-STABLE #0 r313908+14aefcc16ee(stable/11): Sat Aug 12 00:33:04 IST 2017     root@mellon:/usr/obj/usr/home/user1/src/freebsd/sys/MYKERNEL  amd64

Comment 5 Mark Millard 2017-08-12 10:58:29 UTC

(In reply to execve from comment #4)

Cool. (I suppose top or some such could
be used to confirm the expected activity,
given the amount of RAM and other such
context.)

I wonder what it would do without the
"sched = 4BSD was the only change".
(Historically 11.x has been a problem
but likely all the examples had not
adjusted that.)

Comment 6 Mark Millard 2017-08-12 11:09:26 UTC

(In reply to execve from comment #4)

FYI since you have more RAM than the original
context for that stress command I'll quote
from the man page:

       -m, --vm N
              spawn N workers spinning on malloc()/free()

       --vm-bytes B
              malloc B bytes per vm worker (default is 256MB)

       -d, --hdd N
              spawn N workers spinning on write()/unlink()

       --vm-keep
              redirty memory instead of freeing and reallocating

So:

stress -d 2 -m 3 --vm-keep

is only doing 3*256MB = 768MB of VM use.

That was a large percentage of the 1GB of RAM that
the related bugzilla 206048 indicated as the context
for the command. It is not that much of around
8GiBytes of RAM.

Comment 7 Gautam Mani 2017-08-12 18:00:29 UTC

I do not think there is any need to increase the memory usage. Like I mentioned in the original PR description, even without X running on the same system with 8GB RAM, I could reproduce this using a split command on a 6-7GB file via the console. 

>> running split on a 6.4G file (filesystem dump of disk using dump) -- something like 
>> split -d -b 200M -a 4 - part 
>> This would then freeze at one point - making the system unusable. I tried this 2-3 times. 

It is very clear there is an issue - and from my experience it is narrowed to when the ULE scheduler and swapfile is in use.

Comment 8 Mark Millard 2017-08-12 22:35:17 UTC

(In reply to execve from comment #7)

I tried a couple of variations of the experiment
that I suggested. Unfortunately the results are
a little complicated to interpret.

Context: under virtualbox
(on Windows 10 Pro) with. . .
(Bugzilla 206048 has pointed out
reproducibility under virtual
machines.)

FreeBSDx64OPC11S# uname -apKU
FreeBSD FreeBSDx64OPC11S 11.1-STABLE FreeBSD 11.1-STABLE  r322433M  amd64 amd64 1101501 1101501

# svnlite diff /usr/src/
Index: /usr/src/sys/amd64/conf/GENERIC
===================================================================
--- /usr/src/sys/amd64/conf/GENERIC	(revision 322433)
+++ /usr/src/sys/amd64/conf/GENERIC	(working copy)
@@ -24,7 +24,8 @@
 makeoptions	DEBUG=-g		# Build kernel with gdb(1) debug symbols
 makeoptions	WITH_CTF=1		# Run ctfconvert(1) for DTrace support
 
-options 	SCHED_ULE		# ULE scheduler
+#options 	SCHED_ULE		# ULE scheduler
+options 	SCHED_4BSD		# 4BSD scheduler
 options 	PREEMPTION		# Enable kernel thread preemption
 options 	INET			# InterNETworking
 options 	INET6			# IPv6 communications protocols

I tried:

4 processors and 1 GiBYte of RAM assigned
using: stress -d 2 -m 3 --vm-keep

and separately:

8 processors and 1 GiByte of RAM assigned
using: stress -d 6 -m 3 --vm-keep

I had a top -Cawopid running in each
case with its own ssh into the virtual
machine. stress was via ssh as well.

In the 2nd case I got to a lock-up: top
stopped updating and input was ignored
to both the ssh's (top and stress) and
the console window, including input
such as ^C and ^T .

The console window did eventually show:

swap_pager: I/O error - pageout failed; blkno 7367,size 4096, error 12

(After seeing that I waited a while longer but I gave up
on waiting and eventually killed the virtual machine.)

I later found a list message reporting about such
"error 12" variants of the message:

QUOTE
> I think it might be ENOMEM from a geom when trying to g_clone_bio.
. . .
It shouldn't happen, but you should notice no ill effects (that is, the
page isn't lost, it just wasn't paged out and there's a few bytes less
that the pager could do at the moment).
END QUOTE.

As for the lock-up structure. . .

Unfortunately top did not happen to update showing any
of the lock up structure in other processes before
it locked up.

It does at least appear not as easy to get a lock-up
(or get ENOMEM and failure to page out) with
SCHED_4BSD (to the degree that just a couple of tests
indicate anything about such). But getting stuck
appears possible and pageout's can fail to happen
for lack of memory, or so it appears.

Comment 9 Mark Millard 2017-08-12 22:47:09 UTC

(In reply to Mark Millard from comment #8)

I should also have said:

The Windows 10 Task Manager Performance
tab display of CPU usage on threads/cores
suggested a possible live-lock instead
of a dead-lock: 7 of 8 "processors"
(in virtualbox terms) fairly busy but
not any 8th one being noticeably busy.

(Windows 10 Pro was not otherwise busy
in any sustained way.)

But technically I can not prove which of:

lack of overall progress
vs.
very slow overall progress

off my evidence.