| Summary: | /bin/sh coredump | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Oliver Fromme <olli> |
| Component: | bin | Assignee: | Stefan Farfeleder <stefanf> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | CC: | olli |
| Priority: | Normal | ||
| Version: | 4.7-RELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
Oliver Fromme
2002-11-19 12:50:01 UTC
Responsible Changed From-To: freebsd-bugs->tjr I believe this is caused by the SIGINT handler longjmp()'ing out when it's in the middle of a malloc() call. Calls to malloc() and free() should be bracketed in INTON and INTOFF. I haven't had much luck tracking this down in the past, but I'll try again to find the missing INTON/INTOFF. Responsible Changed From-To: tjr->freebsd-bugs Unassign due to lack of time and interest. Perhaps someone else will pick this up. On 2002-11-19 13:43, Oliver Fromme <olli@secnetix.de> wrote: > Responsible-Changed-By: tjr > Responsible-Changed-Why: > I believe this is caused by the SIGINT handler longjmp()'ing > out when it's in the middle of a malloc() call. Calls to malloc() > and free() should be bracketed in INTON and INTOFF. > > I haven't had much luck tracking this down in the past, but > I'll try again to find the missing INTON/INTOFF. I just happened to stumble upon this bug today. It's still with us in FreeBSD 6.0-CURRENT. It seems that the inner for loop in the following: while for true; do false; done; do true; done is not stopped by sh(1) when ^C is hit. Even after the interrupt is received, sh consumes at least 5-15% of CPU on my test here, while it appears to be sitting at a PS1 prompt, waiting for more input. PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 2352 keramida 1 5 0 1668K 1192K ttyin 0:03 25.48% 10.79% sh After a few of these commands have been run, sh may reach CPU utilizations of even more: PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 2352 keramida 1 123 0 1672K 1196K RUN 1:11 63.21% 63.18% sh On 2005-04-15 18:52, Giorgos Keramidas <keramida@freebsd.org> wrote: > On 2002-11-19 13:43, Oliver Fromme <olli@secnetix.de> wrote: > > Responsible-Changed-By: tjr > > Responsible-Changed-Why: > > I believe this is caused by the SIGINT handler longjmp()'ing > > out when it's in the middle of a malloc() call. Calls to malloc() > > and free() should be bracketed in INTON and INTOFF. > > > > I haven't had much luck tracking this down in the past, but > > I'll try again to find the missing INTON/INTOFF. > > I just happened to stumble upon this bug today. I managed to get sh to print "Out of space" after a few more invocations, and here's the backtrace I get either with gcore or by sending a SEGV to the process (there's no other way to stop it from printing infinite numbers of "Out of space" error messages): : (gdb) bt : #0 0x2811f2e3 in write () at write.S:2 : #1 0x0805733d in xwrite (fd=2, buf=0x806a000 "Out of space\namida/Mailbox", nbytes=13) : at output.c:318 : #2 0x080573b4 in flushout (dest=0x806132c) at output.c:206 : #3 0x08057418 in flushall () at output.c:196 : #4 0x0804c733 in exverror (cond=1, msg=0x805eb68 "Out of space", : ap=0xbfbfe7f4 "4è¿¿\031\222\006(Øó\a(Üò\021(ô\001") at error.c:156 : #5 0x0804c787 in error (msg=0x806a000 "Out of space\namida/Mailbox") at error.c:166 : #6 0x0805555c in ckmalloc (nbytes=500) at memalloc.c:61 : #7 0x0805560d in stalloc (nbytes=496) at memalloc.c:132 : #8 0x080557ad in growstackblock () at memalloc.c:247 : #9 0x0804e1f1 in padvance (path=0xbfbfe8ac, name=0x806320c "") at exec.c:192 : #10 0x08054d38 in chkmail (silent=0) at mail.c:88 : #11 0x08054f92 in cmdloop (top=1) at main.c:213 : #12 0x08055138 in main (argc=1, argv=0xbfbfea40) at main.c:183 I don't know if this helps track down the problem though. If anyone with more sh-clue wants me to send the core file or post more data out of it, please ask. I have reproduced this and have a fix. It does seem to have been
INTON/INTOFF. To find where the signal was being handled inside malloc,
in gdb you can do
handle SIGINT nostop pass
(answer yes)
break exraise
command
where
continue
end
run
which will print a stack trace every time it handles the signal. Then you
look for malloc inside the trace.
I found the following offenders:
#11 0x0805bba7 in redirect (redir=0x0, flags=1) at redir.c:111
#11 0x0805e1a3 in setvar (name=0x80dd76e "_", val=0x810d0ac "true",
flags=0) at var.c:246
However, a casual glance revealed several other un-wrapped calls. 90% of
these are made via ckmalloc/ckrealloc/ckfree. Thus I think the easiest
and safest thing to do is to call INTON/INTOFF inside these functions.
They are safe to use recursively and have trivial overhead (incrementing a
counter). Here is a patch which does this. It is against CURRENT but I
imagine it will apply to other versions as well. After this patch I
cannot make it crash anymore.
I found only one unsafe use of malloc/free directly: in the umask builtin.
umaskcmd() calls setmode() which calls malloc, and then calls free. My
patch puts INTON/INTOFF around this as well.
By the way, I think the CPU usage thing is a red herring. I assume the
numbers are from top, and its CPU field is an average. It definitely
decays slowly to 0 after a process has been running, so it can report
nonzero even for a totally idle process. sh gets commands from the user
with a blocking read, so it must be idle while waiting for input.
diff -ur /usr/src/bin/sh/memalloc.c ./src/bin/sh/memalloc.c
--- /usr/src/bin/sh/memalloc.c Tue Apr 6 13:06:51 2004
+++ ./src/bin/sh/memalloc.c Thu Oct 13 14:02:30 2005
@@ -57,8 +57,10 @@
{
pointer p;
+ INTOFF;
if ((p = malloc(nbytes)) == NULL)
error("Out of space");
+ INTON;
return p;
}
@@ -70,11 +72,20 @@
pointer
ckrealloc(pointer p, int nbytes)
{
+ INTOFF;
if ((p = realloc(p, nbytes)) == NULL)
error("Out of space");
+ INTON;
return p;
}
+void
+ckfree(pointer p)
+{
+ INTOFF;
+ free(p);
+ INTON;
+}
/*
* Make a copy of a string in safe storage.
diff -ur /usr/src/bin/sh/memalloc.h ./src/bin/sh/memalloc.h
--- /usr/src/bin/sh/memalloc.h Tue Apr 6 13:06:51 2004
+++ ./src/bin/sh/memalloc.h Thu Oct 13 14:01:27 2005
@@ -48,6 +48,7 @@
pointer ckmalloc(int);
pointer ckrealloc(pointer, int);
+void ckfree(pointer);
char *savestr(char *);
pointer stalloc(int);
void stunalloc(pointer);
@@ -72,5 +73,3 @@
#define STTOPC(p) p[-1]
#define STADJUST(amount, p) (p += (amount), sstrnleft -= (amount))
#define grabstackstr(p) stalloc(stackblocksize() - sstrnleft)
-
-#define ckfree(p) free((pointer)(p))
diff -ur /usr/src/bin/sh/miscbltin.c ./src/bin/sh/miscbltin.c
--- /usr/src/bin/sh/miscbltin.c Fri Sep 9 12:59:41 2005
+++ ./src/bin/sh/miscbltin.c Thu Oct 13 14:00:13 2005
@@ -274,12 +274,14 @@
umask(mask);
} else {
void *set;
+ INTOFF;
if ((set = setmode (ap)) == 0)
error("Illegal number: %s", ap);
mask = getmode (set, ~mask & 0777);
umask(~mask & 0777);
free(set);
+ INTON;
}
}
return 0;
--
Nate Eldredge
nge@cs.hmc.edu
Responsible Changed From-To: freebsd-bugs->stefanf I'll try to fix that one. I have applied Nate Eldredge's patch. With that patch, I couldn't reproduce the problem anymore. Thank you very much, Nate! I would vote for committing this. -- State Changed From-To: open->patched I committed a modified version of this patch to HEAD. Thanks. State Changed
From-To: patched->closed
Also fixed in RELENG_{5,6}.
|