| Summary: | [libc_r] [patch] pthread(3) dont dequeue signals | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | lwa <lwa> | ||||
| Component: | threads | Assignee: | freebsd-threads (Nobody) <threads> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Only Me | ||||||
| Priority: | Normal | ||||||
| Version: | 5.0-CURRENT | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
We're seeing what appears to be exactly the same problem on 4.5-STABLE. I also see talk on the MySQL mailing lists from other FreeBSD users who appear to be hitting this problem. We're load testing a new MySQL-based system. Under some circumstances (not easy to repeat, but we're looking), the CPU of the server will go to 100% and stay there. It remains responsive enough to queries and goes back to normal on a restart. ktrace shows a steady loop with the timeout slowly counting down: 53446 mysqld 0.000009 CALL poll(0x8229000,0x3,0x1d52) 53446 mysqld 0.000009 RET poll 1 53446 mysqld 0.000009 CALL gettimeofday(0x81cfc8c,0) 53446 mysqld 0.000013 RET gettimeofday 0 53446 mysqld 0.000028 CALL poll(0x8229000,0x3,0x1d51) 53446 mysqld 0.000010 RET poll 1 53446 mysqld 0.000011 CALL gettimeofday(0x81cfc8c,0) 53446 mysqld 0.000013 RET gettimeofday 0 I'll do some more digging next time it happens and also use strace, since it seems to show more info. Since we're not in production yet, I can do some digging with a debugger when this next happens if anyone has suggestions as to where to look. Laurent, have you had any more info since you submitted this report? -- Alan Alan Judge wrote:
> Laurent, have you had any more info since you submitted this report?
Not much. The problem hasn't been repeated on my servers since the
report. I've looked in MySQL source code, the SIGALRM signal sound to
be used to check time to time is the server or the connection hasn't
been killed. MySQL provide a CFLAGS/CXXFLAGS (described in the
INSTALL-SOURCE) named DONT_USE_THR_ALARM to avoid this, but some
feature are disabled when you use it.
It's not easy to debug random bugs on a multi-thread process with heavy
load :( but if you want to, you should seek why the signals are queued
but never flushed. Maybe the signal is blocked somewhere in MySQL or
maybe it's a bug in libc_r that does not advertise the application. I
guess it's the second option (you should compile libc_r with debugging
symbols to check here).
Responsible Changed From-To: freebsd-bugs->freebsd-threads Assign to threads mailing list Hi!
I have a small patch for resolving this problem.
Patch tested on heavy load MySQL server 4.0.16.
------------------------------------------------------
--- src/lib/libc_r/uthread/uthread_sig.c.orig Wed Mar 5 07:28:08 2003
+++ src/lib/libc_r/uthread/uthread_sig.c Thu Nov 6 17:16:13 2003
@@ -160,8 +160,10 @@
_thread_sigq[sig - 1].signo = sig;
_sigq_check_reqd = 1;
}
- else
+ else {
DBG_MSG("Got signal %d, ignored.\n", sig);
+ _sigq_check_reqd = 1;
+ }
}
/*
* The signal handlers should have been installed so that they
------------------------------------------------------
With best regards,
Andrew Belashov.
Hello, All! About one year I use my patch for open PR bin/32295. No problem detected on high load mysql server. Also, this patch tested with mozilla and firefox. Patch in attachment. Alternate location: <http://www.orel.ru/~bel/patches/uthread_sig.c.diff> Problem description: In function _thread_sig_handler() received signal queued (by writing into pipe). =============================================================================== if (_queue_signals != 0) { __sys_write(_thread_kern_pipe[1], &c, 1); DBG_MSG("Got signal %d, queueing to kernel pipe\n", sig); } =============================================================================== But flag _sigq_check_reqd (Check of queue of signals is required) is not touched if signal should be ignored: =============================================================================== if (_thread_sigq[sig - 1].blocked == 0) { [.........] /* Indicate that there are queued signals: */ _thread_sigq[sig - 1].pending = 1; _sigq_check_reqd = 1; } /* These signals need special handling: */ else if (sig == SIGCHLD || sig == SIGTSTP || sig == SIGTTIN || sig == SIGTTOU) { _thread_sigq[sig - 1].pending = 1; _thread_sigq[sig - 1].signo = sig; _sigq_check_reqd = 1; } else { DBG_MSG("Got signal %d, ignored.\n", sig); } =============================================================================== In this situation, dequeue signal handler is not executed and signal does not dequeued from pipe... This is bug and patch should be commited. -- With best regards, Andrew Belashov. State Changed From-To: open->suspended In RELENG_5,6 and HEAD libc_r is deprecated in favour of libpthread and libthr. Nobody is working on libc_r bugs so mark this PR as suspended. State Changed From-To: suspended->closed libc_r was detached from lib/Makefile in r162846 (HEAD, 8.x, 7.x). So i'm closing this PR, because i don't think the issue also exists with libpthread or libthr. Please tell me if my assumption is wrong there. |
On heavy load, our MySQL servers sometime take about 100% CPU but are still functionnal. I've traced the process. It's looping on the following system calls : % strace -p 261 gettimeofday({1006788477, 519930}, NULL) = 0 poll([{fd=3, events=POLLRDNORM, revents=POLLRDNORM}, {fd=1715, events=POLLRDNORM}, {fd=1633, events=POLLRDNORM}, {fd=1748, events=POLLRDNORM}, {fd=145, events=POLLRDNORM}, {fd=988, events=POLLRDNORM}, {fd=1734, events=POLLRDNORM}, {fd=313, events=POLLRDNORM}, (...) , 292, 7838) = 1 file descriptor 3 is checked but no treatment is done for it. according lsof : mysqld 261 mysql 3u PIPE 0xd793ff20 16384 ->0xd793fe80, cnt=10798, in=10798 The pipe is used to handle signal in the threads _thread_kern_pipe[0]. It's connected to _thread_kern_pipe[1] == 4. The number of queued bytes is increasing every few seconds. I never see him decrease. The server is in this state sin 2 days. I've ktraced the process between to pipe queue grows : 261 mysqld CALL poll(0x81eb000,0x154,0x2518) 261 mysqld PSIG SIGALRM caught handler=0x282a92fc mask=0x0 code=0x0 261 mysqld RET poll 1 261 mysqld CALL write(0x4,0x81e1e5f,0x1) 261 mysqld GIO fd 4 wrote 1 byte "\^N" 261 mysqld RET write 1 261 mysqld CALL sigreturn(0x81e1e7c) My servers on high load are in production use, so I can't run them into a debuger to pick more infos. Fix: I'm not fluent with libc_r but I suspect bug in thread_kern_poll() from libc_r/uthread/uthread_kern.c or _thread_sig_handler() in uthread_sig.c. In uthread_kern.c, _thread_kern_pipe[0] is added to the pool set but signal are dequeued only when _sigq_check_reqd != 0. If no new signal occurs it can loop forever. _sigq_check_reqd is set to 1 in _thread_sig_handler() only when the signal is not blocked but a byte is wroten in the pipe on every cases. May be some race condition occurs, also. How-To-Repeat: Not easy to repeat. Make a lot of queies to a MySQL server, wait some weeks until its happend.