Bug 32295

Summary: [libc_r] [patch] pthread(3) dont dequeue signals
Product: Base System Reporter: lwa <lwa>
Component: threadsAssignee: freebsd-threads (Nobody) <threads>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.0-CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
uthread_sig.c.diff none

Description lwa 2001-11-26 16:30:00 UTC
  On heavy load, our MySQL servers sometime take about 100% CPU but
are still functionnal. I've traced the process. It's looping on the
following system calls :

% strace -p 261
gettimeofday({1006788477, 519930}, NULL) = 0
poll([{fd=3, events=POLLRDNORM, revents=POLLRDNORM}, {fd=1715, events=POLLRDNORM}, {fd=1633, events=POLLRDNORM}, {fd=1748, events=POLLRDNORM}, {fd=145, events=POLLRDNORM}, {fd=988, events=POLLRDNORM}, {fd=1734, events=POLLRDNORM}, {fd=313, events=POLLRDNORM}, (...) , 292, 7838) = 1

file descriptor 3 is checked but no treatment is done for it.

according lsof :

mysqld  261 mysql    3u  PIPE 0xd793ff20     16384        ->0xd793fe80, cnt=10798, in=10798

The pipe is used to handle signal in the threads
_thread_kern_pipe[0]. It's connected to _thread_kern_pipe[1] == 4. The
number of queued bytes is increasing every few seconds. I never see
him decrease. The server is in this state sin 2 days.

I've ktraced the process between to pipe queue grows :

   261 mysqld   CALL  poll(0x81eb000,0x154,0x2518)
   261 mysqld   PSIG  SIGALRM caught handler=0x282a92fc mask=0x0 code=0x0
   261 mysqld   RET   poll 1
   261 mysqld   CALL  write(0x4,0x81e1e5f,0x1)
   261 mysqld   GIO   fd 4 wrote 1 byte
       "\^N"
   261 mysqld   RET   write 1
   261 mysqld   CALL  sigreturn(0x81e1e7c)

My servers on high load are in production use, so I can't run them
into a debuger to pick more infos.

Fix: 

I'm not fluent with libc_r but I suspect bug in thread_kern_poll()
from libc_r/uthread/uthread_kern.c or _thread_sig_handler() in
uthread_sig.c.

In uthread_kern.c, _thread_kern_pipe[0] is added to the pool set but
signal are dequeued only when _sigq_check_reqd != 0. If no new signal
occurs it can loop forever.

_sigq_check_reqd is set to 1 in _thread_sig_handler() only when the
signal is not blocked but a byte is wroten in the pipe on every cases.

May be some race condition occurs, also.
How-To-Repeat: 
  Not easy to repeat. Make a lot of queies to a MySQL server, wait
some weeks until its happend.
Comment 1 Alan Judge 2002-02-05 14:27:48 UTC
We're seeing what appears to be exactly the same problem on
4.5-STABLE.  I also see talk on the MySQL mailing lists from other
FreeBSD users who appear to be hitting this problem.

We're load testing a new MySQL-based system.  Under some circumstances
(not easy to repeat, but we're looking), the CPU of the server will go
to 100% and stay there.  It remains responsive enough to queries and
goes back to normal on a restart.

ktrace shows a steady loop with the timeout slowly counting down:
 53446 mysqld   0.000009 CALL  poll(0x8229000,0x3,0x1d52)
 53446 mysqld   0.000009 RET   poll 1
 53446 mysqld   0.000009 CALL  gettimeofday(0x81cfc8c,0)
 53446 mysqld   0.000013 RET   gettimeofday 0
 53446 mysqld   0.000028 CALL  poll(0x8229000,0x3,0x1d51)
 53446 mysqld   0.000010 RET   poll 1
 53446 mysqld   0.000011 CALL  gettimeofday(0x81cfc8c,0)
 53446 mysqld   0.000013 RET   gettimeofday 0

I'll do some more digging next time it happens and also use strace,
since it seems to show more info.  Since we're not in production yet,
I can do some digging with a debugger when this next happens if anyone
has suggestions as to where to look.

Laurent, have you had any more info since you submitted this report?
--
Alan
Comment 2 lwa 2002-02-05 15:19:24 UTC
Alan Judge wrote:
> Laurent, have you had any more info since you submitted this report?

Not much. The problem hasn't been repeated on my servers since the
report. I've looked in MySQL source code, the SIGALRM signal sound to
be used to check time to time is the server or the connection hasn't
been killed. MySQL provide a CFLAGS/CXXFLAGS (described in the
INSTALL-SOURCE) named DONT_USE_THR_ALARM to avoid this, but some
feature are disabled when you use it.

It's not easy to debug random bugs on a multi-thread process with heavy
load :( but if you want to, you should seek why the signals are queued
but never flushed. Maybe the signal is blocked somewhere in MySQL or
maybe it's a bug in libc_r that does not advertise the application. I
guess it's the second option (you should compile libc_r with debugging
symbols to check here).
Comment 3 Kris Kennaway freebsd_committer freebsd_triage 2003-07-13 02:37:00 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-threads

Assign to threads mailing list
Comment 4 bel 2003-11-08 09:17:02 UTC
Hi!

I have a small patch for resolving this problem.
Patch tested on heavy load MySQL server 4.0.16.

------------------------------------------------------
--- src/lib/libc_r/uthread/uthread_sig.c.orig	Wed Mar  5 07:28:08 2003
+++ src/lib/libc_r/uthread/uthread_sig.c	Thu Nov  6 17:16:13 2003
@@ -160,8 +160,10 @@
 			_thread_sigq[sig - 1].signo = sig;
 			_sigq_check_reqd = 1;
 		}
-		else
+		else {
 			DBG_MSG("Got signal %d, ignored.\n", sig);
+			_sigq_check_reqd = 1;
+		}
 	}
 	/*
 	 * The signal handlers should have been installed so that they
------------------------------------------------------

With best regards,
 Andrew Belashov.
Comment 5 bel 2004-09-10 11:31:27 UTC
Hello, All!

About one year I use my patch for open PR bin/32295.
No problem detected on high load mysql server. Also, this patch
tested with mozilla and firefox.

Patch in attachment. Alternate location:

<http://www.orel.ru/~bel/patches/uthread_sig.c.diff>

Problem description:

In function _thread_sig_handler() received signal queued (by writing into pipe).

===============================================================================
                 if (_queue_signals != 0) {
                         __sys_write(_thread_kern_pipe[1], &c, 1);
                         DBG_MSG("Got signal %d, queueing to kernel pipe\n", sig);
                 }
===============================================================================

But flag _sigq_check_reqd (Check of queue of signals is required) is not touched
if signal should be ignored:

===============================================================================
                 if (_thread_sigq[sig - 1].blocked == 0) {
                         [.........]

                         /* Indicate that there are queued signals: */
                         _thread_sigq[sig - 1].pending = 1;
                         _sigq_check_reqd = 1;
                 }
                 /* These signals need special handling: */
                 else if (sig == SIGCHLD || sig == SIGTSTP ||
                     sig == SIGTTIN || sig == SIGTTOU) {
                         _thread_sigq[sig - 1].pending = 1;
                         _thread_sigq[sig - 1].signo = sig;
                         _sigq_check_reqd = 1;
                 }
                 else {
                         DBG_MSG("Got signal %d, ignored.\n", sig);
                 }
===============================================================================

In this situation, dequeue signal handler is not executed and signal
does not dequeued from pipe...

This is bug and patch should be commited.

--
With best regards,
Andrew Belashov.
Comment 6 Maxim Konovalov freebsd_committer freebsd_triage 2006-05-03 21:41:39 UTC
State Changed
From-To: open->suspended

In RELENG_5,6 and HEAD libc_r is deprecated in favour of  
libpthread and libthr.  Nobody is working on libc_r bugs  
so mark this PR as suspended.
Comment 7 Alexander Best freebsd_committer freebsd_triage 2010-11-13 19:55:38 UTC
State Changed
From-To: suspended->closed

libc_r was detached from lib/Makefile in r162846 (HEAD, 8.x, 7.x). So i'm 
closing this PR, because i don't think the issue also exists with libpthread or 
libthr. Please tell me if my assumption is wrong there.