This report is about a problem occurring on a dedicated database slave server which runs five instances of MySQL inside jails. The server has run flawlessly for a about six months under FreeBSD 7.0. The server: Dell R905 w/ 2x Opteron 8347 HE 64 GB of RAM @ 533 MHz Mysql data on RAID-0 consisting of 4x Intel X25E connected to 256 MB PERC6i Mysqld processes running in jails Mysql data directories null mounted from SSD into jail MySQL 5.0.51a compiled WITH_OPTIMIZED and WITH_PROC_SCOPE_PTH MyISAM tables The SSDs are recent, but the server has run for almost two months prior to the 7.2 upgrade with occurrence of this problem. Since upgrading this machine to FreeBSD 7.2, on three separate occasions, individual queries against a particular jailed mysqld process have locked up while copying to a temporary table. dmesg and kernel config are attached. Only the one query locks up at a time, but since this is a replication slave, the read lock on the table brings replication to a halt. Other read-only queries proceed normally. We've seen the lock-up last over a day in our testing before we gave up on it. The locked thread doesn't go away when KILLed. We end up having to kill -9 the mysqld and run myisamchk on the tables. Nothing less seems to break the deadlock. Fix: Patch attached with submission follows: How-To-Repeat: Since this is only happening on one of our data sets, under fairly high load, and intermittently at that, it might be more practical for us to collect and provide data which might help diagnose the problem.
A few bits of follow-up information. This machine was using SCHED_ULE under 7.0, at which time it operated flawlessly. We have other machines running ULE under 7.0 and 7.1. No problems at all prior to 7.2. The first two of these lockups occurred while the machine was running custom packages built on 7.0-RELEASE. Now it's running equivalent custom packages built directly on 7.2, but the lockup has recurred anyway. Also, s/WITH_OPTIMIZED/BUILD_OPTIMIZED/. -nick -- nick@desert.net - all messages cryptographically signed
Responsible Changed From-To: freebsd-ports-bugs->ale Over to maintainer (via the GNATS Auto Assign Tool)
Upgraded to 7.2-RELEASE-p2, to see if that would help. Actually, the wedge-up happened even sooner after the upgrade, a matter of hours rather than a matter of days. Of course that could be due to other factors than the upgrade. The last two times, the MySQL client thread has been in the state "Sending data". I ran a tcpdump for a few hours on one of the stuck connections, and saw literally one packet on that particular connection during that time. It actually seems like this might be more of a kernel threading/ locking issue. Should this bug be assigned to a different category? If we can't find a resolution to this, it'll mean 7.2 is off limits on our database servers. :( -nick -- nick@desert.net - all messages cryptographically signed
Responsible Changed From-To: ale->freebsd-threads FreeBSD's threads problem.
Author: attilio Date: Wed Sep 23 21:38:57 2009 New Revision: 197445 URL: http://svn.freebsd.org/changeset/base/197445 Log: rwlock implemented from libthr need to fall through the 'hard path' and query umtx also if the shared waiters bit is set on a shared lock. The writer starvation avoidance technique, infact, can lead to shared waiters on a shared lock which can bring to a missed wakeup and thus to a deadlock if the right bit is not checked (a notable case is the writers counterpart to be handled through expired timeouts). Fix that by checking for the shared waiters bit also when unlocking the shared locks. That bug was causing a reported MySQL deadlock. Many thanks go to Nick Esborn and his employer DesertNet which provided time and machines to identify and fix this issue. PR: thread/135673 Reported by: Nick Esborn <nick at desert dot net> Tested by: Nick Esborn <nick at desert dot net> Reviewed by: jeff Modified: head/lib/libthr/thread/thr_umtx.h Modified: head/lib/libthr/thread/thr_umtx.h ============================================================================== --- head/lib/libthr/thread/thr_umtx.h Wed Sep 23 20:49:14 2009 (r197444) +++ head/lib/libthr/thread/thr_umtx.h Wed Sep 23 21:38:57 2009 (r197445) @@ -171,8 +171,11 @@ _thr_rwlock_unlock(struct urwlock *rwloc for (;;) { if (__predict_false(URWLOCK_READER_COUNT(state) == 0)) return (EPERM); - if (!((state & URWLOCK_WRITE_WAITERS) && URWLOCK_READER_COUNT(state) == 1)) { - if (atomic_cmpset_rel_32(&rwlock->rw_state, state, state-1)) + if (!((state & (URWLOCK_WRITE_WAITERS | + URWLOCK_READ_WAITERS)) && + URWLOCK_READER_COUNT(state) == 1)) { + if (atomic_cmpset_rel_32(&rwlock->rw_state, + state, state-1)) return (0); state = rwlock->rw_state; } else { _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Thanks very much for the patch, we were running into it as well. The good news is that it improves the situation quite a bit. The bad news is that it does not appear to cure the problem entirely. Under heavy load we see the same problem re-occurring. We'll be bringing up another machine with 8.0 in the coming weeks and I hope to test it then. We're currently avoiding this bug by under-loading the 7.2 machine and handling more queries on a different 7.0 machine. Is there any information I can provide which would help diagnose this bug further?
Hi! Can anybody who frequently hits this hangup with relatively simple queries against MyISAM tables please let us know if this solves or avoids the issue: Add to [mysqld] section of my.cnf: concurrent_insert=0 or SET GLOBAL concurrent_insert=0;
Is this still an issue?
Attilio's work resolved the problem completely. Thanks!
Submitter confirms this is fixed with Attilio's work -- thanks for the follow-up!