I run a moderately sized mpd installation servicing DSL clients. They arrive, of course, by L2TP. A problem occurs when the L2TP tunnel is cutoff remotely. Think fiber cut as an example, but I also believe the far end L2TP speaker can go down without cleaning up. In this case, all the ngX interfaces stay "UP" and mpd5 will not terminate. Rebooting seems the only solution. If I kill -9 the mpd process, it does die, but the ngX interfaces remain as does all the other in-kernel netgraph. When I see this happen, mpd5 is still transmitting l2tp packets to nowhere. No incoming l2tp packets... just ones leaving the machine. 1) There needs to be a way to tell mpd5 to clear out, returning all resources. 2) mpd5 should recognize the non-existance of the other end and act accordingly.
More information. When mpd5 is in this state, neither the web interface nor the command line interface (normally accessible by telnet) are accessible. That is, they answer the telnet, but they don't respond to any input. I've added debugging symbols to mpd5 to see if I can convince it to core dump when this happens.
I can only say that this should not happen. L2TP disconnection is not a reason for anything to stuck. PPP links should normally teardown after LCP echo timeout, and of cause on graceful mpd5 shutdown. Please provide any available diagnostic information, since the problem is not generic enough.
absolutely. And I get it. What I have done so far: 1) I have recompiled mp5 with debug info because the threaded core file I have looked at didn't really have much information. 2) I have started dumping the first 64 bytes of every packet to try an catch what happens on the wire. So far I can tell you that when this happens, while mpd5 itself is unresponsive (say: service mpd5 restart never returns, telnet to command line or web unresponsive), it is killable and packets are still sent back to the wire. I'm ever-so-slightly starting to think that it's not the case that the tunnel is unilaterally dropped. I didn't get a chance to figure it out, but one machine had wedged connections and working connections ... suggesting that mpd5 can be wedged and the network part still working. I will update further as I have it. Likely today (the way things are going).
It's probably worth saying that I've noticed a ngX that is apparently in setup in most of the recent sessions that have broken. So that might be our culprit.
Created attachment 177006 [details] pcap of setup/teardown activity So... here's a pcap of the most recent failure. I've filtered this down to L2TP control messages _or_ LCP. MPD is at 66.96.16.194 At packets 1783 and 3251 you'll see the only ICRP/ICCN replies to ICRQ. This, AFAICT, is the system working normally. Starting at 3365, however, ICRQ isn't matched with ICRP/ICCN. It only gets a ZLB. Is something filling up? How would you like me to proceed for more information? at 3637 the far end tries to SCCRQ ... which doesn't get a reply, either. I believe it does eventually terminate the L2TP tunnel that's broken... but not in this dump.
I've analyzed several of the packet dumps. They look like the one I've pasted here. At some point, we stop processing ICRQ's (or at least we stop responding to them with ICCN's). It seems to me like we create the ngX before we reply with the ICCN? Is this true, or are we failing to send the ICCN ... and that's the fail?
Created attachment 177041 [details] MPD log file I'm attaching an mpd5 log file (generated by syslog) that goes from start to finish. The last timestamp in the file is 17:44:49 but I found the server not passing packets around 1830 ... I'm pretty sure the packet dump will show the same pattern ... with the last accepted call around 1744 and then unanswered calls afterward. I don't know that this log tells us much. I'm beginning to feel that this bug we're chasing is related to number of logins. I found one of my users was spamming a login every minute (successful) and then tearing it down. I'm going to try to fix that... but the fact that I've seen this problem before ... just every month or two ... is in the back of my mind.
Another comment: if mpd had a flag to clean up a dead mpd's mess, that would be really spiffy. I can't ifconfig unplumb ngX ... so I can't get rid of the routing attached to it. Well... I've figured out how to delete the local ipv4 address, but not the ipv6. Anyways... cleaning up any mpd objects would be a spiffy option.
today after killing it, I tried a kill -QUIT to get a core. It was stuck here: #0 0x00000008024cbb1a in _poll () from /lib/libc.so.7 [New Thread 803419400 (LWP 100569/<unknown>)] [New Thread 803006800 (LWP 100568/<unknown>)] (gdb) bt #0 0x00000008024cbb1a in _poll () from /lib/libc.so.7 #1 0x00000008021d5d66 in pthread_suspend_all_np () from /lib/libthr.so.3 #2 0x0000000000460ad7 in ?? () #3 0x00000008021d3855 in pthread_create () from /lib/libthr.so.3 #4 0x0000000000000000 in ?? () (gdb) ... I think I've still got the symbols botched, but the _poll() is a place to start. To be clear, I had already run "service mpd5 stop" ... which hangs, then about a minute afterward "killall -QUIT mpd5" ...
Created attachment 177080 [details] Complete and short failure packet capture. For this packet capture, I did a "service mpd5 restart" and you can see this start to happen at packet 249 in the capture (LCP termination requests start). Now it would appear that the far side doesn't listen to "Requestor is being shut down" StopCCN messages because one of the far end routers immediatly starts with a SCCRQ. What surprises me is that we answer with SCCRP when we're shutting down. By packet 506, the new link is clearly broken. Same signature as before. ICRQ's go unanswered (other than ZLB) and CDN's go unanswered (other than ZLB).
(In reply to dgilbert from comment #8) On my previous work we had a shell script that just wiped out every node from Netgraph, that efficiently cleaned up system after mpd5 crash/kill.
you wouldn't happen to have a copy of that script, would you? I'm also trying to get better (smaller) packet dumps that capture l2tp control packets and ppp's lcp, ipcp and pap layers. Maybe should have ip6cp, too... but I don't know it's code. Anyways, this is not capturing the ppp ... does tcpdump pcap not see ppp inside l2tp? tcpdump -nvi bge0.953 -C 200 -W 25 -w osp. udp\[8:2\] \& 0x80ff == 0x8002 or ppp\[2:2\] == 0x0821 or ppp\[2:2\] == 0xc02 3 or \( ppp\[2:2\] == 0xc021 and ppp\[4:1\] != 10 and ppp\[4:1\] != 9 \) ... ie: the ppp[] offset is not valid? If you look at the packet dumps, the header sizes of l2tp are slightly different from the two machines.
OK... I've instrumented the function ppp_l2tp_handle_ICRQ to spit out logs every line. Interestingly, the last entry in the log... typically minutes before the restart is along the lines of: Nov 15 17:44:49 owl mpd: [L2-23] Link: Leave bundle "B-21" ... so I suppose I'm going to instrument that function.
OK. This is the result of a kill -QUIT of a stuck mpd5 daemon: (gdb) bt #0 0x000000080248567c in _umtx_op_err () from /lib/libthr.so.3 #1 0x00000008024796e4 in __thr_umutex_lock (mtx=<value optimized out>, id=<value optimized out>) at /usr/src/lib/libthr/thread/thr_umtx.c:80 #2 0x000000080247f41e in mutex_lock_common (m=<value optimized out>, abstime=<value optimized out>, cvattach=Unhandled dwarf expression opcode 0x9d ) at /usr/src/lib/libthr/thread/thr_mutex.c:693 #3 0x000000080247e519 in __pthread_mutex_lock (mutex=0x802a3d218) at /usr/src/lib/libthr/thread/thr_mutex.c:745 #4 0x00000008027c9797 in vsyslog (pri=6, fmt=0x4d396b "LINK: logging exit", ap=0x7fffdfffd870) at /usr/src/lib/libc/gen/syslog.c:154 #5 0x0000000000489a93 in vLogPrintf (fmt=0x4d396b "LINK: logging exit", args=0x7fffdfffd870) at log.c:267 #6 0x00000000004898b8 in LogPrintf (fmt=0x4d396b "LINK: logging exit") at log.c:244 #7 0x000000000044a230 in BundLeave (l=0x803419010) at bund.c:478 #8 0x000000000047ec82 in LcpNewPhase (l=0x803419010, new=PHASE_TERMINATE) at lcp.c:451 #9 0x000000000047f987 in LcpNewState (fp=0x803419be8, old=ST_OPENED, new=ST_STOPPING) at lcp.c:382 #10 0x00000000004635fd in FsmNewState (fp=0x803419be8, new=ST_STOPPING) at fsm.c:187 #11 0x0000000000466a3e in FsmRecvTermReq (fp=0x803419be8, lhp=0x7fffdfffdaf0, bp=0x0) at fsm.c:865 #12 0x0000000000464e65 in FsmInput (fp=0x803419be8, bp=0x0) at fsm.c:1495 #13 0x000000000047f7a6 in LcpInput (l=0x803419010, bp=0x803481008) at lcp.c:1242 #14 0x0000000000477d7a in InputDispatch (b=0x803612810, l=0x803419010, proto=49185, bp=0x803481008) at input.c:100 #15 0x0000000000477687 in InputFrame (b=0x803612810, l=0x803419010, proto=49185, bp=0x803481008) at input.c:60 #16 0x0000000000483fbe in LinkNgDataEvent (type=1, cookie=0x0) at link.c:937 #17 0x0000000000462f47 in EventHandler (arg=0x7bfbd0) at event.c:146 #18 0x00000000004b1dbe in pevent_ctx_execute (arg=0x803424188) at contrib/libpdel/util/pevent.c:884 #19 0x00000000004b199f in pevent_ctx_service (ev=0x803424188) at contrib/libpdel/util/pevent.c:773 #20 0x00000000004b141f in pevent_ctx_main (arg=0x80322c008) at contrib/libpdel/util/pevent.c:719 #21 0x0000000802477b55 in thread_start (curthread=<value optimized out>) at /usr/src/lib/libthr/thread/thr_create.c:289 #22 0x0000000000000000 in ?? () (gdb) frame 4 #4 0x00000008027c9797 in vsyslog (pri=6, fmt=0x4d396b "LINK: logging exit", ap=0x7fffdfffd870) at /usr/src/lib/libc/gen/syslog.c:154 154 THREAD_LOCK(); Current language: auto; currently minimal (gdb) l 149 pri &= LOG_PRIMASK|LOG_FACMASK; 150 } 151 152 saved_errno = errno; 153 154 THREAD_LOCK(); 155 156 /* Check priority against setlogmask values. */ 157 if (!(LOG_MASK(LOG_PRI(pri)) & LogMask)) { 158 THREAD_UNLOCK();
So the stacktrace above was thread 3. (gdb) info threads * 3 Thread 803216a00 (LWP 101165/<unknown>) 0x00000008027c9797 in vsyslog (pri=6, fmt=0x4d396b "LINK: logging exit", ap=0x7fffdfffd870) at /usr/src/lib/libc/gen/syslog.c:154 2 Thread 80342e000 (LWP 101168/<unknown>) 0x000000080278f0da in _poll () from /lib/libc.so.7 1 Thread 80342f400 (LWP 101178/<unknown>) 0x000000080248567a in _umtx_op_err () from /lib/libthr.so.3 (gdb) thread 2 [Switching to thread 2 (Thread 80342e000 (LWP 101168/<unknown>))]#0 0x000000080278f0da in _poll () from /lib/libc.so.7 (gdb) bt #0 0x000000080278f0da in _poll () from /lib/libc.so.7 #1 0x000000080247a5e6 in __thr_poll (fds=0x80322c0a8, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:306 #2 0x00000000004b0edc in pevent_ctx_main (arg=0x80340e0a8) at contrib/libpdel/util/pevent.c:668 #3 0x0000000802477b55 in thread_start (curthread=<value optimized out>) at /usr/src/lib/libthr/thread/thr_create.c:289 #4 0x0000000000000000 in ?? () (gdb) thread 1 [Switching to thread 1 (Thread 80342f400 (LWP 101178/<unknown>))]#0 0x000000080248567a in _umtx_op_err () from /lib/libthr.so.3 (gdb) bt #0 0x000000080248567a in _umtx_op_err () from /lib/libthr.so.3 #1 0x00000008024796e4 in __thr_umutex_lock (mtx=<value optimized out>, id=<value optimized out>) at /usr/src/lib/libthr/thread/thr_umtx.c:80 #2 0x000000080247f41e in mutex_lock_common (m=<value optimized out>, abstime=<value optimized out>, cvattach=Unhandled dwarf expression opcode 0x9d ) at /usr/src/lib/libthr/thread/thr_mutex.c:693 #3 0x000000080247e519 in __pthread_mutex_lock (mutex=0x802a3d218) at /usr/src/lib/libthr/thread/thr_mutex.c:745 #4 0x00000008027c9797 in vsyslog (pri=6, fmt=0x4e27b1 "[%s] RADIUS: Rec'd RAD_ACCOUNTING_RESPONSE for user '%s'", ap=0x7fffdf1f6e10) at /usr/src/lib/libc/gen/syslog.c:154 #5 0x0000000000489a93 in vLogPrintf (fmt=0x4e27b1 "[%s] RADIUS: Rec'd RAD_ACCOUNTING_RESPONSE for user '%s'", args=0x7fffdf1f6e10) at log.c:267 #6 0x00000000004898b8 in LogPrintf (fmt=0x4e27b1 "[%s] RADIUS: Rec'd RAD_ACCOUNTING_RESPONSE for user '%s'") at log.c:244 #7 0x0000000000496faf in RadiusSendRequest (auth=0x80343a810) at radius.c:1350 #8 0x0000000000497191 in RadiusAccount (auth=0x80343a810) at radius.c:184 #9 0x0000000000443146 in AuthAccount (arg=0x80343a810) at auth.c:968 #10 0x00000000004b20d5 in paction_main (arg=0x803433148) at contrib/libpdel/util/paction.c:185 #11 0x0000000802477b55 in thread_start (curthread=<value optimized out>) at /usr/src/lib/libthr/thread/thr_create.c:289 #12 0x0000000000000000 in ?? ()
So... do I read that right? vsyslog() is called by two threads and it locks up? That would explain the behavior of things.
It looks like some other thread owns the lock. Since the only other thread is in poll(), I imagine the lock was leaked or a thread was killed while it held the lock. What version of FreeBSD are you running? Run this in gdb: set print pretty 1 thr 3 f 3 p *mutex The latter should tell us which thread owns the lock.
After reviewing the syslog code in head, I'm quite sure the lock was not leaked.
(gdb) set print pretty 1 (gdb) thr 3 [Switching to thread 3 (Thread 803216a00 (LWP 101165/<unknown>))]#0 0x000000080248567c in _umtx_op_err () from /lib/libthr.so.3 (gdb) f 3 #3 0x000000080247e519 in __pthread_mutex_lock (mutex=0x802a3d218) at /usr/src/lib/libthr/thread/thr_mutex.c:745 745 ret = mutex_lock_common(m, NULL, false, false); Current language: auto; currently minimal (gdb) p *mutex $1 = 0x803416120 How do I decode that?
Oh... FreeBSD-11.0p3
Sorry. Do both of these: p *m p **(struct pthread_mutex **)mutex They should print the same data, but run both, just in case.
(gdb) p *m $2 = { m_lock = { m_owner = 54616352, m_flags = 8, m_ceilings = 0x802a3d220, m_rb_lnk = 8125440, m_spare = 0x802a3d230 }, m_flags = 0, m_count = 0, m_spinloops = 0, m_yieldloops = 0, m_ps = 0, m_qe = { tqe_next = 0x802a3d258, tqe_prev = 0x0 }, m_pqe = { tqe_next = 0x1, tqe_prev = 0x1 }, m_rb_prev = 0x8006fe020 } (gdb) p **(struct pthread_mutex **)mutex $3 = { m_lock = { m_owner = -2147382472, m_flags = 0, m_ceilings = 0x803416128, m_rb_lnk = 0, m_spare = 0x803416138 }, m_flags = 1, m_count = 0, m_spinloops = 0, m_yieldloops = 0, m_ps = 0, m_qe = { tqe_next = 0x0, tqe_prev = 0x8036576a0 }, m_pqe = { tqe_next = 0x0, tqe_prev = 0x8036576b0 }, m_rb_prev = 0x0 }
It may be worth saying: Since I disabled logging, no crashes so far (and I would have expected a few). I realize this may just reduce the frequency ... until something else can hand on a lock ... but it's a datapoint.
The first structure is not valid, but the second is mostly valid. If so, it confirms that the mutex is owned by a thread that has exited. The m_lock.m_owner field contains the "contested" bit plus the TID 101176. (gdb) p -2147382472 & ~0x80000000 $2 = 101176 The "info threads" output in comment #15 shows that no thread has this TID. However, even that one doesn't like entirely valid. The m_lock.m_spare field is non-zero, but it should always be zero. The m_lock.m_ceilings field contains unlikely data, and is even displayed in an unlikely format. Do this: ptype struct pthread_mutex ptype struct umutex It almost seems like something is mismanaging memory and stomping on this mutex.
#0 0x000000080248567c in _umtx_op_err () from /lib/libthr.so.3 [New Thread 80342f400 (LWP 101178/<unknown>)] [New Thread 80342e000 (LWP 101168/<unknown>)] [New Thread 803216a00 (LWP 101165/<unknown>)] (gdb) ptype struct pthread_mutex type = struct pthread_mutex { struct umutex m_lock; int m_flags; int m_count; int m_spinloops; int m_yieldloops; int m_ps; struct { struct pthread_mutex *tqe_next; struct pthread_mutex **tqe_prev; } m_qe; struct { struct pthread_mutex *tqe_next; struct pthread_mutex **tqe_prev; } m_pqe; struct pthread_mutex *m_rb_prev; } (gdb) ptype struct umutex type = struct umutex { volatile __lwpid_t m_owner; __uint32_t m_flags; __uint32_t m_ceilings[0]; __uintptr_t m_rb_lnk; __uint32_t m_spare[0]; }
Does this mean I need to suck in some sort of memory guard thing?
I would recommend trying a hardware watch point in GDB. Run "help watch" in GDB or read the online docs. Run mpd under GDB (or attach GDB to a running mpd), set a watchpoint on the syslog_mutex.m_lock.m_spare field, then "continue" and let it run until it hits the watchpoint. When it stops, the current thread, stack, and instruction should be the one that modified m_spare. By the way, GDB 7.x from pkg or ports generally works much better than GDB 6 in base.
Do you run the same service(s) on another version of FreeBSD, like 10.x? Did you start seeing this problem when you updated from 10.x to 11.x?
At some point in the past, I didn't have this problem, but it was likely a different version of mpd as well as FreeBSD. When this problem started, I was running 10.1p<something> on one machine and 10.3p<something> on the other. I upgraded to 11.0p3 and there was no change in the bug. Won't your recipe stop mpd each time the lock is grabbed? I also don't have a lab environment, these are live users. This is why I asked about some form of automated memory snooper.
The watchpoints watch a 32-bit word of memory, so the process will only stop when something writes to m_ceilings, not on every lock/unlock. I forget the syntax, but you could tell GDB to run a script when the watchpoint is triggered. The script could drop a core file and then detach, letting the process continue (or quit GDB, killing the process).
(In reply to Eric van Gyzen from comment #30) Urm... are you saying that m_ceilings is not touched by normal lock usage? I'm trying to decide if it's worth setting up a raft of bhyve guests and whatnot to test this. One thing: if it's memory corruption, anything that changes the run of the system my make it a Heisenbug. Another thought I had was to look at the core file I had for what structures were allocated _next_ to the lock in question. Since we're not seeing widespread misbehavior, it stands to reason that the memory corruption is possibly only affecting one nearby structure. Just a thought.
OK. Since frobbing with my real clients is frowned upon, I'm setting up a Bhyve based test platform. Of course, I already know how to set up a PPPoE client and I already know how to setup my LNS machine, I'm wondering if anyone here has a LAC setup for a machine that concentrates PPPoE into L2TP ... ?
Our vsyslog() function (and syslog() that is a wrapper for vsyslog) maintain process-global mutex "syslog_mutex". They exclusive lock it then call send or _writev kernel functions that may block. If one of mpd authorize/acconting threads calls syslog() or vsyslog(), blocks there and is pthread_cancel()'d then "syslog_mutex" is left in inconsistent state and another thread calling syslog() may hang forever. To verify if this is real reason, we have two ways: either temporary disable thread cancellation before each call to [v]syslog(), or (preferably) fix this part of our libc making it more robust: it should clean its mutex itself if owning thread is canceled. First way needs some changes to mpd itself only. Complete solution requires change, rebuild and reinstall of libc. I'm going to attach preliminary versions of both patches for mpd and libc for testing. Please apply only one at the same time. The patches are compile-only tested, so beware.
Created attachment 183410 [details] a patch for libc/syslog by dchagin Apply: cd /usr/src patch < /path/to/syslog.patch Rebuild and reinstall libc.
Created attachment 183411 [details] my patch for mpd Put this to /usr/ports/net/mpd5/files/patch-deadlock Rebuild and reinstall the port.
Created attachment 183436 [details] preliminary patch for mpd Update patch for mpd: fix bugs (&oldstate -> oldstate) and deal with console locking too.
This PR is a duplicate of 186114 that is actively progressing now. Closing this one to move the case to single place. By the way, your debugging efforts and traces were very helpful, thanks! *** This bug has been marked as a duplicate of bug 186114 ***