Bug 203521

Summary: MongoDB or vim in jail hang during mi_switch
Product: Base System Reporter: Randy Westlund <rwestlun>
Component: threadsAssignee: freebsd-threads (Nobody) <threads>
Status: Closed Not A Bug    
Severity: Affects Some People CC: jail, pi, sirl33tname
Priority: ---    
Version: 10.2-RELEASE   
Hardware: amd64   
OS: Any   

Description Randy Westlund 2015-10-03 03:13:28 UTC
I'm running a webserver on 10.2-RELEASE with MongoDB-2.6.7 in a ZFS-backed jail.  After rebooting after an unrelated crash, my jail is unable to fully start.  ezjail-admin just blocks.

I used jexec to get in to the jail and found this:

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
20928 mongodb       1  52    0   100M 40412K wait    0   0:00   0.00% mongod
26645 root          1  20    0 23588K  3408K pause   1   0:00   0.00% tcsh
20391 root          1  20    0 17084K  2480K wait    1   0:00   0.00% sh
20898 mongodb       1  52    0 23592K  2492K pause   0   0:00   0.00% csh
20821 root          1  20    0 14512K  1748K select  1   0:00   0.00% syslogd
20870 root          1  52    0 47724K  2252K wait    1   0:00   0.00% su
21003 mongodb       1  52    0   104M 41244K umtxn   1   0:00   0.00% mongod
20862 root          1  20    0 17084K  2580K wait    0   0:00   0.00% sh
73529 root          1  20    0 21936K  2292K CPU1    1   0:00   0.00% top
21000 mongodb       1  52    0   100M 40408K wait    0   0:00   0.00% mongod


MongoDB is stuck in state 'umtxn'.

And procstat:

root@recipes:/ # procstat -kk 21003
  PID    TID COMM             TDNAME           KSTACK
21003 100205 mongod           -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_lock_umutex+0x1f74 __umtx_op_wait_umutex+0x78 amd64_syscall+0x357 Xfast_syscall+0xfb

root@recipes:/ # procstat -r 21003
  PID COMM             RESOURCE                          VALUE
21003 mongod           user time                    00:00:00.000000
21003 mongod           system time                  00:00:00.007805
21003 mongod           maximum RSS                            40940 KB
21003 mongod           integral shared memory                 14588 KB
21003 mongod           integral unshared data                   360 KB
21003 mongod           integral unshared stack                  128 KB
21003 mongod           page reclaims                            324
21003 mongod           page faults                                0
21003 mongod           swaps                                      0
21003 mongod           block reads                                1
21003 mongod           block writes                               1
21003 mongod           messages sent                              0
21003 mongod           messages received                          0
21003 mongod           signals received                           0
21003 mongod           voluntary context switches                 2
21003 mongod           involuntary context switches               0

root@recipes:/ # procstat -t 21003
  PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN
21003 100205 mongod           -                  1  152 sleep   umtxn


root@recipes:/ # uname -a
FreeBSD recipes 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64


It looks like maybe a deadlock is preventing the context switch.  This happens every time I reset the jail.
Comment 1 Randy Westlund 2015-10-04 16:46:34 UTC
I'm seeing the same thing with vim in a separate jail.  This jail launches okay, but vim won't start.

root@jakory:~ # procstat -kk 37514
  PID    TID COMM             TDNAME           KSTACK
37514 101411 vim              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_lock_umutex+0x1f74 __umtx_op_wait_umutex+0x78 amd64_syscall+0x357 Xfast_syscall+0xfb

root@jakory:~ # procstat -r 37514
  PID COMM             RESOURCE                          VALUE
37514 vim              user time                    00:00:00.032873
37514 vim              system time                  00:00:00.064690
37514 vim              maximum RSS                            12132 KB
37514 vim              integral shared memory                 23760 KB
37514 vim              integral unshared data                  1260 KB
37514 vim              integral unshared stack                 1152 KB
37514 vim              page reclaims                            437
37514 vim              page faults                              285
37514 vim              swaps                                      0
37514 vim              block reads                              184
37514 vim              block writes                               0
37514 vim              messages sent                              0
37514 vim              messages received                          0
37514 vim              signals received                           0
37514 vim              voluntary context switches               244
37514 vim              involuntary context switches              15


root@jakory:~ # procstat -t 37514
  PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN
37514 101411 vim              -                  0  126 sleep   umtxn



This is clearly not specific to MongoDB.
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2015-10-04 17:19:54 UTC
Appears jail-related as well?
Comment 3 Randy Westlund 2015-10-04 17:29:11 UTC
Yes.  So far, I've only seen this in jails.
Comment 4 Sir l33tname 2015-10-18 12:54:14 UTC
I saw the same thing today on a system, with git, mysql and mongodb in different jails. Is there any workaround for it?
Comment 5 Randy Westlund 2015-10-18 16:45:28 UTC
Not that I've found.  This problem is over my head.

Even if I restart the server, it's the same programs in the same jails that hang during the context switch.

In the mean time, I've moved my services to a VPS (with no jails) because I can't get my jails to start.  I'm not updating any system with working jails, for fear of my other servers breaking.
Comment 6 Sir l33tname 2015-10-18 18:17:08 UTC
(In reply to Sir l33tname from comment #4)
Nevermind, I just missed to update my libs to 10.2 in the base jail.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203765
Comment 7 Randy Westlund 2015-11-02 22:34:10 UTC
I've recreated the base jail, and the problem seems to be resolved.  My guess is that ezjail didn't handle an upgrade properly.  Closing.