Bug 201912 - panic in smbfs during mount
Summary: panic in smbfs during mount
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Rick Macklem
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-27 07:57 UTC by martin
Modified: 2015-12-03 12:36 UTC (History)
3 users (show)

See Also:
rmacklem: mfc-stable10+
rmacklem: mfc-stable9+


Attachments
Output of "bt full" from kgdb (9.40 KB, text/plain)
2015-08-05 02:11 UTC, Mikhail T.
no flags Details
fix a race between smb_iod_destroy() and the smd_iod thread that destroys mutexes and frees the iod structure prematurely (688 bytes, patch)
2015-10-16 12:54 UTC, Rick Macklem
no flags Details | Diff
fixes smbfs so that it doesn't do a disconnect when vc_iod == NULL (1.33 KB, patch)
2015-10-17 01:40 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description martin 2015-07-27 07:57:45 UTC
System coredumped while mounting smbfs shares from remote server.

Here the backtrace:

#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80927da2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff80928164 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80d258df in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:865
#4  0xffffffff80d25bf8 in trap_pfault (frame=0xfffffe02293fe9b0, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:676
#5  0xffffffff80d2525a in trap (frame=0xfffffe02293fe9b0) at /usr/src/sys/amd64/amd64/trap.c:440
#6  0xffffffff80d0b142 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232
#7  0xffffffff80974c3c in turnstile_broadcast (ts=0x0, queue=0) at /usr/src/sys/kern/subr_turnstile.c:838
#8  0xffffffff80914c20 in __mtx_unlock_sleep (c=0xfffff8007961b138, opts=<value optimized out>, file=0xffffffff81c32901 "ec", line=1) at /usr/src/sys/kern/kern_mutex.c:761
#9  0xffffffff80914ba9 in __mtx_unlock_flags (c=<value optimized out>, opts=<value optimized out>, file=0xffffffff81c32901 "ec", line=1) at /usr/src/sys/kern/kern_mutex.c:254
#10 0xffffffff81c28054 in smb_iod_sendall () at /usr/src/sys/modules/smbfs/../../netsmb/smb_iod.c:93
#11 0xffffffff81c28760 in smb_iod_thread (arg=0xfffff8007961b100) at /usr/src/sys/modules/smbfs/../../netsmb/smb_iod.c:637
#12 0xffffffff808f8b6a in fork_exit (callout=0xffffffff81c28670 <smb_iod_thread>, arg=0xfffff8007961b100, frame=0xfffffe02293fec00) at /usr/src/sys/kern/kern_fork.c:996
#13 0xffffffff80d0b67e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606
#14 0x0000000000000000 in ?? ()
Comment 1 martin 2015-07-27 07:59:40 UTC
Sorry, forgot to append kernel version.

FreeBSD 10.1-RELEASE-p14 amd64
Comment 2 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-07-27 09:57:10 UTC
Your trace looks very similar to one that was already fixed in https://svnweb.freebsd.org/base?view=revision&revision=264600

Are you sure your smbfs module is in sync with the kernel?
Can you show output of this command `ident /boot/kernel/smbfs.ko`?
Also, do you use GENERIC kernel?
Comment 3 martin 2015-07-29 07:44:54 UTC
(In reply to Andrey V. Elsukov from comment #2)

Yes, I am using the GENERIC kernel. ident says there are "no id keywords". Are you sure that IDs are in GENERIC kernels? Maybe it's because I check out from the Git repository on Github (https://github.com/freebsd/freebsd).

uname -a:
FreeBSD sugioarto.phiscience.local 10.1-RELEASE-p14 FreeBSD 10.1-RELEASE-p14 #0 r284985+86de4e2(releng/10.1): Fri Jul 10 11:54:22 CEST 2015     root@sugioarto.phiscience.local:/usr/obj/usr/src/sys/GENERIC  amd64

Usually I remove all /usr/obj and build entire world. The timestamps in /boot/kernel are also consistent.

I've seen this bug the first time and reported instantly. I'll rebuild the world+kernel once again now with the latest patches, if you say so.

Ok, let's close this PR for now. I'll reopen it, when I see this crash again. I use smbfs a lot and won't change my configuration of it for a long time, I think.
Comment 4 Mikhail T. 2015-08-05 02:11:53 UTC
Created attachment 159559 [details]
Output of "bt full" from kgdb

I think, I'm seeing the same problem on 9.3 stable from June 23. See attachment for "bt full". Here is the ident output:

/boot/kernel/smbfs.ko:
     $FreeBSD: stable/9/sys/kern/md4c.c 139804 2005-01-06 23:35:40Z imp $
     $FreeBSD: stable/9/sys/netsmb/smb_conn.c 249132 2013-04-05 08:22:11Z mav $
     $FreeBSD: stable/9/sys/netsmb/smb_dev.c 206361 2010-04-07 16:50:38Z joel $
     $FreeBSD: stable/9/sys/netsmb/smb_trantcp.c 264425 2014-04-13 22:00:50Z dteske $
     $FreeBSD: stable/9/sys/netsmb/smb_smb.c 230196 2012-01-16 05:15:13Z kevlo $
     $FreeBSD: stable/9/sys/netsmb/smb_subr.c 249132 2013-04-05 08:22:11Z mav $
     $FreeBSD: stable/9/sys/netsmb/smb_rq.c 249132 2013-04-05 08:22:11Z mav $
     $FreeBSD: stable/9/sys/netsmb/smb_usr.c 206361 2010-04-07 16:50:38Z joel $
     $FreeBSD: stable/9/sys/netsmb/smb_crypt.c 161523 2006-08-22 03:05:51Z marcel $
     $FreeBSD: stable/9/sys/netsmb/smb_iod.c 265246 2014-05-02 21:54:36Z ae $
     $FreeBSD: stable/8/sys/crypto/des/des_ecb.c 130443 2004-06-14 00:38:54Z obrien $
     $FreeBSD: stable/8/sys/crypto/des/des_setkey.c 130443 2004-06-14 00:38:54Z obrien $
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2015-10-16 12:54:24 UTC
Created attachment 162117 [details]
fix a race between smb_iod_destroy() and the smd_iod thread that destroys mutexes and frees the iod structure prematurely

I think this patch might fix the problem that caused
your crash. Your crash does look somewhat different than
PR#172942, but it does call smb_iod_destroy() via
smb_vc_gone() and that is where the race was.

Unfortunately your crash does suggest that the mount
was trying to do another smb_iod_destroy() when it
had already happened and this might suggest an additional
race that the patch doesn't address.

Since it is somewhat different, I haven't marked it as
a duplicate of PR#172942. I will do that if Martin
reports back that the patch seems to have stopped
the crashes from occurring. (I have no idea how reproducible
these crashes are?
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2015-10-17 01:40:36 UTC
Created attachment 162138 [details]
fixes smbfs so that it doesn't do a disconnect when vc_iod == NULL

This patch (which includes the 162117 one) adds a check for
vc_iod != NULL to the code in smb_vc_disconnect(), since this
function is called when smb_vc_create() fails and vc_iod == NULL
for that case. It also fixes smb_iod_create() so it returns with
vc_iod == NULL when it fails, since it has free'd the iod and it
also adds code to destroy the mutexes for this case.

I believe this patch will fix the crash reported here.
Comment 7 martin 2015-10-19 07:43:36 UTC
Hi Rick,

I just tried to cause this panic, but it appears to be a very hard to reproduce. I think I reported it, because "it occured" not because I can reproduce it (I would include exact steps in this case).

If you think, this addresses the problem (or even improves the situation), I need to believe you here. Please close this bug, since it does not make sense until I can produce a new coredump as proof that your patch is not correct.

Thanks,
Martin
Comment 8 commit-hook freebsd_committer freebsd_triage 2015-11-18 23:04:10 UTC
A commit references this bug:

Author: rmacklem
Date: Wed Nov 18 23:04:01 UTC 2015
New revision: 291035
URL: https://svnweb.freebsd.org/changeset/base/291035

Log:
  The problem report was for a crash that happened when smbfs was
  trying to do a mount. Given the backtrace,
  it appears that the crash occurred when smb_vc_create() failed and then
  called smb_vc_put() with vcp->vc_iod == NULL. smb_vc_put() subsequently
  called smb_vc_disconnect() with vcp->vc_iod == NULL, causing the crash.
  This patch adds a check for vcp->vc_iod != NULL in smb_vc_disconnect() to
  avoid the crash. It also fixes the case in smb_vc_create() where
  kproc_create() fails so that it destroys the mutexes and sets
  vcp->vc_iod == NULL before free()'ing the iod structure.
  The person who reported the PR tested the patch, but was not able
  to reproduce the crash with or without the patch.

  PR:		201912
  Reviewed by:	jhb
  MFC after:	2 weeks

Changes:
  head/sys/netsmb/smb_conn.c
  head/sys/netsmb/smb_iod.c
Comment 9 Rick Macklem freebsd_committer freebsd_triage 2015-12-02 23:11:10 UTC
The patch that I believe fixes this crash is now MFC'd as
r291655 for stable/10 and r291656 for stable/9.