Recent updates on HEAD have been causing panics for me. I had 4 before I reverted to an older boot environment on this ZFS only/ZFS-on-root machine. GOOD (2 weeks ago): FreeBSD xts-bsd.pa-us.unovitch.com 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r290834M: Sun Nov 15 04:18:50 UTC 2015 jason@xts-bsd.pa-us.unovitch.com:/usr/obj/usr/src/head/sys/GENERIC amd64 BAD: FreeBSD xts-bsd.pa-us.unovitch.com 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r291461M: Mon Nov 30 00:13:05 UTC 2015 jason@xts-bsd.pa-us.unovitch.com:/usr/obj/usr/src/head/sys/VIMAGE amd64 The only relevant change for the "M" is some coretemp changes. Nothing related to vfs. M share/man/man4/coretemp.4 M sys/dev/coretemp/coretemp.c Additional details forthcoming.
Captures from the web IPMI KVM are located at https://people.FreeBSD.org/~junovitch/working/PR204949/
CC the three committers to vfs_subr.c since 15 Nov 2015. Thanks in advance for any assistance!
Is that right after boot or later? If later, can you configure dump device and get a dump?
(In reply to Gleb Smirnoff from comment #3) It's sometime after boot. Generally with some ports builds running but I do not have a 'do A; see effect B' cause and effect for this yet. I'll get the dump information shortly.
I replicated it after doing a 'boot kernel.GENERIC' just to get a sane baseline on GENERIC. The events were poudriere bulk, let a couple ports build, Ctrl-C, poudriere bulk. The panic occurred starting the jail. Again it doesn't seem to be a specific event just heavy filesystem activity. [00:00:00] ====>> Starting jail 110i386-default <panic> The core.txt, info, panicmail, and vmwcore are located in: https://people.FreeBSD.org/~junovitch/working/PR204949/r291461M-GENERIC-crashdump/ Let me know if anything else will assist.
(In reply to Jason Unovitch from comment #5) What does 'M' mean in your kernel revision ? What changes did you apply ? From the core at the reference above, in kgdb do 'frame 12\nprint *mp' and post the result. There were some changes to the vnode handling in the timeframe of recent two weeks, but your problem is about the state of the struct mount, and not vnode.
(In reply to Konstantin Belousov from comment #6) > What does 'M' mean in your kernel revision ? What changes did you apply ? I have applied the patch in bug 158160 to let coretemp recognize my Atom CPU. This only impacts my home router but I have it applied to my source for this NAS machine as well even though it is a no-op there. The relevant files changed: M share/man/man4/coretemp.4 M sys/dev/coretemp/coretemp.c I realize the core dumps are rather useless now without the kernel.debug. I'm learning as I go here. Let me know what else you may need. > From the core at the reference above, in kgdb do 'frame 12\nprint *mp' and post the result. https://people.FreeBSD.org/~junovitch/working/PR204949/r291461M-GENERIC-crashdump/frame12-printmp.txt
Since the vnode change not all fields get reinitialized and chances some of "surviving" fields should not. In particular the union used to store v_mountdhere vnode is suspicious. Unfortunately I'm unable to reproduce the problem. Can you try running with the following? diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index ddab9f0..3c39c05 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -2784,6 +2784,7 @@ _vdrop(struct vnode *vp, bool locked) #endif vp->v_iflag = 0; vp->v_vflag = 0; + bzero(&vp->v_un, sizeof(vp->v_un)); bo->bo_flag = 0; uma_zfree(vnode_zone, vp); } There are few more suspicious fields, but they should not matter for your issue ('clustering stuff').
(In reply to Mateusz Guzik from comment #8) Mateusz, I added this patch and built a new kernel on the existing r291461M build and I can't seem to replicate it. I did the heavy poudriere start/stop/Ctrl-C where it was happening before and haven't see the issue just yet. It's still early; I will continue to monitor and provide an update tomorrow. FreeBSD xts-bsd.pa-us.unovitch.com 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r291461M: Wed Dec 2 20:39:53 UTC 2015 jason@xts-bsd.pa-us.unovitch.com:/usr/obj/usr/src/head/sys/VIMAGE amd64 Thanks!
I can reproduce the problem in few minutes by lowering kern.maxvnodes to few k and running poudriere. The crash disappears with the patch aplied. Apart from completness (what about said clustering fields?) I would say that's the right patch.
A commit references this bug: Author: mckusick Date: Thu Dec 3 02:04:22 UTC 2015 New revision: 291671 URL: https://svnweb.freebsd.org/changeset/base/291671 Log: We need to zero out the union of pointers in a freed vnode structure. PR: 204949 Fix from: Mateusz Guzik Tested by: Jason Unovitch Changes: head/sys/kern/vfs_subr.c
The proposed fix by Mateusz Guzik is correct. Prior to my changes to vfs_subr.c the v_mountedhere union element was always zero'ed. Following my changes it was not. In the lookup function, it checks to see if v_mountedhere != NULL, and if it is not NULL assumes that it points to a struct mount and passes it to vfs_busy. If it contains a pointer to some other structure from a previous use of the vnode, the call to vfs_busy chokes when it attempts to dereference a mount mutex. I have committed the fix as noted in the above comment.
Assign to committer that resolved. @Kirk Does this need to be MFC'd? If so, please set mfc-stable{9,10} flags to + when committed, or to '-' if not necessary with comment. @Mateusz can you attach your diff in comment 8 as an attachment please
I reported on freebsd-current "Since upgrading to r291494 from r290716 my system reliably panics when running poudriere." and was refered to this patch by Mateusz Guzik. My system would easily panic in 10-20 minutes. I applied the one line patch in "comment 8" and had kernel uptime over 24 hours however a new panic has occurred. This new panic may be unrelated - I do not know. panic: deadlkres: possible deadlock detected for 0xfffff80162828000, blocked for 1801507 ticks https://charon.gopai.com/crash/info.2 https://charon.gppai.com/crash/core.txt.2 https://charon.gopai.com/crash/vmcore.2 Let me know what additional information I can supply or if a new PR should be opened.
(In reply to Kubilay Kocak from comment #13) The vfs_subr.c changes have not been MFC'ed and are not presently planned to be MFC'ed. If they are MFC'ed, this fix will be included.
(In reply to mikej from comment #14) Your latest panic does not appear to be related to this bug. I suggest that you open a new PR for it.
A commit references this bug: Author: mckusick Date: Fri Dec 4 03:54:18 UTC 2015 New revision: 291743 URL: https://svnweb.freebsd.org/changeset/base/291743 Log: We need to zero out the clustering variables in a freed vnode structure. For completeness add a VNASSERT that there are no threads waiting on a range lock (this was previously checked on every vnode free). Reported by; Rick Macklem Fix from: Mateusz Guzik PR: 204949 Changes: head/sys/kern/vfs_subr.c
This bug has been fixed as has a related problem that it raised. If this change is MFC'ed these fixes will be included.