Bug 147420

Summary: [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt inode)
Product: Base System Reporter: pcc <pcc>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: chris, rew
Priority: Normal    
Version: 8.1-PRERELEASE   
Hardware: Any   
OS: Any   

Description pcc 2010-06-03 09:00:02 UTC
I don't know whether this is a usual operational situation nor do I see any such report in gnats (at least, to my limited understanding of this all) but I managed to run into the following kernel panic some two or three hours after attempting to load the nfs modules on an 'almost-'GENERIC kernel. I copied this manually and hope that all's fine:

  interface nfslock.1 already present in the KLD 'kernel'!
  interface nfslock.1 already present in the KLD 'kernel'!
  panic: nfs_dirbad: /usr: bad dir ino 46895494 at offset 114368: mangled entry
  cpuid=0
  KDB: enter: panic
  [thread pid 6410 tid 100340]
  Stopped at      kdb_enter+0x3a movl    $0,kdb_why

If I can believe the clock, the failure occurred about 2 and a half hours after I fiddled with the modules and attempting make tinderbox nfs-mount whatever it wants to mount in a jail (that is, a real jail, not a tinderbox 'jail'). I also tried nullfs mounts in the same jail before that.

The box did react on my keyboard input but oddly the clock seems to have stopped running.

The KERNCONF is almost GENERIC, see differences below and foots on # $FreeBSD: src/sys/i386/conf/GENERIC,v 1.519.2.10 2010/04/29 22:44:04 thompsa Exp $. 

'Almost' GENERIC as there are some differences, most notably the VIMAGE option which I currently don't (actively) use.

# ( cd /usr/src/sys/i386/conf/ && diff GENERIC NETSERV )
21,22c21,22
< cpu           I486_CPU
< cpu           I586_CPU
---
> #cpu          I486_CPU
> #cpu          I586_CPU
24c24,25
< ident         GENERIC
---
> ident         VNETSERV
> #ident                NETSERV
38a40
> options               ROUTETABLES=16          # max 16. 1 is back compatible
40a43
> # SCTP is not yet compatible with VIMAGE.
78c81,90
< options       INCLUDE_CONFIG_FILE     # Include this file in kernel
---
>
> # Debugging for use in -current
> options       KDB                     # Enable kernel debugger support.
> options       DDB                     # Support DDB.
> options       GDB                     # Support remote GDB.
> #options      INVARIANTS              # Enable calls of extra sanity checking
> #options      INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
> options       WITNESS                 # Enable checks to detect deadlocks and cycles
> options       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
> #options      INCLUDE_CONFIG_FILE     # Include this file in kernel
297c309
< options       USB_DEBUG       # enable debug msgs
---
> #options      USB_DEBUG       # enable debug msgs
335c347
< #device               sbp             # SCSI over FireWire (Requires scbus and da)
---
> device                sbp             # SCSI over FireWire (Requires scbus and da)
339a352,360
>
> options               SC_NORM_ATTR=(FG_GREEN|BG_BLACK)
> options               SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN)
> options               SC_KERNEL_CONS_ATTR=(FG_RED|BG_BLACK)
> options               SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED)
>
> # Network stack virtualisation.
> options               VIMAGE
> options               VNET_DEBUG

Fix: 

Not known.
How-To-Repeat: I didn't try. The fsck's still running.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2010-06-03 15:54:52 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 pcc 2010-06-04 08:45:00 UTC
Re.

The situation has re-occurred at the exact same spot.

  panic: ufs_dirbad: /usr: bad dir ino 46895494 at offset 114368: mangled entry
  cpuid=1
  KDB: enter: panic
  [thread pid 42063 tid 100153]
  Stopped at      kdb_enter+0x3a movl    $0,kdb_why

I now googled with [nfs|ufs]_dirbad as a better keyword and hence now think that this issue actually may be a duplicate of kern/135690.

I have a screenshot photo of a bt and can supply that on request; no textual dumps due to no dumps, sorry. I can leave the box on for another one or two hours if anyone wishes other commands meanwhile.

What remains is the question how to find and get rid of the entry now, and whether I should go back to INVARIANTS* on in the kernel in order to avoid further mangling.

The box previously has survived months up building (and running) RELENG_8 kernels & worlds happily during the last year with INVARIANTS on.

Thanks,

Peter.
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Comment 3 pcc 2010-06-05 12:11:18 UTC
Hi,

After two more failures, I now found the offending inode with find /usr -inum 46895494 and removed that directory with rm -rf <dir>.

I re-enabled INVARIANTS, and I hope that the file system now at least is clean and that the server is stable now.

As one point, I found the inode in a directory which usually is mounted for an (ez-) jail via nullfs. The jail is used to build packages and hence somewhat busy on file system level. I don't know whether this adds to the pointers of the issue.

If the box is stable now, I would not pursue this matter further but am more than willing to answer questions.

Thanks to all,

Peter.

---

P.S.:, for completeness sake, here the variations observered but without the expectation that that would add much more information.

>   panic: ufs_dirbad: /usr: bad dir ino 46895494 at offset 114368: mangled
> entry
>   cpuid=1
>   KDB: enter: panic
>   [thread pid 42063 tid 100153]

This is, as one may expect, the only line that changes:

   [thread pid 66833 tid 100168]

or, upon ls of the directory which used to have the inode no. 46895494:

   [thread pid 3244 tid 100154]

>   Stopped at      kdb_enter+0x3a movl    $0,kdb_why
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:35 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped