Bug 273555

Summary: GPT table lost or filesystem became invalid for new disk hot added on an existing LSILogic/LSILogicSAS controller in VM with guest OS FreeBSD 13.2-RELEASE i386 running on ESXi
Product: Base System Reporter: Yuhua Zou <zouy>
Component: kernAssignee: freebsd-virtualization (Nobody) <virtualization>
Status: New ---    
Severity: Affects Some People CC: dpetrov67, jcfyecrayz, jsavanyo, tom, vmware-gos-qa, zlei, zouy
Priority: ---    
Version: 13.2-RELEASE   
Hardware: i386   
OS: Any   
Attachments:
Description Flags
screenshots of panic
none
post panic boot failure none

Description Yuhua Zou 2023-09-04 06:44:46 UTC
This issue mainly happens on a new disk hot added on an existing LSILogic/LSILogicSAS controller. After performing file read/write testing on the hot added disk, reboot the guest OS, and then check the new disk partition, the GPT table would lost or the filesystem is invalid.

I captured a kernel panic once but can't reproduce it at every time:

In /var/log/message:

   1177 Aug 28 16:21:03 FreeBSD-20230825202248 syslogd: kernel boot file is /boot/kernel/kernel
   1178 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: panic: ufs_dirbad: /mnt/598dc209-45be-11ee-ad23-005056a9fb8a: b
ad dir ino 2 at offset 0: mangled entry
   1179 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: cpuid = 0
   1180 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: time = 1693239639
   1181 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: KDB: stack backtrace:
   1182 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #0 0x10704bf at kdb_backtrace+0x4f
   1183 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #1 0x1028ab4 at vpanic+0xf4
   1184 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #2 0x10289b4 at panic+0x14
   1185 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #3 0x12c637a at ufs_lookup_ino+0xc7a
   1186 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #4 0x12c56f6 at ufs_lookup+0x16
   1187 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #5 0x10d6f7a at vfs_cache_lookup+0x9a
   1188 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #6 0x10e3e84 at lookup+0x3d4
   1189 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #7 0x10e31bb at namei+0x20b
   1190 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #8 0x1106990 at vn_open_cred+0x480
   1191 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #9 0x10fd268 at kern_openat+0x2f8
   1192 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #10 0x10fd47f at sys_openat+0x2f
   1193 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #11 0x142efc9 at syscall+0x179
   1194 Aug 28 16:21:03 FreeBSD-20230825202248 kernel: #12 0xffc03479 at __stop_set_sysinit_set+0xd93df94d


Steps might be able to reproduce it:
1. Create VM in ESXi with VM settings: 
         guest OS Version: FreeBSD 13 (32-bit)
         vcpu : 2
         memory: 3 G
         disk controller: VMware Paravirtual 
         disk: 16 G
         other default

2. Install guest OS with FreeBSD 13.2-RELEASE i386 ISO and reboot after finish installation.

3. Edit VM settings and Add a lsilogic/lsilogicsas controller

4. Add a new disk (e.g 1G) to this new controller. 
    The new disk will be recognized as /dev/da1

5. Create GPT partition table, add new partition and create filesystem on it.
    gpart create -s GPT /dev/da1
    gpart add -t freebsd-ufs /dev/da1
    newfs -EU /dev/da1p1

6. Mount the device
     mkdir /mnt/testdir
     mount /dev/da1p1 /mnt/testdir

7. Test file write/read: Create and write to file test.txt under /mnt/testdir/
     
8. Unmount the device /dev/da1p1

9.  Reboot VM

10. Mount the device /dev/da1p1

For the step 10, it maybe hit the issue:  GPT is lost for disk /dev/da1 or filesystem is invalid for /dev/da1p1




Additional,  I try the command "newfs" for the newly added disk partition /dev/da0p1 and find this command doesn't works well.

Try 1: No issue
   root@FreeBSD-20230808092435:~ # newfs -EU /dev/da0p1
     /dev/da0p1: 1022.0MB (2093056 sectors) block size 32768, fragment size 4096
	using 4 cylinder groups of 255.53MB, 8177 blks, 32768 inodes.
	with soft updates
     Erasing sectors [128...2093055]
     super-block backups (for fsck_ffs -b #) at:
      192, 523520, 1046848, 1570176
 
  root@FreeBSD-20230808092435:~ # fstyp /dev/da0p1
     ufs
  root@FreeBSD-20230808092435:~ # mount /dev/da0p1 /mnt/da0p1/
  root@FreeBSD-20230808092435:~ # mkdir /mnt/da0p1/testdir
   
Try 2: no error message from command 'newfs', but the filesystem is not recognized.

  root@FreeBSD-20230808092435:~ # newfs -EUN /dev/da0p1
  /dev/da0p1: 1022.0MB (2093056 sectors) block size 32768, fragment size 4096
	using 4 cylinder groups of 255.53MB, 8177 blks, 32768 inodes.
	with soft updates
  super-block backups (for fsck_ffs -b #) at:
   192, 523520, 1046848, 1570176

  root@FreeBSD-20230808092435:~ # newfs -EUN /dev/da0p1
  /dev/da0p1: 1022.0MB (2093056 sectors) block size 32768, fragment size 4096
	using 4 cylinder groups of 255.53MB, 8177 blks, 32768 inodes.
	with soft updates
  super-block backups (for fsck_ffs -b #) at:
  192, 523520, 1046848, 1570176


  root@FreeBSD-20230808092435:~ # fstyp /dev/da0p1
  fstyp: /dev/da0p1: filesystem not recognized


Try 3: hit issue "iput: check-hash failed for inode read from disk"

  root@FreeBSD-20230808092435:~ # newfs -EU /dev/da0p1
  /dev/da0p1: 1022.0MB (2093056 sectors) block size 32768, fragment size 4096
	using 4 cylinder groups of 255.53MB, 8177 blks, 32768 inodes.
	with soft updates
  Erasing sectors [128...2093055]
  super-block backups (for fsck_ffs -b #) at:
   192, 523520, 1046848, 1570176
  iput: check-hash failed for inode read from disk


Try 4: hit issue "cg 0: bad magic number":

  root@FreeBSD-20230808092435:~ # newfs -EU /dev/da0p1
  /dev/da0p1: 1022.0MB (2093056 sectors) block size 32768, fragment size 4096
	using 4 cylinder groups of 255.53MB, 8177 blks, 32768 inodes.
	with soft updates
  Erasing sectors [128...2093055]
  super-block backups (for fsck_ffs -b #) at:
   192, 523520, 1046848, 1570176
  cg 0: bad magic number





This issue is not related to ESXi version. It seems to be kernel/driver issue
Comment 1 Tom 2024-01-25 15:47:20 UTC
Created attachment 247950 [details]
screenshots of panic
Comment 2 Tom 2024-01-25 15:48:08 UTC
I'm seeing a ufs_dirbad panic when attempting to upgrading a 12.4 i386 vmware (LSI Logic SAS) hosted machine to 13.2.

The first freebsd-update to install the kernel is okay as is the reboot (13.2 kernel 12.4 world). During the second freebsd-update (userland) I get the panic. On restarting I see "Not UFS" and it stops at the boot: prompt. I'm now using a clone of the vm and it happens on every attempt. The inode changes each time.
Comment 3 Tom 2024-01-25 15:48:54 UTC
Created attachment 247951 [details]
post panic boot failure