Bug 218026

Summary: GEOM / gpart: secondary table(s) are consistently corrupted
Product: Base System Reporter: Chris Hutchinson <portmaster>
Component: miscAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Many People CC: cem, zlei
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Chris Hutchinson 2017-03-23 00:28:27 UTC
OK this is for a USB3 stick I use for dumps, restores && jails.
So nothing important on it, or anything. :-(
Here's the messages I receive:
GEOM: new disk da0
GEOM: new disk cd0
GEOM: da0: the secondary GPT table is corrupt or invalid.
GEOM: da0: using the primary only -- recovery suggested.
GEOM: diskid/DISK-E600665E1DC77749: the secondary GPT table is corrupt or invalid.
GEOM: diskid/DISK-E600665E1DC77749: using the primary only -- recovery suggested.

which is understandable if the device hadn't been dismounted
properly, or the system had crashed, powered off unexpectedly.
But that's not the case. The stick is always in, and is mounted
as any other system drive/slice/partition.

# gpart show da0
=>       40  247463856  da0  GPT  (118G)
         40  247463856    1  freebsd-ufs  (118G)

# gpart list da0
Geom name: da0
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 247463895
first: 40
entries: 152
scheme: GPT
Providers:
1. Name: da0p1
   Mediasize: 126701494272 (118G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 20480
   Mode: r1w1e2
   rawuuid: 30ca1b6e-0df1-11e7-b26e-3497f69fd18f
   rawtype: 516e7cb6-6ecf-11d6-8ff8-00022d09712b
   label: jails
   length: 126701494272
   offset: 20480
   type: freebsd-ufs
   index: 1
   end: 247463895
   start: 40
Consumers:
1. Name: da0
   Mediasize: 126701535232 (118G)
   Sectorsize: 512
   Mode: r1w1e3

I use this to archive all the servers I maintain locally,
as well as for build jails. This never presented any trouble
on an RELENG_11 box I originally used on.

The method to partition/format the drive:
gpart destroy -F da0
gpart create -s GPT da0
gpart add -t freebsd-ufs -l jails da0
newfs -U -o time
fsck /dev/gpt/jails
mount /jails

everything returns as one would expect *except* on the
after a reboot. Where it returns:
GEOM: new disk da0
GEOM: new disk cd0
GEOM: da0: the secondary GPT table is corrupt or invalid.
GEOM: da0: using the primary only -- recovery suggested.
GEOM: diskid/DISK-E600665E1DC77749: the secondary GPT table is corrupt or invalid.
GEOM: diskid/DISK-E600665E1DC77749: using the primary only -- recovery suggested.

gpart recover da0 brings it from CORRUPT to OK. But this shouldn't happen.

Suggestions? Thoughts? ...?

Thanks!

--Chris
 OH!
FreeBSD 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314700:
Sun Mar 5 09:01:30 PST 2017 :/usr/obj/usr/src/sys/TESTKERN amd64
Comment 1 Chris Hutchinson 2017-03-31 16:35:56 UTC
OK. I use the USB3 flash stick I mentioned above, for
dumps, and other important backups.
Today, in an attempt to experiment with the 12-CURRENT
install I have a fresh dump on the stick for. I plugged
in a USB2 external drive, and bounced the box, with the
intention of performing a restore(8) to the external
drive. I forgot that I had an install of ghostbsd on
that external drive, and given that I usually have
a USB DVD plugged into that port, it booted to
ghostbsd. So I simply bounced the box, and changed
the boot order in the BIOS, and booted single-user
to my 12-CURRENT.
performed the following (to the external [usb] drive):
gpart destroy -F da0
gpart create -s GPT da0
gpart add -t freebsd-boot -l usbboot -b 40 -s 512k da0
gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 da0
gpart add -t freebsd-ufs -l usbroot -s 8G da0
gpart add -t freebsd-swap -l usbswap -s 3G da0
gpart add -t freebsd-ufs -l usbvar -s 55G da0
gpart add -t freebsd-ufs -l usbusr da0

followed by a

newfs -U /dev/gpt/usbroot
newfs -U /dev/gpt/usbvar
newfs -U /dev/gpt/usbusr

then individually mounting those to /mnt
followed by a restore(8) to each.

Bounce the box, change boot order in BIOS, and new
usb external drive is not listed. Booting into single-user
from main drive reveals secondary table is corrupt!

Same is true on the usb stick I keep the dumps on.
Before performing any of the above. I am forced to
perform gpart recover on the usb stick && an fsck.
Both return good status.

What to do? I can now see this is not limited to USB3,
as the external drive is on a USB2 port.

This is a real problem!

Thanks for anything that might fix this!

--Chris

FreeBSD 12.0-CURRENT #0 r314700
Comment 2 Chris Hutchinson 2017-04-20 17:29:50 UTC
OK I'm now on:
FreeBSD 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r316389: amd64
and the problem still exists. I also solicited help on the mailing list.
I received a reply indicating that my BIOS may be the culprit. WTF?!
Well. It indeed appears to be true...
gpart destroy -F da0
gpart create -s GPT da0
gpart add -t freebsd-ufs -l external da0
newfs -U -o time /dev/gpt/external

All of which returned the anticipated results. So I perform the
following, to confirm the suspected culprit (BIOS) on the newly
created drive:

dd if=/dev/da0 of=./sector skip='diskinfo da0 | awk '{print $4-1}''
(backtics may not show correctly)
Which resulted in the following:

00000000  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART....\...|
00000010  aa 9c 5f 36 00 00 00 00  2f 60 38 3a 00 00 00 00  |.._6..../'8:....|
00000020  01 00 00 00 00 00 00 00  28 00 00 00 00 00 00 00  |........(.......|
00000030  07 60 38 3a 00 00 00 00  91 e5 f5 c1 0d 16 e7 11  |.'8:............|
00000040  8d 49 00 24 81 ce ba 87  09 60 38 3a 00 00 00 00  |.I.$.....'8:....|
00000050  80 00 00 00 80 00 00 00  65 12 5c 16 00 00 00 00  |........e.\.....|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200

Looks as one would anticipate. Let's reboot, and see what happens
(drive plugged in, but NOT mounted)

Results AFTER reboot;
gpart informs me:
kernel: GEOM: da0: the secondary GPT table is corrupt or invalid.
kernel: GEOM: da0: using the primary only -- recovery suggested.
kernel: GEOM: new disk ada0

As suspected, somethings still wrong. Let's look closer;
dd if=/dev/da0 of=./sector skip='diskinfo da0 | awk '{print $4-1}''
returns:

00000000  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART....\...|
00000010  65 12 5c 16 00 00 00 00  2f 60 38 3a 00 00 00 00  |e.\...../'8:....|
00000020  01 00 00 00 00 00 00 00  28 00 00 00 00 00 00 00  |........(.......|
00000030  07 60 38 3a 00 00 00 00  91 e5 f5 c1 0d 16 e7 11  |.'8:............|
00000040  8d 49 00 24 81 ce ba 87  08 60 38 3a 00 00 00 00  |.I.$.....'8:....|
00000050  80 00 00 00 80 00 00 00  00 00 00 00 86 da fa 98  |................|
00000060  61 66 13 80 09 fe d0 54  35 59 db 8e 43 b8 7e 37  |af.....T5Y..C.~7|
00000070  c9 77 0e 9d 35 fd 45 04  de 9a d3 ff 30 83 8f b4  |.w..5.E.....0...|
00000080  b9 84 1d 41 59 44 ef fd  fd 89 3e 1e 9e c6 23 e1  |...AYD....>...#.|
00000090  83 17 a7 53 e1 e7 51 c8  5f 87 2b 76 f8 60 c4 ca  |...S..Q._.+v.'..|
000000a0  e2 3e 1e eb 12 69 12 32  33 c3 29 42 d6 aa 1a bc  |.>...i.23.)B....|
000000b0  90 af fc 4f d0 e1 58 c3  52 f5 5c 54 ca bd 05 8c  |...O..X.R.\T....|
000000c0  89 04 8d 7b 11 a3 b2 1e  07 6e fe 1b 79 00 c0 15  |...{.....n..y...|
000000d0  1a 39 79 28 91 a3 e8 24  93 1a 35 ef e9 f8 e5 17  |.9y(...$..5.....|
000000e0  e6 93 f1 a2 5d aa 3e 2f  40 dc b3 17 19 4c f6 05  |....].>/@....L..|
000000f0  cf 75 3e 88 ad a4 2a 68  8c 04 c4 99 a1 bb a2 1c  |.u>...*h........|
00000100  9c 8d fe c7 3e e4 cb 56  ce 3d 33 5b 28 a5 c9 45  |....>..V.=3[(..E|
00000110  c7 3f aa e2 1e 98 bc e2  6d 9d 91 12 84 24 d6 13  |.?......m....$..|
00000120  3d b5 14 bd 9a 44 e9 ee  3f b5 91 31 73 86 79 7e  |=....D..?..1s.y~|
00000130  09 bd 4e 01 cb 06 81 b4  41 11 cd cf 97 dd 97 a1  |..N.....A.......|
00000140  a7 73 e5 f7 c5 a4 75 c9  1f 6b 5e 88 fe 1a 92 d2  |.s....u..k^.....|
00000150  3a cc 70 21 1f b8 30 34  b9 0e 5c b2 d0 14 5e 82  |:.p!..04..\...^.|
00000160  56 60 04 35 77 c9 25 04  7a af ce e1 8d 24 37 53  |V'.5w.%.z....$7S|
00000170  a3 0c dd 63 3c 15 fe 9f  a4 46 00 97 c1 b0 27 be  |...c<....F....'.|
00000180  f5 c7 f9 b5 71 9e 1b 90  f7 9c ee 8a 8e 7b 77 61  |....q........{wa|
00000190  23 13 4a 93 0b e0 f0 9e  3f dc 8e 12 f9 19 d3 75  |#.J.....?......u|
000001a0  f2 52 6d bd 12 30 cd bf  0c 91 79 10 1a bd 5b d4  |.Rm..0....y...[.|
000001b0  0f 9c 1b ff 7b 60 74 79  d7 fa bb 02 6f 19 be e4  |....{'ty....o...|
000001c0  06 fd f4 7c cb 05 23 eb  89 2f 7f cc 9b 01 fa f7  |...|..#../......|
000001d0  4c 07 c4 72 55 9f 3d 39  f3 71 64 94 bf 7e 74 b0  |L..rU.=9.qd..~t.|
000001e0  49 80 c1 37 4f 49 91 e0  54 a7 e5 4d 83 8f b8 32  |I..7OI..T..M...2|
000001f0  62 f2 61 50 6f f2 16 05  a4 60 2f 06 be 45 a6 72  |b.aPo....'/..E.r|
00000200

Whoa! That doesn't look AT ALL the way it should! So, is this the NSA
planting a backdoor, or is it FreeBSD, or???
Can I use a HEX editor to save the extra bits to a separate file, and
attempt to execute it, to see if it's something evil?

Thoughts, suggestions, ... GREATLY appreciated!

--Chris

P.S. A BIG thanks to Andrey V. Elsukov for the insight!
Comment 3 Zhenlei Huang freebsd_committer freebsd_triage 2022-09-28 12:28:54 UTC
The last sector was overwritten or not read properly. 

Some thoughts:

1. Try a different USB stick on the same box. If the behavior is the same then the box is suspected. Then try fresh install of FreeBSD. Try different BIOS version. Try another trusted box (known good).
2. Try the USB stick on a trusted box. If the behavior is the same then the stick is suspected. Maybe the controller of the stick is in bad behavior.