When testing Hyper-V's Online VHDX Resize feature (refer to Bug 211000), I noticed an issue: after resizing “da1”, if there was a disk read before "camcontrol reprobe da1", gpart couldn't detect the new “free space", though it could detect the new disk capacity. To reproduce the bug, we need to run FreeBSD 12-CURRENT guest on Hyper-V (we need to make sure the 2 patches mentioned by https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211000#c4 are used to avoid the known issues) and do the following 1. This is before the disk capacity change: [root@decui-b11 ~]# gpart show da1 => 63 83886017 da1 MBR (40G) 63 83886017 - free - (40G) 2. Expand the disk capacity from 40GB to 50GB by Hyper-V management tool. 3. [root@decui-b11 ~]# dd if=/dev/da1 of=/dev/null bs=512 count=1 1+0 records in 1+0 records out 512 bytes transferred in 0.000715 secs (715864 bytes/sec) [root@decui-b11 ~]# gpart show da1 => 63 83886017 da1 MBR (50G) 63 83886017 - free - (40G) We can see the 'free' space can't grow to 50GB. 4. This can't help: [root@decui-b11 ~]# camcontrol reprobe da1 [root@decui-b11 ~]# gpart show da1 => 63 83886017 da1 MBR (50G) 63 83886017 - free - (40G)
BTW, dmesg shows: This corresponds to the "dd": (da1:storvsc1:0:0:0): storvsc scsi_status = 2 (da1:storvsc1:0:0:0): Capacity data has changed (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (6) [0 b2 0 4 1] (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (5) [0 b0 0 3c 0] (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (5) [0 b1 0 3c 0] This corresponds to the "reprobe": (probe0:storvsc1:0:0:0): storvsc has passed inquiry response (36) validation (probe0:storvsc1:0:0:0): storvsc scsi_status = 2 (probe0:storvsc1:0:0:0): storvsc skips the validation for short inquiry (5) [0 0 0 8 0] (probe0:storvsc1:0:0:0): storvsc skips the validation for short inquiry (6) [0 83 0 30 1] (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (6) [0 b2 0 4 1] (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (5) [0 b0 0 3c 0] (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (5) [0 b1 0 3c 0]
(In reply to Dexuan Cui from comment #0) In comment 0, if I skip step 3, that is, I don't run the "dd" and only run "camcontrol reprobe da1", "gpart show da1" can detect the new "free" space correctly. BTW, when I get the bug, there is a workaround: dd if=/dev/da1 of=/dev/da1 bs=512 count=0 This means, when da1 is opened for writing, somehow geom will be forced to recalculate the free space.
So, disk provider has changed its size, I assume `camcontrol reprobe` has initiated g_disk_resize event. How it should work in my understanding: 1. `camcontrol reprobe` triggers calling of disk_resize() 2. disk_resize() calls g_disk_resize() event 3. GPART class receives resize event, g_part_resize() should be called 4. GPART_MBR's method g_part_mbr_resize() should be called 5. "GEOM_PART: da1 was automatically resized. Use `gpart commit da1` to save changes or `gpart undo da1` to revert them." message should be printed in the console. If you don't see the message from p.5 - something doesn't work as expected. You can use the following dtrace command to debug: # kldload dtraceall # dtrace -n 'dtrace -n 'fbt::*disk_resize:entry fbt::g_part_*resize:entry {stack(5);}'
# kldload dtraceall # dtrace -n 'fbt::*disk_resize:entry fbt::g_part_*resize:entry {stack(5);}'
(In reply to Andrey V. Elsukov from comment #3) Thank you Andrey for the detailed instructions! g_resize_provider() returns at Line 671 and at this line, both 'size' and 'pp->mediasize' are the new disk capacity! This is pretty strange, because at this time pp->mediasize is supposed to be the old disk capacity. I'm trying to find out why this happens. 661 void 662 g_resize_provider(struct g_provider *pp, off_t size) 663 { 664 struct g_hh00 *hh; 665 666 G_VALID_PROVIDER(pp); 667 if (pp->flags & G_PF_WITHER) 668 return; 669 670 if (size == pp->mediasize) 671 return; 672 673 hh = g_malloc(sizeof *hh, M_WAITOK | M_ZERO); 674 hh->pp = pp; 675 hh->size = size; 676 g_post_event(g_resize_provider_event, hh, M_WAITOK, NULL); 677 }
It looks like the da(4) driver correctly noticed the updated capacity: (da1:storvsc1:0:0:0): Capacity data has changed So "camcontrol reprobe" shouldn't really do anything.
Also - correct me if I'm wrong, but even if you don't do reprobe, you still see the new size of da1 - 50G, so GEOM already "sees" the updated size, it's just that g_part(4) doesn't update the "free" size. Does that make sense?
(In reply to Edward Tomasz Napierala from comment #7) Hi Edword, Yes, exactly. After the disk capacity change, if I do "camcontrol reprobe da1", gpart can see both the capacity & free space are updated. After the disk capacity change, if first I open /dev/da1 for reading (e.g. dd if=/dev/da1 of=tmp bs=512 count=1), da driver notices "Capacity data has changed", and gpart can see the new capacity, but the "free" space shown by gpart is still the old. Next, "camcontrol reprobe da1" can't help and the "free" space shown by gpart is still the old. Next, opening da1 for writing (e.g. dd if=/dev/da1 of=/dev/da1 bs=512 count=0) can make gpart show the new "free" space.
Hm. Okay. The read dd works, because the da(4) driver gets notified about the capacity change on the first read command sent to the disk; basically the disk responds with Unit Attention indicating this. The write dd works, because it forces GEOM retaste. So it still fits the theory that GEOM itself is fine, and it's just g_part(4) not handling it properly.
(In reply to Edward Tomasz Napierala from comment #9) > Hm. Okay. The read dd works, because the da(4) driver gets notified about > the capacity change on the first read command sent to the disk; basically > the disk responds with Unit Attention indicating this. The write dd works, > because it forces GEOM retaste. So it still fits the theory that GEOM > itself is fine, and it's just g_part(4) not handling it properly. As I said, tracing via dtrace or geom's internal capability should show that g_part receives resize event. In previous messages I didn't see this.
Hm, you're right. Dexuan - the g_resize_provider() you're seeing, is that after "camcontrol reprobe", or after the first read? Can you bump GEOM debugflags and show what happens on the first read, when da(4) is supposed to notice the updated capacity?
(In reply to Edward Tomasz Napierala from comment #11) Thank you both for the analysis! The g_resize_provider() is after the first read, and before the "camcontrol reprobe". I'm going to use try debugflags and dtrace.
(In reply to Edward Tomasz Napierala from comment #11) With kern.geom.debugflags=253 (log everything except G_T_BIO) and debug.bootverbose=1, after the disk capacity was changed from 180GB to 200GB, I got the below when running "dd if=/dev/da1 of=tmp bs=512 count=1": g_dev_open(da1, 1, 8192, 0xfffff80004433500) g_access(0xfffff80003faba00(da1), 1, 0, 0) open delta:[r1w0e0] old:[r0w0e0] provider:[r0w0e0] 0xfffff80003d19c00(da1) g_disk_access(da1, 1, 0, 0) (da1:storvsc1:0:0:0): storvsc scsi_status = 2 (da1:storvsc1:0:0:0): Capacity data has changed g_post_event_x(0xffffffff80996f40, 0xfffff800043fd800, 1, 0) (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (6) [0 b2 0 4 1] (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (5) [0 b0 0 3c 0] (da1:storvsc1:0:0:0): storvsc skips the validation for short inquiry (5) [0 b1 0 3c 0] g_dev_close(da1, 131073, 8192, 0xfffff80004433500) g_access(0xfffff80003faba00(da1), -1, 0, 0) open delta:[r-1w0e0] old:[r1w0e0] provider:[r1w0e0] 0xfffff80003d19c00(da1) g_disk_access(da1, -1, 0, 0) Next, gpart can't show the new free space: [root@decui-b11 ~/bsd.git/sys]# gpart show da1 => 63 377487297 da1 MBR (200G) 63 377487297 - free - (180G)
(In reply to Dexuan Cui from comment #13) When I got the kernel messages in comment 13, dtrace showed this: [root@decui-b11 ~]# dtrace -n 'fbt::*disk_resize:entry fbt::g_part_*resize:entry {stack(5);}' dtrace: description 'fbt::*disk_resize:entry fbt::g_part_*resize:entry ' matched 7 probes CPU ID FUNCTION:NAME 0 36534 disk_resize:entry 0 13496 g_disk_resize:entry and g_part_resize() was not called.
Hi all, I know why g_disk_resize() can't work here when I open da1 for reading after a disk capacity change: When /dev/da1 is opened for reading (or writing) every time, amd64_syscall() -> kern_openat() -> ... -> g_dev_open() -> g_access() -> g_disk_access() -> daopen() -> dareprobe() sends the DA_STATE_PROBE_RC16 command; if the disk capacity was changed, in g_disk_access(), we update provide's mediasize ***immediately***: pp->mediasize = dp->d_mediasize; Later, in disk_resize(), g_disk_resize() -> g_resize_provider(): because of disk's size == pp->mediasize, g_resize_provider() doesn't post the g_resize_provider_event event at all. This is why "camcontrol reprobe da1" can't help. BTW, opening da1 for writing can work because the provider is destroyed and re-created, as you mentioned. I'll be proposing a patch.
Created attachment 172554 [details] The patch can work, but it may be a little ugly...
(In reply to Dexuan Cui from comment #16) Please review the attached patch. It may be a little ugly since I'm pretty new to GEOM code... Please suggest how we should fix the issue in a better way.
(In reply to Dexuan Cui from comment #17) > (In reply to Dexuan Cui from comment #16) > Please review the attached patch. It may be a little ugly since I'm pretty > new to GEOM code... Please suggest how we should fix the issue in a better > way. GEOM serves not only the disks, you can break g_provider_resize() for other types of providers. Can you test this patch instead? Index: geom_disk.c =================================================================== --- geom_disk.c (revision 302840) +++ geom_disk.c (working copy) @@ -126,7 +126,6 @@ g_disk_access(struct g_provider *pp, int r, int w, if (error != 0) return (error); } - pp->mediasize = dp->d_mediasize; pp->sectorsize = dp->d_sectorsize; if (dp->d_maxsize == 0) { printf("WARNING: Disk drive %s%d has no d_maxsize\n", @@ -143,6 +142,7 @@ g_disk_access(struct g_provider *pp, int r, int w, pp->stripeoffset = dp->d_stripeoffset; pp->stripesize = dp->d_stripesize; dp->d_flags |= DISKFLAG_OPEN; + g_resize_provider(pp, dp->d_mediasize); } else if ((pp->acr + pp->acw + pp->ace) > 0 && (r + w + e) == 0) { if (dp->d_close != NULL) { error = dp->d_close(dp);
(In reply to Andrey V. Elsukov from comment #18) Your patch is good and it works per my tests. Thank you very much, Andrey! Can you please commit it to the CURRENT branch and can we consider merging it to stable/11 too? I think this small patch should be riskless.
(In reply to Dexuan Cui from comment #19) The 11-beta3 build will begin on July 22. I hope we can have Andrey's patch committed before the date so that it can get more tests. I think we definitely want to make sure the bug is fixed in the final 11 release.
A commit references this bug: Author: ae Date: Tue Jul 19 05:36:21 UTC 2016 New revision: 303019 URL: https://svnweb.freebsd.org/changeset/base/303019 Log: Use g_resize_provider() to change the size of GEOM_DISK provider, when it is being opened. This should fix the possible loss of a resize event when disk capacity changed. PR: 211028 Reported by: Dexuan Cui <decui at microsoft dot com> MFC after: 3 weeks Changes: head/sys/geom/geom_disk.c
I think we need a bit more time to test this, so I don't think it will be in 11.0-release.
(In reply to Andrey V. Elsukov from comment #22) Got it. Thank you, Andrey!
We're seeing new panics on machines in the freebsd.org cluster that appear to be related to this. Sometimes it boots fine, other times it panics like this: Trying to mount root from zfs:zroot []... GEOM_PART: da0 was automatically resized. Use `gpart commit da0` to save changes or `gpart undo da0` to revert them. GEOM_PART: integrity check failed (da0, GPT) Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x48 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80740005 stack pointer = 0x28:0xfffffe01f119db10 frame pointer = 0x28:0xfffffe01f119db30 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13 (g_event) [ thread pid 13 tid 100019 ] Stopped at g_part_resize+0x35: testb $0x8,0x48(%rbx) When it boots, the partition info for da0 looks like this: root@tiny.nyi:/home/peter # gpart show => 40 143374576 da0 GPT (68G) 40 216 1 freebsd-boot (108K) 256 8388608 2 freebsd-swap (4.0G) 8388864 134985752 3 freebsd-zfs (64G)
In case this matters it is exploding on actual SCSI disk systems.
On closer examination, the GPT headers on these drives have strange things going on. It does look like there's a 6 sector overallocation: GEOM_PART: da1 was automatically resized. Use `gpart commit da1` to save changes or `gpart undo da1` to revert them. GEOM_PART: partition 3 has end offset beyond last LBA: 143374615 > 143374610 GEOM_PART: integrity check failed (da1, GPT) However.. I am puzzled as to why this is only sometimes detected. As I said in #24 above, sometimes the machine boots fine without a peep. All 6 drives in the machine in question are in this state.
(In reply to Peter Wemm from comment #26) > On closer examination, the GPT headers on these drives have strange things > going on. It does look like there's a 6 sector overallocation: > GEOM_PART: da1 was automatically resized. > Use `gpart commit da1` to save changes or `gpart undo da1` to revert them. > GEOM_PART: partition 3 has end offset beyond last LBA: 143374615 > 143374610 > GEOM_PART: integrity check failed (da1, GPT) > > However.. I am puzzled as to why this is only sometimes detected. As I > said in #24 above, sometimes the machine boots fine without a peep. All 6 > drives in the machine in question are in this state. My guess is the following, it looks like disk reported that its size decreased. This leads to resize and integrity check in geom_part. Since integrity check has been failed, geom is going to be withered. In the meantime disk reported again about changed size, this event handled before geom has fully destroyed, but its softc now will be NULL, so this leads to panic what you see. Probably the following patch will fix this panic, but I'm not sure that there will not another panic due to disappeared GPT. Index: geom_subr.c =================================================================== --- geom_subr.c (revision 303255) +++ geom_subr.c (working copy) @@ -636,7 +636,7 @@ g_resize_provider_event(void *arg, int flag) LIST_FOREACH_SAFE(cp, &pp->consumers, consumers, cp2) { gp = cp->geom; - if (gp->resize != NULL) + if ((gp->flags & G_GEOM_WITHER) == 0 && gp->resize != NULL) gp->resize(cp); }
There's no size decreases. The sizes match what is in the initial probe messages. What's happening is that the disk is mis-partitioned like this: Disk size: 143374650 sectors sector 0: PMBR sector 1-39: GPT 40-143374615: partition space inside gpt 143374610-143374649 backup GPT Note the overlap, da0-da5 are all like this. dmesg says: da1 at mpt0 bus 0 scbus0 target 1 lun 0 da1: <FUJITSU MAS3735NC 5B08> Fixed Direct Access SCSI-3 device da1: Serial Number A107P49025H2 da1: 320.000MB/s transfers (160.000MHz, offset 127, 16bit) da1: Command Queueing enabled da1: 70007MB (143374650 512 byte sectors) There is no size change. The problem is that instead of copying the media size, you're introducing a new resize event at open time that wasn't there before - even when there's no resize. g_part is now calling its resize code paths and trying to make sense. GEOM_PART: partition 3 has end offset beyond last LBA: 143374615 > 143374610 GEOM_PART: integrity check failed (da1, GPT) This matches the bad partitioning and the GPT should have been rejected the very first time around. The problem is that it is only being detected perhaps 1 in 10 times. Aside from the panic, the second problem is the inconsistency. It should be attempting to repair all 6 drives, but isn't. Sometimes it doesn't detect any of the bad disks and boots successfully. Is the event getting lost? I think you're racing with initialization / tasting somehow and I suspect that's a bigger problem.
(In reply to Peter Wemm from comment #28) > There is no size change. The problem is that instead of copying the media > size, you're introducing a new resize event at open time that wasn't there > before - even when there's no resize. g_part is now calling its resize code > paths and trying to make sense. When geom_disk object is created its size is initialized. If its size is the same, resize event will not be invoked. So, there is definitely present size change. At least sometimes disks reported different size. > GEOM_PART: partition 3 has end offset beyond last LBA: 143374615 > 143374610 > GEOM_PART: integrity check failed (da1, GPT) > > This matches the bad partitioning and the GPT should have been rejected the > very first time around. The problem is that it is only being detected > perhaps 1 in 10 times. > > Aside from the panic, the second problem is the inconsistency. It should be > attempting to repair all 6 drives, but isn't. Sometimes it doesn't detect > any of the bad disks and boots successfully. Is the event getting lost? I > think you're racing with initialization / tasting somehow and I suspect > that's a bigger problem. The metadata on the disk cannot be sometimes correct and sometimes not.
Created attachment 172924 [details] Increase debugging in geom_disk Can you apply this patch and enable bootverbose mode?
The first few boot attempts looked like this: ... (da0:mpt0:0:0:0): UNMAPPED (da1:mpt0:0:1:0): da0 at mpt0 bus 0 scbus0 target 0 lun 0 da0: <FUJITSU MAS3735NC 5B08> Fixed Direct Access SCSI-3 device da0: Serial Number A107P49025GU da0: 320.000MB/s transfers (160.000MHz, offset 127, 16bit) da0: Command Queueing enabled da0: 70007MB (143374650 512 byte sectors) da0: Delete methods: <NONE(*),ZERO> ses0: Nfans 0 Npwr 0 Nslots 6 Lck 0 Ntherm 0 Nspkrs 0 Ntstats 0 UNMAPPED (da2:mpt0:0:2:0): UNMAPPED (da3:mpt0:0:3:0): UNMAPPED (da4:mpt0:0:4:0): UNMAPPED (da5:mpt0:0:5:GEOM: new disk da0 (73407820800 bytes) GEOM: new disk da1 (0 bytes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ GEOM: new disk da2 (0 bytes) da3 at mpt0 bus 0 scbus0 target 3 lun 0 da3: <FUJITSU MAS3735NC 5B08> Fixed Direct Access SCSI-3 device da3: Serial Number A107P49025H7 da3: 320.000MB/s transfers (160.000MHz, offset 127, 16bit) da3: Command Queueing enabled da3: 70007MB (143374650 512 byte sectors) da3: Delete methods: <NONE(*),ZERO> da2 at mpt0 bus 0 scbus0 target 2 lun 0 da2: <FUJITSU MAS3735NC 5B08> Fixed Direct Access SCSI-3 device da2: Serial Number A107P49025GB da2: 320.000MB/s transfers (160.000MHz, offset 127, 16bit) da2: Command Queueing enabled da2: 70007MB (143374650 512 byte sectors) da2: Delete methods: <NONE(*),ZERO> da1 at mpt0 bus 0 scbus0 target 1 lun 0 da1: <FUJITSU MAS3735NC 5B08> Fixed Direct Access SCSI-3 device da1: Serial Number A107P49025H2 da1: 320.000MB/s transfers (160.000MHz, offset 127, 16bit) da1: Command Queueing enabled da1: 70007MB (143374650 512 byte sectors) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ da1: Delete methods: <NONE(*),ZERO> da4 at mpt0 bus 0 scbus0 target 4 lun 0 da4: <MAXTOR ATLAS15K_73SCA DT60> Fixed Direct Access SCSI-3 device da4: Serial Number C80666ZK da4: 320.000MB/s transfers (160.000MHz, offset 127, 16bit) da4: Command Queueing enabled da4: 70007MB (143374650 512 byte sectors) da4: Delete methods: <NONE(*),ZERO> GEOM: new disk da3 (73407820800 bytes) GEOM: new disk da4 (73407820800 bytes) 0): UNMAPPED pass0 at mpt0 bus 0 scbus0 target 0 lun 0 da5 at mpt0 bus 0 scbus0 target 5 lun 0 da5: <FUJITSU MAS3735NC 5B08> Fixed Direct Access SCSI-3 device da5: Serial Number A107P49025GP da5: 320.000MB/s transfers (160.000MHz, offset 127, 16bit) da5: Command Queueing enabled da5: 70007MB (143374650 512 byte sectors) da5: Delete methods: <NONE(*),ZERO> ... Note the "GEOM new disk, 0 bytes" happens before the probe messages. Presumably the nature of the original panic we were seeing was that was trying to resize an uninitialized disk. It no longer detects the corrupt GPT but no longer panics either. The machine in question is slow to boot, but I'll try a few more iterations.
It fairly consistently prints one or two 'GEOM new disk .. 0 bytes' before the disk probe messages but has not crashed. It is usually different disks each time. And it has not ever detected the corrupt GPT data since the patch.
With the patch from #30 the machine in question hasn't had any more hiccups. It is behaving in a bug-compatible way with the way it was before the commit that started the panics.
(In reply to Peter Wemm from comment #31) > Note the "GEOM new disk, 0 bytes" happens before the probe messages. > Presumably the nature of the original panic we were seeing was that was > trying to resize an uninitialized disk. No, at this time there are no GEOM_PART objects yet, so it is nothing to resize. > It no longer detects the corrupt GPT but no longer panics either. > > The machine in question is slow to boot, but I'll try a few more iterations. Thanks, I will commit the change to g_resize_provider_event() and wait additional time before MFC.
A commit references this bug: Author: ae Date: Mon Jul 25 09:12:08 UTC 2016 New revision: 303288 URL: https://svnweb.freebsd.org/changeset/base/303288 Log: Do not invoke resize method if geom is being withered. PR: 211028 MFC after: 2 weeks Changes: head/sys/geom/geom_subr.c
A commit references this bug: Author: ae Date: Mon Aug 1 20:54:54 UTC 2016 New revision: 303637 URL: https://svnweb.freebsd.org/changeset/base/303637 Log: Do not invoke resize event if initial disk size is zero. Some disks report the size only after first opening. And due to the events are asynchronous, some consumers can receive this event too late and this confuses them. This partially restores previous behaviour, and at the same time this should fix the problem, when already opened provider loses resize event. PR: 211028 MFC after: 3 weeks Changes: head/sys/geom/geom_disk.c
A commit references this bug: Author: ae Date: Fri Sep 30 03:45:41 UTC 2016 New revision: 306476 URL: https://svnweb.freebsd.org/changeset/base/306476 Log: MFC r303019: Use g_resize_provider() to change the size of GEOM_DISK provider, when it is being opened. This should fix the possible loss of a resize event when disk capacity changed. MFC r303288: Do not invoke resize method if geom is being withered. MFC r303637: Do not invoke resize event if initial disk size is zero. Some disks report the size only after first opening. And due to the events are asynchronous, some consumers can receive this event too late and this confuses them. This partially restores previous behaviour, and at the same time this should fix the problem, when already opened provider loses resize event. PR: 211028 Changes: _U stable/11/ stable/11/sys/geom/geom_disk.c stable/11/sys/geom/geom_subr.c
Fixed in head/ and stable/11. Thanks!