211000 – [Hyper-V] Online VHDX Resize doesn't work properly

Bug 211000 - [Hyper-V] Online VHDX Resize doesn't work properly

Summary: [Hyper-V] Online VHDX Resize doesn't work properly

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-07-11 09:43 UTC by Dexuan Cui
Modified:	2018-04-10 17:41 UTC (History)
CC List:	4 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dexuan Cui 2016-07-11 09:43:29 UTC

Hyper-V has a feature called Online VHDX Resize: the disk capacity can be
    changed by the host when the VM is running. Note: the host doesn't notify
    the VM of the capacity change proactively; when the VM sends the next I/O
    request to the host, the host will report an error (SCSI_STATUS_CHECK_COND)
    with sense data supplied (the sense data indicates a disk capacity change)
    and the VM is supposed to check and handle the capacity change properly,
    and re-submit the I/O request.

    However, the current I/O response handling in the VM has a bug: the host's
    response's status is lost unintentionally, so the upper SCSI layer in the VM
    always thinks the I/O request is completed successfully while it's not (this
    causes undefined behaviors), and ignores the sense data.

Comment 1 Dexuan Cui 2016-07-11 11:03:51 UTC

This shows the issue:

Dexuan:  before the disk capacity change, we have

[root@decui-b11 ~/bsd.git/sys]# gpart show da1
=>       40  125829040  da1  GPT  (60G)
         40  125829040       - free -  (60G)

Dexuan:  after the disk capacity change, we have
(Note: the first failure is caused by the wrong handling of the SCSI_STATUS_CHECK_COND error)

[root@decui-b11 ~/bsd.git/sys]# diskinfo da1
diskinfo: da1: ioctl(DIOCGMEDIASIZE) failed, probably not a disk.
[root@decui-b11 ~/bsd.git/sys]# diskinfo da1
da1     512     75161927680     146800640       4096    0       9137    255     63

Dexuan: now the new capacity is detected, but the 'free' space remains the same old value:

[root@decui-b11 ~/bsd.git/sys]# gpart show da1
=>       40  125829040  da1  GPT  (70G)
         40  125829040       - free -  (60G)

Dexuan: this is caused by another bug: Bug 210425: rescannning da1 makes the disk disappear:

[root@decui-b11 ~/bsd.git/sys]# camcontrol rescan 2:0:0
Re-scan of 2:0:0 was successful
[root@decui-b11 ~/bsd.git/sys]# gpart show da1
gpart: No such geom: da1.
[root@decui-b11 ~/bsd.git/sys]# camcontrol devlist
<Msft Virtual CD/ROM 1.0>          at scbus0 target 0 lun 0 (cd0,pass0)
<Msft Virtual Disk 1.0>            at scbus1 target 0 lun 0 (da0,pass1)

Comment 2 Dexuan Cui 2016-07-11 11:12:45 UTC

With the fix to Bug 210425, I get this when I resize the disk from 70GB to 80GB:

(After the resizing, I get this:)
[root@decui-b11 ~/bsd.git/sys]# diskinfo da1
diskinfo: da1: ioctl(DIOCGMEDIASIZE) failed, probably not a disk.
[root@decui-b11 ~/bsd.git/sys]# diskinfo da1
da1     512     85899345920     167772160       4096    0       10443   255     63
[root@decui-b11 ~/bsd.git/sys]# gpart show da1
=>       40  146800560  da1  GPT  (80G)
         40  146800560       - free -  (70G)

[root@decui-b11 ~/bsd.git/sys]# camcontrol rescan 2:0:0
Re-scan of 2:0:0 was successful
[root@decui-b11 ~/bsd.git/sys]# camcontrol devlist
<Msft Virtual CD/ROM 1.0>          at scbus0 target 0 lun 0 (cd0,pass0)
<Msft Virtual Disk 1.0>            at scbus1 target 0 lun 0 (da0,pass1)
<Msft Virtual Disk 1.0>            at scbus2 target 0 lun 0 (da1,pass2)
[root@decui-b11 ~/bsd.git/sys]# gpart show da1
=>       40  146800560  da1  GPT  (80G)
         40  146800560       - free -  (70G)

We can see the "free" space remains the same to be the old 70GB.

Comment 3 Dexuan Cui 2016-07-11 11:28:13 UTC

Made a patch per Sephe's suggestion: https://reviews.freebsd.org/D7181

Comment 4 Dexuan Cui 2016-07-12 08:59:21 UTC

2 patches are committed into the CURRENT to improve the situation:

commit 2d2d5090052a987069c501ea3a7b73691347dae1
Author: sephe <sephe@FreeBSD.org>
Date:   Tue Jul 12 02:57:13 2016 +0000

    hyperv/stor: Save the response status and xfer length properly.

    The current command response handling discards status and xfer
    length unconditionally, so that all of the commands would be
    considered successful, even if errors happened.  When errors
    really happens, this causes all kinds of wiredness, since the
    buffer will not be filled on the host side and sense data will
    be ignored.

    Most of the time, errors do not happen, however, error does
    happen for the request sent immediately after the disk resizing.
    Discarding the SCSI status (SCSI_STATUS_CHECK_COND) and sense
    data (capacity changes) prevents the disk resizing from working
    properly.

    This commit saves the response status and xfer length properly
    for later use.

    Submitted by:       Dexuan Cui <decui microsoft com>
    Noticed by: sephe
    MFC after:  3 days
    Sponsored by:       Microsoft OSTC
    Differential Revision:      https://reviews.freebsd.org/D7181

commit 187620ec479c064b89a998d295ebf81b71db24ad
Author: sephe <sephe@FreeBSD.org>
Date:   Mon Jul 11 05:17:48 2016 +0000

    hyperv/stor: Fix the INQUIRY checks

    Don't check the area that the host has not filled.

    PR:         https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209443
    PR:         https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210425
    Submitted by:       Hongjiang Zhang <honzhan microsoft com>
    Reviewed by:        sephe, Dexuan Cui <decui microsoft com>
    MFC after:  3 days
    Sponsored by:       Microsoft OSTC
    Differential Revision:      https://reviews.freebsd.org/D6955

Comment 5 Dexuan Cui 2016-07-12 09:05:47 UTC

With the 2 patches in comment 4, “camcontrol reprobe da1” can reliably detect the new disk capacity and “gpart show da1” can see the new “free space” now. And, for MBR mode, no extra command is required, but for GPT mode, after  “camcontrol reprobe da1”, we need to run “gpart commit da1” to commit the updated GPT partition information (updated by the kernel) to the disk: without this, we'll have to run “gpart recover da1” after the VM is rebooted. 

Wer're going to merge the fixes to stable/10 and stable/11.

For now,  FreeBSD 10.3 doesn't have “camcontrol reprobe”, so we have to use this workaround:

after resizing “da1”, we should run the 3 lines:

dd if=/dev/da1 of=/dev/da1 bs=512 count=0
dd if=/dev/da1 of=/dev/da1 bs=512 count=0 (this is the same as the first line.)
gpart recover da1   (this line is not required for MBR mode)

Now, “gpart show da1” should see the new disk capacity and new “free space”.

Comment 6 Dexuan Cui 2016-07-12 09:11:24 UTC

(In reply to Dexuan Cui from comment #5)
However, for the CURRENT code, after resizing “da1”, if there is a disk read before "camcontrol reprobe da1", gpart can't detect the new “free space", though it can detect thew new disk capacity.

The workaround is to open da1 for writing (i.e., dd if=/dev/da1 of=/dev/da1 bs=512 count=0). This should be a bug in the geom code. I'll open a new bug for this.

Comment 7 Mark Linimon freebsd_committer

2016-07-13 17:54:03 UTC

sephe seems to be working on a fix.

Comment 8 Sepherosa Ziehau 2016-07-14 01:32:27 UTC

The disk controller fix has been committed.  And it works much better now.  But we suspect there is a geom bug preventing certain types of usage of disk resizing from working, see here:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211028

Comment 9 Eitan Adler freebsd_committer

2018-01-08 04:14:37 UTC

For the following conditions
Product: Base System, Documentation Status: New, Open, In Progress, UNCONFIRMED 
Assignee: Former FreeBSD committer 

Reset to default assignee. Reset status to "Open".

Comment 10 Dexuan Cui 2018-04-10 17:41:50 UTC

I believe the bug has been fixed, at least in 11.