Bug 212721 - FreeBSD 11.0-RC2/RC3/RELEASE fails on Hyper-V 2012r2
Summary: FreeBSD 11.0-RC2/RC3/RELEASE fails on Hyper-V 2012r2
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RC1
Hardware: amd64 Any
: Normal Affects Many People
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2016-09-16 07:20 UTC by Alexander
Modified: 2017-02-27 15:28 UTC (History)
12 users (show)

See Also:
sa.inbox: mfc-stable11?
koobs: mfc-stable10?


Attachments
dmesg (103.07 KB, image/jpeg)
2016-09-16 07:20 UTC, Alexander
no flags Details
Install FreeBSD 11 through RC3 (272.96 KB, image/jpeg)
2016-09-18 02:58 UTC, Hongjiang
no flags Details
11.0-rc3 on diff IDE- error on partitioning step (239.23 KB, image/jpeg)
2016-09-18 07:40 UTC, Alexander
no flags Details
11.0-rc3 on same IDE- boot failed (88.33 KB, image/jpeg)
2016-09-18 07:41 UTC, Alexander
no flags Details
11.0-RC3 Critical install error (49.40 KB, image/jpeg)
2016-09-19 10:16 UTC, Arkadiy Yaruta
no flags Details
Workable 2012R2 systeminfo (14.48 KB, text/plain)
2016-09-20 01:56 UTC, Hongjiang
no flags Details
Systeminfo of my Hyper-V Server (14.16 KB, text/plain)
2016-09-20 09:07 UTC, Arkadiy Yaruta
no flags Details
dmesg screenshot of 11.0-RC3 installation boot (36.58 KB, image/png)
2016-09-20 10:25 UTC, Chris K
no flags Details
Patch to fix the issue (700 bytes, patch)
2016-09-26 03:39 UTC, Hongjiang
no flags Details | Diff
screenshot of errors compiling GENERIC w/ proposed patch on 10-STABLE (20.87 KB, image/png)
2016-09-26 16:57 UTC, Terrence Koeman
no flags Details
Lock order reversal on 11-STABLE with proposed patch applied (21.57 KB, image/png)
2016-09-29 21:51 UTC, Terrence Koeman
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander 2016-09-16 07:20:35 UTC
Created attachment 174824 [details]
dmesg

I tried to install 11.0-RC3 on Hyper-V 2012r2 (Generation 1 VM).
Installation failed on disk partitioning step (Auto-UFS):
------------------------Abort-----------------
|Partitioning error                          |
|                                            |
|An installation step has been aborted. Would|
|you like to restart the installation or exit|
|the installer?                              |
    <Restart>     <Exit>                     |
---------------------------------------------

dmesg available in attachment.
10.3 installs without issues to the same VM.

such behavior is disaster for our environment.
Comment 1 Alexander 2016-09-16 08:29:46 UTC
The same problem reported on support forum https://forums.freebsd.org/threads/57682/
Comment 2 Sepherosa Ziehau 2016-09-18 02:21:29 UTC
Can you post the hyper-v's VM configuration, i.e. get a snapshot of the VM configuration.
Comment 3 Hongjiang 2016-09-18 02:58:08 UTC
Created attachment 174898 [details]
Install FreeBSD 11 through RC3
Comment 4 Hongjiang 2016-09-18 02:59:05 UTC
I tried to install FreeBSD 11 through RC3 iso on my 2012 r2, and there is no issue. In my VM settings, HDD locates on "IDE Controller 0" and LUN 0. DVD locates on "IDE Controller 1" and LUN 0. See my attachment. You can see my 2012 R2 version, and VM settings.

But if I exchange the "IDE Controller" for HDD and DVD, then the installation will fail. That is a known issue, and patch is under review (https://reviews.freebsd.org/D7693).

Could you please tell me what is your VM settings for HDD and DVD?
Comment 5 Alexander 2016-09-18 07:40:06 UTC
Created attachment 174900 [details]
11.0-rc3 on diff IDE- error on partitioning step
Comment 6 Alexander 2016-09-18 07:41:22 UTC
Created attachment 174901 [details]
11.0-rc3 on same IDE- boot failed
Comment 7 Alexander 2016-09-18 07:43:58 UTC
I did tests in the following scenario:
- 2012r2 Datacenter in failover cluster configuration
- VM stored on SMB3 share (2012r2 storage server)
- all up-to-date patches installed on 2012r2 machines
- Cluster validation configuration reports no errors

I tested 2 configurations:
FreeBSD 11.0-rc3 VM vhdx (200Gb size) and dvd on the different IDEs: installation failed on disk partitioning step (for fixed sized and dynamically sized disk)

FreeBSD 11.0-rc3 VM vhdx (200Gb size) and dvd on same IDEs: boot from cd failed
I've uploaded relevant print screens.

Are you able to install 11.0-RC3 on large virtual disk? (e.g. 200 Gb)
Comment 8 Sepherosa Ziehau 2016-09-18 07:46:50 UTC
(In reply to Alexander from comment #7)

Is the vhdx saved on the SMB3 too?

Thanks,
sephe
Comment 9 Alexander 2016-09-18 07:52:26 UTC
Yes. vxdx and all VM files located smb3. This VM works with FreeBSD 10.3-RELEASE, but 11.0-RC3 failed to install. 
BR,
Alexander
Comment 10 Arkadiy Yaruta 2016-09-19 10:13:54 UTC
I got the same disk partitioning error while trying to install 11.0-RC3 on 2012r2 hyper-v on local server. 11.0-RC3 fails to install on small (<32Gb) vhdx and 200Gb vhdx on internal (non smb3) server disk.
Comment 11 Arkadiy Yaruta 2016-09-19 10:16:12 UTC
Created attachment 174948 [details]
11.0-RC3 Critical install error
Comment 12 Hongjiang 2016-09-20 01:55:55 UTC
I have tried to install 11.0-RC3 (ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/11.0/FreeBSD-11.0-RC3-amd64-dvd1.iso) on my 2012r2 (Generation 1VM). I used both SMB3 shared storage and local disk. Both of them work fine. The vhdx I used is dynamic size.
I have tried 30G vhdx and 200G vhdx.

I guess that must be caused by some environment or version issue.

See my 2012r2 systeminfo. You can compare it with your local 2012r2 details by the output of cmd "systeminfo".
Comment 13 Hongjiang 2016-09-20 01:56:21 UTC
Created attachment 174979 [details]
Workable 2012R2 systeminfo
Comment 14 Hongjiang 2016-09-20 02:28:22 UTC
Do you mean you install FreeBSD11 RC3 successfully on VHDX > 32G (exclusive 200G)? In other words, is the failure related to VHDX size?
(In reply to Arkadiy from comment #10)
Comment 15 Arkadiy Yaruta 2016-09-20 09:05:51 UTC
I was unable to install 11.0-RC3. Installation failed on disk partitioning step on newly created vhdx (tried 30Gb and 200Gb).
Moreover, I tried to install 11.0-RC3, 11.0-RC2 on my laptop with Hyper-V. I’ve got the same error. 
If I install 10.3 on my Hyper-V server or laptop with Hyper-V, it work fine.
I’ve downloaded all .iso files from ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/...
Comment 16 Arkadiy Yaruta 2016-09-20 09:07:34 UTC
Created attachment 174994 [details]
Systeminfo of my Hyper-V Server
Comment 17 Chris K 2016-09-20 10:23:31 UTC
Disk partitioning fails because any virtual disks are enumerated and then detached (and periph destroyed).
Escaping to the shell and rescanning the bus(ses) with camcontrol rescan re-attaches the disks which allows partitioning and installation to occur, but rebooting the installation results in the virtual disks being detached, followed by a subsequent mountroot panic.

Can repro this on multiple 2012 R2 systems, clean install, Hyper-V role, up-to-date with all available Automatic Updates installed, new VMs using default configuration.
Comment 18 Chris K 2016-09-20 10:25:13 UTC
Created attachment 174998 [details]
dmesg screenshot of 11.0-RC3 installation boot
Comment 19 Dexuan Cui 2016-09-20 10:59:39 UTC
(In reply to Chris K from comment #17)

Just now I installed 11-RC3 VM (ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/11.0/FreeBSD-11.0-RC3-amd64-dvd1.iso) without any issue and the VM worked fine for me.

I usd the default VM configure. I tried 20GB and 40GB local vhdx.

It's strange I could't reproduce the issue.
Comment 20 Hongjiang 2016-09-21 04:37:46 UTC
I cannot reproduce the issue either.

Could you please try my VHDX on your local environment? Both of them are fresh install of FreeBSD 11 RC3 on windows 2012r2 and windows 10.

https://honzhancustomer.blob.core.windows.net/freebsd11rc3issues/f11r_2012r2.vhdx
https://honzhancustomer.blob.core.windows.net/freebsd11rc3issues/FreeBSD11RC3.vhdx

The login account: root
passwd: User@123
Comment 21 Arkadiy Yaruta 2016-09-21 15:12:15 UTC
I figured out that version 11 RC1 and earlier I can install without problem. Version 11 RC2 and RC3 has a problems with installation on my laptop (Windows 8.1 Ent with last updates, Hyper-V, CPU Core i7 2,9G, RAM 6Gb). I do not know why it happening.
Comment 22 Tomáš Randa 2016-09-22 06:28:14 UTC
This problem is related to MS update KB3172614 or KB3179574. Uninstalling them make VPS or installation working again. But I don't know exact reason why.
Comment 23 Arkadiy Yaruta 2016-09-22 08:19:11 UTC
Thank you Tomas for this solution. I uninstalled both updates KB3172614 and KB3179574, after that I installed FreeBSD 11 RC3 on Hyper-V VM on my laptop without problems.
Comment 24 Alexander 2016-09-22 10:48:45 UTC
Problem with 11.0-RC3 definitely disappears after uninstalling both KB3172614 and KB3179574 updates. Why FreeBSD 10.3 works when these updates installed? 
On which layer this should be fixed: FreeBSD or Hyper-V?
Comment 25 Chris K 2016-09-22 14:48:19 UTC
(In reply to Hongjiang from comment #20)

Same mountroot panic. virtual disk is enumerated as da0, then immediately detached.
Comment 26 Chris K 2016-09-22 15:07:37 UTC
(In reply to Hongjiang from comment #20)

If I uninstall KB3172614 from both my Hyper-V hosts I'm able to successfully install and then boot into FreeBSD 11.0-RC3, as well as successfully boot into the VHDX you supplied.

The Windows 8.1 and Windows Server 2012 R2 update history page (https://support.microsoft.com/en-au/help/24717/windows-8-1-windows-server-2012-r2-update-history) calls out the following Hyper-V related change:

* When you try to configure connecting a SCSI storage device to a Windows Hyper-V Host, the Host will not recognize the SCSI storage device when Logical Unit (LUN) 0 is not present.

I suspect this change (along with the Hyper-V code changes in FreeBSD since 10.3) is triggering the virtual disk being detached from the bus.

As a reference, FreeBSD 10.3 release installs correctly with or without KB3172614 installed. So there's definitely a regression in the Hyper-V codebase in FreeBSD 11.0-RC3.
Comment 27 Hongjiang 2016-09-23 02:50:37 UTC
On my local 2012r2, I did not have applied either KB3172614 or KB3179574 updates. Thanks for your investigation.
Comment 29 Terrence Koeman 2016-09-25 18:16:11 UTC
This seems what I'm hitting on Windows 2012 R2 with all current updates. I get:

storvsc0: <Hyper-V IDE Storage Interface> on vmbus0
(probe0:blkvsc0:0:0:0): storvsc scsi_status = 2
(probe0:blkvsc0:0:1:1): invalid LUN 1
...
(probe0:blkvsc0:0:0:1): invalid LUN 1
da0 at blkvsc0 bus 0 scbus1 target 0 lun 0
da0: <specs>
da0 at blkvsc0 bus 0 scbus1 target 0 lun 0
da0: <Msft Virtual Disk 1.0> detached


I first tried HardenedBSD 11-STABLE, did not work. Then I tried HardenedBSD 10-STABLE, that did not work either.

Then I tried FreeBSD 10.1-RELEASE and that did work, as did 10.2-REL and 10.3-REL. 11.0-REL does not work, nor does 11.0-RC3. Finally I installed from 10.3-REL CD and updated to 10-STABLE and the system didn't boot anymore after rebuilding GENERIC.

I think it's safe to assume some kind of change in FreeBSD since 11.0-RC1 (2016/08/13) but before 11-RC2 (2016/08/25) causes this.

Unless it's known what change causes this I'll start building & testing different dates of STABLE to pinpoint the exact date of the change.
Comment 30 Terrence Koeman 2016-09-25 18:21:32 UTC
I just noticed that RC3 gives me an error that I don't get on 10-STABLE or 11-REL:

(da0:blkvsc0:0:0:0): fatal error, could not acquire reference count

It does not recognize and then detach da0, it just doesn't recognize it at all. The other (error) messages are the same (storvsc scsi_status and invalid LUN).
Comment 31 Terrence Koeman 2016-09-25 23:36:18 UTC
Okay, I looked at the changes to hyper-v files in the window, then I tested a candidate and this change seems to have introduced the bug:

https://svnweb.freebsd.org/base/stable/10/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c?revision=304581&view=markup

10-STABLE r304580 works fine and r304581 and up does not.

Could someone more knowledgeable tell me how to work around this problem? Can I just use the most recent revision of 10-STABLE and only revert the files in sys/dev/hyperv/storvsc (that were changed in r304581) back to 304580? Or are there more files that I should also revert to make it work?

I need a HardenedBSD 10-STABLE running in h-v tomorrow, so I'm hoping I can just fiddle a bit with the files and get it working until a fix is committed.

Removing the MS update on the host would be more of a pain unfortunately.

Thanks!
Comment 32 Hongjiang 2016-09-26 03:38:46 UTC
(In reply to Terrence Koeman from comment #31)
Thanks for your investigation. We have pending patch for another issue but it can also fix your issue(https://reviews.freebsd.org/D7693). The simplest patch for you to fix it is attached.
Comment 33 Hongjiang 2016-09-26 03:39:14 UTC
Created attachment 175166 [details]
Patch to fix the issue
Comment 34 Hongjiang 2016-09-26 03:55:21 UTC
You can try my VHDX which also included this fix.
https://honzhanbug212721.blob.core.windows.net/honzhan212721/hz_testBSD11RC3.vhdx
Comment 35 Terrence Koeman 2016-09-26 10:45:43 UTC
(In reply to Hongjiang from comment #32)

I can successfully apply the patch at https://reviews.freebsd.org/D7693 to 10-STABLE, however building GENERIC then fails. I'm currently rebuilding with -j1 to see what's up.

The attached patch doesn't apply (the original code in 10-STABLE is missing the if statement just after the comment).
Comment 36 Terrence Koeman 2016-09-26 16:57:26 UTC
Created attachment 175183 [details]
screenshot of errors compiling GENERIC w/ proposed patch on 10-STABLE

Screenshot of the first errors compiling GENERIC with the proposed patch applied on 10-STABLE.
Comment 37 Dexuan Cui 2016-09-27 05:39:34 UTC
(In reply to Terrence Koeman from comment #35)
(In reply to Terrence Koeman from comment #36)

Hi Terrence, are you using the latest stable/10 branch?
The patch context in stable/10 and the HEAD branch should be the same.

If it can't apply it cleanly to stable/10 somehow, you may try manually updating the code by replacing the line in stable/10

ccb->ccb_h.status |= CAM_SEL_TIMEOUT;

with the related new block of code

if (storvsc_get_storage_type(sc->hs_dev) == DRIVER_STORVSC)
	ccb->ccb_h.status |= CAM_SEL_TIMEOUT;
else
	ccb->ccb_h.status |= CAM_DEV_NOT_THERE;
.

In stable/10, we also need to replace the
sc->hs_dev
with
sc->hs_dev->device.

Hope this can fix the compiling issue.
Comment 38 Terrence Koeman 2016-09-28 01:35:45 UTC
(In reply to Dexuan Cui from comment #37)

Thanks, I did use the latest 10-STABLE. It looked like everything was successfully applied, but the compile just wouldn't work.

At the moment I commented out the line "ccb->ccb_h.status |= CAM_SEL_TIMEOUT;", and that seems to have solved the problem for me.

I figured that's what was causing the detach. Is it safe to leave it like that until a fix is committed to the tree?

Thanks.
Comment 39 Dexuan Cui 2016-09-28 02:27:12 UTC
(In reply to Terrence Koeman from comment #38)
We'll get Hongjiang's fix committed to the HEAD and later will MFC it to stable/10. I suppose this would take a week or so.
Comment 40 Dexuan Cui 2016-09-28 05:13:35 UTC
(In reply to Terrence Koeman from comment #38)
Hi Terrence,
To fix this issue for 10-stable, you don't need to apply the whole patch at https://reviews.freebsd.org/D7693; instead, you only need to apply the patch in 
Comment 33 and change the "sc->hs_dev" to "sc->hs_dev->device": this should build fine according to my test against today's 10-stable code.
Comment 41 Terrence Koeman 2016-09-29 21:51:03 UTC
Created attachment 175289 [details]
Lock order reversal on 11-STABLE with proposed patch applied

(In reply to Dexuan Cui from comment #37)

I've made the changes you recommended, and I'm noticing frequent lock order reversals. Especially at boot and shutdown.

Attached is a screenshot of one, it looks like it has to do with the storage subsystem. Doesn't halt execution, and I haven't seen any corruption. It also does not appear that these reversals become more frequent when disk I/O increases.

Is this an indication of something wrong, or just a debugging notice I should ignore or disable?
Comment 42 Sepherosa Ziehau 2016-09-30 01:47:49 UTC
(In reply to Terrence Koeman from comment #41)

This is a well known LOR.
Comment 43 Sepherosa Ziehau 2016-10-10 08:21:01 UTC
It will be MFCed to 10-stable and 11-stable.
Comment 44 Sepherosa Ziehau 2016-10-19 09:00:25 UTC
MFC to stable/10 and stable/11 are done.
Comment 45 DmDS 2016-11-13 12:42:29 UTC
is this working?
i'm sorry, i have't try this on FreebBSD 11.
but it works for me on PFSense 2.4 (FreeBSD 11)

On the installer boot prompt, select
3. Escape to the loader prompt
set hw.ata.disk_enable=1
boot

after installation do it again.

after boot exit to shell with 8.
ee /boot/loader.conf
add the line
hw.ata.disk_enable="1"
Comment 46 Hongjiang 2016-11-14 03:08:15 UTC
10-stable and 11-stable have fixed this issue. You can try 11-stable if you are using 11-release and encountered this issue.
Comment 47 Alessandro 2016-12-08 02:53:52 UTC
This issue is causing errors with FreeBSD 11.0-RELEASE (up to p4) running on Azure as well. I've seen multiple instances of VMs failing to boot because of this bug.
Comment 48 Alexander 2017-02-27 15:28:49 UTC
It is fixed now (FreeBSD-EN-17:03.hyperv)
https://www.freebsd.org/security/advisories/FreeBSD-EN-17:03.hyperv.asc