Summary: | [Hyper-V] NICs' (hn0, hn1) MAC addresses can appear in an uncertain way across reboot | ||
---|---|---|---|
Product: | Base System | Reporter: | Dexuan Cui <decui> |
Component: | kern | Assignee: | Xin LI <delphij> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | decui, delphij, honzhan, kyliel |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Dexuan Cui
2015-12-09 12:29:37 UTC
Patched in head/ as r292859. MFC reminder. A commit references this bug: Author: delphij Date: Tue Dec 29 07:54:56 UTC 2015 New revision: 292859 URL: https://svnweb.freebsd.org/changeset/base/292859 Log: hyperv: vmbus: remove the per-channel control_work_queue Now vmbus_channel_on_offer() -> vmbus_channel_process_offer() can safely run on the global hv_vmbus_g_connection.work_queue now. We remove the per-channel control_work_queue to achieve the proper serialization of the message handling. I removed the bogus TODO in vmbus_channel_on_offer(): a vmbus offer can only come from the parent partition, i.e., the host. PR: kern/205156 Submitted by: Dexuan Cui <decui microsoft com> Reviewed by: Howard Su <howard0su gmail com>, delphij MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4597 Changes: head/sys/dev/hyperv/include/hyperv.h head/sys/dev/hyperv/vmbus/hv_channel_mgmt.c To fix this bug, we need https://reviews.freebsd.org/D4596 and https://reviews.freebsd.org/D4597 (In reply to Dexuan Cui from comment #3) Ah sorry for that. I have committed D4596 as well. Thanks a lot, Xin! I saw the other revision: r292861 | delphij | 2015-12-29 16:19:43 +0800 (Tue, 29 Dec 2015) | 35 lines hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr() We'll remove the per-channel control_work_queue because it can't properly do serialization of message handling, e.g., when there are 2 NIC devices, vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition: for an SMP VM, vmbus_channel_process_offer() can run concurrently on different CPUs and if the second NIC's vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs first, the second NIC's name will be hn0 and the first NIC's name will be hn1! We can fix the race condition by removing the per-channel control_work_queue and run all the message handlers in the global hv_vmbus_g_connection.work_queue -- we'll do this in the next patch. With the coming next patch, we have to run the non-blocking handlers directly in the kernel thread vmbus_msg_swintr(), because the special handling of sub-channel: when a sub-channel (e.g., of the storvsc driver) is received and being handled in vmbus_channel_on_offer() running on the global hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer() invokes channel->sc_creation_callback, i.e., storvsc_handle_sc_creation, and the callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message and expect a further reply from the host, but the handling of the further messag can't be done because the current message's handling hasn't finished yet; as result, hv_vmbus_channel_open() -> sema_timedwait() will time out and th device can't work. Also renamed the handler type from hv_pfn_channel_msg_handler to vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense. Submitted by: Dexuan Cui <decui microsoft com> Reviewed by: royger MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4596 A commit references this bug: Author: delphij Date: Wed Jan 13 08:22:54 UTC 2016 New revision: 293820 URL: https://svnweb.freebsd.org/changeset/base/293820 Log: MFC r292861: hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr() We'll remove the per-channel control_work_queue because it can't properly do serialization of message handling, e.g., when there are 2 NIC devices, vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition: for an SMP VM, vmbus_channel_process_offer() can run concurrently on different CPUs and if the second NIC's vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs first, the second NIC's name will be hn0 and the first NIC's name will be hn1! We can fix the race condition by removing the per-channel control_work_queue and run all the message handlers in the global hv_vmbus_g_connection.work_queue -- we'll do this in the next patch. With the coming next patch, we have to run the non-blocking handlers directly in the kernel thread vmbus_msg_swintr(), because the special handling of sub-channel: when a sub-channel (e.g., of the storvsc driver) is received and being handled in vmbus_channel_on_offer() running on the global hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer() invokes channel->sc_creation_callback, i.e., storvsc_handle_sc_creation, and the callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message and expect a further reply from the host, but the handling of the further messag can't be done because the current message's handling hasn't finished yet; as result, hv_vmbus_channel_open() -> sema_timedwait() will time out and th device can't work. Also renamed the handler type from hv_pfn_channel_msg_handler to vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense. Submitted by: Dexuan Cui <decui microsoft com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D4596 MFC r292859: hyperv: vmbus: remove the per-channel control_work_queue Now vmbus_channel_on_offer() -> vmbus_channel_process_offer() can safely run on the global hv_vmbus_g_connection.work_queue now. We remove the per-channel control_work_queue to achieve the proper serialization of the message handling. I removed the bogus TODO in vmbus_channel_on_offer(): a vmbus offer can only come from the parent partition, i.e., the host. PR: kern/205156 Submitted by: Dexuan Cui <decui microsoft com> Reviewed by: Howard Su <howard0su gmail com>, delphij Differential Revision: https://reviews.freebsd.org/D4597 Changes: _U stable/10/ stable/10/sys/dev/hyperv/include/hyperv.h stable/10/sys/dev/hyperv/vmbus/hv_channel_mgmt.c stable/10/sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c stable/10/sys/dev/hyperv/vmbus/hv_vmbus_priv.h The issue has been fixed in 10.3 and 11-CURRENT. I don't think we want to make errata for 10.1 and 10.2. So, I suppose we can close the bug. |