Bug 271456 - SR-IOV: mce(4) VF interface in a bridge can't be pinged or ping IPs outside of a VM
Summary: SR-IOV: mce(4) VF interface in a bridge can't be pinged or ping IPs outside o...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 13.2-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-16 19:43 UTC by benoitc
Modified: 2023-08-16 22:55 UTC (History)
2 users (show)

See Also:


Attachments
ip offered by the router is marked as active (175.32 KB, image/png)
2023-05-23 16:38 UTC, benoitc
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description benoitc 2023-05-16 19:43:49 UTC
I have setup sriov on a mellanox ConnectX-4 Lx, mce(4)

```
PF {
    device: "mlx5_core0";
    num_vfs: 8;
}

DEFAULT {
    passthrough: true;
}

VF-0 {
    passthrough: false;
}

VF-1 {
    mac-addr: "2e:79:38:0d:0f:9e";
    passthrough: true;
}


```

This provide me the `mce2`  interface ConnectX-4 Lx
and a bridge like this:

```
mce2: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=7ead00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,HWRXTSTMP,NOMAP,TXTLS4,TXTLS6,VXLAN_HWCSUM,VXLAN_HWTSO>
	ether ba:88:73:e7:93:7d
	media: Ethernet 25GBase-SR <full-duplex,rxpause,txpause>
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vm-public: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
	ether 6a:db:8d:8d:ff:63
	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
	member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 8 priority 128 path cost 2000000
	member: mce2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 6 priority 128 path cost 55
	groups: bridge vm-switch viid-4c918@
	nd6 options=9<PERFORMNUD,IFDISABLED>
```

when I start a freebsd vm with vm-bhyve, the vm start and is abble to acquire via dhcp the IP but I can't ping it. 

When I start a vm using passthrough interface this is working as expected. Any idea what could be the issue?
Comment 1 benoitc 2023-05-16 20:12:05 UTC
dmseg log of the vm=

```
oading kernel...
/boot/kernel/kernel text=0x18aa98 text=0xdfd150 text=0x675154 data=0x140 data=0x1c38e8+0x43b718 0x8+0x18fe70+0x8+0x1ae449|
Loading configured modules...
/boot/entropy size=0x1000
/etc/hostid size=0x25
---<<BOOT>>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
VT: init without driver.
CPU: Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz (2100.00-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50657  Family=0x6  Model=0x55  Stepping=7
  Features=0x9f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT,PBE>
  Features2=0xfede7a17<SSE3,PCLMULQDQ,DTES64,DS_CPL,SSSE3,SDBG,FMA,CX16,xTPR,PCID,DCA,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0x10150fb9<FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,AVX512F,RDSEED,SMAP,AVX512CD>
  Structured Extended Features3=0x400<MD_CLEAR>
  XSAVE Features=0x1<XSAVEOPT>
  TSC: P-state invariant
Hypervisor: Origin = "bhyve bhyve "
real memory  = 536870912 (512 MB)
avail memory = 481677312 (459 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BHYVE  BVAPIC >
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-31
random: entropy device external interface
kbd1 at kbdmux0
smbios0: <System Management BIOS> at iomem 0xf1000-0xf101e
smbios0: Version: 2.6, BCD Revision: 2.6
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
acpi0: <BHYVE BVXSDT >
acpi0: Power Button (fixed)
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 16777216 Hz quality 950
Event timer "HPET" frequency 16777216 Hz quality 550
Event timer "HPET1" frequency 16777216 Hz quality 450
Event timer "HPET2" frequency 16777216 Hz quality 450
Event timer "HPET3" frequency 16777216 Hz quality 450
Event timer "HPET4" frequency 16777216 Hz quality 450
Event timer "HPET5" frequency 16777216 Hz quality 450
Event timer "HPET6" frequency 16777216 Hz quality 450
Event timer "HPET7" frequency 16777216 Hz quality 450
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pcib0: could not evaluate _ADR - AE_NOT_FOUND
pci0: <ACPI PCI bus> on pcib0
virtio_pci0: <VirtIO PCI (legacy) Block adapter> port 0x2000-0x207f mem 0xc0000000-0xc0001fff irq 16 at device 4.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci0
vtblk0: 20480MB (41943040 512 byte sectors)
virtio_pci1: <VirtIO PCI (legacy) Network adapter> port 0x2080-0x20bf mem 0xc0002000-0xc0003fff irq 17 at device 5.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci1
vtnet0: Ethernet address: 58:9c:fc:07:d3:a7
vtnet0: netmap queues/slots: TX 1/1024, RX 1/512
000.000147 [ 450] vtnet_netmap_attach       vtnet attached txq=1, txd=1024 rxq=1, rxd=512
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
vmgenc0: <VM Generation Counter> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
driver bug: Unable to set devclass (class: atkbdc devname: (unknown))
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 14.0.
psm0: model Generic PS/2 mouse, device ID 0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
uart2: <16550 or compatible> port 0x3e8-0x3ef irq 4 on acpi0
uart3: <16550 or compatible> port 0x2e8-0x2ef irq 3 on acpi0
vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff pnpid PNP0900 on isa0
Timecounter "TSC" frequency 2095025872 Hz quality 1000
Timecounters tick every 10.000 msec
usb_needs_explore_all: no devclass
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
Setting hostuuid: 4325916e-b608-4b41-a265-b221d150d2e4.
Setting hostid: 0x8aeb4d18.
Starting file system checks:
/dev/gpt/rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/gpt/rootfs: clean, 3891969 free (49 frags, 486490 blocks, 0.0% fragmentation)
/dev/gpt/efiesp: FILESYSTEM CLEAN; SKIPPING CHECKS
Mounting local filesystems:.
ELF ldconfig path: /lib /usr/lib /usr/lib/compat
32-bit compatibility ldconfig path: /usr/lib32
Setting hostname: freebsd.
Setting up harvesting: VMGENID,PURE_RDRAND,[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
Feeding entropy: .
lo0: link state changed to UP
vtnet0: link state changed to UP
Starting Network: lo0 vtnet0.
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
	ether 58:9c:fc:07:d3:a7
	inet6 fe80::5a9c:fcff:fe07:d3a7%vtnet0 prefixlen 64 scopeid 0x1
	media: Ethernet autoselect (10Gbase-T <full-duplex>)
	status: active
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
Starting devd.
Starting dhclient.
DHCPREQUEST on vtnet0 to 255.255.255.255 port 67
DHCPREQUEST on vtnet0 to 255.255.255.255 port 67
DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 4
DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 7
DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 17
DHCPOFFER from 10.102.1.1
DHCPREQUEST on vtnet0 to 255.255.255.255 port 67
DHCPACK from 10.102.1.1
bound to 10.102.1.249 -- renewal in 900 seconds.
add host 127.0.0.1: gateway lo0 fib 0: route already in table
add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net ::ffff:0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Updating /var/run/os-release done.
Updating motd:.
Clearing /tmp (X related).
Creating and/or trimming log files.
Starting syslogd.
Mounting late filesystems:.
Starting sendmail_submit.
```

tcpdump:

```
sudo tcpdump -i mce2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on mce2, link-type EN10MB (Ethernet), capture size 262144 bytes
20:11:21.159417 18:fd:74:05:15:d7 (oui Unknown) > Broadcast, ethertype Unknown (0x9003), length 64:
	0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0030:  0000                                     ..
20:11:26.159345 18:fd:74:05:15:d7 (oui Unknown) > Broadcast, ethertype Unknown (0x9003), length 64:
	0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0030:  0000                                     ..
20:11:26.721353 IP 10.102.1.249.49307 > one.one.one.one.domain: 11545+ A? freebsd. (25)
20:11:31.159520 18:fd:74:05:15:d7 (oui Unknown) > Broadcast, ethertype Unknown (0x9003), length 64:
	0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0030:  0000
```
Comment 2 benoitc 2023-05-16 20:14:41 UTC
tcpdump in the instance 

```
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vtnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:13:52.004897 18:fd:74:05:15:d7 (oui Unknown) > Broadcast, ethertype Unknown (0x9003), length 64:
        0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000                                     ..
20:13:57.012404 18:fd:74:05:15:d7 (oui Unknown) > Broadcast, ethertype Unknown (0x9003), length 64:
        0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000

```

I'm only pinging this machine from another.
Comment 3 benoitc 2023-05-23 16:38:09 UTC
Created attachment 242345 [details]
ip offered by the router is marked as active

attached is a screenshot showing that IP received by the vm has been offered by the router and is active. 

I can't ping from the router to the vm, and from the vm to the bridge.

I tried with a different vlan but same result. I can't understand why the vm is able to get an IPV4 from from the router using DHCP but can't ping it...

 Let me know if you need more information to debug this issue. I
Comment 4 benoitc 2023-05-23 17:57:38 UTC
same result with jails. Are virtual devices created with srv-io working in a bridge? I can set the ip of the interface, but once I put it in the bridge it doesn't work.

With jail, here is the bastille configuration:

```
testing {
  devfs_ruleset = 13;
  enforce_statfs = 2;
  exec.clean;
  exec.consolelog = /var/log/bastille/testing_console.log;
  exec.start = '/bin/sh /etc/rc';
  exec.stop = '/bin/sh /etc/rc.shutdown';
  host.hostname = testing;
  mount.devfs;
  mount.fstab = /usr/local/bastille/jails/testing/fstab;
  path = /usr/local/bastille/jails/testing/root;
  securelevel = 2;

  vnet;
  vnet.interface = e0b_bastille0;
  exec.prestart += "jib addm bastille0 mce2";
  exec.prestart += "ifconfig e0a_bastille0 description \"vnet host interface for Bastille jail testing\"";
  exec.poststop += "jib destroy bastille0";
}
```

/etc/rc.conf:

```
syslogd_flags="-ss"
sendmail_enable="NO"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"
cron_flags="-J 60"
ifconfig_e0b_bastille0_name="vnet0"
ifconfig_vnet0="inet6 XXXX:XXXX:XXXX:102::20/64"
defaultrouter="XXXX:XXXX:XXXX:102::1"
```

On the host network s configured like this:

```
ce2: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=7ead00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,HWRXTSTMP,NOMAP,TXTLS4,TXTLS6,VXLAN_HWCSUM,VXLAN_HWTSO>
	ether a2:5b:2e:82:9f:09
	inet 0.0.5.220 netmask 0xff000000 broadcast 0.255.255.255
	inet6 fe80::a05b:2eff:fe82:9f09%mce2 prefixlen 64 tentative scopeid 0x6
	media: Ethernet 25GBase-SR <full-duplex,rxpause,txpause>
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
mce2bridge: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
	ether 58:9c:fc:10:fc:41
	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
	member: e0a_bastille0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 8 priority 128 path cost 2000
	member: mce2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 6 priority 128 path cost 800
	groups: bridge
	nd6 options=9<PERFORMNUD,IFDISABLED>
e0a_bastille0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
	description: vnet host interface for Bastille jail testing
	options=8<VLAN_MTU>
	ether 0a:20:98:82:9f:09
	hwaddr 02:00:91:24:1e:0a
	groups: epair
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
```
Comment 5 benoitc 2023-05-27 18:36:35 UTC
is this the expected behaviour to not be able to run a  VF inside a bridge?
Comment 6 benoitc 2023-06-02 07:08:43 UTC
anything I can do to help solving this issue?
Comment 7 Santiago Martinez 2023-08-16 22:41:04 UTC
Yes, it is expected as the VF is "receiving frames just for him".
Some cards allow to set the VF in promisc mode just for this type of cases.
Not sure the scaling you need, but ideally you should use a SRIOV per jail or VF.
Comment 8 Santiago Martinez 2023-08-16 22:42:12 UTC
just trying to dig the capabilities (or options) for mlx5 for SRIOV but i cant find them. Documentation looks.... ahh.. just not good...
Comment 9 Santiago Martinez 2023-08-16 22:55:21 UTC
This is the output for mlx5_coreX.

iovctl -S -f /etc/iovctl-mce0.conf
The following configuration parameters may be configured on the PF:
        num_vfs : uint16_t (required)
        device : string (required)

The following configuration parameters may be configured on a VF:
        passthrough : bool (default = false)
        mac-addr : unicast-mac (optional)
        node-guid : uint64_t (optional)
        port-guid : uint64_t (optional)

This is the output for ixl

iovctl -S -f /etc/iovctl-ixl0.conf
The following configuration parameters may be configured on the PF:
        num_vfs : uint16_t (required)
        device : string (required)

The following configuration parameters may be configured on a VF:
        passthrough : bool (default = false)
        mac-addr : unicast-mac (optional)
        mac-anti-spoof : bool (default = true)
        allow-set-mac : bool (default = false)
        allow-promisc : bool (default = false)
        num-queues : uint16_t (default = 4)