Bug 221317 - ifconfig down/up issue after ixgbe driver update in r320897
Summary: ifconfig down/up issue after ixgbe driver update in r320897
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Eric Joyner
URL:
Keywords: IntelNetworking, regression
Depends on:
Blocks:
 
Reported: 2017-08-07 17:19 UTC by Cassiano Peixoto
Modified: 2020-05-19 16:49 UTC (History)
22 users (show)

See Also:
erj: maintainer-feedback+
erj: mfc-stable12+
erj: mfc-stable11+
erj: mfc-stable10-


Attachments
Patch to add a 1s delay before stopping ixgbe interface (no carrier issue on stable) (378 bytes, patch)
2018-03-30 16:44 UTC, Sylvain Galliano
no flags Details | Diff
Attempt to remove 1-second spin (409 bytes, patch)
2018-04-13 18:16 UTC, Stephen Hurd
no flags Details | Diff
Additional debugging in ixgbe_stop() (1.26 KB, patch)
2018-04-13 20:55 UTC, Stephen Hurd
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Cassiano Peixoto 2017-08-07 17:19:15 UTC
After commit r320897 (ixgbe driver update to 3.2.12-k) netmap is not working well anymore. When ixgbe interface is opened many times it changes the status to "no carrier" and needs a reboot to work again.

It's quite easy to reproduce the issue, just use two machines with ixgbe driver sending and receiving packets using pkt-gen. Run pkt-gen many times with option like -p and -c.

To make sure it was a driver update issue i downgrade it ixgbe driver (version 3.1.13-K) released with FreeBSD-11.1-RELEASE and worked fine.
Comment 1 Eric Joyner freebsd_committer freebsd_triage 2017-08-11 16:34:22 UTC
(In reply to Cassiano Peixoto from comment #0)

"When ixgbe interface is opened many times"

Could you clarify what that means? Does it mean just running pkt-gen repeatedly, without manipulating the state of the link at all?
Comment 2 Cassiano Peixoto 2017-08-11 16:51:20 UTC
(In reply to Eric Joyner from comment #1)
Hi Eric,

You need to use two servers like this:

serverA <ix0>----<ix0> ServerB

On serverA run:
pkt-gen -i ix0 -f rx -p4 -c4

On serverB run:
pkt-gen -i ix0 -f rx

Repeat this process cancelling and starting over many times. You will realize ix0 changed the state to "no carrier status". After that you need to reboot the server.

I had this behavior only with 3.1.13-K driver.
Comment 3 Cassiano Peixoto 2017-08-11 16:52:10 UTC
(In reply to Cassiano Peixoto from comment #2)
Sorry my typo, i had this behavior only with 3.2.12-k driver.
Comment 4 Eric Joyner freebsd_committer freebsd_triage 2017-09-12 23:36:31 UTC
(In reply to Cassiano Peixoto from comment #2)

I'm trying pkt-gen out. I noticed this line in the output:

299.471571 main [2597] Wait 2 secs for phy reset

And the link status in ifconfig flaps. It seems like a bizarre thing for pkt-gen or netmap to do.
Comment 5 Eric Joyner freebsd_committer freebsd_triage 2017-09-12 23:37:29 UTC
And also, what card are you using?
Comment 6 Cassiano Peixoto 2017-09-13 11:50:56 UTC
(In reply to Eric Joyner from comment #5)
Hi Eric,

It's normal flaps when run pkt-gen. It always worked on this way.

Could you reproduce the issue?

I'm using 82599ES Intel NIC.

Thanks.
Comment 7 Eric Joyner freebsd_committer freebsd_triage 2017-09-13 22:05:56 UTC
(In reply to Cassiano Peixoto from comment #6)

I can reproduce it after many times, but I don't know what would be causing it. I don't know much about the inner workings of netmap.

Can we get the maintainer for it to comment on this? It used to be luigi, but I don't know if he still works on it.
Comment 8 Eric Joyner freebsd_committer freebsd_triage 2017-09-13 22:11:00 UTC
(In reply to Eric Joyner from comment #7)

and to add to this, I see this on ixgbe in 12-CURRENT, too. Though, with the impending conversion to iflib, this may end up being overcome/have its nature changed by that.
Comment 9 Cassiano Peixoto 2017-09-14 11:33:10 UTC
(In reply to Eric Joyner from comment #7)
Hi Eric,

It happens by chance, but i used to run a netmap application and many times it stops working and i need to reboot the server. Just to make sure the issue came up after ixgbe driver update to 3.2.12-k.

Just to add for this PR here the issue happen bellow:

(root@rt1)~# /usr/local/proapps/bin/pkt-gen -i ix0 -f tx -c4 -p4
687.258207 main [2568] interface is ix0
687.258259 main [2691] running on 4 cpus (have 8)
687.258426 extract_ip_range [465] range is 10.0.0.1:1234 to 10.0.0.1:1234
687.258443 extract_ip_range [465] range is 10.1.0.1:1234 to 10.1.0.1:1234
687.489604 nm_open [850] overriding ifname ix0 ringid 0x0 flags 0x8004
687.605494 main [2786] mapped 273540KB at 0x801600000
Sending on netmap:ix0: 8 queues, 4 threads and 4 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
687.605584 main [2883] Sending 512 packets every  0.000000000 s
687.605674 nm_open [850] overriding ifname ix0 ringid 0x1 flags 0x8004
687.721195 nm_mmap [959] do not mmap, inherit from parent
687.721300 nm_open [850] overriding ifname ix0 ringid 0x2 flags 0x8004
687.836418 nm_mmap [959] do not mmap, inherit from parent
687.836520 nm_open [850] overriding ifname ix0 ringid 0x3 flags 0x8004
687.951300 nm_mmap [959] do not mmap, inherit from parent
687.951331 start_threads [2251] Wait 2 secs for phy reset
689.997961 start_threads [2253] Ready...
689.998143 sender_body [1444] start, fd 3 main_fd 3
689.998201 sender_body [1444] start, fd 4 main_fd 3
689.998232 sender_body [1444] start, fd 5 main_fd 3
689.998283 sender_body [1444] start, fd 6 main_fd 3
689.998259 main [2896] failed to install ^C handler: Invalid argument
690.035491 sender_body [1526] drop copy
690.041994 sender_body [1526] drop copy
690.042093 sender_body [1526] drop copy
690.042681 sender_body [1526] drop copy
690.999011 main_thread [2341] 10.901 Mpps (10.907 Mpkts 5.236 Gbps in 1000562 usec) 399.70 avg_batch 0 min_space
692.000049 main_thread [2341] 10.855 Mpps (10.866 Mpkts 5.216 Gbps in 1001037 usec) 399.65 avg_batch 399996 min_space
693.063201 main_thread [2341] 10.851 Mpps (11.536 Mpkts 5.537 Gbps in 1063152 usec) 399.66 avg_batch 399996 min_space
694.087449 main_thread [2341] 10.857 Mpps (11.120 Mpkts 5.338 Gbps in 1024248 usec) 399.64 avg_batch 399996 min_space
695.087945 main_thread [2341] 10.852 Mpps (10.858 Mpkts 5.212 Gbps in 1000496 usec) 399.64 avg_batch 399996 min_space
696.089042 main_thread [2341] 10.854 Mpps (10.865 Mpkts 5.215 Gbps in 1001097 usec) 399.64 avg_batch 399996 min_space
697.090944 main_thread [2341] 10.855 Mpps (10.875 Mpkts 5.220 Gbps in 1001902 usec) 399.65 avg_batch 399996 min_space
698.091952 main_thread [2341] 10.853 Mpps (10.864 Mpkts 5.215 Gbps in 1001008 usec) 399.68 avg_batch 399996 min_space
699.150016 main_thread [2341] 10.851 Mpps (11.481 Mpkts 5.511 Gbps in 1058064 usec) 399.67 avg_batch 399996 min_space
700.151767 main_thread [2341] 10.853 Mpps (10.872 Mpkts 5.219 Gbps in 1001751 usec) 399.66 avg_batch 399996 min_space
701.151958 main_thread [2341] 10.849 Mpps (10.851 Mpkts 5.209 Gbps in 1000190 usec) 399.70 avg_batch 399996 min_space
702.155017 main_thread [2341] 10.852 Mpps (10.885 Mpkts 5.225 Gbps in 1003060 usec) 399.64 avg_batch 399996 min_space
703.156485 main_thread [2341] 10.854 Mpps (10.870 Mpkts 5.218 Gbps in 1001468 usec) 399.64 avg_batch 399996 min_space
704.219446 main_thread [2341] 10.855 Mpps (11.538 Mpkts 5.538 Gbps in 1062961 usec) 399.66 avg_batch 399996 min_space
705.229017 main_thread [2341] 10.851 Mpps (10.955 Mpkts 5.259 Gbps in 1009571 usec) 399.65 avg_batch 399996 min_space
706.230039 main_thread [2341] 10.854 Mpps (10.865 Mpkts 5.215 Gbps in 1001022 usec) 399.64 avg_batch 399996 min_space
707.290196 main_thread [2341] 10.855 Mpps (11.508 Mpkts 5.524 Gbps in 1060157 usec) 399.65 avg_batch 399996 min_space
708.353447 main_thread [2341] 10.856 Mpps (11.542 Mpkts 5.540 Gbps in 1063251 usec) 399.66 avg_batch 399996 min_space
709.354198 main_thread [2341] 10.851 Mpps (10.859 Mpkts 5.212 Gbps in 1000750 usec) 399.64 avg_batch 399996 min_space
710.355141 main_thread [2341] 10.856 Mpps (10.866 Mpkts 5.216 Gbps in 1000943 usec) 399.65 avg_batch 399996 min_space
711.392196 main_thread [2341] 10.856 Mpps (11.258 Mpkts 5.404 Gbps in 1037056 usec) 399.64 avg_batch 399996 min_space
712.393486 main_thread [2341] 10.852 Mpps (10.866 Mpkts 5.216 Gbps in 1001290 usec) 399.67 avg_batch 399996 min_space
713.393950 main_thread [2341] 10.854 Mpps (10.859 Mpkts 5.212 Gbps in 1000463 usec) 399.65 avg_batch 399996 min_space
714.395262 main_thread [2341] 10.851 Mpps (10.866 Mpkts 5.216 Gbps in 1001313 usec) 399.66 avg_batch 399996 min_space
^C

(root@rt1)~# /usr/local/proapps/bin/pkt-gen -i ix0 -f tx -c4 -p4
716.666336 main [2568] interface is ix0
716.666387 main [2691] running on 4 cpus (have 8)
716.666560 extract_ip_range [465] range is 10.0.0.1:1234 to 10.0.0.1:1234
716.666578 extract_ip_range [465] range is 10.1.0.1:1234 to 10.1.0.1:1234
716.898132 nm_open [850] overriding ifname ix0 ringid 0x0 flags 0x8004
717.014194 main [2786] mapped 273540KB at 0x801600000
Sending on netmap:ix0: 8 queues, 4 threads and 4 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
717.014285 main [2883] Sending 512 packets every  0.000000000 s
717.014376 nm_open [850] overriding ifname ix0 ringid 0x1 flags 0x8004
717.128974 nm_mmap [959] do not mmap, inherit from parent
717.129078 nm_open [850] overriding ifname ix0 ringid 0x2 flags 0x8004
717.265684 nm_mmap [959] do not mmap, inherit from parent
717.265788 nm_open [850] overriding ifname ix0 ringid 0x3 flags 0x8004
717.414345 nm_mmap [959] do not mmap, inherit from parent
717.414376 start_threads [2251] Wait 2 secs for phy reset
719.415014 start_threads [2253] Ready...
719.415243 sender_body [1444] start, fd 3 main_fd 3
719.415291 sender_body [1444] start, fd 4 main_fd 3
719.415345 sender_body [1444] start, fd 5 main_fd 3
719.415359 sender_body [1444] start, fd 6 main_fd 3
719.415351 main [2896] failed to install ^C handler: Invalid argument
720.445090 main_thread [2341] 6.213 Kpps (6.396 Kpkts 3.070 Mbps in 1029510 usec) 399.75 avg_batch 0 min_space
721.425011 sender_body [1513] poll error/timeout on queue 3: No error: 0
721.445085 sender_body [1513] poll error/timeout on queue 0: No error: 0
721.445084 sender_body [1513] poll error/timeout on queue 1: No error: 0
721.445084 sender_body [1513] poll error/timeout on queue 2: No error: 0
721.476197 main_thread [2341] 0.000 pps (0.000 pkts 0.000 bps in 1031109 usec) 0.00 avg_batch 399996 min_space
722.539507 main_thread [2341] 0.000 pps (0.000 pkts 0.000 bps in 1063310 usec) 0.00 avg_batch 399996 min_space
723.447014 sender_body [1513] poll error/timeout on queue 1: No error: 0
723.447183 sender_body [1513] poll error/timeout on queue 2: No error: 0
723.490202 sender_body [1513] poll error/timeout on queue 3: No error: 0
723.490206 sender_body [1513] poll error/timeout on queue 0: No error: 0
723.559528 main_thread [2341] 0.000 pps (0.000 pkts 0.000 bps in 1020019 usec) 0.00 avg_batch 399996 min_space
^C

(root@rt1)~# ifconfig ix0
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 84:44:64:40:a1:5e
	hwaddr 84:44:64:40:a1:5e
	inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255 
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier

Regarding the maintainer question, it really used to be Luigi, but i don't know if he is still coding netmap. Anyway, I recommend speak with Vincenzo Maffione. He's the most active netmap developer nowadays. I'm copying him on this PR. Maybe he can help to figure out the issue.

Thanks.
Comment 10 V Maffione 2017-09-16 16:30:01 UTC
Hi,
  For the moment being, unfortunately, netmap needs to do bring the interface down in order to put the NIC in "netmap mode" (nm_register method in the code). At the end of the registration, the interface is brought up again.
We bring the interface down because we need to detach the mbufs from the NIC rings and attach the netmap buffers as a replacement. Currently we can't do this while the interface is up, becuse the driver may be concurrently receiving/transmitting mbufs. The link flapping comes as a result of the IFF_DOWN+IFF_UP cycle.
In the long term we would like to get rid of this, but for now we haven't found yet a reasonable way to avoid the down/up cycle on register.

This said, the only piece of code in netmap that I think it may cause the issue is the nm_register method (called to put the NIC in netmap mode):

http://fxr.watson.org/fxr/source/dev/netmap/ixgbe_netmap.h#L118

The rest of netmap never touches NIC configuration/control registers, but only the registers involved in receive/transmit operation (and the hardware rings).
I don't know why this bug shows up, but I can guess it may be some race condition triggered by repeated IFF_DOWN+IFF_UP cycles.
Note that the nm_register method runs under IFNET_RLOCK().
Comment 11 Cassiano Peixoto 2017-10-18 11:53:29 UTC
Hi guys,

Any update about this issue?

Thanks.
Comment 12 Ryan Stone freebsd_committer freebsd_triage 2017-10-20 04:04:40 UTC
(In reply to Vincenzo Maffione from comment #10)

I have some old, uncommited ixgbe code that can stop the receive path of an
interface without resetting the entire device.  The patch is available here:

https://github.com/rysto32/freebsd/commit/db13ef817927b0c84b73906a4326a79a11823266
https://github.com/rysto32/freebsd/commit/5ae7ec4bd3d2fab61aaf7c755fe666457327f401

Similar work for the tx side can be found here:

https://github.com/rysto32/freebsd/commit/bf7186b90a26ca2df7d873961b19af5b854eb3d0

Feel free to contact me via email if you have questions.
Comment 13 Cassiano Peixoto 2018-02-01 12:56:30 UTC
Hi there,

Some months have gone since last reply on this issue. Did you guys give up to fix it?
Comment 14 Sylvain Galliano 2018-03-29 08:26:54 UTC
I have the same issue after running the following script:

#!/bin/sh

for i in `seq 1 100`; do
echo $i
ifconfig ix0 down
ifconfig ix0 up
done

After running it, ix0 interface status is 'no carrier'.

I'm running latest 11.1 STABLE (ixgbe 3.2.12-k)

Just to be sure it's not related to netmap, I've compiled the kernel with 'nodevice netmap': same issue.

Doing same test after reverting ixbge 3.2.12-k to 3.1.13-k, the issue is not there anymore.
Comment 15 Cassiano Peixoto 2018-03-29 12:04:46 UTC
(In reply to Sylvain Galliano from comment #14)
Hi Sylvain,

So it's worse than i thought. But unfortunately seems nobody is willing to fix it. :(

I can't take the risk to update my servers until it has been fixed.

Thanks for your test.
Comment 16 Sylvain Galliano 2018-03-30 16:44:44 UTC
Created attachment 191979 [details]
Patch to add a 1s delay before stopping ixgbe interface (no carrier issue on stable)
Comment 17 Sylvain Galliano 2018-03-30 16:45:22 UTC
(In reply to Cassiano Peixoto from comment #15)

Hello Cassiano,

can you try to recompile kernel using attached patch.
It add a 1s delay before stopping ixgbe interface (i.e. ifconfig down or netmap initialisation)
I know it's a TERRIBLE patch but I had no issue after stressing my servers.
If it's also work for you, I hope this can help driver maintainer to find and correct the issue.
Comment 18 Cassiano Peixoto 2018-04-02 18:32:06 UTC
(In reply to Sylvain Galliano from comment #17)
Hi Sylvain,

Thanks for sharing your patch. I've tested and it really worked. Let's wait for some maintainer's answer.

Thank you anyway :)
Comment 19 commit-hook freebsd_committer freebsd_triage 2018-04-12 19:06:54 UTC
A commit references this bug:

Author: shurd
Date: Thu Apr 12 19:06:15 UTC 2018
New revision: 332447
URL: https://svnweb.freebsd.org/changeset/base/332447

Log:
  Work around netmap issue with ixgbe

  After multiple start/stop of netmap, ixgbe will get into a bad state
  requiring a reboot to recover.  Adding a delay before stopping the interface
  appears to work around the issue.

  The -CURRENT driver has diverged too far from -STABLE for an MFC.

  PR:		221317
  Submitted by:	Sylvain Galliano <sg@efficientip.com>
  Reported by:	Cassiano Peixoto <peixoto.cassiano@gmail.com>
  Sponsored by:	Limelight Networks

Changes:
  stable/11/sys/dev/ixgbe/if_ix.c
Comment 20 Stephen Hurd freebsd_committer freebsd_triage 2018-04-12 19:10:01 UTC
I've committed your work-around just in case nobody has time to investigate this before 11.2.  Thanks for sticking with this.
Comment 21 commit-hook freebsd_committer freebsd_triage 2018-04-13 17:46:56 UTC
A commit references this bug:

Author: shurd
Date: Fri Apr 13 17:45:54 UTC 2018
New revision: 332481
URL: https://svnweb.freebsd.org/changeset/base/332481

Log:
  Move 1-second spin into ixgbe_netmap_reg()

  This should still work around the netmap issue, but should not impact other
  calls to ixgbe_stop().

  PR:		221317
  Sponsored by:	Limelight Networks

Changes:
  stable/11/sys/dev/ixgbe/if_ix.c
  stable/11/sys/dev/ixgbe/ixgbe_netmap.c
Comment 22 Stephen Hurd freebsd_committer freebsd_triage 2018-04-13 17:50:12 UTC
Can you test with r332481 and ensure it still works around the issue?
Comment 23 Stephen Hurd freebsd_committer freebsd_triage 2018-04-13 18:16:42 UTC
Created attachment 192502 [details]
Attempt to remove 1-second spin

Assuming the previous commit still works around the issue, please try the attached patch.
Comment 24 Sylvain Galliano 2018-04-13 18:37:36 UTC
(In reply to Stephen Hurd from comment #22)

Hello Stephen,

Your patch is working when using netmap, but issue with ifconfig down/up in loop is back (see little script in comment #14)
Comment 25 Stephen Hurd freebsd_committer freebsd_triage 2018-04-13 18:49:01 UTC
(In reply to Sylvain Galliano from comment #24)

Hrm, could you try putting an ixgbe_qflush(ipf) in ixgbe_stop() before the interrupt is disabled?  My current theory is that the TX queue is being left in a bad state (which is why the delay helps).

I don't current have an 11-STABLE system with an ixgbe in it to test on.
Comment 26 Sylvain Galliano 2018-04-13 19:17:46 UTC
(In reply to Stephen Hurd from comment #25)

Unfortunately it's not working.

Here is the patch I applied:

--- sys/dev/ixgbe/if_ix.c       (revision 332482)
+++ sys/dev/ixgbe/if_ix.c       (working copy)
@@ -3568,6 +3568,7 @@
        mtx_assert(&adapter->core_mtx, MA_OWNED);

        INIT_DEBUGOUT("ixgbe_stop: begin\n");
+       ixgbe_qflush(ifp);
        ixgbe_disable_intr(adapter);
        callout_stop(&adapter->timer);
Comment 27 Sylvain Galliano 2018-04-13 20:00:10 UTC
(In reply to Stephen Hurd from comment #25)

In my first test, I used commit r332481 (with msec_delay moved in netmap code) -> worked with netmap only (not for ifconfig down/up)

I've just tested your attached patch (ixgbe_qflush(ifp) in ixgbe_netmap.c and I reproduce issue after several netmap start/stop
Comment 28 Stephen Hurd freebsd_committer freebsd_triage 2018-04-13 20:55:26 UTC
Created attachment 192505 [details]
Additional debugging in ixgbe_stop()

This patch won't solve the problem, but it will log errors encountered in ixgbe_stop() if any.

If there are no errors logged in dmesg, I'm curious if that delay needs to be at the beginning of the call to stop, or if it can be moved to just before the init_locked() call.

If there's an error, possibly just retrying after a short delay will help, but if not, I'll see if I can get an 11-STABLE system up and running this weekend.
Comment 29 Sylvain Galliano 2018-04-16 12:06:35 UTC
(In reply to Stephen Hurd from comment #28)

Patch with error logs applied:
I do not have any error log before issue to appear.
Comment 30 Lev A. Serebryakov freebsd_committer freebsd_triage 2018-11-18 21:24:15 UTC
I have same problem with CURRENT r340586.

Script which calls ifconfig down / ifconfig up in the loop renders NIC unusable ("media: No carrier").

Also, driver complains about unsupported SFP+ type before failure.

Reboot helps.
Comment 31 Lev A. Serebryakov freebsd_committer freebsd_triage 2018-11-28 14:38:22 UTC
Any news on this? I have exactly the same problem on 12 and CURRENT, with new iflib-based driver too.

It is very annoying, as I can not run long benchmarks in automatic mode, I need to monitor, do I have NICs hanged up.
Comment 32 Charles Goncalves 2018-11-28 14:52:37 UTC
(In reply to Lev A. Serebryakov from comment #31)
I applied Sylvain's patch with change to 100ms and works fine for production use while I am waiting to someone fix this.
Comment 33 Lev A. Serebryakov freebsd_committer freebsd_triage 2018-11-28 15:05:54 UTC
(In reply to Charles Goncalves from comment #32)
It is not clear where should I apply patch on 12/13, as driver is very different. Put it into iflib for ALL adapters?
Comment 34 Lev A. Serebryakov freebsd_committer freebsd_triage 2018-11-28 15:29:41 UTC
(In reply to Charles Goncalves from comment #32)
Nope, adding delay to common iflib_netmap_register code doesn't help, but this code is somewhat different from 11 driver's one.
Comment 35 Piotr Pietruszewski 2018-12-07 13:12:26 UTC
(In reply to Lev A. Serebryakov from comment #34)
(In reply to Charles Goncalves from comment #32)
(In reply to Sylvain Galliano from comment #29)
(In reply to Cassiano Peixoto from comment #18)

The bug seems to be fixed by applying patch D18468 which is currently under review ( https://reviews.freebsd.org/D18468 ). Please let me know if the patch solves your problem.
Comment 36 Sylvain Galliano 2018-12-07 16:25:56 UTC
(In reply to Piotr Pietruszewski from comment #35)
Patch looks good, I've stressed NIC during one hour without any issue.
NIC status always stay 'active' after last 'ifconfig up'
Comment 37 Charles Goncalves 2018-12-07 18:35:22 UTC
(In reply to Piotr Pietruszewski from comment #35)
can't apply this patch on 11.2-STABLE
Comment 38 Peter Vanek 2018-12-09 19:49:42 UTC
(In reply to Charles Goncalves from comment #37)

Hi Charles,

My colleague Sylvain did patch merge against Freebsd-current;
He had same too many conflicts against stable version.

Peter
Comment 39 Sergii Spivak 2018-12-27 01:14:35 UTC
Patch applied sucessfully to 12.0-RELEASE, and it works fine.
Comment 40 Peter Vanek 2019-01-11 10:12:39 UTC
(In reply to Piotr Pietruszewski from comment #35)

Hello Piotr,

Just would like check with you if there is any work we can help to move forward.
I can see that https://reviews.freebsd.org/D18468 is marked for review and waiting for accept there.

I see with my colleagues increased number of occurrences of this defect for different customer, many customers are moving to new connectivity and using SFP more and more.

If there is any testing/explicit verification to be done, feel free to ask. Will be happy to run it.
Also, is there any work for 11.2; I understand that driver is slightly different on FreeBSD 11.2 and patch cannot be applied directly.

Thank you for answer.

Best Regards,
Peter Vanek
Comment 41 commit-hook freebsd_committer freebsd_triage 2019-01-31 21:45:00 UTC
A commit references this bug:

Author: erj
Date: Thu Jan 31 21:44:34 UTC 2019
New revision: 343621
URL: https://svnweb.freebsd.org/changeset/base/343621

Log:
  ix(4): Run {mod,msf,mbx,fdir,phy}_task in if_update_admin_status

  From Piotr:

  This patch introduces adapter->task_requests register responsible for
  recording requests for mod_task, msf_task, mbx_task, fdir_task and
  phy_task calls. Instead of enqueueing these tasks with
  GROUPTASK_ENQUEUE, handlers will be called directly from
  ixgbe_if_update_admin_status() while holding ctx lock.

  SIOCGIFXMEDIA ioctl() call reads adapter->media list. The list is
  deleted and rewritten in ixgbe_handle_msf() task without holding ctx
  lock. This change is needed to maintain data coherency when sharing
  adapter info via ioctl() calls.

  Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>.

  PR:		221317
  Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
  Reviewed by:	sbruno@, IntelNetworking
  Sponsored by:	Intel Corporation
  Differential Revision:	https://reviews.freebsd.org/D18468

Changes:
  head/sys/dev/ixgbe/if_ix.c
  head/sys/dev/ixgbe/ixgbe.h
  head/sys/dev/ixgbe/ixgbe_type.h
Comment 42 Charles Goncalves 2019-02-01 12:21:26 UTC
(In reply to Piotr Pietruszewski from comment #35)

In 11.1 and 11.2 I can't apply this patch.

I tested ifconfig down/up script in FreeBSD 12.0-RELEASE r341666 and this issue not occurs.
Comment 43 commit-hook freebsd_committer freebsd_triage 2019-02-13 15:19:41 UTC
A commit references this bug:

Author: marius
Date: Wed Feb 13 15:19:32 UTC 2019
New revision: 344100
URL: https://svnweb.freebsd.org/changeset/base/344100

Log:
  MFC: r343621

  ix(4): Run {mod,msf,mbx,fdir,phy}_task in if_update_admin_status

  From Piotr:

  This patch introduces adapter->task_requests register responsible for
  recording requests for mod_task, msf_task, mbx_task, fdir_task and
  phy_task calls. Instead of enqueueing these tasks with
  GROUPTASK_ENQUEUE, handlers will be called directly from
  ixgbe_if_update_admin_status() while holding ctx lock.

  SIOCGIFXMEDIA ioctl() call reads adapter->media list. The list is
  deleted and rewritten in ixgbe_handle_msf() task without holding ctx
  lock. This change is needed to maintain data coherency when sharing
  adapter info via ioctl() calls.

  Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>.

  PR:		221317
  Submitted by:	Piotr Pietruszewski <piotr.pietruszewski@intel.com>
  Reviewed by:	sbruno@, IntelNetworking
  Differential Revision:	https://reviews.freebsd.org/D18468

Changes:
_U  stable/12/
  stable/12/sys/dev/ixgbe/if_ix.c
  stable/12/sys/dev/ixgbe/ixgbe.h
  stable/12/sys/dev/ixgbe/ixgbe_type.h
Comment 44 Kurt Jaeger freebsd_committer freebsd_triage 2019-02-13 15:34:01 UTC
Any chance to get this released as a errata for 12.0 ?
Comment 45 Timo Voelker 2019-02-28 15:39:51 UTC
I experienced this issue while configuring VLANs.

I installed FreeBSD 12.0 Release on a Dell PowerEdge R430 with an Intel Ethernet 10G 2P X520 Adapter. If I add these lines to /etc/rc.conf

ifconfig_ix0="up"
vlans_ix0="102"
ifconfig_ix0_102="inet 10.10.10.12/24 description test"

ix0 stays down. A 'ifconfig ix0 up' has no effect then. If I start FreeBSD with only

ifconfig_ix0="up"

in /etc/rc.conf, I can use the following commands to successfully add and use the vlan (ix0 is up).

ifconfig ix0 up
ifconfig ix0.102 create vlan 102 vlandev ix0 inet 10.10.10.12/24

An update to 12.0-p3 with the commands

# freebsd-update fetch
# freebsd-update install

did not fix it. I then downloaded the FreeBSD sources (base/releng/12.0), applied the patch 

https://reviews.freebsd.org/D18468

and complied and installed kernel and world. Now, I'm able to boot FreeBSD with the above VLAN config in /etc/rc.conf. 

Thanks to everyone involved here!
Comment 46 Peter Vanek 2019-03-25 20:26:04 UTC
(In reply to Piotr Pietruszewski from comment #35)

Hello Piotr,

I would like to still ask you for chance of getting this bug corrected on stable/11.
I think that some of organization may stick on stable/11 and would be great to have this tricky defected corrected.

Thanks for reply,
Peter
Comment 47 Piotr Pietruszewski 2019-03-26 12:02:55 UTC
(In reply to Peter Vanek from comment #46)

Hello Peter,

Port of rS343621 is available for review on Phabricator ( https://reviews.freebsd.org/D19711 ). Since our validation team is currently involved in internal projects, we don't know when this change would be tested and approved for merging.

Best regards,
Piotr Pietruszewski
Comment 48 Peter Vanek 2019-03-26 12:26:29 UTC
(In reply to Piotr Pietruszewski from comment #47)

Hello Piotr,

Thanks for details; The mininum we can do is to run this on out platform.
My colleague is going to reproduce on default kernel issue / apply correction and report to me.

I will keep all you updated.

Best Regards,
Peter
Comment 49 Peter Vanek 2019-03-28 08:50:38 UTC
Hello all,

Just brief summary of our test yesterday:
Network Adapter X520

1. issue re-tested on default kernel

FreeBSD testing-host 11.2-STABLE FreeBSD 11.2-STABLE #274 r345055M: Tue Mar 12 18:46:49 UTC 2019

ix0 card will stay in down after few rounds of up and down.

Issue present - OK

================================

2. Applied patch

Script executing down/up on both ix0 and ix1 for 1000 times, waiting 2 seconds sleep and displaying status of both interfaces
then continuing in next 1000 times of down/up

After few hours of stressing... result is good and we have both ix0 and ix1 full up and functional.

One more time thanks @Piotr for passing this patch to here.

Best Regards,
Peter
Comment 50 dhulme 2020-05-12 17:53:32 UTC
Apologies for rushing anyone, but this is over a year old and no status update has been made.  Has it been forgotten?
Comment 51 Kurt Jaeger freebsd_committer freebsd_triage 2020-05-12 18:01:48 UTC
(In reply to dhulme from comment #50)
Please check

https://svnweb.freebsd.org/base?view=revision&revision=347419

if it fixes your problem. See
https://reviews.freebsd.org/D19711
Comment 52 dhulme 2020-05-12 18:05:50 UTC
Thanks for responding.  I see this patch was designed for 11.x.  Would it be easy to apply to 12.x?

I assume I would have to recompile from source to test this issue.
Comment 53 Krzysztof Galazka 2020-05-12 18:15:29 UTC
(In reply to dhulme from comment #52)

It was MFCed by marius: https://reviews.freebsd.org/rS344100 to 12.x. Do you still see this issue?
Comment 54 dhulme 2020-05-12 18:19:16 UTC
Thank you!  That is what I was trying to track down, but couldn't see to find where it was committed to 12.x.

I have this issue on a 12.0 box (I think), which is being upgraded to 12.1 (late, I know!).

I'll let you know if this fixed it.
Comment 55 dhulme 2020-05-12 19:44:11 UTC
(In reply to Krzysztof Galazka from comment #53)

Thanks again.  In latest builds everything seems fine.
Comment 56 Kurt Jaeger freebsd_committer freebsd_triage 2020-05-12 19:50:02 UTC
If everything is fine, what's holding up closing this ticket 8-} ?
Comment 58 Kubilay Kocak freebsd_committer freebsd_triage 2020-05-19 02:39:53 UTC
@Piotr Was CURRENT 12.x at the time of those commits? Did stable/11 end up getting the changes (mfc) or were they not applicable? If so, please set mfc-stable11 flag to + or - respectively
Comment 59 Piotr Pietruszewski 2020-05-19 14:04:32 UTC
The fix was merged to 13.x-CURRENT and then MFCed to stable/12. Also, we've ported it to stable/11 as the ix driver in stable/11 does not use iflib and original patch was not applicable.
Comment 60 Eric Joyner freebsd_committer freebsd_triage 2020-05-19 16:49:51 UTC
@Kubilay, I set the flags for Piotr. I set stable/11 to "+" since a similar fix was directly committed there, like Piotr said.