Bug 252165

Summary: usb network and mii bus media status race condition
Product: Base System Reporter: Ali Abdallah <ali.abdallah>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: New ---    
Severity: Affects Many People CC: emaste, hselasky, oleg.nauman, pi, rgrimes
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Workaround race in link status change none

Description Ali Abdallah 2020-12-26 16:14:32 UTC
I'm using if_ure on a USB-C dock station. I'm having the following random issue:

Dec 25 12:20:10 Fryzen495 kernel: ue0: link state changed to UP
Dec 25 12:20:10 Fryzen495 kernel: ue0: link state changed to DOWN
Dec 25 12:20:10 Fryzen495 kernel: ue0: link state changed to UP
Dec 25 12:20:58 Fryzen495 kernel: ue0: link state changed to DOWN
Dec 25 12:21:01 Fryzen495 kernel: ue0: link state changed to UP
Dec 25 12:21:07 Fryzen495 dhclient[20388]: New IP Address (ue0): 192.168.1.145
Dec 25 12:21:07 Fryzen495 dhclient[21191]: New Subnet Mask (ue0): 255.255.255.0
Dec 25 12:21:07 Fryzen495 dhclient[22099]: New Broadcast Address (ue0): 192.168.1.255
Dec 25 12:21:07 Fryzen495 dhclient[22651]: New Routers (ue0): 192.168.1.254

After digging into the code, I've discovered that the issue is caused by the following race condition:

In mii/mii.c, the function miibus_linkchg(device_t dev) reads mii->mii_media_status and sets the link state accordingly. 

static void
miibus_linkchg(device_t dev)
{
...

   if (mii->mii_media_status & IFM_AVALID) {
       if (mii->mii_media_status & IFM_ACTIVE)
           link_state = LINK_STATE_UP;
       else
           link_state = LINK_STATE_DOWN;

...
}

On the other hand, the function mii_pollstat running on another thread, polls the media status, which is using rgephy.c code in my case. (via PHY_SERVICE call)

static void
rgephy_status(struct mii_softc *sc)
{
   mii->mii_media_status = IFM_AVALID;
   mii->mii_media_active = IFM_ETHER;

   /* HERE IS THE ISSUE
     The read in miibus_linkchg is randomly seeing mii->mii_media_status as IFM_AVALID,
     but it does not yet have the IFM_ACTIVE flag set, which makes it trigger a link state change...
   */

   if (rgephy_linkup(sc) != 0)
      mii->mii_media_status |= IFM_ACTIVE;
   ...

This is an issue in all miibus modules, rgephy.c, truephy.c, mlphy.c, etc...

Please note that also in mii_pollstat, mii->mii_media_status is initialized to 0 (not sure why), and that also races with miibus_linkchg.

There are many ways to fix it, locally I'm not resetting mii->mii_media_status until rgephy_linkup(sc) returns link down.
Comment 1 Rodney W. Grimes freebsd_committer 2020-12-27 10:14:11 UTC
I believe I have seen this several times in the past, mostly when I am doing diskless boots via pxe/ipxe/nfs.
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2020-12-27 12:20:28 UTC
^Triage: set to net@ as this affect miibus.
Comment 3 Ali Abdallah 2021-01-04 19:37:43 UTC
Created attachment 221274 [details]
Workaround race in link status change

(In reply to Rodney W. Grimes from comment #1)
The various mii modules are "terribly" SMP broken, they all reset the media status to

mii->mii_media_status = IFM_AVALID; 

then call the hardware specific function to check the link and eventually set

mii->mii_media_status |= IFM_ACTIVE;

But during this time the call to miibus_linkchg might trigger a link change, as it randomly sees the IFM_AVALID flag only, before the IFM_ACTIVE being set.

Attached is the workaround I'm currently applying to avoid the issue on my T495 with the Thinkpad USB-C gen2 docking station (My hardware uses rgephy.c). If you think this is the way to go, I can provide a full patch for the rest of the mii bus modules.
Comment 4 Hans Petter Selasky freebsd_committer 2021-01-07 14:47:42 UTC
Could you extend your patch to fix the other cases you see aswell, and create a patch at https://reviews.freebsd.org/differential/ with full context?
Comment 5 Ali Abdallah 2021-01-08 07:19:18 UTC
(In reply to Hans Petter Selasky from comment #4)

Yes I'll do that.
Comment 6 Ali Abdallah 2021-01-11 12:30:07 UTC
I had a deeper look into the issue, applying the work-around described in comment 3 is definitely not the way to go, the problem lies somewhere else.

Actually the code path that end up calling miibus_linkchg is mii_tick and mii_poll (through mii_phy_update). At the first look they seem to be running without lock, but looking at the usb ethernet specific code (if_ure.c in my case), a common lock is acquired before calling those functions.

in ure_tick the lock is already acquired beforehand, by the usb ethernet code 

static void
ure_tick(struct usb_ether *ue)
....
URE_LOCK_ASSERT(sc, MA_OWNED); 
mii_tick(mii);
....

ure_ifmedia_sts is what polls for media status, and it acquires the same lock before calling mii_pollstat.

static void
ure_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
...
URE_LOCK(sc);
mii_pollstat(mii);
...
URE_UNLOCK(sc);

What I'm seeing is that, despite the lock, sometimes they run both in parallel, which is causing the race in the media status... I'm not sure why, and it is kind of driving me crazy.
Comment 7 Hans Petter Selasky freebsd_committer 2021-01-11 12:39:13 UTC
Did you try using dtrace, or adding simple prints:

printf("%d %s:%d\n", ticks, __FILE__, __LINE__);

--HPS
Comment 8 Ali Abdallah 2021-01-11 12:43:57 UTC
Yes I did add simple printf, showing the thread id, and this is what I have

Jan  8 16:24:47 frix230 kernel: tick begin  thread 0xfffff800a2343000
Jan  8 16:24:47 frix230 kernel: tick finish thread 0xfffff800a2343000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: poll begin  thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: poll finish thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: poll begin  thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: poll finish thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: tick begin  thread 0xfffff800a2343000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: poll begin  thread 0xfffff8001e67c000 
Jan  8 16:24:48 frix230 kernel: Link down detected from thread 0xfffff800a2343000 
Jan  8 16:24:48 frix230 kernel: tick finish thread 0xfffff800a2343000
Jan  8 16:24:48 frix230 kernel: ue0: link state changed to DOWN
Jan  8 16:24:48 frix230 kernel:
Jan  8 16:24:48 frix230 kernel: Link up detected from thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: poll finish thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: ue0: link state changed to UP
Jan  8 16:24:48 frix230 kernel: 

All those print are after having acquired the lock and before releasing it. As you can see, the poll thread begins while the tick one didn't finish yet.
Comment 9 Ali Abdallah 2021-01-11 12:59:03 UTC
How can that be even possible? I've added also the following static variable in ure_tick

URE_LOCK_ASSERT(sc, MA_OWNED);                                                          
printf("\ntick begin thread %p\n", curthread);
ticking = 1;
mii_tick(mii);                                                                                                                
ticking = 0;
printf("\ntick end thread %p\n", curthread);

And in ure_ifmedia_sts

URE_LOCK(sc);
printf("poll begin %p\n", curthread);
if (!ticking)                                                                                                                 
  mii_pollstat(mii);
printf("poll end %p\n", curthread);
...
URE_UNLOCK(sc);

And it is running fine since last Friday, which proves further that the poll thread is being executed while the tick on is running, despite the fact that they use the same lock!

The only difference I personally see here, unlike mii_tick, ure_ifmedia_sts is called at the end of a syscall, not sure if this is relevant here.
Comment 10 Hans Petter Selasky freebsd_committer 2021-01-11 13:24:25 UTC
Hi,

Can you print "%s" curthread->td_proc->p_comm

and

curthread->td_name

Probably two threads are racing there!

--HPS
Comment 11 Ali Abdallah 2021-01-11 13:44:09 UTC
The additional strings to print changed a bit the timing, it is no longer racing on media status, but media type (it went from 1000baseT <full-duplex> to nothing during the race below).

Jan 11 14:40:42 frix230 kernel: tick begin thread 0xfffff8027121a740 ure0 usb
Jan 11 14:40:42 frix230 kernel: 
Jan 11 14:40:42 frix230 kernel: poll begin 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: poll finish 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: 
Jan 11 14:40:42 frix230 kernel: poll begin 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: poll finish 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: 
Jan 11 14:40:42 frix230 kernel: poll begin 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: poll finish 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: tick end thread 0xfffff8027121a740 ure0 usb

For the test, I'm calling ifconfig continuously from a script, but any network monitoring applet will trigger the issue.
Comment 12 Ali Abdallah 2021-01-20 13:55:24 UTC
Any possible explanation of what is happening here? Thanks.
Comment 13 Ali Abdallah 2021-03-01 17:25:22 UTC
I had a better look at the code, and I think I have an explanation of what is the root cause of the issue. 

The function uether_rxflush releases the lock, causing a mess up with lock/unlock orders.

void
uether_rxflush(struct usb_ether *ue)
{
        ...       
        UE_LOCK_ASSERT(ue, MA_OWNED);
                
        n = mbufq_flush(&ue->ue_rxq);
        UE_UNLOCK(ue);
        NET_EPOCH_ENTER(et);
        ...
        NET_EPOCH_EXIT(et);
        UE_LOCK(ue);
}   

I've dtraced the calls

2  64153           ure_ifmedia_sts:return      155646206056
0  64153           ure_ifmedia_sts:return      155647214955
0  64152            ure_ifmedia_sts:entry      155647476515
2  64153           ure_ifmedia_sts:return      155648459628
2  64152            ure_ifmedia_sts:entry      155648490676
1  64220             uether_rxflush:entry      155649463422  <-
0  64153           ure_ifmedia_sts:return      155649466344  <-
1  64221            uether_rxflush:return      155649488201


When uether_rxflush:entry is called, it has the lock, but then it is releasing it and re-acquiring it later, causing the mess with ure_ifmedia_sts IMHO.
Comment 14 Ali Abdallah 2021-03-02 07:39:28 UTC
Even without UE_LOCK and UE_UNLOCK of comment 13 the issue is still occurring... 

The following dtrace is observed when the link state fictitiously changes 

2  64192               ue_tick_task:entry      242528044179
2  64175                   ure_tick:entry      242528050925
1  64152            ure_ifmedia_sts:entry      242529137165
2  64176                  ure_tick:return      242529369904
1  64152            ure_ifmedia_sts:entry      242529474008
2  64153           ure_ifmedia_sts:return      242530486029

For whatever reason... Normally the sequence should be 

2  64152            ure_ifmedia_sts:entry      243552084077
2  64153           ure_ifmedia_sts:return      243553071554
0  64192               ue_tick_task:entry      243553441743
0  64175                   ure_tick:entry      243553444178
2  64176                  ure_tick:return      243554572011
3  64152            ure_ifmedia_sts:entry      243663084286
3  64153           ure_ifmedia_sts:return      243664074467
Comment 15 Hans Petter Selasky freebsd_committer 2021-03-02 10:43:15 UTC
Probably all of the chip configuration and status reads need to be serialized on a USB process callback thread.

--HPS
Comment 16 Ali Abdallah 2021-03-02 16:47:22 UTC
(In reply to Hans Petter Selasky from comment #15)

Probably, but I'm really not sure how to deal in a usb callback thread with the SIOCSIFMEDIA ioctl.

Also, I'm still not able to identify the root cause of the issue, the fact that the usb thread and the ifconfig ioctl thread use the same lock should avoid any race, but for some reason it is not happening. I would still like to understand what is going wrong.