Bug 252165 - mii bus media status race condition
Summary: mii bus media status race condition
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-26 16:14 UTC by Ali Abdallah
Modified: 2021-01-11 13:44 UTC (History)
4 users (show)

See Also:


Attachments
Workaround race in link status change (1.50 KB, patch)
2021-01-04 19:37 UTC, Ali Abdallah
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ali Abdallah 2020-12-26 16:14:32 UTC
I'm using if_ure on a USB-C dock station. I'm having the following random issue:

Dec 25 12:20:10 Fryzen495 kernel: ue0: link state changed to UP
Dec 25 12:20:10 Fryzen495 kernel: ue0: link state changed to DOWN
Dec 25 12:20:10 Fryzen495 kernel: ue0: link state changed to UP
Dec 25 12:20:58 Fryzen495 kernel: ue0: link state changed to DOWN
Dec 25 12:21:01 Fryzen495 kernel: ue0: link state changed to UP
Dec 25 12:21:07 Fryzen495 dhclient[20388]: New IP Address (ue0): 192.168.1.145
Dec 25 12:21:07 Fryzen495 dhclient[21191]: New Subnet Mask (ue0): 255.255.255.0
Dec 25 12:21:07 Fryzen495 dhclient[22099]: New Broadcast Address (ue0): 192.168.1.255
Dec 25 12:21:07 Fryzen495 dhclient[22651]: New Routers (ue0): 192.168.1.254

After digging into the code, I've discovered that the issue is caused by the following race condition:

In mii/mii.c, the function miibus_linkchg(device_t dev) reads mii->mii_media_status and sets the link state accordingly. 

static void
miibus_linkchg(device_t dev)
{
...

   if (mii->mii_media_status & IFM_AVALID) {
       if (mii->mii_media_status & IFM_ACTIVE)
           link_state = LINK_STATE_UP;
       else
           link_state = LINK_STATE_DOWN;

...
}

On the other hand, the function mii_pollstat running on another thread, polls the media status, which is using rgephy.c code in my case. (via PHY_SERVICE call)

static void
rgephy_status(struct mii_softc *sc)
{
   mii->mii_media_status = IFM_AVALID;
   mii->mii_media_active = IFM_ETHER;

   /* HERE IS THE ISSUE
     The read in miibus_linkchg is randomly seeing mii->mii_media_status as IFM_AVALID,
     but it does not yet have the IFM_ACTIVE flag set, which makes it trigger a link state change...
   */

   if (rgephy_linkup(sc) != 0)
      mii->mii_media_status |= IFM_ACTIVE;
   ...

This is an issue in all miibus modules, rgephy.c, truephy.c, mlphy.c, etc...

Please note that also in mii_pollstat, mii->mii_media_status is initialized to 0 (not sure why), and that also races with miibus_linkchg.

There are many ways to fix it, locally I'm not resetting mii->mii_media_status until rgephy_linkup(sc) returns link down.
Comment 1 Rodney W. Grimes freebsd_committer 2020-12-27 10:14:11 UTC
I believe I have seen this several times in the past, mostly when I am doing diskless boots via pxe/ipxe/nfs.
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2020-12-27 12:20:28 UTC
^Triage: set to net@ as this affect miibus.
Comment 3 Ali Abdallah 2021-01-04 19:37:43 UTC
Created attachment 221274 [details]
Workaround race in link status change

(In reply to Rodney W. Grimes from comment #1)
The various mii modules are "terribly" SMP broken, they all reset the media status to

mii->mii_media_status = IFM_AVALID; 

then call the hardware specific function to check the link and eventually set

mii->mii_media_status |= IFM_ACTIVE;

But during this time the call to miibus_linkchg might trigger a link change, as it randomly sees the IFM_AVALID flag only, before the IFM_ACTIVE being set.

Attached is the workaround I'm currently applying to avoid the issue on my T495 with the Thinkpad USB-C gen2 docking station (My hardware uses rgephy.c). If you think this is the way to go, I can provide a full patch for the rest of the mii bus modules.
Comment 4 Hans Petter Selasky freebsd_committer 2021-01-07 14:47:42 UTC
Could you extend your patch to fix the other cases you see aswell, and create a patch at https://reviews.freebsd.org/differential/ with full context?
Comment 5 Ali Abdallah 2021-01-08 07:19:18 UTC
(In reply to Hans Petter Selasky from comment #4)

Yes I'll do that.
Comment 6 Ali Abdallah 2021-01-11 12:30:07 UTC
I had a deeper look into the issue, applying the work-around described in comment 3 is definitely not the way to go, the problem lies somewhere else.

Actually the code path that end up calling miibus_linkchg is mii_tick and mii_poll (through mii_phy_update). At the first look they seem to be running without lock, but looking at the usb ethernet specific code (if_ure.c in my case), a common lock is acquired before calling those functions.

in ure_tick the lock is already acquired beforehand, by the usb ethernet code 

static void
ure_tick(struct usb_ether *ue)
....
URE_LOCK_ASSERT(sc, MA_OWNED); 
mii_tick(mii);
....

ure_ifmedia_sts is what polls for media status, and it acquires the same lock before calling mii_pollstat.

static void
ure_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
...
URE_LOCK(sc);
mii_pollstat(mii);
...
URE_UNLOCK(sc);

What I'm seeing is that, despite the lock, sometimes they run both in parallel, which is causing the race in the media status... I'm not sure why, and it is kind of driving me crazy.
Comment 7 Hans Petter Selasky freebsd_committer 2021-01-11 12:39:13 UTC
Did you try using dtrace, or adding simple prints:

printf("%d %s:%d\n", ticks, __FILE__, __LINE__);

--HPS
Comment 8 Ali Abdallah 2021-01-11 12:43:57 UTC
Yes I did add simple printf, showing the thread id, and this is what I have

Jan  8 16:24:47 frix230 kernel: tick begin  thread 0xfffff800a2343000
Jan  8 16:24:47 frix230 kernel: tick finish thread 0xfffff800a2343000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: poll begin  thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: poll finish thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: poll begin  thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: poll finish thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: tick begin  thread 0xfffff800a2343000
Jan  8 16:24:48 frix230 kernel: 
Jan  8 16:24:48 frix230 kernel: poll begin  thread 0xfffff8001e67c000 
Jan  8 16:24:48 frix230 kernel: Link down detected from thread 0xfffff800a2343000 
Jan  8 16:24:48 frix230 kernel: tick finish thread 0xfffff800a2343000
Jan  8 16:24:48 frix230 kernel: ue0: link state changed to DOWN
Jan  8 16:24:48 frix230 kernel:
Jan  8 16:24:48 frix230 kernel: Link up detected from thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: poll finish thread 0xfffff8001e67c000
Jan  8 16:24:48 frix230 kernel: ue0: link state changed to UP
Jan  8 16:24:48 frix230 kernel: 

All those print are after having acquired the lock and before releasing it. As you can see, the poll thread begins while the tick one didn't finish yet.
Comment 9 Ali Abdallah 2021-01-11 12:59:03 UTC
How can that be even possible? I've added also the following static variable in ure_tick

URE_LOCK_ASSERT(sc, MA_OWNED);                                                          
printf("\ntick begin thread %p\n", curthread);
ticking = 1;
mii_tick(mii);                                                                                                                
ticking = 0;
printf("\ntick end thread %p\n", curthread);

And in ure_ifmedia_sts

URE_LOCK(sc);
printf("poll begin %p\n", curthread);
if (!ticking)                                                                                                                 
  mii_pollstat(mii);
printf("poll end %p\n", curthread);
...
URE_UNLOCK(sc);

And it is running fine since last Friday, which proves further that the poll thread is being executed while the tick on is running, despite the fact that they use the same lock!

The only difference I personally see here, unlike mii_tick, ure_ifmedia_sts is called at the end of a syscall, not sure if this is relevant here.
Comment 10 Hans Petter Selasky freebsd_committer 2021-01-11 13:24:25 UTC
Hi,

Can you print "%s" curthread->td_proc->p_comm

and

curthread->td_name

Probably two threads are racing there!

--HPS
Comment 11 Ali Abdallah 2021-01-11 13:44:09 UTC
The additional strings to print changed a bit the timing, it is no longer racing on media status, but media type (it went from 1000baseT <full-duplex> to nothing during the race below).

Jan 11 14:40:42 frix230 kernel: tick begin thread 0xfffff8027121a740 ure0 usb
Jan 11 14:40:42 frix230 kernel: 
Jan 11 14:40:42 frix230 kernel: poll begin 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: poll finish 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: 
Jan 11 14:40:42 frix230 kernel: poll begin 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: poll finish 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: 
Jan 11 14:40:42 frix230 kernel: poll begin 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: poll finish 0xfffff801ca32f000 ifconfig ifconfig
Jan 11 14:40:42 frix230 kernel: tick end thread 0xfffff8027121a740 ure0 usb

For the test, I'm calling ifconfig continuously from a script, but any network monitoring applet will trigger the issue.