I found em's receiving queue handling will have the following problems if system mbuf is used up. 1. NIC will hang because receive ring's tail pointer will not be updated (in em_rxeof). 2. "ifconfig up/down" may cause system panic because em_setup_receive_ring will free already-freed mbufs. So I made some changes: 1. NIC's recieve ring's head/tail pointer is updated according to rxr's next_to_check/next_to_refresh. So, on the position of next_to_refresh, no need to fill free mbuf because datasheet says "When the head pointer(s) is equal to the tail pointer(s), the queue(s) is empty. Hardware stops storing packets in system memory until software advances the tail pointer(s), making more receive buffers available." And (next_to_refresh + 1) % num_rx_desc == next_to_check means ring queue is full. 2. no need to reallocate the mbufs on receive queue when em re-initialize. 3. The mbufs on the queue are also freed according to these two indexs. 4. If ring queue is empty, em_refresh_mbufs is also called even if it doesn't handle any packet Fix: Patch attached with submission follows:
Responsible Changed From-To: freebsd-bugs->jfv Jack is em(4) maintainer
After r215808, the last hunk no longer apply. I'll give it a try without it.
For the record, I can still quickly hang an em(4) with the attached patch.
On 2011-02-23 03:08:53, Arnaud Lacombe wrote: >For the record, I can still quickly hang an em(4) with the attached patch. That patch is not complete. I sent this new one to Jack, I think he is verifying if it can work. If you can't wait, try the new one. It's based on stable/8. Thanks Beezar
2011/2/22 beezarliu <beezarliu@yahoo.com.cn>: > On 2011-02-23 03:08:53, Arnaud Lacombe wrote: >>For the record, I can still quickly hang an em(4) with the attached patch. > > That patch is not complete. > I sent this new one to Jack, I think he is verifying if it can work. > If you can't wait, try the new one. It's based on stable/8. > On a 7-STABLE em(4) code, after applying the patch you sent with patch(1), minus a small rejection manually fixed (the point of failure?), I am still able to hang em(4) RX. I can provide you the patch I used if you want to check. - Arnaud
Hi, 2011/2/23 beezarliu <beezarliu@yahoo.com.cn>: > I think you can replace sys/dev/e1000/* =A0with the files in stable/8, > and then uses this patch. It's much simpler. > :) > agree. I switched to the driver of 8-STABLE, applied your patch (without any conflict), fixed a bad version check (see [0]), built the kernel, rebooted, ran the test... and hung em(4) RX again. 141k missed packets (growing) and about 3k cluster allocations denied (constant) currently. TX is fine, but the NIC is unable to process ARP and ICMP reply sent back by the machine on the LAN. I did not test igb(4) as I do not have any on the path I directly control, but the fix in trunk survived a few day of similar testing. - Arnaud [0]: diff --git a/sys/dev/e1000/if_igb.h b/sys/dev/e1000/if_igb.h index 4388e07..adef0af 100644 --- a/sys/dev/e1000/if_igb.h +++ b/sys/dev/e1000/if_igb.h @@ -508,7 +508,7 @@ struct igb_rx_buf { cur |=3D new; \ } -#if __FreeBSD_version < 800504 +#if __FreeBSD_version >=3D 800000 && __FreeBSD_version < 800504 static __inline int drbr_needs_enqueue(struct ifnet *ifp, struct buf_ring *br) {
Hi, What is the point to invent a complex logic to detected a situation the chip warn you about ? The attached patch has currently survived longer than anything I've been ever tested, without dirty hack, like raising `nmbclusters'. Thanks in advance, - Arnaud ps: this is a temporary patch, for my 82754 based system, I am aware there is other user of IMS_ENABLE_MASK. On Wed, Feb 23, 2011 at 2:52 AM, Arnaud Lacombe <lacombar@gmail.com> wrote: > Hi, > > 2011/2/23 beezarliu <beezarliu@yahoo.com.cn>: >> I think you can replace sys/dev/e1000/* with the files in stable/8, >> and then uses this patch. It's much simpler. >> :) >> > agree. I switched to the driver of 8-STABLE, applied your patch > (without any conflict), fixed a bad version check (see [0]), built the > kernel, rebooted, ran the test... and hung em(4) RX again. 141k missed > packets (growing) and about 3k cluster allocations denied (constant) > currently. TX is fine, but the NIC is unable to process ARP and ICMP > reply sent back by the machine on the LAN. > > I did not test igb(4) as I do not have any on the path I directly > control, but the fix in trunk survived a few day of similar testing. > > - Arnaud > > [0]: > diff --git a/sys/dev/e1000/if_igb.h b/sys/dev/e1000/if_igb.h > index 4388e07..adef0af 100644 > --- a/sys/dev/e1000/if_igb.h > +++ b/sys/dev/e1000/if_igb.h > @@ -508,7 +508,7 @@ struct igb_rx_buf { > cur |= new; \ > } > > -#if __FreeBSD_version < 800504 > +#if __FreeBSD_version >= 800000 && __FreeBSD_version < 800504 > static __inline int > drbr_needs_enqueue(struct ifnet *ifp, struct buf_ring *br) > { >
On 2011-02-26 07:52:54, Arnaud Lacombe wrote: >Hi, > >What is the point to invent a complex logic to detected a situation >the chip warn you about ? > >The attached patch has currently survived longer than anything I've >been ever tested, without dirty hack, like raising `nmbclusters'. Sorry, I didn't 'invent' the logic, it's just what rx ring queue works in hardware. You provided another way to detect the hang, which doesn't mean others are meaningless. Raising nmbcluster is not best way, but it can ease the memory shortage, and allcate more for network, which also improve system performance. Beezar
Hi, 2011/2/25 beezarliu <beezarliu@yahoo.com.cn>: > Raising nmbcluster is not best way, but it can ease the memory shortage, > and allcate more for network, which also improve system performance. > sure, but the driver should still be able to deals with such condition without stalling. :) - Arnaud
igb(4) and em(4) have been updated with new watchdog and queue handling functions. This was resolved in my testing on 10.2r and stable/10