Created attachment 194064 [details] The patch agains 12.0-CURRENT r320599 Under FreeBSD norquay.restart.bel 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r320599M with the patches attached I am running mpd5 to connect to the internet: mpd.conf: #--- RestartSoft --- ADSL with PPPoE --- startup: # enable TCP-Wrapper (hosts_access(5)) to block unfriendly clients set global enable tcp-wrapper set user hlh guru admin # configure the console set console self 0.0.0.0 5005 set console open # configure the web interface set web self 0.0.0.0 5006 set web open default: load scarlet scarlet: create bundle static B0 set bundle enable compression # Configure IP addressing -- we get both endpoint assignments from peer. set ipcp ranges 0.0.0.0/0 0.0.0.0/0 set ipcp yes vjcomp # Configure the interface: default route, idle timeout. set iface route default set iface enable tcpmssfix set iface disable on-demand set iface idle 0 set iface up-script /usr/local/etc/mpd5/adsl.linkup set iface down-script /usr/local/etc/mpd5/adsl.linkdown # Configure the (only) link. We expect to be authenticated by peer. create link static L0 pppoe set link action bundle B0 set auth authname "hh364660@SCARLET" set link max-redial 0 set link mtu 1492 set link keep-alive 10 60 set link no acfcomp protocomp set link disable chap pap set link accept chap set pppoe iface awg0 # set pppoe service "SCARLET" open I have a throughput of 2.3 MB/s The same configuration without any patch under FreeBSD 12.0-CURRENT #0 r334436 I have a throughput of 10KB/s :-o Any clue? Thanks Henri
(In reply to hlh from comment #0) These patches have nothing to do with your PPPoE throughput problems. Please add more details: 1) hardware you use to run mpd5; 2) exact package (or port) version of mpd5; 3) nominal speed of your PPPoE connection as written in provider's contract; 4) attach output of "ifconfig -a" when PPPoE connection is established; 5) add to startup section of mpd.conf: log +auth +bund +iface +iface2 +link Make sure mpd's logs are stored to a file by means of addition to /etc/sysctl.conf: !mpd *.* /var/log/mpd.log Don't forget to apply changes by commands: touch /var/log/mpd.log && killall -1 syslogd Then restart mpd5 and attach contents of /var/log/mpd.log after PPPoE connection re-establishes.
(In reply to Eugene Grosbein from comment #1) 1) Hardware is a pine64+ 2G with the root filesystem on 2 USB disk under zfs 2) mpd5-5.8_3 Name : mpd5 Version : 5.8_3 Installed on : Thu Jun 7 11:06:27 2018 CEST Origin : net/mpd5 Architecture : FreeBSD:12:aarch64 Prefix : /usr/local Categories : net Licenses : BSD3CLAUSE Maintainer : eugen@FreeBSD.org WWW : https://sourceforge.net/projects/mpd/ Comment : Multi-link PPP daemon based on netgraph(4) Options : NG_IPACCT : off 3) Download: 50.0 Mbps Upload: 4.0 Mbps 4) awg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE> ether 02:ba:f5:4d:34:f7 inet 192.168.24.1 netmask 0xffffff00 broadcast 192.168.24.255 inet 192.168.23.127 netmask 0xffffff00 broadcast 192.168.23.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33160 groups: pflog gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280 options=80000<LINKSTATE> tunnel inet 212.239.130.5 --> 5.135.182.190 inet 192.168.25.127 --> 192.168.25.1 netmask 0xffffff00 groups: gif nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> ng0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1492 inet 212.239.130.5 --> 212.239.130.1 netmask 0xffffffff nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> 5) see attachment
Created attachment 194068 [details] /var/log/mpd.log
(In reply to hlh from comment #3) Thanks, this data is very helpful. It show there are no problems in your setup nor PPPoE negotiation process and mpd5 does it job just fine configuring internal kernel's netgraph structures and network interface ng0. Then it is kernel's duty to forward the traffic and hardware/drivers must pass it flawlessly. The problem may be at provider's side. Do you have same throughput problem if you connect another equipment to physical link and setup PPPoE connection? I mean, use some i386/amd64 hardware with FreeBSD 11 or some Linux or Windows system? Can you get your 50Mbps of download rate saturated then? If no, this should be uplink problem. If yes, it is a problem of FreeBSD-12/aarch64 kernel. How do you measure throughput using this mpd5 router?
(In reply to Eugene Grosbein from comment #4) When I boot the same configuration with 12.0-CURRENT r320599M with the patches attached (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220140) I have a good throughput. So It must be a kernel problem. I check the throughput by simply downloading a file with firefox and checking the download speed (eg https://download.freebsd.org/ftp/releases/arm64/aarch64/ISO-IMAGES/11.2/FreeBSD-11.2-RC1-arm64-aarch64-memstick.img)
(In reply to hlh from comment #5) You have something else besides of patches that influences on the troughput as nothing in https://bz-attachments.freebsd.org/attachment.cgi?id=194064 can do that. Perhaps, another patch you have for if_awg does it. So, you perform tests from internal machine and traffic goes through pf/NAT, right? If so, could you please temporary disable pf and use fetch(1) (or wget, or just ftp) directly from the router to download large file to /dev/null: fetch -o /dev/null $url Use "systat -ifstat" to monitor average and peak downloading spead meantime. This would allow us to rule out possible pfnat problems.
(In reply to Eugene Grosbein from comment #6) Under 12.0-CURRENT #0 r334436 I disable pf (pfctl -F all) I fetch a big file on the gateway (norquay) and get: ng0 in Mean: 27.105 KB/s Peak: 52.624 KB/s To try to rule out if_awg problem I download with an internal machine a big file from the gateway (norquay) and have a good throughput (3MB/s) With 12.0-CURRENT #0 r320599M I have ng0 in Mean: 9.211 MB/s Peak: 9.534 MB/s to be exhaustive, on r320599M I have this patch against if_awg.c: Index: sys/arm/allwinner/if_awg.c =================================================================== --- sys/arm/allwinner/if_awg.c (revision 320599) +++ sys/arm/allwinner/if_awg.c (working copy) @@ -92,7 +92,7 @@ #define TX_SKIP(n, o) (((n) + (o)) & (TX_DESC_COUNT - 1)) #define RX_NEXT(n) (((n) + 1) & (RX_DESC_COUNT - 1)) -#define TX_MAX_SEGS 10 +#define TX_MAX_SEGS 20 #define SOFT_RST_RETRY 1000 #define MII_BUSY_RETRY 1000 @@ -419,14 +419,18 @@ sc->tx.buf_map[index].map, m, segs, &nsegs, BUS_DMA_NOWAIT); if (error == EFBIG) { m = m_collapse(m, M_NOWAIT, TX_MAX_SEGS); - if (m == NULL) + if (m == NULL) { + device_printf(sc->miibus, "awg_setup_txbuf: m_collapse failed\n"); return (0); + } *mp = m; error = bus_dmamap_load_mbuf_sg(sc->tx.buf_tag, sc->tx.buf_map[index].map, m, segs, &nsegs, BUS_DMA_NOWAIT); } - if (error != 0) + if (error != 0) { + device_printf(sc->miibus, "awg_setup_txbuf: bus_dmamap_load_mbuf_sg failed\n"); return (0); + } bus_dmamap_sync(sc->tx.buf_tag, sc->tx.buf_map[index].map, BUS_DMASYNC_PREWRITE);
It seems head r334436 has some kind of regression comparing with r320599. Please re-do the test with r334436 and this time monitor "top -iFUP" output to see if there is excessive CPU load or not during download.
(In reply to Eugene Grosbein from comment #8) Strange results during fetch: With r334436 the cpu is all the time idle: 99% With r320599 the cpu is user: 3% system: 8% interrupt: 11%
(In reply to hlh from comment #9) I suspect there is some kind of hardware driver problem, e.g. delayed or missed interrupts etc. Please collect more information: 1) perform verbose boot, if possible, and save /var/run/dmesg.boot after boot completed; 2) re-do the test and save output of "vmstat -ai" after the test completed. Perform that for both revisions and attach all data plus "diff -u" for corresponding files.
Created attachment 194150 [details] Verbose boot of r334436
Created attachment 194151 [details] Verbose boot of r320599
Created attachment 194152 [details] vmstat -ai of r334436
Created attachment 194153 [details] vmstat -ai of r320599
(In reply to Eugene Grosbein from comment #10) On r334436 I cancel the fetch after more than 10MB downloaded On r320599 I fetch the complete 854MB The files seems too different for a diff to be useful.
Unified diff for dmesg.boot files is still useful and lack of it won't make it easier to understand the problem for an arm64 developer. Interrupts are assigned in very different ways for those revisions. Are there exactly same pieces of hardware used? We still need "vmstat -ai" output for same volume of data transferred. Make 10MB file or so and repeate the test, please. Does r334436 have any problems with time management, e.g. too slow wall clock? Can you start ntpd, wait for 20-60 minutes then show contents of drift file /var/db/ntpd.drift ?
Seems to be ARM-specific problem.
Created attachment 194156 [details] diff -u verbose boot r334436 r320599
(In reply to Eugene Grosbein from comment #16) Test are run on the same piece of hardware: boot rpool/ROOT/current (r334436) and rpool/ROOT/default (r320599). There is a time management problem with r320599 _and_ r334436: egrep 'spike_detect|clock_step' /var/log/ntp.log 15 Apr 09:47:50 ntpd[34364]: 0.0.0.0 0613 03 spike_detect -178.950279 s 15 Apr 09:47:51 ntpd[34364]: 0.0.0.0 061c 0c clock_step -178.950180 s 28 Apr 07:42:57 ntpd[39486]: 0.0.0.0 0613 03 spike_detect -178.959592 s 28 Apr 08:17:50 ntpd[39486]: 0.0.0.0 061c 0c clock_step -178.960942 s 29 Apr 07:50:51 ntpd[1811]: 0.0.0.0 0613 03 spike_detect -178.956145 s 29 Apr 08:01:38 ntpd[1811]: 0.0.0.0 061c 0c clock_step -178.958064 s 2 May 13:40:24 ntpd[31174]: 0.0.0.0 0613 03 spike_detect -178.954557 s 2 May 13:40:49 ntpd[31174]: 0.0.0.0 061c 0c clock_step -178.954612 s 2 May 16:56:36 ntpd[31174]: 0.0.0.0 0613 03 spike_detect -357.918704 s 2 May 17:09:24 ntpd[31174]: 0.0.0.0 061c 0c clock_step -357.918942 s 2 May 20:34:21 ntpd[4932]: 0.0.0.0 c61c 0c clock_step -178.957489 s 6 May 23:02:35 ntpd[20790]: 0.0.0.0 0613 03 spike_detect -178.955371 s 6 May 23:36:10 ntpd[20790]: 0.0.0.0 061c 0c clock_step -178.955510 s 7 May 04:50:38 ntpd[20790]: 0.0.0.0 0613 03 spike_detect -178.959438 s 7 May 05:25:50 ntpd[20790]: 0.0.0.0 061c 0c clock_step -178.958614 s 9 May 10:19:27 ntpd[52422]: 0.0.0.0 c61c 0c clock_step -178.957618 s 9 May 16:48:19 ntpd[1046]: 0.0.0.0 c61c 0c clock_step -1.479122 s 13 May 04:47:30 ntpd[26110]: 0.0.0.0 0613 03 spike_detect -178.954317 s 13 May 05:07:12 ntpd[26110]: 0.0.0.0 061c 0c clock_step -178.952353 s 16 May 03:43:56 ntpd[51054]: 0.0.0.0 0613 03 spike_detect -178.958443 s 16 May 03:56:20 ntpd[51054]: 0.0.0.0 061c 0c clock_step -178.958603 s 20 May 11:08:27 ntpd[91061]: 0.0.0.0 0613 03 spike_detect -178.956436 s 20 May 11:19:35 ntpd[91061]: 0.0.0.0 061c 0c clock_step -178.957825 s 20 May 23:01:53 ntpd[88991]: 0.0.0.0 0613 03 spike_detect -178.953364 s 20 May 23:04:28 ntpd[88991]: 0.0.0.0 061c 0c clock_step -178.953372 s 22 May 09:13:28 ntpd[10332]: 0.0.0.0 0613 03 spike_detect -178.958062 s 22 May 09:19:54 ntpd[10332]: 0.0.0.0 061c 0c clock_step -178.957215 s 2 Jun 10:35:09 ntpd[25713]: 0.0.0.0 0613 03 spike_detect -178.955744 s 2 Jun 10:44:25 ntpd[25713]: 0.0.0.0 061c 0c clock_step -178.956775 s 3 Jun 03:02:01 ntpd[25713]: 0.0.0.0 0613 03 spike_detect -178.953073 s 3 Jun 03:12:24 ntpd[25713]: 0.0.0.0 061c 0c clock_step -178.953418 s 7 Jun 11:32:29 ntpd[1842]: 0.0.0.0 c61c 0c clock_step +73.709562 s 7 Jun 18:04:57 ntpd[1031]: 0.0.0.0 c61c 0c clock_step +63.829980 s 8 Jun 11:13:28 ntpd[1031]: 0.0.0.0 0613 03 spike_detect -178.959113 s 8 Jun 11:40:17 ntpd[1031]: 0.0.0.0 061c 0c clock_step -178.958836 s 8 Jun 16:36:02 ntpd[1042]: 0.0.0.0 c61c 0c clock_step +11.228134 s 9 Jun 11:08:50 ntpd[1034]: 0.0.0.0 c61c 0c clock_step +24.488987 s 11 Jun 10:43:33 ntpd[1038]: 0.0.0.0 c61c 0c clock_step +3.852690 s The clock_step < 30s occurs because after a power off, I must do a boot -s and give the current time with a date command and then resume the boot in multiuser mode. Note the the drift is always n*178.9 seconds with n = 1 or 2. This occurs if I run something like `periodic daily`. I will do the fetch of a file of 10MB.
Created attachment 194165 [details] vmstat -ai of r334436 after fetch of 10329777 bytes
Created attachment 194166 [details] vmstat -ai of r320559 after fetch of 10329777 bytes
Created attachment 194167 [details] diff -u vmstat r334436 r320599 after fetch of 10329777 bytes
Timing problems seems to be first suspect for me. We need more eyes of ARM people here.
There are no patches here (yet) solving the problem.
I try with 12.0-CURRENT r336112 and patch from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229644 The throughput regression is still there
I try with 12.0-ALPHA2 r338128M with a10_timer - see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229644#c27 The throughput is back to normal :-) I think this bug is solved.
Closed at submitter's request. Thank you for notification.