Created attachment 190162 [details] cycle_clock.c Delays in TCP connection over Gigabit Ethernet connections; Regression from 6.3-RELEASE. Long description of usage case: The company I'm working in is developing soft-realtime applications, with internal cycle like 600 ms (0.6) seconds. The core of applications is working (for functions safety) simulatenously on two computers with different architecture (A - FreeBSD/Intel, B - Linux/PowerPC). A and B computers connected together via copper Gigabit switch, with no other devices connected to the same network. Normal cycle for application looks like this: 1. Process input from object controllers; 2. Cross-compare input between A and B computers 3. Make evaluations (simulate complex automata). 4. Produce output to send to objects controllers. 5. Cross-compare output between A and B. 6. Send output to object controllers. 7. Sleep until the end-of-cycle. 8. Go to step 1 Cross-compare part is done using tcp connection over gigabit ethernet, between A and B computers. If A or B do not able to handle all operations in appropriate time (600 ms +/- 150 ms) the whole system halts due to internal time checks. Yes, may be using UDP packets may be better - but even in this configuration, last released version of hardware is working just fine - uptime ~ 1 year per installation, approximatelly 100 installations worldwide. Moreover, no cases of halts due to internal time checks was found - some other defects, software or hardware, was causing rare halts. So, the old release use A industrial computer, CPU: Intel(R) Core(TM)2 Duo CPU L7400 @ 1.50GHz (1500.12-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6fb Stepping = 11 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM> AMD Features=0x20100000<NX,LM> AMD Features2=0x1<LAHF> Cores per package: 2 .... em0: <Intel(R) PRO/1000 Network Connection Version - 6.7.2> port 0xdf00-0xdf1f mem 0xff880000-0xff89ffff,0xff860000-0xff87ffff irq 16 at device 0.0 on pci4 em0: Ethernet address: 00:30:64:09:6f:ee .... FreeBSD fspa2 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 04:45:45 UTC 2008 root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP i386 and this configuration works just fine. I have made test application simulating main cycle of the complex system, and maximum delay is 24 ms for the 1 million cycles. Test application (running on two different computers) is working like this: 1. Establish tcp connection (one instance listen, another connect). 2. send() small fixed amount of data from both sides 3. recv() small fixed data. 4. send() 40 K of data from both sides 5. recv() 40 K of data on both sides 6. Perform complex, but useless computation (simulate actual application load). 7. Calculate how much to nanosleep() until end-of-cycle, call nanosleep(). go on step 2 ------------------ every operation is guarded with clock_gettime() and duration of send(), recv(), evaluate() and nanosleep() is printed out. I will attach test application sources to this ticket. For FreeBSD 6.3-RELEASE, output looks like this - running between two identical computers with FreeBSD-6.3 installed (only 2-4 columns is shown): fspa2# grep times fspa2_fspa2_vpu.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10}' | sort | uniq -c 1115322 send_sync 0 0 0 0 7425 send_sync 0 0 0 1 73629 send_sync 0 1 0 0 1 send_sync 0 13 0 0 66 send_sync 0 2 0 0 1 send_sync 0 24 0 0 27 send_sync 0 3 0 0 17 send_sync 0 4 0 0 As you can see, the maximum delay is 24 milliseconds (0.024 s) and it's happends only once. So now the real problem: using FreeBSD 10.3-RELEASE and much more powerfull hardware (Moxa DA-820 industrial computer): CPU: Intel(R) Core(TM) i7-3555LE CPU @ 2.50GHz (2494.39-MHz 686-class CPU) Origin="GenuineIntel" Id=0x306a9 Family=0x6 Model=0x3a Stepping=9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x28100000<NX,RDTSCP,LM> AMD Features2=0x1<LAHF> Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0x6600-0x661f mem 0xc0a00000-0xc0a1ffff,0xc0a27000-0xc0a27fff irq 20 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 00:90:e8:69:ea:3c root@fspa2:~/clock/new_res # uname -a FreeBSD fspa2 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 03:51:29 UTC 2016 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC i386 and the same test application we get (running between two identical devices, both Moxa DA-820 with FreeBSD-10.3): root@fspa2:~/clock/new_res # grep times direct_1.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 155042 send_sync 0 0 0 0 122890 send_sync 0 0 0 1 1 send_sync 0 0 0 100 1 send_sync 0 0 0 102 1 send_sync 0 0 0 111 1 send_sync 0 0 0 12 1 send_sync 0 0 0 125 2 send_sync 0 0 0 13 1 send_sync 0 0 0 130 1 send_sync 0 0 0 131 1 send_sync 0 0 0 133 2 send_sync 0 0 0 136 1 send_sync 0 0 0 148 2 send_sync 0 0 0 149 3 send_sync 0 0 0 150 1 send_sync 0 0 0 156 1 send_sync 0 0 0 159 1 send_sync 0 0 0 16 1 send_sync 0 0 0 161 1 send_sync 0 0 0 17 1 send_sync 0 0 0 176 1 send_sync 0 0 0 19 18 send_sync 0 0 0 2 1 send_sync 0 0 0 229 1 send_sync 0 0 0 23 1 send_sync 0 0 0 24 2 send_sync 0 0 0 25 1 send_sync 0 0 0 26 1 send_sync 0 0 0 28 1 send_sync 0 0 0 282 1 send_sync 0 0 0 29 14 send_sync 0 0 0 3 1 send_sync 0 0 0 30 1 send_sync 0 0 0 31 1 send_sync 0 0 0 32 1 send_sync 0 0 0 37 1 send_sync 0 0 0 38 14 send_sync 0 0 0 4 1 send_sync 0 0 0 40 1 send_sync 0 0 0 41 1 send_sync 0 0 0 43 1 send_sync 0 0 0 45 2 send_sync 0 0 0 46 4 send_sync 0 0 0 49 16 send_sync 0 0 0 5 2 send_sync 0 0 0 53 2 send_sync 0 0 0 57 4 send_sync 0 0 0 58 14 send_sync 0 0 0 59 17 send_sync 0 0 0 6 20 send_sync 0 0 0 60 16 send_sync 0 0 0 61 8 send_sync 0 0 0 62 4 send_sync 0 0 0 63 1 send_sync 0 0 0 64 1 send_sync 0 0 0 67 1 send_sync 0 0 0 68 9 send_sync 0 0 0 7 1 send_sync 0 0 0 70 1 send_sync 0 0 0 72 1 send_sync 0 0 0 79 5 send_sync 0 0 0 8 3 send_sync 0 0 0 80 1 send_sync 0 0 0 81 1 send_sync 0 0 0 82 1 send_sync 0 0 0 84 1 send_sync 0 0 0 89 3 send_sync 0 0 0 9 1 send_sync 0 0 0 90 1 send_sync 0 0 0 93 1 send_sync 0 0 0 95 1 send_sync 0 0 0 97 147 send_sync 0 1 0 0 1 send_sync 0 33 0 0 As you can see, with only 300.000 cycles (1.100.000 cycles in old configuration) we have multiple cases of long and very long delays, including delay for 229 ms. The only difference from standart OS configuration is sysctl kern.timecounter.alloweddeviation=0 because without it, nanosleep() return with random error up to 4%. ============================================ I have tryed everything that I can think of: 1. Tweaking em0 syscontrols - do not help. In polling mode, works much worse. 2. Replace intel em0 devices with Realtec re0 device - slightly better, but stil produce significant delays. 3. Tweaking kernel syscontrols - disabling new rfc compatibility - do not help. 4. Replace cables and switch to different model just from the box - do not help. If anybody have any ideas how to fix it, please comment it here.
After re-reading: > nanosleep() return with random error up to 4%. For clarification, I mean, it oversleep for up to ~4% of 0.6 seconds.
Did you try multiqueue?
Please attach: 1) /var/run/dmesg.boot of verbose boot for Moxa DA-820; 2) output of "sysctl kern.timecounter" command; 3) output of "sysctl kern.eventtimer" command; 4) output of "sysctl kern.hz" command. If your new system runs with kern.eventtimer.periodic=0 by default, try to change this setting to 1. And why do use version 10.3 that has "Expected EoL" in April 30, 2018? And 10.4 in October 31, 2018, as whole line of 10.x versions. Have you considered trying 11.1-RELEASE? It has many more changes to get fixes, if found necessary.
What syscalls are you measuring delay over? It seems that the column with delay is different between your two sets of samples -- does this mean anything? Thanks.
Additional info: I have tryed the following configuration: Xen (CentOS 7 in domain 0) running over Moxa hardware, under Xen - FreeBSD 10.3-RELEASE i386. xn0 configured as bridge. Between two copies of FreeBSD-10.3 running under Xen on different host computers - this test application produce much better result: grep times fspa2_fspa1_virt.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 9863 send_sync 0 0 0 0 38565 send_sync 0 0 0 1 2 send_sync 0 0 0 10 9 send_sync 0 0 0 11 16 send_sync 0 0 0 12 10 send_sync 0 0 0 13 1 send_sync 0 0 0 17 301599 send_sync 0 0 0 2 1 send_sync 0 0 0 233 698526 send_sync 0 0 0 3 5195 send_sync 0 0 0 4 2770 send_sync 0 0 0 5 1361 send_sync 0 0 0 6 14 send_sync 0 0 0 7 5 send_sync 0 0 0 8 4 send_sync 0 0 0 9 1 send_sync 0 0 1 2 1 send_sync 0 0 2 0 4 send_sync 0 0 2 1 51 send_sync 0 0 2 2 79 send_sync 0 0 2 3 6545 send_sync 0 1 0 0 21849 send_sync 0 1 0 1 1 send_sync 0 1 0 10 1 send_sync 0 1 0 11 1 send_sync 0 1 0 12 39671 send_sync 0 1 0 2 244 send_sync 0 1 0 3 23 send_sync 0 1 0 4 8 send_sync 0 1 0 5 1 send_sync 0 1 0 6 1 send_sync 0 1 2 0 1 send_sync 0 14 0 2 17499 send_sync 0 2 0 0 58073 send_sync 0 2 0 1 1 send_sync 0 2 0 10 3 send_sync 0 2 0 11 2 send_sync 0 2 0 12 115462 send_sync 0 2 0 2 572 send_sync 0 2 0 3 7 send_sync 0 2 0 4 4 send_sync 0 2 0 5 1 send_sync 0 2 0 7 2 send_sync 0 2 0 9 7959 send_sync 0 3 0 0 20669 send_sync 0 3 0 1 1 send_sync 0 3 0 11 1 send_sync 0 3 0 12 49220 send_sync 0 3 0 2 267 send_sync 0 3 0 3 3 send_sync 0 3 0 4 2 send_sync 0 3 0 5 1 send_sync 0 3 1 0 23 send_sync 0 4 0 0 35 send_sync 0 4 0 1 54 send_sync 0 4 0 2 2 send_sync 0 4 0 3 1 send_sync 0 49 0 1 1 send_sync 0 5 0 1 1 send_sync 0 5 0 2 1 send_sync 0 6 0 2 1 send_sync 0 7 0 0 1 send_sync 0 7 0 3 1 send_sync 2 0 0 0 2 send_sync 2 0 0 1 16 send_sync 2 0 0 2 41 send_sync 2 0 0 3 1 send_sync 2 1 0 1 1 send_sync 2 1 0 2 From 1.396.680 samples only 52 >= 10ms, and only 2 >= 20 ms. From my POV, it means that problem is somewhere in OS<->hardware layer, not in OS itself. (In reply to amvandemore from comment #2) No, I have not tried multiqueue - first, it give advantage only under high load, and 40 Kbytes * 3 = 120 Kbytes/sec over 1Gibit line is very low load. Seconds, Under 6.3-RELEASE it works fine without multiqueue. (In reply to Eugene Grosbein from comment #3) dmesg will be added as attachment. root@fspa1:~/clock/new_res # sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) i8254(0) HPET(950) dummy(-1000000) kern.timecounter.hardware: TSC-low kern.timecounter.alloweddeviation: 0 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.TSC-low.quality: 1000 kern.timecounter.tc.TSC-low.frequency: 1247191596 kern.timecounter.tc.TSC-low.counter: 233727093 kern.timecounter.tc.TSC-low.mask: 4294967295 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 10612645 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 14761 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.HPET.frequency: 14318180 kern.timecounter.tc.HPET.counter: 3051462975 kern.timecounter.tc.HPET.mask: 4294967295 root@fspa1:~/clock/new_res # sysctl kern.eventtimer kern.eventtimer.periodic: 0 kern.eventtimer.timer: LAPIC kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: LAPIC(600) HPET(550) HPET1(440) HPET2(440) HPET3(440) HP ET4(440) i8254(100) RTC(0) kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.HPET4.quality: 440 kern.eventtimer.et.HPET4.frequency: 14318180 kern.eventtimer.et.HPET4.flags: 3 kern.eventtimer.et.HPET3.quality: 440 kern.eventtimer.et.HPET3.frequency: 14318180 kern.eventtimer.et.HPET3.flags: 3 kern.eventtimer.et.HPET2.quality: 440 kern.eventtimer.et.HPET2.frequency: 14318180 kern.eventtimer.et.HPET2.flags: 3 kern.eventtimer.et.HPET1.quality: 440 kern.eventtimer.et.HPET1.frequency: 14318180 kern.eventtimer.et.HPET1.flags: 3 kern.eventtimer.et.HPET.quality: 550 kern.eventtimer.et.HPET.frequency: 14318180 kern.eventtimer.et.HPET.flags: 7 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.LAPIC.frequency: 49887687 kern.eventtimer.et.LAPIC.flags: 7 root@fspa1:~/clock/new_res # sysctl kern.hz kern.hz: 1000 root@fspa1:~/clock/new_res # sysctl kern.eventtimer.periodic kern.eventtimer.periodic: 0 I will try and report results here. > And why do use version 10.3 that has "Expected EoL" in April 30, 2018? And > 10.4 in October 31, 2018, as whole line of 10.x versions. > Have you considered trying 11.1-RELEASE? It has many more changes to get > fixes, if found necessary. We have very long verification/release cycle because this software will be used in critical infrastructure (railway automation). Latest security fixes is not actually important, because it will be used in very closed environment. I will try to update to 10.4 and 11.1 and check if it helps. (In reply to Conrad Meyer from comment #4) > What syscalls are you measuring delay over? It seems that the column with > delay is different between your two sets of samples -- does this mean > anything? Thanks. clock_gettime( CLOCK_MONOTONIC, ...); See attached source code. Actually, delay under 6.3-RELEASE is lower ( and evaluate time is higher ) because 6.3-RELEASE is running on different hardware. Test load (Fibonacci calculation) is executed faster under modern hardware. I will try to install 6.3-RELEASE on Moxa hardware, repeat test and report result. Unfortunelly, I need to have test running for at least 6 hours before I get reliable results.
Created attachment 190196 [details] dmesg from verbose boot of Moxa DA-820
I have now running tests on Moxa DA-820 with kern.eventtimer.periodic=1. One interesting thing: if you look on tests with 6.3-RELEASE, 98% of samples is 1115322 send_sync 0 0 0 0 = send(small data) recv(small data) send(40Kb) recv(40Kb) - so every send() and recv() call is executed in <1ms. 7% of samples have 73629 send_sync 0 1 0 0 - so first recv() have 1ms execution time, but it's OK because it is first recv() after nanosleep(~200ms) - and we can't expect that nanosleep() will act exactly the same on different computers. Moreover, nanosleep() period is evaluated from two local calls to clock_gettime( CLOCK_MONOTONIC, ...) and subtraction operation. But, for 10.3-RELEASE we have approximatelly even division between: 155042 send_sync 0 0 0 0 122890 send_sync 0 0 0 1 case with delay on first recv() - less then 0.02% 147 send_sync 0 1 0 0 This last 1 is for second recv() - and in this case, we have already syncronised computers over first small recv(). P.S. Test with kern.eventtimer.periodic=1 produce the following results for short test time: root@fspa2:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 1484 send_sync 0 0 0 0 1314 send_sync 0 0 0 1 1 send_sync 0 0 0 230 4 send_sync 0 1 0 0 root@fspa1:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 1698 send_sync 0 0 0 0 1134 send_sync 0 0 0 1 1 send_sync 0 0 0 229 11 send_sync 0 1 0 0 Very quick very huge delay. One moment: fspa1: times 2550: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247 times 2551: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248 times 2552: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247 times 2553: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247 times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19 times 2555: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248 times 2556: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247 times 2557: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247 times 2558: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248 times 2559: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247 fspb: times 2550: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248 times 2551: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247 times 2552: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248 times 2553: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247 times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18 times 2555: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247 times 2556: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247 times 2557: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248 times 2558: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247 times 2559: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247 The problem arise on the SAME cycle in both computers! How it is possible? Seems like in one of computers both send and receive was blocked (buffered) on some level.
You may want to do "sysctl kern.timecounter.fast_gettime=0" and re-do your tests because 6.3 did not have fast_gettime mode.
Also, you may want to decrease influence of kernel randomness harvester in case it uses network as a source for your system. Please show output of "sysctl kern.random".
Also, if you use em0 network interface, please attach output of "sysctl dev.em.0" You may want to disable NIC flow control to eliminate influence of possible "pause" frames.
(In reply to Eugene Grosbein from comment #8) Will try with sysctl kern.timecounter.fast_gettime=0 and report here. (In reply to Eugene Grosbein from comment #9) root@fspa2:~ # sysctl kern.random kern.random.sys.harvest.swi: 0 kern.random.sys.harvest.interrupt: 0 kern.random.sys.harvest.point_to_point: 0 kern.random.sys.harvest.ethernet: 0 kern.random.sys.seeded: 1 kern.random.yarrow.slowoverthresh: 2 kern.random.yarrow.slowthresh: 128 kern.random.yarrow.fastthresh: 96 kern.random.yarrow.bins: 10 kern.random.yarrow.gengateinterval: 10 kern.random.live_entropy_sources: Hardware, Intel Secure Key RNG kern.random.active_adaptor: yarrow kern.random.adaptors: dummy,yarrow I have already think about it, and have it in /etc/sysctrl.conf kern.random.sys.harvest.swi=0 kern.random.sys.harvest.interrupt=0 kern.random.sys.harvest.point_to_point=0 kern.random.sys.harvest.ethernet=0 (In reply to Eugene Grosbein from comment #10) Currently I have try it with Realtec card (additional). See attached sysctl.txt
Created attachment 190201 [details] Full sysctl -a output from Moxa DA-820
Results of tests with kern.eventtimer.periodic=0 seems very interesting: A1 computer: grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 18989 send_sync 0 0 0 0 1557 send_sync 0 0 0 1 3 send_sync 0 0 0 229 352 send_sync 0 1 0 0 A2 computer root@fspa2:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 11498 send_sync 0 0 0 0 9621 send_sync 0 0 0 1 3 send_sync 0 0 0 2 4 send_sync 0 0 0 230 20 send_sync 0 1 0 0 root@fspa1:~/clock/new_res # grep times periodic.txt | grep recv_data\ 229 times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19 times 7278: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19 times 10424: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19 root@fspa1:~/clock/new_res # times 18296: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248 root@fspa2:~/clock/new_res # grep times periodic.txt| grep recv_data\ 230 times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18 times 7278: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18 times 10424: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18 times 18296: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18 root@fspa2:~/clock/new_res # As you can see, in 3 cases delay was catched on both sides, and in one case - only in A2 computer. Moreover, it seems like A1 computer have 92% of all zeroes: 18989 send_sync 0 0 0 0 1557 send_sync 0 0 0 1 At the same time, A2 computer have only 54% of all zeroes: 11498 send_sync 0 0 0 0 9621 send_sync 0 0 0 1 I have found the difference in config: A1 root@fspa1:~/clock/new_res # cat /boot/loader.conf boot_verbose="YES" coretemp_load="YES" autoboot_delay="-1" loader_logo="fbsdbw" kern.ipc.semmns=350 hw.em.txd="4096" hw.em.rxd="4096" hw.em.rx_process_limit="-1" hw.em.enable_msix="0" A2 root@fspa2:~/clock/new_res # cat /boot/loader.conf console="comconsole,vidconsole" coretemp_load="YES" autoboot_delay="-1" loader_logo="fbsdbw" kern.ipc.semmns="350" hw.em.txd="4096" hw.em.rxd="4096" hw.em.rx_process_limit="-1" hw.em.enable_msix="0" Can boot_verbose="YES" or console="comconsole,vidconsole" make such difference?
(In reply to Eugene Grosbein from comment #10) Regarding flow control: Who will send "pause" frames, if I have only two computers (A1 and A2) connected to this switch, and nothing more? And load is pretty low, like 40K*3 per seconds in both directions.
(In reply to Aleksander Derevianko from comment #14) I would not be so trustful to correctness of flow control implementation in both of NIC firmware and switch. Low volume of traffic means you can turn flow control off without a problem.
(In reply to Aleksander Derevianko from comment #13) > Can boot_verbose="YES" or console="comconsole,vidconsole" make such difference? It can if your system writes something to console during the test.
(In reply to Aleksander Derevianko from comment #13) Why do you disable MSI-X support for em interfaces?
(In reply to Eugene Grosbein from comment #16) No, it doesn't write anything to console. Output from the test itself is redirected to file on disk (SSD 100Gb INTEL SSDSA2BZ100G3 6PB10362). Results was checked from remote ssh session. (In reply to Eugene Grosbein from comment #17) I have switched MSI-X support off in one of the previous attempts to solve this issue. Actually, it doesn't change anything at all in my test case. ------------------- OK, current results from still-running test: Computer A1: root@fspa1:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 212603 send_sync 0 0 0 0 6437 send_sync 0 0 0 1 10 send_sync 0 0 0 229 4177 send_sync 0 1 0 0 1 send_sync 0 1 0 1 Computer A2: root@fspa2:~/clock/new_res # !459 grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 122102 send_sync 0 0 0 0 100857 send_sync 0 0 0 1 47 send_sync 0 0 0 2 12 send_sync 0 0 0 230 2 send_sync 0 0 0 231 170 send_sync 0 1 0 0 I have set kern.timecounter.fast_gettime=0 yesteday, on both A1 and A2, and for last 12 hours 6 more cases of long delays appear on A1, and 8 on A2 computer. I will try to repeat this test with console="comconsole,vidconsole" removed both on A1 and A2 computers.
With console="comconsole,vidconsole" removed both on A1 and A2 computers: root@fspa1:~/clock/new_res # grep times periodic_1.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 5116 send_sync 0 0 0 0 2234 send_sync 0 0 0 1 1 send_sync 0 0 0 229 64 send_sync 0 1 0 0 root@fspa2:~/clock/new_res # grep times periodic_1.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 4000 send_sync 0 0 0 0 3235 send_sync 0 0 0 1 1 send_sync 0 0 0 230 6 send_sync 0 1 0 0 Again, on the same cycle. ---------------------------------------------------- With boot_verbose="YES" root@fspa1:~/clock/new_res # grep times periodic_2.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 6539 send_sync 0 0 0 0 794 send_sync 0 0 0 1 2 send_sync 0 0 0 229 125 send_sync 0 1 0 0 root@fspa2:~/clock/new_res # grep times periodic_2.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 4210 send_sync 0 0 0 0 3249 send_sync 0 0 0 1 2 send_sync 0 0 0 230 6 send_sync 0 1 0 0
Now with em0 devices: A1 computer: root@fspa1:~/clock/new_res # grep times periodic_em0.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 11256 send_sync 0 0 0 0 10968 send_sync 0 0 0 1 234 send_sync 0 1 0 0 1 send_sync 0 1 0 1 A2 computer: root@fspa2:~/clock/new_res # grep times periodic_em0.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 18284 send_sync 0 0 0 0 2490 send_sync 0 0 0 1 1704 send_sync 0 1 0 0 Current results seems just fine. I think, rare delays for 230ms in previous tests is actually Realtec card problems. It's cheap TP-Link TG-3468 device, so if it will continue to work fine with build-in em0 I will consider problem solved. Actual solutions seems to be kern.eventtimer.periodic=1 I will try to set only kern.eventtimer.periodic=1 and kern.timecounter.alloweddeviation=0 parameters, leaving everything else to system defaults.
OK, problem solved. 12-hours test doesn't show any delays longer then 1 ms. For the future reference, only following parameters must be set in /etc/sysctl.conf: -------------------------------------------------- # Exact clock for nanosleep() kern.timecounter.alloweddeviation=0 # Disable random delays in network/adapter code kern.eventtimer.periodic=1 -------------------------------------------------- All other parameters (boot.loader) can be left as system defaults. Closing defect with "Works As Intended" because it's possible to solve just in OS configuration. Thanks everybody for good support and good advices!