I'm running 'sysctl net.inet.tcp.hostcache.list' from operator's crontab every 5 minutes for gathering tcp statistics. After 3 or 4 months of work sysctl hangs. Kill (and kill -9) doesn't help. Running one more "sysctl net.inet.tcp.hostcache.list" command gives the same result - it hangs forever with no output. Other "sysctl SOMETHING" commands works fine. # ps ax| grep sysctl operator 787 0.0 0.0 4236 1004 - D Tue16 2:42.96 /sbin/sysctl net.inet.tcp.hostcache.list operator 1908 0.0 0.0 4236 1004 - D Mon02 2:45.74 /sbin/sysctl net.inet.tcp.hostcache.list operator 2358 0.0 0.0 4236 1004 - D Wed10 1:33.93 /sbin/sysctl net.inet.tcp.hostcache.list operator 3270 0.0 0.0 4236 1004 - D Sun16 2:47.22 /sbin/sysctl net.inet.tcp.hostcache.list After rebot 'sysctl net.inet.tcp.hostcache.list' works again as expected for another 3 or 4 months. The issue is reproducing on about 60 hevy loaded http(s) servers with different hw specs.
What is the output of procstat -kk PID, where PID is the PID of a hanging sysctl process?
It will take some time to reproduce the issue. Plese let me now if there any other information I can collect when sysctl hangs again.
(In reply to Maxim Shalomikhin from comment #2) I mentioned this issue on the bi-weekly transport call, since I don't see how this problem can occur. It was asked if it is possible that the problem could be a hanging console or TCP connection or whatever the output stream is going to?
I just noticed, that sysctl will actually do a syscall to sysctl twice. First, with no buffer space - expecting to receive the size to allocate for a buffer, then it tries to allocate twice as much buffer (apparently "for reasons"), before doing the syscall to sysctl a 2nd time. As the system continues to run, I am curious how large the list of hostcache entries is just prior to the "freeze" (of sysctl), and if the system may be space-constrained. Note that during the call, it appears that both kernel and userspace need memory (twice as much in the userspace). userspace seems to use a non-blocking malloc call, after the first round where the return length is checked. Possibly the system has a hard time allocating a sufficiently large chunk of memory, if the hostcache is fully utilized and extremely busy... In sbuf_new, the kernel is trying a malloc(size, M_SBUF, M_WAITOK|M_ZERO)... Perhaps the sysctl callback procedure could be improved to a) not actually allocate memory when the user has not yet provided a buffer to fill - only return the require size (reducing the chances to run out-of-space somewhat) b) not allocate a huge chunk in sbuf_new right away, but use sbuf_extend repeatedly (although sbuf.c will actually grab a larger chunk of memory, copy the data over,and release the old chunk - thus driving temporary memory requirements higher c) use a different memory model supporting disjoint segments of a c-string scattered in smaller memory chunks, subsequently concatenating them on the SYSCTL_OUT. d) change the M_WAITOK in /kern/subr_sbuf.c#59 to M_NOWAIT and expect an error on very busy systems every once in a while. As a reasonable short term fix, I guess a combination of M_NOWAIT in sbuf.c and use of sbuf_new(smaller size) / sbuf_extend(smaller_size) may be the prudent approach, if this turns out to be the culprit.
(Note that with sbuf_extent, the prior string could be overwritten with a more detailed error (e.g. failed to extend sbuf at the 20th iteration...)
Also, as a quick workaround: What is the output of sysctl net.inet.tcp.hostcache and the above problem about the waiting for a contingeous chunk of kernel memory could be reduced, by restricting net.inet.tcp.hostcache.cachelimit (the output tries to allocate (net.inet.tcp.hostcache.count+1)*128 bytes. With a full hostcache this is almost a 2 MB memory chunk (on my system, the limit is 15360).
(In reply to Richard Scheffenegger from comment #6) That might be the reason... Good idea! So it would be good to see the output of sysctl net.inet.tcp.hostcache and vmstat -m | grep sbuf
(In reply to Michael Tuexen from comment #7) Just to be clear. You can provide these values as of now, you don't need to wait until the hang occurs.
See https://reviews.freebsd.org/D29471 for improving the situation by addressing (b) mentioned above. Note that this may not fully alleviate the issue (I did not investigate the inner workings when the sbin/sysctl is only checking how much buffer memory should be provided), but this is much more in-line with what other, similar SYSCTL_PROC handlers are doing when providing an indeterminate amount of output.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-28 21:12:03 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-03-28 21:50:23 +0000 tcp: reduce memory footprint when listing tcp hostcache In tcp_hostcache_list, the sbuf used would need a large (~2MB) blocking allocation of memory (M_WAITOK), when listing a full hostcache. This may stall the requestor for an indeterminate time. A further optimization is to return the expected userspace buffersize right away, rather than preparing the output of each current entry of the hostcase, provided by: @tuexen. This makes use of the ready-made functions of sbuf to work with sysctl, and repeatedly drain the much smaller buffer. PR: 254333 MFC after: 2 weeks Reviewed By: #transport, tuexen Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29471 sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
(In reply to commit-hook from comment #10) If it is possible, please patch your system with the above patch and give it a try. My guess is that it resolves your issue, but one doesn't know without test. Please let us know if patching a system is an option for you.
I have one machine with hanging sysct: 99750 - D 22:33.32 /sbin/sysctl net.inet.tcp.hostcache.list # procstat -kk 99750 PID TID COMM TDNAME KSTACK 99750 101008 sysctl - mi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x115 vmem_xalloc+0x44b vmem_alloc+0x31 kmem_malloc+0x33 uma_large_malloc+0x46 malloc+0x3d sbuf_new+0x105 sysctl_tcp_hc_list+0x6d sysctl_root_handler_locked+0x8a sysctl_root+0x1ee userland_sysctl+0x134 sys___sysctl+0x5f amd64_syscall+0xa4e fast_syscall_common+0x101 As for cachelimit: Yes, we have large number set, but I noticed, then sysctl hangs, the hostcache.count goes crazy big (and stops near 2^32 at 4294966447): # sysctl net.inet.tcp.hostcache net.inet.tcp.hostcache.purgenow: 0 net.inet.tcp.hostcache.purge: 0 net.inet.tcp.hostcache.prune: 300 net.inet.tcp.hostcache.expire: 3600 net.inet.tcp.hostcache.count: 4294966447 net.inet.tcp.hostcache.bucketlimit: 30 net.inet.tcp.hostcache.hashsize: 65536 net.inet.tcp.hostcache.cachelimit: 1966080 net.inet.tcp.hostcache.enable: 1 # vmstat -m | grep sbuf sbuf 1 1K - 68235 16,32,64,128,256,512,1024,2048,4096,8192,32768,65536 mrsasbuf 949 2860K - 951 32,128,256,2048,8192 Usually, hostcache.count is about ~500k. So, Is it just my fault setting cachelimit to high or this is a bug in syncache with hostcache.count?
Thanks. I can not comment as to why the hostcache.count counts up and beyond the limit. But the other data confirms that what you are seeing is the (unsuccessful) attempt in allocating a huge amount of kernel memory by the sbuf_new function. When trying to dump all the entries of the hostcache, the hostcache.list tries to grab hostcache.cachelimit * 128 bytes, or 1966080 * 128 ~= 250 MB of contingeous kernel memory - twice (!). (Also, if the count really is > cachelimit, the hostcache.list may eventually simply fail, due to insufficient memory...) The following Diffs are under review, but should address these particular issues: o) immediately respond with buffer required without actually allocating/preparing that o) allocating only one bucket's worth of output and moving the output bucket-by-bucket to userspace (reducing the memory footprint temporarily required from 2x 250 MB down to 1x 4kB). See https://reviews.freebsd.org/D29471 https://reviews.freebsd.org/D29481 https://reviews.freebsd.org/D29483 Patching with only D29471 should mostly address the issue, although there remain issues around keeping a lock for an extensive period of time, while moving the output to userspace repeatedly. This may have undesired sideeffects, so you want to probably go with all three. To "unstuck" the system in this state - refrain from issuing hostcache.list multiple times, and try to free up the above mentioned chunk of in-kernel memory (at least temporarily). That should allow all the stalled malloc processes to succeed eventually, and return properly. But on systems with a high uptime or kernel memory churn, and a very large hostcache.cachelimit, it really is only a question of time, until such a huge malloc blocks "indefinitely" - thus the above patches try to be much more smart in what kernel memory really is needed. The downside is, that there is a higher chance, that hash buckets may change more, than they would with that huge memory allocation (if successful). But hostcache.list does not provide a "complete" snapshot of all the entries in the hostcache at a very specific moment in time - they may change between the evaluation of different buckets already (but the longer time for moving data to userspace may allow more changes to happen).
I agree with Richard that we know he issue and the patches from Richard should address the problem. However, I would like to understand why net.inet.tcp.hostcache.count > net.inet.tcp.hostcache.cachelimit can happen. Could you post what tunings you exactly do (in loader.conf and or sysctl.conf) in relation to the hostcache?
Could you also post for the output of netstat -sptcp the two lines with counters for hostcache entries added and bucket overflow.
Please note that you should not be able to set the net.inet.tcp.hostcache.cachelimit too high. It is set automatically to net.inet.tcp.hostcache.bucketlimit * net.inet.tcp.hostcache.hashsize and you should only the able to reduce it. If you provide a larger value in /boot/loader.conf, the setting should be ignored and the above default should be used. The setting you show indicates that you set net.inet.tcp.hostcache.hashsize to 65536, leave net.inet.tcp.hostcache.bucketlimit at the default of 30, and the net.inet.tcp.hostcache.cachelimit is set automatically. However, net.inet.tcp.hostcache.count should always be between 0 (inclusive) and net.inet.tcp.hostcache.cachelimit (inclusive). So I would really like to understand how you end up with such a large net.inet.tcp.hostcache.count. This really blows up the sysctl operation, since the code wants to allocate linesize * (V_tcp_hostcache.cache_count + 1) bytes, which is much higher than the limit Richard computed...
net.inet.tcp.hostcache.count: 4294966447 is -849 (decimal). tcp_hc_purge_internal decrements hostcache.count unconditionally, and would be the most likely candidate where that counter can rapidely go negative (becoming a huge uint32). All the adjustments to hch_length, cache_count and actual add/dels from hashbuckets appear to be symmetric. One more hint (more operationally): if you have high "hostcache buffer-overflows" in 'netstat -snp tcp' you my want to tweak the bucket size rather than the hashsize. Unfortunately, the hostcache does not currently provide insight if a wider hashsize, or a deeper bucketlimit would be preferrential for your workload... (you would probably want a histogram of #buckets of length "n". If that histogram shows few deeply used buckets, but most buckets empty or sparsely used, a narrower hashsize with deeper buckets may be more optimal space use. If most buckets are mostly used, a wider hashsize may be preferrably over deeper buckets...
(In reply to Richard Scheffenegger from comment #17) But tcp_hc_purge_internal() decrements the counter when it removes an entry and frees it. I double checked the code and I think the counter is handled correctly. I did look for an underflow, but I could not find it... If the hash buckets are used highly un-symmetric, I wouldn't suggest to use larger buckets. That results in long processing time. In that case I would suggest to use a better hash algorithm. But this is not the issue right now. I do see two problems: 1. Using a large amount of memory when exporting the list. 2. The counter being off. The first issue is solved by the patches you are working on. The second issue without your patches results in a memory amount which can't be allocated and therefor triggers the problem. The second issue with you patches in place will still result in not adding any new entries to the host cache anymore, since the number counter is than larger than the limit. For the counter having such a large value could happen when there is an underflow. But I don't see how it can happen. For me it looks like the global and the bucket counter are handled correctly. I also looked at the 11.4 code, but I see no issue. Since the statement is that this happens every 3 to 4 month, it must be a rare event. Or some other code is writing in the memory location where the counter is... Richard: Do you see a way how the counter could be off?
(In reply to Michael Tuexen from comment #18) > Richard: Do you see a way how the counter could be off? Unfortunately not. I was thinking about adding KASSERTS when the counter is about to be decremented, but is already zero... Or simply reset all the counters to zero (hch_length, cache_count) when a purgeall is being performed, to return to a "known good" state. The only feasible way I can thing of right now, would be if the hostcache settings are changed dynamically, while the hostcache is already populated with some entries. That may leave the hc_bucket full, but the counters off... But all the normal processing of the accounting variables look good. > But tcp_hc_purge_internal() decrements the counter when it removes an > entry and frees it. I double checked the code and I think the counter > is handled correctly. I did look for an underflow, but I could not > find it... > I also looked at the 11.4 code, but I see no issue. The diff between 11.4 and HEAD is minuscle. None of the logic has changed (meaning this very same issue could potentially still be impacting HEAD). > If the hash buckets are used highly un-symmetric, I wouldn't suggest > to use larger buckets. That results in long processing time. In that > case I would suggest to use a better hash algorithm. But this is not > the issue right now. Correct. (A different salt during hashing may also help. Frequently used entries do percholate to the head of the TAILQ, though.) > For the counter having such a large value could happen when there is > an underflow. But I don't see how it can happen. For me it looks like > the global and the bucket counter are handled correctly. Still, a underflow is more likely than an overflow. > Since the statement is that this happens every 3 to 4 month, it must > be a rare event. Or some other code is writing in the memory location > where the counter is... KASSERTS for safety? (And an eventual core to potentially analyze)
(In reply to Richard Scheffenegger from comment #19) As far as I can see, it is impossible to change the cache config at runtime, the relevant variables are all only settable in /boot/loader.conf. Not sure if KASSERTs help in this case, since I doubt that the reporter is willing to run a debug kernel. In general, they could be added, but it would be more helpful to understand how to trigger the issue (except for memory corruption).
One potential issue came up to my mind: The hch_length is protected by the hch_mtx. But which mutex is protecting the cache_count? Could that be an issue?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c804c8f2c58ba42d476de07fbceff9ac4dd95f0e commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 16:25:53 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-03-31 17:17:37 +0000 Export sbuf_drain to orchestrate lock and drain action While exporting large amounts of data to a sysctl request, datastructures may need to be locked. Exporting the sbuf_drain function allows the coordination between drain events and held locks, to avoid stalls. PR: 254333 Reviewed By: jhb MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29481 sys/kern/subr_sbuf.c | 2 +- sys/sys/sbuf.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=869880463cc2ce64e2e6599eaec880a981f3ced6 commit 869880463cc2ce64e2e6599eaec880a981f3ced6 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:24:01 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-03-31 17:24:21 +0000 tcp: drain tcp_hostcache_list in between per-bucket locks Explicitly drain the sbuf after completing each hash bucket to minimize the work performed while holding the hash bucket lock. PR: 254333 MFC after: 2 weeks Reviewed By: tuexen, jhb, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29483 sys/netinet/tcp_hostcache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=95e56d31e348594973affd0ea81d8f8383bc3031 commit 95e56d31e348594973affd0ea81d8f8383bc3031 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:30:20 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-03-31 18:24:13 +0000 tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t Addressing the underlying root cause for cache_count to show unexpectedly high values, by protecting all arithmetic on that global variable by using counter(9). PR: 254333 Reviewed By: tuexen, #transport MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29510 sys/netinet/tcp_hostcache.c | 20 ++++++++++++-------- sys/netinet/tcp_hostcache.h | 20 ++++++++++---------- 2 files changed, 22 insertions(+), 18 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=529a2a0f2765f6c57c50a5af6be242c03bf714e3 commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-04-01 08:00:32 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-01 08:03:30 +0000 tcp: For hostcache performance, use atomics instead of counters As accessing the tcp hostcache happens frequently on some classes of servers, it was recommended to use atomic_add/subtract rather than (per-CPU distributed) counters, which have to be summed up at high cost to cache efficiency. PR: 254333 MFC after: 2 weeks Sponsored by: NetApp, Inc. Reviewed By: #transport, tuexen, jtl Differential Revision: https://reviews.freebsd.org/D29522 sys/netinet/tcp_hostcache.c | 24 +++++++++++------------- sys/netinet/tcp_hostcache.h | 2 +- 2 files changed, 12 insertions(+), 14 deletions(-)
We have one more machine with hanging sysctl. # sysctl net.inet.tcp.hostcache net.inet.tcp.hostcache.purgenow: 0 net.inet.tcp.hostcache.purge: 0 net.inet.tcp.hostcache.prune: 300 net.inet.tcp.hostcache.expire: 3600 net.inet.tcp.hostcache.count: 4294961495 net.inet.tcp.hostcache.bucketlimit: 30 net.inet.tcp.hostcache.hashsize: 65536 net.inet.tcp.hostcache.cachelimit: 1966080 net.inet.tcp.hostcache.enable: 1 # netstat -sptcp ... 195494221 hostcache entries added 88163 bucket overflow ... loader.conf: accf_data_load="YES" accf_dns_load="YES" accf_http_load="YES" net.inet.tcp.tcbhashsize=131072 net.inet.tcp.syncache.hashsize=65536 net.inet.tcp.syncache.cachelimit=1966080 net.inet.tcp.hostcache.hashsize=65536 net.inet.tcp.hostcache.cachelimit=1966080 sysctl.conf: net.inet.icmp.drop_redirect=1 net.inet.icmp.icmplim=2000 net.inet.icmp.icmplim_output=1 net.inet.ip.fw.dyn_buckets=2048 net.inet.ip.fw.dyn_max=100000 net.inet.tcp.blackhole=2 net.inet.tcp.drop_synfin=1 net.inet.tcp.fast_finwait2_recycle=1 net.inet.tcp.msl=10000 net.inet.udp.blackhole=1 The issue reproduces every 3-4 months on each of 60 servers in different locations. All servers are Dell/IBM with different hw specs but all with ECC RAM, so I don't think this is HW issue. Please let me know any other information I can collect.
The "hang" is due to a waiting malloc requesting much more memory any machine will currently have. The root cause for that, however, is a race condition (likely to hit busy servers), fixed with D29522 (will apply cleanly after D29510). If you are able to patch your systems, that will address this issue. (The full set of these related hostcache improvements will be backported in about a week from now to stable/13, stable/12 and stable/11).
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=13d418a967c90cfd845f41db96383ac7eb5862aa commit 13d418a967c90cfd845f41db96383ac7eb5862aa Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 16:25:53 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 18:50:48 +0000 Export sbuf_drain to orchestrate lock and drain action While exporting large amounts of data to a sysctl request, datastructures may need to be locked. Exporting the sbuf_drain function allows the coordination between drain events and held locks, to avoid stalls. PR: 254333 Reviewed By: jhb MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29481 (cherry picked from commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e) sys/kern/subr_sbuf.c | 2 +- sys/sys/sbuf.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dc5281a7d43917c55905d8b83a5bef69b8013071 commit dc5281a7d43917c55905d8b83a5bef69b8013071 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 16:25:53 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 19:27:15 +0000 Export sbuf_drain to orchestrate lock and drain action While exporting large amounts of data to a sysctl request, datastructures may need to be locked. Exporting the sbuf_drain function allows the coordination between drain events and held locks, to avoid stalls. PR: 254333 Reviewed By: jhb MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29481 (cherry picked from commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e) sys/kern/subr_sbuf.c | 2 +- sys/sys/sbuf.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6ab67f3b20ac44e9e9741bd0bdf557c3aa118d32 commit 6ab67f3b20ac44e9e9741bd0bdf557c3aa118d32 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 16:25:53 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 19:37:02 +0000 Export sbuf_drain to orchestrate lock and drain action While exporting large amounts of data to a sysctl request, datastructures may need to be locked. Exporting the sbuf_drain function allows the coordination between drain events and held locks, to avoid stalls. PR: 254333 Reviewed By: jhb MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29481 (cherry picked from commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e) sys/kern/subr_sbuf.c | 2 +- sys/sys/sbuf.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=632e3363087cb6ef2a7b26a291a044b97afabea7 commit 632e3363087cb6ef2a7b26a291a044b97afabea7 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:30:20 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 19:43:14 +0000 tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t Addressing the underlying root cause for cache_count to show unexpectedly high values, by protecting all arithmetic on that global variable by using counter(9). PR: 254333 Reviewed By: tuexen, #transport MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29510 (cherry picked from commit 95e56d31e348594973affd0ea81d8f8383bc3031) sys/netinet/tcp_hostcache.c | 20 ++++++++++++-------- sys/netinet/tcp_hostcache.h | 20 ++++++++++---------- 2 files changed, 22 insertions(+), 18 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=831be7fa19ef10c78da51c222693a5a2083d1da4 commit 831be7fa19ef10c78da51c222693a5a2083d1da4 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-28 21:12:03 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 19:42:45 +0000 tcp: reduce memory footprint when listing tcp hostcache In tcp_hostcache_list, the sbuf used would need a large (~2MB) blocking allocation of memory (M_WAITOK), when listing a full hostcache. This may stall the requestor for an indeterminate time. A further optimization is to return the expected userspace buffersize right away, rather than preparing the output of each current entry of the hostcase, provided by: @tuexen. This makes use of the ready-made functions of sbuf to work with sysctl, and repeatedly drain the much smaller buffer. PR: 254333 MFC after: 2 weeks Reviewed By: #transport, tuexen Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29471 (cherry picked from commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5) sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4052275c2fc5f96b4a208404733713959afe68f6 commit 4052275c2fc5f96b4a208404733713959afe68f6 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-04-01 08:00:32 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 19:44:00 +0000 tcp: For hostcache performance, use atomics instead of counters As accessing the tcp hostcache happens frequently on some classes of servers, it was recommended to use atomic_add/subtract rather than (per-CPU distributed) counters, which have to be summed up at high cost to cache efficiency in a hot codepath. PR: 254333 MFC after: 2 weeks Sponsored by: NetApp, Inc. Reviewed By: #transport, tuexen, jtl Differential Revision: https://reviews.freebsd.org/D29522 (cherry picked from commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3) sys/netinet/tcp_hostcache.c | 24 +++++++++++------------- sys/netinet/tcp_hostcache.h | 2 +- 2 files changed, 12 insertions(+), 14 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e610ebe5d351c0570101ec58f1642904a30051ae commit e610ebe5d351c0570101ec58f1642904a30051ae Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:24:01 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 19:42:58 +0000 tcp: drain tcp_hostcache_list in between per-bucket locks Explicitly drain the sbuf after completing each hash bucket to minimize the work performed while holding the hash bucket lock. PR: 254333 MFC after: 2 weeks Reviewed By: tuexen, jhb, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29483 (cherry picked from commit 869880463cc2ce64e2e6599eaec880a981f3ced6) sys/netinet/tcp_hostcache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e9ca38abc79b5ee0cc0052fb4e213b3dd5720716 commit e9ca38abc79b5ee0cc0052fb4e213b3dd5720716 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-04-01 08:00:32 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 20:52:37 +0000 tcp: For hostcache performance, use atomics instead of counters As accessing the tcp hostcache happens frequently on some classes of servers, it was recommended to use atomic_add/subtract rather than (per-CPU distributed) counters, which have to be summed up at high cost to cache efficiency. PR: 254333 MFC after: 2 weeks Sponsored by: NetApp, Inc. Reviewed By: #transport, tuexen, jtl Differential Revision: https://reviews.freebsd.org/D29522 (cherry picked from commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3) sys/netinet/tcp_hostcache.c | 24 +++++++++++------------- sys/netinet/tcp_hostcache.h | 2 +- 2 files changed, 12 insertions(+), 14 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c0f8ed6ff81260fb4a26da354d3f06aaa83e74de commit c0f8ed6ff81260fb4a26da354d3f06aaa83e74de Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:30:20 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 20:52:21 +0000 tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t Addressing the underlying root cause for cache_count to show unexpectedly high values, by protecting all arithmetic on that global variable by using counter(9). PR: 254333 Reviewed By: tuexen, #transport MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29510 (cherry picked from commit 95e56d31e348594973affd0ea81d8f8383bc3031) sys/netinet/tcp_hostcache.c | 20 ++++++++++++-------- sys/netinet/tcp_hostcache.h | 20 ++++++++++---------- 2 files changed, 22 insertions(+), 18 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=54beb1ef2c831394ef908ef52bac690a4fe362e0 commit 54beb1ef2c831394ef908ef52bac690a4fe362e0 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-28 21:12:03 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 20:51:57 +0000 tcp: reduce memory footprint when listing tcp hostcache In tcp_hostcache_list, the sbuf used would need a large (~2MB) blocking allocation of memory (M_WAITOK), when listing a full hostcache. This may stall the requestor for an indeterminate time. A further optimization is to return the expected userspace buffersize right away, rather than preparing the output of each current entry of the hostcase, provided by: @tuexen. This makes use of the ready-made functions of sbuf to work with sysctl, and repeatedly drain the much smaller buffer. PR: 254333 MFC after: 2 weeks Reviewed By: #transport, tuexen Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29471 (cherry picked from commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5) sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=12ce9e4868d1454b263a2330ce95e91b145c1c8f commit 12ce9e4868d1454b263a2330ce95e91b145c1c8f Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:24:01 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 20:52:09 +0000 tcp: drain tcp_hostcache_list in between per-bucket locks Explicitly drain the sbuf after completing each hash bucket to minimize the work performed while holding the hash bucket lock. PR: 254333 MFC after: 2 weeks Reviewed By: tuexen, jhb, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29483 (cherry picked from commit 869880463cc2ce64e2e6599eaec880a981f3ced6) sys/netinet/tcp_hostcache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c449b8a71bce7d7838164f6a5595f71c91b5b91b commit c449b8a71bce7d7838164f6a5595f71c91b5b91b Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-28 21:12:03 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 21:33:09 +0000 tcp: reduce memory footprint when listing tcp hostcache In tcp_hostcache_list, the sbuf used would need a large (~2MB) blocking allocation of memory (M_WAITOK), when listing a full hostcache. This may stall the requestor for an indeterminate time. A further optimization is to return the expected userspace buffersize right away, rather than preparing the output of each current entry of the hostcase, provided by: @tuexen. This makes use of the ready-made functions of sbuf to work with sysctl, and repeatedly drain the much smaller buffer. PR: 254333 MFC after: 2 weeks Reviewed By: #transport, tuexen Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29471 (cherry picked from commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5) sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=254b56ac2a9ee43d10b0a4f8eeb20b722bb7fd74 commit 254b56ac2a9ee43d10b0a4f8eeb20b722bb7fd74 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-04-01 08:00:32 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 21:33:41 +0000 tcp: For hostcache performance, use atomics instead of counters As accessing the tcp hostcache happens frequently on some classes of servers, it was recommended to use atomic_add/subtract rather than (per-CPU distributed) counters, which have to be summed up at high cost to cache efficiency. PR: 254333 MFC after: 2 weeks Sponsored by: NetApp, Inc. Reviewed By: #transport, tuexen, jtl Differential Revision: https://reviews.freebsd.org/D29522 (cherry picked from commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3) sys/netinet/tcp_hostcache.c | 24 +++++++++++------------- sys/netinet/tcp_hostcache.h | 2 +- 2 files changed, 12 insertions(+), 14 deletions(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a85003abdad402c27c12cf1a31b243c39483f263 commit a85003abdad402c27c12cf1a31b243c39483f263 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:24:01 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 21:33:20 +0000 tcp: drain tcp_hostcache_list in between per-bucket locks Explicitly drain the sbuf after completing each hash bucket to minimize the work performed while holding the hash bucket lock. PR: 254333 MFC after: 2 weeks Reviewed By: tuexen, jhb, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29483 (cherry picked from commit 869880463cc2ce64e2e6599eaec880a981f3ced6) sys/netinet/tcp_hostcache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d20563819b9292ddcd42caf3449636b84825b1d6 commit d20563819b9292ddcd42caf3449636b84825b1d6 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2021-03-31 17:30:20 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2021-04-16 21:33:31 +0000 tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t Addressing the underlying root cause for cache_count to show unexpectedly high values, by protecting all arithmetic on that global variable by using counter(9). PR: 254333 Reviewed By: tuexen, #transport MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29510 (cherry picked from commit 95e56d31e348594973affd0ea81d8f8383bc3031) sys/netinet/tcp_hostcache.c | 20 ++++++++++++-------- sys/netinet/tcp_hostcache.h | 20 ++++++++++---------- 2 files changed, 22 insertions(+), 18 deletions(-)
I believe this PR can be marked closed, with the two fixes (the race leading to erroneous accounting of the number of global entries, and the fix to not excessively use kernel memory while providing the list of entries) having been backported. Perhaps one of the systems with the problem can be upgraded to latest stable and validated if the problem is rectified, as expected.
Thank you for the update Richard. Re-open pending MFC. If this will not aor cannot be merged to a particular stable/ branch, please set mfc-stableX flag to - with comment
I don't quite understand that last comment, and why this PR got reopened? The MFC was put in stable/13, /12 and even /11 in middle of April already (2 Months ago).
(In reply to Richard Scheffenegger from comment #45) May be somebody needs to set the flags accordingly to the MFC state. So stable-13? -> stable-13+ ...