Bug 254333 - [tcp] sysctl net.inet.tcp.hostcache.list hangs
Summary: [tcp] sysctl net.inet.tcp.hostcache.list hangs
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.4-STABLE
Hardware: amd64 Any
: --- Affects Many People
Assignee: Richard Scheffenegger
URL: https://reviews.freebsd.org/D29471
Keywords: performance
Depends on:
Blocks:
 
Reported: 2021-03-16 10:15 UTC by Maxim Shalomikhin
Modified: 2021-06-24 18:56 UTC (History)
5 users (show)

See Also:
tuexen: mfc-stable13+
tuexen: mfc-stable12+
tuexen: mfc-stable11+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maxim Shalomikhin 2021-03-16 10:15:09 UTC
I'm running 'sysctl net.inet.tcp.hostcache.list' from operator's crontab every 5 minutes for gathering tcp statistics. 

After 3 or 4 months of work sysctl hangs. Kill (and kill -9) doesn't help.
Running one more "sysctl net.inet.tcp.hostcache.list" command gives the same result - it hangs forever with no output. Other "sysctl SOMETHING" commands works fine. 

# ps ax| grep sysctl
operator     787    0.0  0.0    4236  1004  -  D    Tue16         2:42.96 /sbin/sysctl net.inet.tcp.hostcache.list
operator    1908    0.0  0.0    4236  1004  -  D    Mon02         2:45.74 /sbin/sysctl net.inet.tcp.hostcache.list
operator    2358    0.0  0.0    4236  1004  -  D    Wed10         1:33.93 /sbin/sysctl net.inet.tcp.hostcache.list
operator    3270    0.0  0.0    4236  1004  -  D    Sun16         2:47.22 /sbin/sysctl net.inet.tcp.hostcache.list


After rebot 'sysctl net.inet.tcp.hostcache.list' works again as expected for another 3 or 4 months.

The issue is reproducing on about 60 hevy loaded http(s) servers with different hw specs.
Comment 1 Michael Tuexen freebsd_committer freebsd_triage 2021-03-17 13:52:30 UTC
What is the output of procstat -kk PID, where PID is the PID of a hanging sysctl process?
Comment 2 Maxim Shalomikhin 2021-03-18 17:32:17 UTC
It will take some time to reproduce the issue. Plese let me now if there any other information I can collect when sysctl hangs again.
Comment 3 Michael Tuexen freebsd_committer freebsd_triage 2021-03-26 10:35:51 UTC
(In reply to Maxim Shalomikhin from comment #2)
I mentioned this issue on the bi-weekly transport call, since I don't see how this problem can occur. It was asked if it is possible that the problem could be a hanging console or TCP connection or whatever the output stream is going to?
Comment 4 Richard Scheffenegger freebsd_committer freebsd_triage 2021-03-28 11:46:06 UTC
I just noticed, that sysctl will actually do a syscall to sysctl twice.
First, with no buffer space - expecting to receive the size to allocate for a buffer, then it tries to allocate twice as much buffer (apparently "for reasons"), before doing the syscall to sysctl a 2nd time.

As the system continues to run, I am curious how large the list of hostcache entries is just prior to the "freeze" (of sysctl), and if the system may be space-constrained.

Note that during the call, it appears that both kernel and userspace need memory (twice as much in the userspace). userspace seems to use a non-blocking malloc call, after the first round where the return length is checked.

Possibly the system has a hard time allocating a sufficiently large chunk of memory, if the hostcache is fully utilized and extremely busy...

In sbuf_new, the kernel is trying a malloc(size, M_SBUF, M_WAITOK|M_ZERO)...

Perhaps the sysctl callback procedure could be improved to 
a) not actually allocate memory when the user has not yet provided a buffer to fill - only return the require size (reducing the chances to run out-of-space somewhat)

b) not allocate a huge chunk in sbuf_new right away, but use sbuf_extend repeatedly (although sbuf.c will actually grab a larger chunk of memory, copy the data over,and release the old chunk - thus driving temporary memory requirements higher

c) use a different memory model supporting disjoint segments of a c-string scattered in smaller memory chunks, subsequently concatenating them on the SYSCTL_OUT.

d) change the M_WAITOK in /kern/subr_sbuf.c#59 to M_NOWAIT and expect an error on very busy systems every once in a while.


As a reasonable short term fix, I guess a combination of M_NOWAIT in sbuf.c and use of sbuf_new(smaller size) / sbuf_extend(smaller_size) may be the prudent approach, if this turns out to be the culprit.
Comment 5 Richard Scheffenegger freebsd_committer freebsd_triage 2021-03-28 11:47:50 UTC
(Note that with sbuf_extent, the prior string could be overwritten with a more detailed error (e.g. failed to extend sbuf at the 20th iteration...)
Comment 6 Richard Scheffenegger freebsd_committer freebsd_triage 2021-03-28 11:52:28 UTC
Also, as a quick workaround:

What is the output of sysctl net.inet.tcp.hostcache

and the above problem about the waiting for a contingeous chunk of kernel memory could be reduced, by restricting net.inet.tcp.hostcache.cachelimit
(the output tries to allocate (net.inet.tcp.hostcache.count+1)*128 bytes. With a full hostcache this is almost a 2 MB memory chunk (on my system, the limit is 15360).
Comment 7 Michael Tuexen freebsd_committer freebsd_triage 2021-03-28 13:13:45 UTC
(In reply to Richard Scheffenegger from comment #6)
That might be the reason... Good idea!
So it would be good to see the output of
sysctl net.inet.tcp.hostcache
and
vmstat -m | grep sbuf
Comment 8 Michael Tuexen freebsd_committer freebsd_triage 2021-03-28 13:15:10 UTC
(In reply to Michael Tuexen from comment #7)
Just to be clear. You can provide these values as of now, you don't need to wait until the hang occurs.
Comment 9 Richard Scheffenegger freebsd_committer freebsd_triage 2021-03-28 16:24:46 UTC
See https://reviews.freebsd.org/D29471 for improving the situation by addressing
(b) mentioned above.

Note that this may not fully alleviate the issue (I did not investigate the
inner workings when the sbin/sysctl is only checking how much buffer memory
should be provided), but this is much more in-line with what other, similar
SYSCTL_PROC handlers are doing when providing an indeterminate amount of 
output.
Comment 10 commit-hook freebsd_committer freebsd_triage 2021-03-28 21:53:12 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=cb0dd7e122b8936ad61a141e65ef8ef874bfebe5

commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-28 21:12:03 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-03-28 21:50:23 +0000

    tcp: reduce memory footprint when listing tcp hostcache

    In tcp_hostcache_list, the sbuf used would need a large (~2MB)
    blocking allocation of memory (M_WAITOK), when listing a
    full hostcache. This may stall the requestor for an indeterminate
    time.

    A further optimization is to return the expected userspace
    buffersize right away, rather than preparing the output of
    each current entry of the hostcase, provided by: @tuexen.

    This makes use of the ready-made functions of sbuf to work
    with sysctl, and repeatedly drain the much smaller buffer.

    PR: 254333
    MFC after: 2 weeks
    Reviewed By: #transport, tuexen
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29471

 sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)
Comment 11 Michael Tuexen freebsd_committer freebsd_triage 2021-03-28 22:00:25 UTC
(In reply to commit-hook from comment #10)
If it is possible, please patch your system with the above patch and give it a try. My guess is that it resolves your issue, but one doesn't know without test. Please let us know if patching a system is an option for you.
Comment 12 Maxim Shalomikhin 2021-03-30 13:36:49 UTC
I have one machine with hanging sysct:
99750  -  D         22:33.32 /sbin/sysctl net.inet.tcp.hostcache.list

# procstat -kk 99750
  PID    TID COMM                TDNAME              KSTACK
99750 101008 sysctl              -                   mi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x115 vmem_xalloc+0x44b vmem_alloc+0x31 kmem_malloc+0x33 uma_large_malloc+0x46 malloc+0x3d sbuf_new+0x105 sysctl_tcp_hc_list+0x6d sysctl_root_handler_locked+0x8a sysctl_root+0x1ee userland_sysctl+0x134 sys___sysctl+0x5f amd64_syscall+0xa4e fast_syscall_common+0x101

As for cachelimit: Yes, we have large number set, but I noticed, then sysctl hangs, the hostcache.count goes crazy big (and stops near 2^32 at 4294966447):
# sysctl net.inet.tcp.hostcache
net.inet.tcp.hostcache.purgenow: 0
net.inet.tcp.hostcache.purge: 0
net.inet.tcp.hostcache.prune: 300
net.inet.tcp.hostcache.expire: 3600
net.inet.tcp.hostcache.count: 4294966447
net.inet.tcp.hostcache.bucketlimit: 30
net.inet.tcp.hostcache.hashsize: 65536
net.inet.tcp.hostcache.cachelimit: 1966080
net.inet.tcp.hostcache.enable: 1
# vmstat -m | grep sbuf
         sbuf     1     1K       -    68235  16,32,64,128,256,512,1024,2048,4096,8192,32768,65536
     mrsasbuf   949  2860K       -      951  32,128,256,2048,8192

Usually, hostcache.count is about ~500k.
So, Is it just my fault setting cachelimit to high or this is a bug in syncache with hostcache.count?
Comment 13 Richard Scheffenegger freebsd_committer freebsd_triage 2021-03-30 14:33:01 UTC
Thanks. I can not comment as to why the hostcache.count counts up and beyond the limit.

But the other data confirms that what you are seeing is the (unsuccessful) attempt in allocating a huge amount of kernel memory by the sbuf_new function.

When trying to dump all the entries of the hostcache, the hostcache.list
 tries to grab hostcache.cachelimit * 128 bytes, or  1966080 *  128 ~= 250 MB of contingeous kernel memory - twice (!).

(Also, if the count really is > cachelimit, the hostcache.list may eventually simply fail, due to insufficient memory...)

The following Diffs are under review, but should address these particular issues:
o) immediately respond with buffer required without actually allocating/preparing that 
o) allocating only one bucket's worth of output and moving the output bucket-by-bucket to userspace (reducing the memory footprint temporarily required from 2x 250 MB down to 1x 4kB).

See 
https://reviews.freebsd.org/D29471
https://reviews.freebsd.org/D29481
https://reviews.freebsd.org/D29483

Patching with only D29471 should mostly address the issue, although there remain issues around keeping a lock for an extensive period of time, while moving the output to userspace repeatedly. This may have undesired sideeffects, so you want to probably go with all three.

To "unstuck" the system in this state - refrain from issuing hostcache.list multiple times, and try to free up the above mentioned chunk of in-kernel memory (at least temporarily). That should allow all the stalled malloc processes to succeed eventually, and return properly.

But on systems with a high uptime or kernel memory churn, and a very large hostcache.cachelimit, it really is only a question of time, until such a huge malloc blocks "indefinitely" - thus the above patches try to be much more smart in what kernel memory really is needed.

The downside is, that there is a higher chance, that hash buckets may change more, than they would with that huge memory allocation (if successful).

But hostcache.list does not provide a "complete" snapshot of all the entries in the hostcache at a very specific moment in time - they may change between the evaluation of different buckets already (but the longer time for moving data to userspace may allow more changes to happen).
Comment 14 Michael Tuexen freebsd_committer freebsd_triage 2021-03-30 15:47:47 UTC
I agree with Richard that we know he issue and the patches from Richard should address the problem.

However, I would like to understand why net.inet.tcp.hostcache.count > net.inet.tcp.hostcache.cachelimit can happen. Could you post what tunings you exactly do (in loader.conf and or sysctl.conf) in relation to the hostcache?
Comment 15 Michael Tuexen freebsd_committer freebsd_triage 2021-03-30 16:53:24 UTC
Could you also post for the output of netstat -sptcp the two lines with counters for hostcache entries added and bucket overflow.
Comment 16 Michael Tuexen freebsd_committer freebsd_triage 2021-03-30 17:27:02 UTC
Please note that you should not be able to set the net.inet.tcp.hostcache.cachelimit too high. It is set automatically to net.inet.tcp.hostcache.bucketlimit * net.inet.tcp.hostcache.hashsize and you should only the able to reduce it. If you provide a larger value in /boot/loader.conf, the setting should be ignored and the above default should be used.

The setting you show indicates that you set net.inet.tcp.hostcache.hashsize to 65536, leave net.inet.tcp.hostcache.bucketlimit at the default of 30, and the net.inet.tcp.hostcache.cachelimit is set automatically.

However, net.inet.tcp.hostcache.count should always be between 0 (inclusive) and net.inet.tcp.hostcache.cachelimit (inclusive).

So I would really like to understand how you end up with such a large net.inet.tcp.hostcache.count. This really blows up the sysctl operation, since
the code wants to allocate linesize * (V_tcp_hostcache.cache_count + 1) bytes, which is much higher than the limit Richard computed...
Comment 17 Richard Scheffenegger freebsd_committer freebsd_triage 2021-03-30 20:06:08 UTC
net.inet.tcp.hostcache.count: 4294966447 is -849 (decimal).

tcp_hc_purge_internal decrements hostcache.count unconditionally, and would be 
the most likely candidate where that counter can rapidely go negative (becoming a huge uint32).

All the adjustments to hch_length, cache_count and actual add/dels from hashbuckets appear to be symmetric.

One more hint (more operationally): if you have high "hostcache buffer-overflows" in 'netstat -snp tcp' you my want to tweak the bucket size rather than the hashsize.

Unfortunately, the hostcache does not currently provide insight if a wider hashsize, or a deeper bucketlimit would be preferrential for your workload...

(you would probably want a histogram of #buckets of length "n". If that histogram shows few deeply used buckets, but most buckets empty or sparsely used, a narrower hashsize with deeper buckets may be more optimal space use.

If most buckets are mostly used, a wider hashsize may be preferrably over deeper buckets...
Comment 18 Michael Tuexen freebsd_committer freebsd_triage 2021-03-30 21:20:06 UTC
(In reply to Richard Scheffenegger from comment #17)
But tcp_hc_purge_internal() decrements the counter when it removes an entry and frees it. I double checked the code and I think the counter is handled correctly. I did look for an underflow, but I could not find it...

If the hash buckets are used highly un-symmetric, I wouldn't suggest to use larger buckets. That results in long processing time. In that case I would suggest to use a better hash algorithm. But this is not the issue right now.

I do see two problems:
1. Using a large amount of memory when exporting the list.
2. The counter being off.

The first issue is solved by the patches you are working on.

The second issue without your patches results in a memory amount which can't be allocated and therefor triggers the problem. The second issue with you patches in place will still result in not adding any new entries to the host cache anymore, since the number counter is than larger than the limit.

For the counter having such a large value could happen when there is an underflow. But I don't see how it can happen. For me it looks like the global and the bucket counter are handled correctly. I also looked at the 11.4 code, but I see no issue.

Since the statement is that this happens every 3 to 4 month, it must be a rare event. Or some other code is writing in the memory location where the counter is...

Richard: Do you see a way how the counter could be off?
Comment 19 Richard Scheffenegger freebsd_committer freebsd_triage 2021-03-30 22:27:32 UTC
(In reply to Michael Tuexen from comment #18)
> Richard: Do you see a way how the counter could be off?

Unfortunately not. I was thinking about adding KASSERTS when the counter is about to be decremented, but is already zero...

Or simply reset all the counters to zero (hch_length, cache_count) when a purgeall is being performed, to return to a "known good" state.

The only feasible way I can thing of right now, would be if the hostcache settings are changed dynamically, while the hostcache is already populated with some entries. That may leave the hc_bucket full, but the counters off...

But all the normal processing of the accounting variables look good.


> But tcp_hc_purge_internal() decrements the counter when it removes an
> entry and frees it. I double checked the code and I think the counter
> is handled correctly. I did look for an underflow, but I could not
> find it...


> I also looked at the 11.4 code, but I see no issue.

The diff between 11.4 and HEAD is minuscle. None of the logic has changed (meaning this very same issue could potentially still be impacting HEAD).


> If the hash buckets are used highly un-symmetric, I wouldn't suggest
> to use larger buckets. That results in long processing time. In that
> case I would suggest to use a better hash algorithm. But this is not
> the issue right now.

Correct. (A different salt during hashing may also help. Frequently used entries do percholate to the head of the TAILQ, though.)


> For the counter having such a large value could happen when there is
> an underflow. But I don't see how it can happen. For me it looks like
> the global and the bucket counter are handled correctly. 

Still, a underflow is more likely than an overflow.

> Since the statement is that this happens every 3 to 4 month, it must
> be a rare event. Or some other code is writing in the memory location
> where the counter is...

KASSERTS for safety? (And an eventual core to potentially analyze)
Comment 20 Michael Tuexen freebsd_committer freebsd_triage 2021-03-30 22:53:01 UTC
(In reply to Richard Scheffenegger from comment #19)
As far as I can see, it is impossible to change the cache config at runtime, the relevant variables are all only settable in /boot/loader.conf.

Not sure if KASSERTs help in this case, since I doubt that the reporter is willing to run a debug kernel. In general, they could be added, but it would be more helpful to understand how to trigger the issue (except for memory corruption).
Comment 21 Michael Tuexen freebsd_committer freebsd_triage 2021-03-31 07:09:03 UTC
One potential issue came up to my mind: The hch_length is protected by the hch_mtx.
But which mutex is protecting the cache_count? Could that be an issue?
Comment 22 commit-hook freebsd_committer freebsd_triage 2021-03-31 17:19:12 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c804c8f2c58ba42d476de07fbceff9ac4dd95f0e

commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 16:25:53 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-03-31 17:17:37 +0000

    Export sbuf_drain to orchestrate lock and drain action

    While exporting large amounts of data to a sysctl
    request, datastructures may need to be locked.

    Exporting the sbuf_drain function allows the
    coordination between drain events and held
    locks, to avoid stalls.

    PR:             254333
    Reviewed By:    jhb
    MFC after:      2 weeks
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29481

 sys/kern/subr_sbuf.c | 2 +-
 sys/sys/sbuf.h       | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
Comment 23 commit-hook freebsd_committer freebsd_triage 2021-03-31 17:26:15 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=869880463cc2ce64e2e6599eaec880a981f3ced6

commit 869880463cc2ce64e2e6599eaec880a981f3ced6
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:24:01 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-03-31 17:24:21 +0000

    tcp: drain tcp_hostcache_list in between per-bucket locks

    Explicitly drain the sbuf after completing each hash bucket
    to minimize the work performed while holding the hash
    bucket lock.

    PR:             254333
    MFC after:      2 weeks
    Reviewed By:    tuexen, jhb, #transport
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29483

 sys/netinet/tcp_hostcache.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
Comment 24 commit-hook freebsd_committer freebsd_triage 2021-03-31 18:25:24 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=95e56d31e348594973affd0ea81d8f8383bc3031

commit 95e56d31e348594973affd0ea81d8f8383bc3031
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:30:20 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-03-31 18:24:13 +0000

    tcp: Make hostcache.cache_count MPSAFE  by using a counter_u64_t

    Addressing the underlying root cause for cache_count to
    show unexpectedly high  values, by protecting all arithmetic on
    that global variable by using counter(9).

    PR:             254333
    Reviewed By: tuexen, #transport
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29510

 sys/netinet/tcp_hostcache.c | 20 ++++++++++++--------
 sys/netinet/tcp_hostcache.h | 20 ++++++++++----------
 2 files changed, 22 insertions(+), 18 deletions(-)
Comment 25 commit-hook freebsd_committer freebsd_triage 2021-04-01 08:04:05 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=529a2a0f2765f6c57c50a5af6be242c03bf714e3

commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-04-01 08:00:32 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-01 08:03:30 +0000

    tcp: For hostcache performance, use atomics instead of counters

    As accessing the tcp hostcache happens frequently on some
    classes of servers, it was recommended to use atomic_add/subtract
    rather than (per-CPU distributed) counters, which have to be
    summed up at high cost to cache efficiency.

    PR: 254333
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Reviewed By: #transport, tuexen, jtl
    Differential Revision: https://reviews.freebsd.org/D29522

 sys/netinet/tcp_hostcache.c | 24 +++++++++++-------------
 sys/netinet/tcp_hostcache.h |  2 +-
 2 files changed, 12 insertions(+), 14 deletions(-)
Comment 26 Maxim Shalomikhin 2021-04-09 08:26:29 UTC
We have one more machine with hanging sysctl.

# sysctl net.inet.tcp.hostcache
net.inet.tcp.hostcache.purgenow: 0
net.inet.tcp.hostcache.purge: 0
net.inet.tcp.hostcache.prune: 300
net.inet.tcp.hostcache.expire: 3600
net.inet.tcp.hostcache.count: 4294961495
net.inet.tcp.hostcache.bucketlimit: 30
net.inet.tcp.hostcache.hashsize: 65536
net.inet.tcp.hostcache.cachelimit: 1966080
net.inet.tcp.hostcache.enable: 1

# netstat -sptcp
...
        195494221 hostcache entries added
                88163 bucket overflow
...

loader.conf:
accf_data_load="YES"
accf_dns_load="YES" 
accf_http_load="YES"
net.inet.tcp.tcbhashsize=131072
net.inet.tcp.syncache.hashsize=65536
net.inet.tcp.syncache.cachelimit=1966080
net.inet.tcp.hostcache.hashsize=65536
net.inet.tcp.hostcache.cachelimit=1966080

sysctl.conf:
net.inet.icmp.drop_redirect=1
net.inet.icmp.icmplim=2000
net.inet.icmp.icmplim_output=1
net.inet.ip.fw.dyn_buckets=2048
net.inet.ip.fw.dyn_max=100000
net.inet.tcp.blackhole=2
net.inet.tcp.drop_synfin=1
net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.msl=10000
net.inet.udp.blackhole=1

The issue reproduces every 3-4 months on each of 60 servers in different locations. All servers are Dell/IBM with different hw specs but all with ECC RAM, so I don't think this is HW issue.

Please let me know any other information I can collect.
Comment 27 Richard Scheffenegger freebsd_committer freebsd_triage 2021-04-09 08:56:38 UTC
The "hang" is due to a waiting malloc requesting much more memory any machine will currently have. 

The root cause for that, however, is a race condition (likely to hit busy servers), fixed with D29522 (will apply cleanly after D29510).

If you are able to patch your systems, that will address this issue.

(The full set of these related hostcache improvements will be backported in about a week from now to stable/13, stable/12 and stable/11).
Comment 28 commit-hook freebsd_committer freebsd_triage 2021-04-16 19:26:59 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=13d418a967c90cfd845f41db96383ac7eb5862aa

commit 13d418a967c90cfd845f41db96383ac7eb5862aa
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 16:25:53 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 18:50:48 +0000

    Export sbuf_drain to orchestrate lock and drain action

    While exporting large amounts of data to a sysctl
    request, datastructures may need to be locked.

    Exporting the sbuf_drain function allows the
    coordination between drain events and held
    locks, to avoid stalls.

    PR:             254333
    Reviewed By:    jhb
    MFC after:      2 weeks
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29481

    (cherry picked from commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e)

 sys/kern/subr_sbuf.c | 2 +-
 sys/sys/sbuf.h       | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
Comment 29 commit-hook freebsd_committer freebsd_triage 2021-04-16 19:37:03 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=dc5281a7d43917c55905d8b83a5bef69b8013071

commit dc5281a7d43917c55905d8b83a5bef69b8013071
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 16:25:53 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 19:27:15 +0000

    Export sbuf_drain to orchestrate lock and drain action

    While exporting large amounts of data to a sysctl
    request, datastructures may need to be locked.

    Exporting the sbuf_drain function allows the
    coordination between drain events and held
    locks, to avoid stalls.

    PR:             254333
    Reviewed By:    jhb
    MFC after:      2 weeks
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29481

    (cherry picked from commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e)

 sys/kern/subr_sbuf.c | 2 +-
 sys/sys/sbuf.h       | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
Comment 30 commit-hook freebsd_committer freebsd_triage 2021-04-16 19:41:05 UTC
A commit in branch stable/11 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6ab67f3b20ac44e9e9741bd0bdf557c3aa118d32

commit 6ab67f3b20ac44e9e9741bd0bdf557c3aa118d32
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 16:25:53 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 19:37:02 +0000

    Export sbuf_drain to orchestrate lock and drain action

    While exporting large amounts of data to a sysctl
    request, datastructures may need to be locked.

    Exporting the sbuf_drain function allows the
    coordination between drain events and held
    locks, to avoid stalls.

    PR:             254333
    Reviewed By:    jhb
    MFC after:      2 weeks
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29481

    (cherry picked from commit c804c8f2c58ba42d476de07fbceff9ac4dd95f0e)

 sys/kern/subr_sbuf.c | 2 +-
 sys/sys/sbuf.h       | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
Comment 31 commit-hook freebsd_committer freebsd_triage 2021-04-16 20:46:14 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=632e3363087cb6ef2a7b26a291a044b97afabea7

commit 632e3363087cb6ef2a7b26a291a044b97afabea7
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:30:20 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 19:43:14 +0000

    tcp: Make hostcache.cache_count MPSAFE  by using a counter_u64_t

    Addressing the underlying root cause for cache_count to
    show unexpectedly high  values, by protecting all arithmetic on
    that global variable by using counter(9).

    PR:             254333
    Reviewed By: tuexen, #transport
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29510

    (cherry picked from commit 95e56d31e348594973affd0ea81d8f8383bc3031)

 sys/netinet/tcp_hostcache.c | 20 ++++++++++++--------
 sys/netinet/tcp_hostcache.h | 20 ++++++++++----------
 2 files changed, 22 insertions(+), 18 deletions(-)
Comment 32 commit-hook freebsd_committer freebsd_triage 2021-04-16 20:46:16 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=831be7fa19ef10c78da51c222693a5a2083d1da4

commit 831be7fa19ef10c78da51c222693a5a2083d1da4
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-28 21:12:03 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 19:42:45 +0000

    tcp: reduce memory footprint when listing tcp hostcache

    In tcp_hostcache_list, the sbuf used would need a large (~2MB)
    blocking allocation of memory (M_WAITOK), when listing a
    full hostcache. This may stall the requestor for an indeterminate
    time.

    A further optimization is to return the expected userspace
    buffersize right away, rather than preparing the output of
    each current entry of the hostcase, provided by: @tuexen.

    This makes use of the ready-made functions of sbuf to work
    with sysctl, and repeatedly drain the much smaller buffer.

    PR: 254333
    MFC after: 2 weeks
    Reviewed By: #transport, tuexen
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29471

    (cherry picked from commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5)

 sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)
Comment 33 commit-hook freebsd_committer freebsd_triage 2021-04-16 20:46:17 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4052275c2fc5f96b4a208404733713959afe68f6

commit 4052275c2fc5f96b4a208404733713959afe68f6
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-04-01 08:00:32 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 19:44:00 +0000

    tcp: For hostcache performance, use atomics instead of counters

    As accessing the tcp hostcache happens frequently on some
    classes of servers, it was recommended to use atomic_add/subtract
    rather than (per-CPU distributed) counters, which have to be
    summed up at high cost to cache efficiency in a hot codepath.

    PR: 254333
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Reviewed By: #transport, tuexen, jtl
    Differential Revision: https://reviews.freebsd.org/D29522

    (cherry picked from commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3)

 sys/netinet/tcp_hostcache.c | 24 +++++++++++-------------
 sys/netinet/tcp_hostcache.h |  2 +-
 2 files changed, 12 insertions(+), 14 deletions(-)
Comment 34 commit-hook freebsd_committer freebsd_triage 2021-04-16 20:46:18 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e610ebe5d351c0570101ec58f1642904a30051ae

commit e610ebe5d351c0570101ec58f1642904a30051ae
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:24:01 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 19:42:58 +0000

    tcp: drain tcp_hostcache_list in between per-bucket locks

    Explicitly drain the sbuf after completing each hash bucket
    to minimize the work performed while holding the hash
    bucket lock.

    PR:             254333
    MFC after:      2 weeks
    Reviewed By:    tuexen, jhb, #transport
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29483

    (cherry picked from commit 869880463cc2ce64e2e6599eaec880a981f3ced6)

 sys/netinet/tcp_hostcache.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
Comment 35 commit-hook freebsd_committer freebsd_triage 2021-04-16 21:33:26 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e9ca38abc79b5ee0cc0052fb4e213b3dd5720716

commit e9ca38abc79b5ee0cc0052fb4e213b3dd5720716
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-04-01 08:00:32 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 20:52:37 +0000

    tcp: For hostcache performance, use atomics instead of counters

    As accessing the tcp hostcache happens frequently on some
    classes of servers, it was recommended to use atomic_add/subtract
    rather than (per-CPU distributed) counters, which have to be
    summed up at high cost to cache efficiency.

    PR: 254333
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Reviewed By: #transport, tuexen, jtl
    Differential Revision: https://reviews.freebsd.org/D29522

    (cherry picked from commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3)

 sys/netinet/tcp_hostcache.c | 24 +++++++++++-------------
 sys/netinet/tcp_hostcache.h |  2 +-
 2 files changed, 12 insertions(+), 14 deletions(-)
Comment 36 commit-hook freebsd_committer freebsd_triage 2021-04-16 21:33:28 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c0f8ed6ff81260fb4a26da354d3f06aaa83e74de

commit c0f8ed6ff81260fb4a26da354d3f06aaa83e74de
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:30:20 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 20:52:21 +0000

    tcp: Make hostcache.cache_count MPSAFE  by using a counter_u64_t

    Addressing the underlying root cause for cache_count to
    show unexpectedly high  values, by protecting all arithmetic on
    that global variable by using counter(9).

    PR:             254333
    Reviewed By: tuexen, #transport
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29510

    (cherry picked from commit 95e56d31e348594973affd0ea81d8f8383bc3031)

 sys/netinet/tcp_hostcache.c | 20 ++++++++++++--------
 sys/netinet/tcp_hostcache.h | 20 ++++++++++----------
 2 files changed, 22 insertions(+), 18 deletions(-)
Comment 37 commit-hook freebsd_committer freebsd_triage 2021-04-16 21:33:29 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=54beb1ef2c831394ef908ef52bac690a4fe362e0

commit 54beb1ef2c831394ef908ef52bac690a4fe362e0
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-28 21:12:03 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 20:51:57 +0000

    tcp: reduce memory footprint when listing tcp hostcache

    In tcp_hostcache_list, the sbuf used would need a large (~2MB)
    blocking allocation of memory (M_WAITOK), when listing a
    full hostcache. This may stall the requestor for an indeterminate
    time.

    A further optimization is to return the expected userspace
    buffersize right away, rather than preparing the output of
    each current entry of the hostcase, provided by: @tuexen.

    This makes use of the ready-made functions of sbuf to work
    with sysctl, and repeatedly drain the much smaller buffer.

    PR: 254333
    MFC after: 2 weeks
    Reviewed By: #transport, tuexen
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29471

    (cherry picked from commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5)

 sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)
Comment 38 commit-hook freebsd_committer freebsd_triage 2021-04-16 21:33:30 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=12ce9e4868d1454b263a2330ce95e91b145c1c8f

commit 12ce9e4868d1454b263a2330ce95e91b145c1c8f
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:24:01 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 20:52:09 +0000

    tcp: drain tcp_hostcache_list in between per-bucket locks

    Explicitly drain the sbuf after completing each hash bucket
    to minimize the work performed while holding the hash
    bucket lock.

    PR:             254333
    MFC after:      2 weeks
    Reviewed By:    tuexen, jhb, #transport
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29483

    (cherry picked from commit 869880463cc2ce64e2e6599eaec880a981f3ced6)

 sys/netinet/tcp_hostcache.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
Comment 39 commit-hook freebsd_committer freebsd_triage 2021-04-16 22:12:36 UTC
A commit in branch stable/11 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c449b8a71bce7d7838164f6a5595f71c91b5b91b

commit c449b8a71bce7d7838164f6a5595f71c91b5b91b
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-28 21:12:03 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 21:33:09 +0000

    tcp: reduce memory footprint when listing tcp hostcache

    In tcp_hostcache_list, the sbuf used would need a large (~2MB)
    blocking allocation of memory (M_WAITOK), when listing a
    full hostcache. This may stall the requestor for an indeterminate
    time.

    A further optimization is to return the expected userspace
    buffersize right away, rather than preparing the output of
    each current entry of the hostcase, provided by: @tuexen.

    This makes use of the ready-made functions of sbuf to work
    with sysctl, and repeatedly drain the much smaller buffer.

    PR: 254333
    MFC after: 2 weeks
    Reviewed By: #transport, tuexen
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29471

    (cherry picked from commit cb0dd7e122b8936ad61a141e65ef8ef874bfebe5)

 sys/netinet/tcp_hostcache.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)
Comment 40 commit-hook freebsd_committer freebsd_triage 2021-04-16 22:12:38 UTC
A commit in branch stable/11 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=254b56ac2a9ee43d10b0a4f8eeb20b722bb7fd74

commit 254b56ac2a9ee43d10b0a4f8eeb20b722bb7fd74
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-04-01 08:00:32 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 21:33:41 +0000

    tcp: For hostcache performance, use atomics instead of counters

    As accessing the tcp hostcache happens frequently on some
    classes of servers, it was recommended to use atomic_add/subtract
    rather than (per-CPU distributed) counters, which have to be
    summed up at high cost to cache efficiency.

    PR: 254333
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Reviewed By: #transport, tuexen, jtl
    Differential Revision: https://reviews.freebsd.org/D29522

    (cherry picked from commit 529a2a0f2765f6c57c50a5af6be242c03bf714e3)

 sys/netinet/tcp_hostcache.c | 24 +++++++++++-------------
 sys/netinet/tcp_hostcache.h |  2 +-
 2 files changed, 12 insertions(+), 14 deletions(-)
Comment 41 commit-hook freebsd_committer freebsd_triage 2021-04-16 22:12:39 UTC
A commit in branch stable/11 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a85003abdad402c27c12cf1a31b243c39483f263

commit a85003abdad402c27c12cf1a31b243c39483f263
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:24:01 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 21:33:20 +0000

    tcp: drain tcp_hostcache_list in between per-bucket locks

    Explicitly drain the sbuf after completing each hash bucket
    to minimize the work performed while holding the hash
    bucket lock.

    PR:             254333
    MFC after:      2 weeks
    Reviewed By:    tuexen, jhb, #transport
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29483

    (cherry picked from commit 869880463cc2ce64e2e6599eaec880a981f3ced6)

 sys/netinet/tcp_hostcache.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
Comment 42 commit-hook freebsd_committer freebsd_triage 2021-04-16 22:12:40 UTC
A commit in branch stable/11 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d20563819b9292ddcd42caf3449636b84825b1d6

commit d20563819b9292ddcd42caf3449636b84825b1d6
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-31 17:30:20 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-04-16 21:33:31 +0000

    tcp: Make hostcache.cache_count MPSAFE  by using a counter_u64_t

    Addressing the underlying root cause for cache_count to
    show unexpectedly high  values, by protecting all arithmetic on
    that global variable by using counter(9).

    PR:             254333
    Reviewed By: tuexen, #transport
    MFC after: 2 weeks
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29510

    (cherry picked from commit 95e56d31e348594973affd0ea81d8f8383bc3031)

 sys/netinet/tcp_hostcache.c | 20 ++++++++++++--------
 sys/netinet/tcp_hostcache.h | 20 ++++++++++----------
 2 files changed, 22 insertions(+), 18 deletions(-)
Comment 43 Richard Scheffenegger freebsd_committer freebsd_triage 2021-04-18 08:13:45 UTC
I believe this PR can be marked closed, with the two fixes (the race leading to erroneous accounting of the number of global entries, and the fix to not excessively use kernel memory while providing the list of entries) having been backported.

Perhaps one of the systems with the problem can be upgraded to latest stable and validated if the problem is rectified, as expected.
Comment 44 Kubilay Kocak freebsd_committer freebsd_triage 2021-05-03 01:08:27 UTC
Thank you for the update Richard. Re-open pending MFC. If this will not aor cannot be merged to a particular stable/ branch, please set mfc-stableX flag to - with comment
Comment 45 Richard Scheffenegger freebsd_committer freebsd_triage 2021-06-14 12:17:20 UTC
I don't quite understand that last comment, and why this PR got reopened? The MFC was put in stable/13, /12 and even /11 in middle of April already (2 Months ago).
Comment 46 Lutz Donnerhacke freebsd_committer freebsd_triage 2021-06-14 12:31:28 UTC
(In reply to Richard Scheffenegger from comment #45)

May be somebody needs to set the flags accordingly to the MFC state.
So stable-13? -> stable-13+ ...