Bug 253589 - KCSAN race between tcp_do_segment and sbfree with vtnet
Summary: KCSAN race between tcp_do_segment and sbfree with vtnet
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-17 15:03 UTC by Alex Richardson
Modified: 2021-02-18 20:40 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Richardson freebsd_committer freebsd_triage 2021-02-17 15:03:47 UTC
I have been trying to run a kernel with KCSAN enabled and it reports the following race for me:

CSan: Racy Access [Cpu7 Read Addr=0xfffff80016103578 Size=4 PC=0xffffffff81475058<tcp_do_segment>] [Cpu1 Write Addr=0xfffff80016103578 Size=4 PC=0xffffffff81291e55<sbfree>]
kcsan_access() at kcsan_access+0x1be/frame 0xfffffe0051a313d0
tcp_do_segment() at tcp_do_segment+0x3228/frame 0xfffffe0051a31510
tcp_input() at tcp_input+0x12c3/frame 0xfffffe0051a316c0
ip_input() at ip_input+0x2fb/frame 0xfffffe0051a31780
netisr_dispatch_src() at netisr_dispatch_src+0x15c/frame 0xfffffe0051a317f0
netisr_dispatch() at netisr_dispatch+0x21/frame 0xfffffe0051a31810
ether_demux() at ether_demux+0x2b4/frame 0xfffffe0051a31870
ether_nh_input() at ether_nh_input+0x680/frame 0xfffffe0051a318e0
netisr_dispatch_src() at netisr_dispatch_src+0x15c/frame 0xfffffe0051a31950
netisr_dispatch() at netisr_dispatch+0x21/frame 0xfffffe0051a31970
ether_input() at ether_input+0x107/frame 0xfffffe0051a319e0
vtnet_rxq_eof() at vtnet_rxq_eof+0x103f/frame 0xfffffe0051a31b40
vtnet_rx_vq_process() at vtnet_rx_vq_process+0xee/frame 0xfffffe0051a31b90
vtnet_rx_vq_intr() at vtnet_rx_vq_intr+0x1f/frame 0xfffffe0051a31bb0
virtqueue_intr() at virtqueue_intr+0x30/frame 0xfffffe0051a31bd0
vtpci_vq_intr() at vtpci_vq_intr+0x1a/frame 0xfffffe0051a31bf0
ithread_loop() at ithread_loop+0x361/frame 0xfffffe0051a31cf0
fork_exit() at fork_exit+0xaf/frame 0xfffffe0051a31d30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0051a31d30

This seems like CPU7 is trying to read from a struct sockbuf that is about to be freed?

The affected line appears to be the `if (sb->sb_flags & SB_STOP)` check at sys/sys/sockbuf.h:236. GDB tells me other CPU is executing `sb->sb_ccc -= m->m_len;` at sys/kern/uipc_sockbuf.c:303.

Since these are not even accessing the same fields it seems like the memory may have been reallocated?
Comment 1 Alex Richardson freebsd_committer freebsd_triage 2021-02-17 19:42:10 UTC
I enabled the disabled locking assertions in https://reviews.freebsd.org/D28736 and have not seen the error since.
Comment 2 Jonathan T. Looney freebsd_committer freebsd_triage 2021-02-18 20:40:44 UTC
I can think more about this, but my initial thought is that this is probably both expected and acceptable within the context you have given.

It appears that the two threads are doing these things:

Thread 1: Running sbfree() to free an mbuf from a socket buffer. I am going to surmise that this is an mbuf on the input socket buffer, and it has now been read by the application. This will decrease sb_ccc, which has the implication of increasing the available space in the socket buffer.

Thread 2: Running sbspace() to check on the amount of available space in a socket buffer. Because this is being called from tcp_do_segment(), I am going to surmise that this is being called from one of the two places which use sbspace() to determine the amount of data we can read from an input packet. (FWIW, even though the instruction points to the line which checks for SB_STOP, I assume that the actual conflict occurs on the next line of code which uses sb_ccc.)

If I understand the code correctly, neither the socket nor socket buffer can be freed as long as the TCP code holds a reference on the socket. And, if the TCP code is calling tcp_do_segment(), it should be in a state where it still has a reference on the socket. So, the only potential danger from the lack of synchronization is a stale read on the socket buffer values.

In this particular case, I believe it is acceptable to have a stale read. TCP advertises a receive window to its peer based on the latest information it has about input socket buffer space. The peer should not exceed that receive window. Prior to Thread 1 modifying the sb_ccc value, the receive window would have been X. After Thread 1 modified the sb_ccc value, the receive window would have been Y (where Y > X). Absent some corner cases (such as suddenly shrinking the input buffer high water level), we should have most recently advertised a receive window no larger than X. Therefore, whether Thread 2 reads sb_ccc before or after Thread 1 updates it, the peer should not have sent us so much data that it actually produces a functional difference. (And, if the peer did exceed our advertised receive window, our TCP stack is under no obligation to accept that data.)

So, for this usage within TCP, it seems to me that the locking should not make a functional difference and the lack of locking for this particular read is probably a net gain.