Bug 257368

Summary: ggatec misbehaving
Product: Base System Reporter: jo
Component: binAssignee: freebsd-geom (Nobody) <geom>
Status: Closed FIXED    
Severity: Affects Only Me CC: asomers
Priority: ---    
Version: 12.2-STABLE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213479

Description jo 2021-07-23 23:29:46 UTC
Looks like there's a 32 bit integer overflow or underflow glitch in ggated or ggatec (not sure yet which).

Relevant lines from the log.
ggated first:

[...snipped...]
debug: Received hdr packet.
debug: recv_thread: offset=53686845440 length=114688
debug: disk_thread: offset=53686845440 length=114688
debug: send_thread: offset=53686845440 length=114688
debug: Sent hdr packet.
debug: Sent 114688 bytes (offset=53686845440, size=114688).
debug: Received hdr packet.
debug: recv_thread: offset=16384 length=2147467264
debug: disk_thread: offset=16384 length=2147467264
debug: send_thread: offset=16384 length=2147467264
debug: Sent hdr packet.
debug: Process 2048 exiting.

Notice the suspicious length paramter:
2147467264 = 0x7FFFC000
    114688 = 0x   1C000


Corresponding log from ggatec:

[...snipped...]
debug: Received 114688 bytes (offset=53686845440, size=114688).
debug: Sent hdr packet.
debug: Received hdr packet.
debug: Received data packet.
error: Lost connection 4.
debug: recv_thread: Died.
debug: send_thread: Died.



And ggated promptly allocates an additional 2GB of memory. Line from top:

 PID USERNAME THR PRI NICE   SIZE    RES STATE   C TIME  WCPU COMMAND
2048 root     1   20     0  2077M  1217M spa_na  1 0:05  0.00% ggated
Comment 2 jo 2021-07-23 23:55:21 UTC
ggatec connection lost message is coming from here https://github.com/freebsd/freebsd-src/blob/stable/12/sbin/ggate/ggatec/ggatec.c#L229

Looks like the code is expecting to be able to send/receive pretty huge packets as is, instead of breaking it up into smaller chunks.
Comment 3 jo 2021-07-24 00:19:48 UTC
$ truss zpool create testvolpool /dev/ggate0

[...snipped...]
openat(AT_FDCWD,"/dev/ggate0",O_RDONLY,00)       = 7 (0x7)
fstat(7,{ mode=crw-r----- ,inode=76,size=0,blksize=4096 }) = 0 (0x0)
ioctl(7,DIOCGMEDIASIZE,0x7fffff281960)           = 0 (0x0)
mmap(0x0,266240,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 42446731808768 (0x269ae65f1000)
pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0x0) = 262144 (0x40000)
pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0x40000) = 262144 (0x40000)
pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0xc7ff80000) = 262144 (0x40000)
pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0xc7ffc0000) = 262144 (0x40000)
close(7)                                         = 0 (0x0)



Looks fine to me... and shows no weird 2 GB read.
Comment 4 jo 2021-07-24 01:47:00 UTC
> Looks like the code is expecting to be able to send/receive pretty huge packets as is, instead of breaking it up into smaller chunks.

Ignore that, found the helper function that breaks it up.


Here's a truss of ggated:


debug: Received hdr packet.
 2008: write(1,"debug: Received hdr packet.\n",28) = 28 (0x1c)
debug: recv_thread: offset=16384 length=2147467264
 2008: write(1,"debug: recv_thread: offset=16384"...,51) = 51 (0x33)
 2008: mmap(0x0,2147487744,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 56795097997312 (0x33a7a3a42000)
 2008: mmap(0x0,6291456,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 56797253623808 (0x33a824206000)
 2008: munmap(0x33a824206000,6291456)            = 0 (0x0)
 2008: mmap(0x0,8384512,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 56797253255168 (0x33a8241ac000)
 2008: munmap(0x33a8241ac000,344064)             = 0 (0x0)
 2008: munmap(0x33a824800000,1748992)            = 0 (0x0)
 2008: _umtx_op(0x33a783f9f1b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
 2008: _umtx_op(0x33a77e55dfd0,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0)
debug: disk_thread: offset=16384 length=2147467264
 2008: write(1,"debug: disk_thread: offset=16384"...,51) = 51 (0x33)
 2008: pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,2147467264,0x4000) = 2147467264 (0x7fffc000)
 2008: _umtx_op(0x33a783f9e2b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
 2008: _umtx_op(0x33a794e63008,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0)
debug: send_thread: offset=16384 length=2147467264
 2008: write(1,"debug: send_thread: offset=16384"...,51) = 51 (0x33)
 2008: sendto(6,"\0\0\0\0\0\0\0@\0\^?\M^?\M-@\0\0"...,23,0,NULL,0) = 23 (0x17)
debug: Sent hdr packet.
 2008: write(1,"debug: Sent hdr packet.\n",24)   = 24 (0x18)
 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000)
 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000)
 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000)
 2008: recvfrom(8,0x33a796ff97b0,23,MSG_WAITALL,NULL,0x0) = 0 (0x0)
 2008: getpid()                                  = 2008 (0x7d8)
debug: Process 2008 exiting.
 2008: write(1,"debug: Process 2008 exiting.\n",29) = 29 (0x1d)
 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000)



We can see that ggated allocs a big chunk of mem and successfully reads data into it and then tries to send it off, in chunks of 128kB. So far so good.
But then the receive thread notices that it did not read a full header and thus exits and the whole process dies only having sent 4x128kB.

Smells like a race condition?
Comment 5 jo 2021-07-24 02:06:11 UTC
Truss for ggatec:

debug: Received hdr packet.
write(1,"debug: Received hdr packet.\n",28)      = 28 (0x1c)
recvfrom(4,0x7fffdffddf40,2147467264,MSG_WAITALL,0x0,0x0) ERR#14 'Bad address'
debug: Received data packet.
write(1,"debug: Received data packet.\n",29)     = 29 (0x1d)
error: Lost connection 4.


Notice the bad address!
It's supposed to point to a stack buffer of size 128kB.
But here we are trying to dump 2 GB into it. Oops. Stack buffer overflow.

See near https://github.com/freebsd/freebsd-src/blob/stable/12/sbin/ggate/ggatec/ggatec.c#L223
Comment 6 jo 2021-07-27 00:31:53 UTC
That 2 GB request tripping up ggatec is a BIO_FLUSH command. But ggatec only handles BIO_READ and BIO_WRITE without failing any of the other ones.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213479
Comment 7 jo 2021-07-27 00:40:20 UTC
Or BIO_DELETE, don't know which one. Doesn't matter. Needs to explicitly fail unhandled requests.
Comment 8 jo 2021-07-27 01:03:21 UTC
Here's a fix: https://reviews.freebsd.org/D31318