Looks like there's a 32 bit integer overflow or underflow glitch in ggated or ggatec (not sure yet which). Relevant lines from the log. ggated first: [...snipped...] debug: Received hdr packet. debug: recv_thread: offset=53686845440 length=114688 debug: disk_thread: offset=53686845440 length=114688 debug: send_thread: offset=53686845440 length=114688 debug: Sent hdr packet. debug: Sent 114688 bytes (offset=53686845440, size=114688). debug: Received hdr packet. debug: recv_thread: offset=16384 length=2147467264 debug: disk_thread: offset=16384 length=2147467264 debug: send_thread: offset=16384 length=2147467264 debug: Sent hdr packet. debug: Process 2048 exiting. Notice the suspicious length paramter: 2147467264 = 0x7FFFC000 114688 = 0x 1C000 Corresponding log from ggatec: [...snipped...] debug: Received 114688 bytes (offset=53686845440, size=114688). debug: Sent hdr packet. debug: Received hdr packet. debug: Received data packet. error: Lost connection 4. debug: recv_thread: Died. debug: send_thread: Died. And ggated promptly allocates an additional 2GB of memory. Line from top: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 2048 root 1 20 0 2077M 1217M spa_na 1 0:05 0.00% ggated
ggated log message printed here https://github.com/freebsd/freebsd-src/blob/stable/12/sbin/ggate/ggated/ggated.c#L665
ggatec connection lost message is coming from here https://github.com/freebsd/freebsd-src/blob/stable/12/sbin/ggate/ggatec/ggatec.c#L229 Looks like the code is expecting to be able to send/receive pretty huge packets as is, instead of breaking it up into smaller chunks.
$ truss zpool create testvolpool /dev/ggate0 [...snipped...] openat(AT_FDCWD,"/dev/ggate0",O_RDONLY,00) = 7 (0x7) fstat(7,{ mode=crw-r----- ,inode=76,size=0,blksize=4096 }) = 0 (0x0) ioctl(7,DIOCGMEDIASIZE,0x7fffff281960) = 0 (0x0) mmap(0x0,266240,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 42446731808768 (0x269ae65f1000) pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0x0) = 262144 (0x40000) pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0x40000) = 262144 (0x40000) pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0xc7ff80000) = 262144 (0x40000) pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,262144,0xc7ffc0000) = 262144 (0x40000) close(7) = 0 (0x0) Looks fine to me... and shows no weird 2 GB read.
> Looks like the code is expecting to be able to send/receive pretty huge packets as is, instead of breaking it up into smaller chunks. Ignore that, found the helper function that breaks it up. Here's a truss of ggated: debug: Received hdr packet. 2008: write(1,"debug: Received hdr packet.\n",28) = 28 (0x1c) debug: recv_thread: offset=16384 length=2147467264 2008: write(1,"debug: recv_thread: offset=16384"...,51) = 51 (0x33) 2008: mmap(0x0,2147487744,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 56795097997312 (0x33a7a3a42000) 2008: mmap(0x0,6291456,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 56797253623808 (0x33a824206000) 2008: munmap(0x33a824206000,6291456) = 0 (0x0) 2008: mmap(0x0,8384512,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 56797253255168 (0x33a8241ac000) 2008: munmap(0x33a8241ac000,344064) = 0 (0x0) 2008: munmap(0x33a824800000,1748992) = 0 (0x0) 2008: _umtx_op(0x33a783f9f1b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0) 2008: _umtx_op(0x33a77e55dfd0,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0) debug: disk_thread: offset=16384 length=2147467264 2008: write(1,"debug: disk_thread: offset=16384"...,51) = 51 (0x33) 2008: pread(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,2147467264,0x4000) = 2147467264 (0x7fffc000) 2008: _umtx_op(0x33a783f9e2b8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0) 2008: _umtx_op(0x33a794e63008,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0) debug: send_thread: offset=16384 length=2147467264 2008: write(1,"debug: send_thread: offset=16384"...,51) = 51 (0x33) 2008: sendto(6,"\0\0\0\0\0\0\0@\0\^?\M^?\M-@\0\0"...,23,0,NULL,0) = 23 (0x17) debug: Sent hdr packet. 2008: write(1,"debug: Sent hdr packet.\n",24) = 24 (0x18) 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000) 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000) 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000) 2008: recvfrom(8,0x33a796ff97b0,23,MSG_WAITALL,NULL,0x0) = 0 (0x0) 2008: getpid() = 2008 (0x7d8) debug: Process 2008 exiting. 2008: write(1,"debug: Process 2008 exiting.\n",29) = 29 (0x1d) 2008: sendto(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,131072,0,NULL,0) = 131072 (0x20000) We can see that ggated allocs a big chunk of mem and successfully reads data into it and then tries to send it off, in chunks of 128kB. So far so good. But then the receive thread notices that it did not read a full header and thus exits and the whole process dies only having sent 4x128kB. Smells like a race condition?
Truss for ggatec: debug: Received hdr packet. write(1,"debug: Received hdr packet.\n",28) = 28 (0x1c) recvfrom(4,0x7fffdffddf40,2147467264,MSG_WAITALL,0x0,0x0) ERR#14 'Bad address' debug: Received data packet. write(1,"debug: Received data packet.\n",29) = 29 (0x1d) error: Lost connection 4. Notice the bad address! It's supposed to point to a stack buffer of size 128kB. But here we are trying to dump 2 GB into it. Oops. Stack buffer overflow. See near https://github.com/freebsd/freebsd-src/blob/stable/12/sbin/ggate/ggatec/ggatec.c#L223
That 2 GB request tripping up ggatec is a BIO_FLUSH command. But ggatec only handles BIO_READ and BIO_WRITE without failing any of the other ones. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213479
Or BIO_DELETE, don't know which one. Doesn't matter. Needs to explicitly fail unhandled requests.
Here's a fix: https://reviews.freebsd.org/D31318
fixed with https://cgit.freebsd.org/src/commit/?id=91a8bed5a49eb2d1e4e096a4c68c108cebec8818