Bug 255398

Summary: databases/mariadb105-server: deadlocks at start when wsrep is enabled
Product: Ports & Packages Reporter: Phillip R. Jaenke <prj>
Component: Individual Port(s)Assignee: Bernard Spil <brnrd>
Status: New ---    
Severity: Affects Only Me Flags: bugzilla: maintainer-feedback? (brnrd)
Priority: ---    
Version: Latest   
Hardware: Any   
OS: Any   

Description Phillip R. Jaenke 2021-04-25 18:24:47 UTC
Reproduction using 10.5.9 with from ports:

sudo -u mysql /usr/local/bin/mysql_install_db --basedir=/usr/local --datadir=$datadir --skip-test-db
sudo -u mysql /usr/local/libexec/mariadbd --wsrep-new-cluster --wsrep-on --wsrep_cluster_address=gcomm://DB1 --datadir=$datadir

sudo -u mysql /usr/local/bin/mysql_install_db --basedir=/usr/local --datadir=$datadir --skip-test-db
sudo -u mysql /usr/local/libexec/mariadbd --wsrep-on --wsrep_cluster_address=gcomm://DB1 --datadir=$datadir

Initial membership will succeed (check the logs, of course.) Once that's confirmed, the cluster should be ready to go. So go kill -TERM mariadbd on DB2, then DB1. (Order matters, even though it's multi-master.)

BOTH hosts /usr/local/etc/mysql/conf.d/wsrep.cnf:

BOTH hosts rc.conf:

Then on DB1: /usr/local/etc/rc.d/mysql-server start
It'll go through the 15 second timeout waiting for the pidfile, and then exit 1, without actually killing the process. It just never writes either the pidfile or the socket. Ever. No errors are logged in either the mysql error log or wsrep error logs. The process just hangs and does not die to TERM.

This reproduces if manually started with "sudo -u mysql /usr/local/libexec/mariadbd --defaults-extra-file=/usr/local/etc/mysql/my.cnf --user=mysql --datadir=/var/db/mysql/data --pid-file=/var/run/mysql/mysqld.pid" Instead of starting, it just hangs and will not respond to TERM only KILL.

Port was built with options:
databases_mariadb105-server_SET+=GSSAPI_HEIMDAL LZ4 WSREP
databases_mariadb105-server_UNSET+=GSSAPI_BASE GSSAPI_MIT GSSAPI_NONE

What is perplexing is that this ONLY reproduces with wsrep being configured by files in /usr/local/etc/mysql/conf.d. If the server is started with "--wsrep-on --wsrep_cluster_address=gcomm://DB1,DB2" then it works as expected. So it is specifically something with reading the wsrep configuration from files. Even putting the wsrep configuration into my.cnf causes the exact same behavior.