Bug 230564

Summary: databases/lmdb: switch to robust mutexes on FreeBSD 11
Product: Ports & Packages Reporter: Jan Beich <jbeich>
Component: Individual Port(s)Assignee: Xin LI <delphij>
Status: Closed Works As Intended    
Severity: Affects Only Me CC: arcade
Priority: --- Keywords: patch, patch-ready
Version: LatestFlags: delphij: maintainer-feedback+
Hardware: Any   
OS: Any   
Attachments:
Description Flags
v0
none
v0 none

Description Jan Beich freebsd_committer freebsd_triage 2018-08-12 15:50:54 UTC
Created attachment 196128 [details]
v0

Can you help with testing runtime and upstreaming? I'm only interested in this port as part of Firefox 63, see https://bugzilla.mozilla.org/show_bug.cgi?id=1445451

10.4 amd64:   https://clbin.com/ybmwW
10.4 i386:    https://clbin.com/deada
11.1 aarch64: https://clbin.com/Zxcbm
11.1 amd64:   https://clbin.com/YPFwK
11.1 armv6:   https://clbin.com/r1IAD
11.1 i386:    https://clbin.com/lff55
11.2 aarch64: https://clbin.com/K1XPb
11.2 amd64:   https://clbin.com/HB98i
11.2 armv6:   https://clbin.com/x8uby
11.2 i386:    https://clbin.com/4vknS
12.0 aarch64: https://clbin.com/84pM3
12.0 amd64:   https://clbin.com/7SzKK
12.0 armv6:   https://clbin.com/siFEo
12.0 armv7:   https://clbin.com/463IM
12.0 i386:    https://clbin.com/3l54A
Comment 1 Jan Beich freebsd_committer freebsd_triage 2018-08-12 16:07:08 UTC
Created attachment 196129 [details]
v0

Oops, regen via "make makepatch".
Comment 2 commit-hook freebsd_committer freebsd_triage 2018-08-28 00:01:50 UTC
A commit references this bug:

Author: jbeich
Date: Tue Aug 28 00:01:28 UTC 2018
New revision: 478269
URL: https://svnweb.freebsd.org/changeset/ports/478269

Log:
  databases/lmdb: switch to robust mutexes on FreeBSD >= 11

  PR:		230564
  Approved by:	delphij (maintainer)

Changes:
  head/databases/lmdb/Makefile
  head/databases/lmdb/files/patch-mdb.c
Comment 3 Volodymyr Kostyrko 2018-09-01 09:44:26 UTC
Ahem.

Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x8006a1000 with size 0x6000 to be written at offset 0x7a000 for process lmtpd
Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x8006ae000 with size 0x8000 to be written at offset 0x87000 for process lmtpd
Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x8006bb000 with size 0x4a000 to be written at offset 0x94000 for process lmtpd
Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x806c00000 with size 0x20000000 to be written at offset 0x835000 for process lmtpd
Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x827000000 with size 0x20000000 to be written at offset 0x20c35000 for process lmtpd
Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x847400000 with size 0x20000000 to be written at offset 0x41035000 for process lmtpd
Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x867800000 with size 0x20000000 to be written at offset 0x61435000 for process lmtpd
Sep  1 11:23:37 limbo kernel: Failed to fully fault in a core file segment at VA 0x887c00000 with size 0x20000000 to be written at offset 0x81835000 for process lmtpd
Sep  1 11:23:37 limbo kernel: pid 3572 (lmtpd), uid 60: exited on signal 6 (core dumped)

Sep  1 12:17:00 limbo imaps[4145]: could not find auxprop plugin, was searching for [all]
Sep  1 12:17:00 limbo imaps[4145]: starttls: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication
Sep  1 12:17:00 limbo imaps[4145]: login: localhost [127.0.0.1] news plaintext+TLS User logged in SESSIONID=<limbo.b1t.name-4145-1535793420-1-2028660329904255942>
Sep  1 12:17:00 limbo imaps[4145]: cyrusdb_lmdb(/var/imap/mailboxes.db): MDB_READERS_FULL: Environment maxreaders limit reached
Sep  1 12:17:00 limbo imaps[4145]: DBERROR: error fetching mboxlist gmane^os^freebsd^performance: cyrusdb error
Sep  1 12:17:00 limbo imaps[4145]: cyrusdb_lmdb(/var/imap/mailboxes.db): MDB_READERS_FULL: Environment maxreaders limit reached
Sep  1 12:17:00 limbo imaps[4145]: DBERROR: error fetching mboxlist gmane^os^freebsd^performance: cyrusdb error
Sep  1 12:17:00 limbo imaps[4145]: cyrusdb_lmdb(/var/imap/mailboxes.db): MDB_READERS_FULL: Environment maxreaders limit reached
Sep  1 12:17:00 limbo imaps[4145]: DBERROR: error fetching mboxlist gmane.os.freebsd.performance: cyrusdb error
Sep  1 12:17:00 limbo imaps[4145]: cyrusdb_lmdb(/var/imap/mailboxes.db): MDB_READERS_FULL: Environment maxreaders limit reached
Sep  1 12:17:00 limbo imaps[4145]: DBERROR: error fetching mboxlist gmane.os.freebsd.performance: cyrusdb error

(gdb) bt full
#0  0x0000000803d801fa in thr_kill () from /lib/libc.so.7
No symbol table info available.
#1  0x0000000803d801c4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
        id = <value optimized out>
#2  0x0000000803d80139 in abort () at /usr/src/lib/libc/stdlib/abort.c:65
        act = <value optimized out>
#3  0x0000000804ff116d in mdb_version () from /usr/local/lib/liblmdb.so.0
No symbol table info available.
#4  0x0000000804fee7ae in mdb_set_relctx () from /usr/local/lib/liblmdb.so.0
No symbol table info available.
#5  0x0000000804fe8838 in mdb_cursor_get () from /usr/local/lib/liblmdb.so.0
No symbol table info available.
#6  0x0000000804fe52b8 in mdb_cursor_put () from /usr/local/lib/liblmdb.so.0
No symbol table info available.
#7  0x0000000804fed186 in mdb_put () from /usr/local/lib/liblmdb.so.0
No symbol table info available.
#8  0x00000008013df14f in put () from /usr/local/lib/libcyrus.so.0
No symbol table info available.
#9  0x00000008013de9b4 in store () from /usr/local/lib/libcyrus.so.0
No symbol table info available.
#10 0x00000008010a3bcf in _init_counted () from /usr/local/lib/libcyrus_imap.so.0
No symbol table info available.
#11 0x00000008010a3a76 in conversations_open_path () from /usr/local/lib/libcyrus_imap.so.0
No symbol table info available.
#12 0x00000008010a3c62 in conversations_open_mbox () from /usr/local/lib/libcyrus_imap.so.0
No symbol table info available.
#13 0x00000008010c24c6 in mailbox_lock_index () from /usr/local/lib/libcyrus_imap.so.0
No symbol table info available.
#14 0x00000008010c0bcc in mailbox_open_advanced () from /usr/local/lib/libcyrus_imap.so.0
No symbol table info available.
#15 0x00000008010b29ae in index_open () from /usr/local/lib/libcyrus_imap.so.0
No symbol table info available.
#16 0x0000000000424663 in shut_down ()
No symbol table info available.
#17 0x0000000000412d37 in shut_down ()
No symbol table info available.
#18 0x0000000000411582 in shut_down ()
No symbol table info available.
#19 0x000000000044055a in addmbox_cb ()
No symbol table info available.
#20 0x000000000040d414 in ?? ()
No symbol table info available.
#21 0x000000080066e000 in ?? ()
No symbol table info available.
#22 0x0000000000000000 in ?? ()
No symbol table info available.

That's a home server so not much fuzz... Will try rebuilding everything right now.
Comment 4 Volodymyr Kostyrko 2018-09-01 10:12:59 UTC
Sorry for the noise, full rollback is giving the same errors. Probably messed something up when updating STABLE.
Comment 5 Volodymyr Kostyrko 2018-09-01 21:48:03 UTC
Ok, I know what happened.

As I'm running cyrus some processes are starting and stopping from time to time. When library was updated old process were still working on some db's and new ones were started with new library and tried to do something to the same db's too. When locking doesn't match weird things happen, so right now when I restored damaged db's from backup everything looks fine.
Comment 6 Volodymyr Kostyrko 2018-09-03 10:51:33 UTC
LMDB might be unstable on new patch.

In cyrus I have one DB that takes around 40MB. After upgrade I restored that db from backup, converted to skiplist and back and now from time to time it's still failing.

Sep  3 10:51:30 limbo imaps[16050]: skiplist: checkpointed /var/imap/user/a/arcade.conversations.NEW (232305 records, 31029232 bytes) in 11.961 sec
Sep  3 10:51:30 limbo imaps[16050]: cyrusdb: converted /var/imap/user/a/arcade.conversations from lmdb to skiplist

Sep  3 10:52:12 limbo imaps[16109]: cyrusdb_lmdb(/var/imap/user/a/arcade.conversations): MDB_INVALID: File is not an LMDB file
Sep  3 10:52:25 limbo imaps[16109]: skiplist: recovered /var/imap/user/a/arcade.conversations (232305 records, 31029280 bytes) in 13 seconds
Sep  3 10:52:37 limbo imaps[16109]: skiplist: checkpointed /var/imap/user/a/arcade.conversations (232305 records, 31029232 bytes) in 11.758 sec
Sep  3 10:52:39 limbo imaps[16109]: skiplist: longlock /var/imap/user/a/arcade.conversations for 2.0 seconds
Sep  3 10:52:39 limbo imaps[16109]: cyrusdb: converted /var/imap/user/a/arcade.conversations from skiplist to lmdb

Sep  3 13:46:55 limbo imaps[17047]: cyrusdb_lmdb(/var/imap/user/a/arcade.conversations): MDB_CURSOR_FULL: Internal error - cursor stack limit reached
Sep  3 13:46:55 limbo imaps[17047]: IOERROR: conversations invalid status user.arcade.kworr
Sep  3 13:46:55 limbo imaps[17047]: cyrusdb_lmdb(/var/imap/user/a/arcade.conversations): MDB_BAD_TXN: Transaction must abort, has a child, or is invalid
Sep  3 13:46:55 limbo imaps[17047]: IOERROR: conversations invalid status user.arcade.kworr
Sep  3 13:46:55 limbo imaps[17047]: IOERROR: failed to commit mailbox user.arcade.kworr, probably need to reconstruct

Sorry for nagging about it, I just hope this can really be relevant. There were not a single error in lmdb for months before.

I'll try clearing DB completely and repopulate from scratch.
Comment 7 commit-hook freebsd_committer freebsd_triage 2018-09-03 12:09:07 UTC
A commit references this bug:

Author: jbeich
Date: Mon Sep  3 12:08:41 UTC 2018
New revision: 478856
URL: https://svnweb.freebsd.org/changeset/ports/478856

Log:
  databases/lmdb: back out r478269 due to cyrus-imapd30 instability

  Robust mutexes were already enabled but r478269 disabled POSIX
  semaphores. It appears both are only mutually exclusive on Android and
  old GNU libc.

  PR:		230564
  Reported by:	Volodymyr Kostyrko
  Pointy hat to:	jbeich

Changes:
  head/databases/lmdb/Makefile
  head/databases/lmdb/files/patch-mdb.c
Comment 8 Jan Beich freebsd_committer freebsd_triage 2018-09-03 12:14:51 UTC
Thanks for reporting. Looks like I've misread the code and introduced an unintentional change. MDB_FDATASYNC is still not quite correct (the existing conditional in files/patch-mdb.c is ignored due to defined(BSD) above it) but that's a minor issue: fdatasync(2) is only implemented for UFS/MSDOSFS, elsewhere it's alias for fsync(2).