I am running CouchDB 3.3.3, installed via latest package available in quarterly package repo. Package version couchdb3-3.3.3_1 Host OS: FreeBSD 14.1-RELEASE-p5 I created a jail, with a config that looks like this: db-4 { host.hostname = "db-4"; vnet; vnet.interface = "e12b_db_4"; vnet.interface += "e13b_db_4"; exec.poststop += "ifconfig e12a_db_4 destroy"; exec.poststop += "ifconfig e13a_gh_4 destroy"; exec.poststop += "ifconfig epair12 create"; exec.poststop += "ifconfig epair13 create"; sysvsem = "new"; sysvmsg = "new"; sysvshm = "new"; } The jail has the same version of FreeBSD as the host. I installed CouchDB 3.3.3 with `pkg install couchdb3`. I edited the `/usr/local/etc/couchdb3/local.ini` config file according to package message, and set it to listen to IPv6 global address :: I enabled the couchdb3 service and started it. I sent the HTTP PUT requests as instructed: curl -X PUT 'http://admin:notmyactualpasswordobviously@[::]:5984/_users' curl -X PUT 'http://admin:notmyactualpasswordobviously@[::]:5984/_replicator' curl -X PUT 'http://admin:notmyactualpasswordobviously@[::]:5984/_global_changes' The first PUT command succeeded. The second and third ones returned error responses similar to: {"error":"unknown_error","reason":"badarg","ref":1926143316} After a few minutes, CouchDB stops running, as can be observed by there no longer being any process matches for `ps wwwaux | grep couchdb3`. I turned on logging to file at level info. The first few lines of the CouchDB log file look like the following: [info] 2024-10-20T00:46:44.941085Z couchdb@127.0.0.1 <0.254.0> -------- Preflight check: Checking For Monsters [info] 2024-10-20T00:46:44.941137Z couchdb@127.0.0.1 <0.254.0> -------- Preflight check: Asserting Admin Account [info] 2024-10-20T00:46:44.943129Z couchdb@127.0.0.1 <0.254.0> -------- Apache CouchDB 3.3.3 is starting. [info] 2024-10-20T00:46:44.943167Z couchdb@127.0.0.1 <0.255.0> -------- Starting couch_sup [info] 2024-10-20T00:46:44.974439Z couchdb@127.0.0.1 <0.254.0> -------- Apache CouchDB has started. Time to relax. [notice] 2024-10-20T00:46:44.978854Z couchdb@127.0.0.1 <0.329.0> -------- rexi_server : started servers [notice] 2024-10-20T00:46:44.979659Z couchdb@127.0.0.1 <0.333.0> -------- rexi_buffer : started servers [error] 2024-10-20T00:46:45.011115Z couchdb@127.0.0.1 emulator -------- Error in process <0.353.0> on node 'couchdb@127.0.0.1' with exit value: {badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module => erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]} [error] 2024-10-20T00:46:45.011204Z couchdb@127.0.0.1 emulator -------- Error in process <0.353.0> on node 'couchdb@127.0.0.1' with exit value: {badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module => erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]} [error] 2024-10-20T00:46:45.011323Z couchdb@127.0.0.1 emulator -------- Error in process <0.363.0> on node 'couchdb@127.0.0.1' with exit value: {badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module => erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]} [error] 2024-10-20T00:46:45.011405Z couchdb@127.0.0.1 emulator -------- Error in process <0.363.0> on node 'couchdb@127.0.0.1' with exit value: {badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module => erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{mem3_shards,fold_fun,2,[{file,"src/mem3_shards.erl"},{line,332}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]} The log file is quickly filled with a lot of error messages, and the log file grows very quickly to gigabytes in size in just a few minutes. One of the error messages that keeps showing up says "custodian shard checker died": [notice] 2024-10-20T01:19:03.987934Z couchdb@127.0.0.1 <0.2139.0> -------- custodian shard checker died {badarg,[{erlang,list_to_integer,[[48,48,48,13,48,13,1,8],16],[{error_info,#{module => erl_erts_errors}}]},{mem3_util,'-build_shards_by_node/2-fun-0-',5,[{file,"src/mem3_util.erl"},{line,228}]},{lists,map,2,[{file,"lists.erl"},{line,1315}]},{lists,flatmap_1,2,[{file,"lists.erl"},{line,1335}]},{custodian_util,fold_dbs1,2,[{file,"src/custodian_util.erl"},{line,83}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1161}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1151}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,1167}]}]} And aside from that one there are similar looking ones to the ones from the beginning of the log. Would appreciate some help here, as I cannot tell from these log files what is actually going wrong.
Created attachment 254379 [details] /usr/local/etc/couchdb3/local.ini config file Attaching full `/usr/local/etc/couchdb3/local.ini` config file. (Logging to file is outcommented here again, because I don't want to fill my disk with gigabytes of log files outside of when I am actively looking at the logs.)
thanks Erik, thats indeed odd, I can replicate it here too. Can you attach your vm.args file as well please? redact the cookie ofc.
Created attachment 254470 [details] /usr/local/etc/couchdb3/vm.args Here is the vm.args file from within the jail. I haven't made any edits to it so whatever is in here is just whatever was there by the pkg install or by CouchDB or whoever creates/edits this file.
I am wondering however, if the default 127.0.0.1 values from that vm.args file could be causing any problems since as mentioned I configured CouchDB to listen on the IPv6 global address `::`. Could that be the case?
Hi again Dave, do you have any tips or advice for steps that I could take to pinpoint what is causing this problem?
thanks Erik I can replicate it here also on 3.4.1 & also next release 3.4.2 - it's nothing to do with jails - if built from sources (within same jail) this issue doesn't occur Next stage is a tiresome diff of all configs and files to see whats different or missing.
i managed to reproduce it even with a fresh new port built from source. I will try to reproduce from fresh source as well.
Erik, I think something is going wrong inside snappy, please test with: ``` ; add to /usr/local/etc/couchdb3/local.ini [couchdb] file_compression = snappy ``` and report back, this fixes it for me. Benoit, please share more details of how you build this please, I didn't get a reproduction from a plain sources build, so I assume this is a C-related setting that is used in ports tree, but not in production build. My poudriere build is https://pkg.skunkwerks.at/poudriere/data/14_1_x64-default/2024-11-14_11h51m41s/logs/couchdb3-3.4.2.log reasoning: shards doc looks good going in at creation time: _db_create:create_shard_files/2 --> ok 0:15:47.446410 <0.2049.0> fabric_db_create:create_shard_db_doc({doc,<<"_users">>, {0,[]}, {[{<<"shard_suffix">>,".1731716147"}, {<<"changelog">>, [[<<"add">>,<<"00000000-7fffffff">>,<<"couchdb@127.0.0.1">>], [<<"add">>,<<"80000000-ffffffff">>,<<"couchdb@127.0.0.1">>]]}, {<<"by_node">>, {[{<<"couchdb@127.0.0.1">>, [<<"00000000-7fffffff">>,<<"80000000-ffffffff">>]}]}}, {<<"by_range">>, {[{<<"00000000-7fffffff">>,[<<"couchdb@127.0.0.1">>]}, {<<"80000000-ffffffff">>,[<<"couchdb@127.0.0.1">>]}]}}, {<<"props">>,{[]}}]}, [],false,[]}) and then comes out as garbage: 0:15:47.485786 <0.334.0> mem3_shards:changes_callback({change,{[{<<"seq">>,1}, {<<"id">>,<<"_users">>}, {<<"changes">>, [{[{<<"rev">>,<<"1-39ee832b1fb208136e77da705cca37c3">>}]}]}, {doc,{[{<<"_id">>,<<"_users">>}, {<<"_rev">>,<<"1-39ee832b1fb208136e77da705cca37c3">>}, {<<"shard_suffix">>,".1731716147"}, {<<"changelog">>, [[<<"add">>, <<48,48,48,13,48,13,1,8,45,55,102,102,102,9,102,9,1>>, <<"couchdb@127.0.0.1">>], [<<"add">>, <<56,48,48,13,48,13,1,8,45,102,102,9,102,9,1,102,1>>, <<"couchdb@127.0.0.1">>]]}, {<<"by_node">>, {[{<<"couchdb@127.0.0.1">>, [<<48,48,48,13,48,13,1,8,45,55,102,102,102,9,102,9,1>>, <<56,48,48,13,48,13,1,8,45,102,102,9,102,9,1,102,1>>]}]}}, {<<"by_range">>, {[{<<48,48,13,48,13,1,8,8,45,55,102,102,102,9,102,9,1>>, [<<"couchdb@127.0.0.1">>]}, {<<56,48,48,13,48,13,1,8,45,102,102,9,102,9,1,102,1>>, [<<"couchdb@127.0.0.1">>]}]}}, {<<"props">>,{[]}}]}}]}, <<>>}, "continuous") looking at snappy next: 0:44:49.399321 <0.582.0> snappy:decompress/1 --> {ok,<<131,104,1,108,0,0,0,5,104,2,109,0,0,0,12,115,104,97,114,100,95,115,117, 102,102,105,120,107,0,11,46,49,55,51,49,55,49,55,56,56,57,104,2,109,0,0, 0,9,99,104,97,110,103,101,108,111,103,108,0,0,0,2,108,0,0,0,3,109,0,0,0, 3,97,100,100,109,0,0,0,17,48,48,48,13,48,13,1,8,45,55,102,102,102,9,102, 9,1,109,0,0,0,17,99,111,117,99,104,100,98,64,49,50,55,46,48,46,48,46,49, 106,108,0,0,0,3,109,0,0,0,3,97,100,100,109,0,0,0,17,56,48,48,13,48,13,1, 8,45,102,102,9,102,9,1,102,1,109,0,0,0,17,99,111,117,99,104,100,98,64, 49,50,55,46,48,46,48,46,49,106,106,104,2,109,0,0,0,7,98,121,95,110,111, 100,101,104,1,108,0,0,0,1,104,2,109,0,0,0,17,99,111,117,99,104,100,98, 64,49,50,55,46,48,46,48,46,49,108,0,0,0,2,109,0,0,0,17,48,48,48,13,48, <---- here 13,1,8,45,55,102,102,102,9,102,9,1,109,0,0,0,17,56,48,48,13,48,13,1,8, 45,102,102,9,102,9,1,102,1,106,106,104,2,109,0,0,0,8,98,121,95,114,97, 110,103,101,104,1,108,0,0,0,2,104,2,109,0,0,0,17,48,48,13,48,13,1,8,8, 45,55,102,102,102,9,102,9,1,108,0,0,0,1,109,0,0,0,17,99,111,117,99,104, 100,98,64,49,50,55,46,48,46,48,46,49,106,104,2,109,0,0,0,17,56,48,48,13, <---- here 48,13,1,8,45,102,102,9,102,9,1,102,1,108,0,0,0,1,109,0,0,0,17,99,111, 117,99,104,100,98,64,49,50,55,46,48,46,48,46,49,106,106,104,2,109,0,0,0, 5,112,114,111,112,115,104,1,106,106>>} 0:44:49.400392 <0.582.0> mem3_util:build_shards_by_node(<<"_users">>, [{<<"shard_suffix">>,".1731717889"}, {<<"changelog">>, [[<<"add">>, <<48,48,48,13,48,13,1,8,45,55,102,102,102,9,102,9,1>>, <---- now visible here <<"couchdb@127.0.0.1">>], ...
You can also try this approach: - ensure `file_compression=snappy` again Now, build a couchdb release normally, and copy `rel/couchdb/lib/snappy-1.0.5/` over top of the ports-built version also fixes the issue for me: $ ./configure --skip-deps --user couchdb --rebar /usr/local/bin/rebar --spidermonkey-version 91 $ gmake release $ cp -av rel/couchdb/lib/snappy-1.0.5/ /usr/local/libexec/couchdb3/lib/snappy-1.0.5/ so - I assume some build-time CFLAGS or similar is causing problems for snappy.
looks like problems in snappy 1> S = fun(Doc) -> {ok, C} = snappy:compress(erlang:term_to_binary(Doc)), true = snappy:is_valid(C), {ok, U} = snappy:decompress(C), New = erlang:binary_to_term(U), New == Doc end. 2> S([<<"00000000-7fffffff">>]). false 3> S([<<"7fffffff">>]). true 4> S([<<"00000000">>]). true 5> S([<<"00000000-">>]). true 6> S([<<"00000000-7">>]). true 7> S([<<"00000000-7f">>]). true 8> S([<<"00000000-7ff">>]). true 9> S([<<"00000000-7fff">>]). true 10> S([<<"00000000-7ffff">>]). true 11> S([<<"00000000-7fffff">>]). true 12> S([<<"00000000-7ffffff">>]). true 13> S([<<"00000000-7fffffff">>]). false
looks like `-O3` (used in ports tree build, but not source tree build) causes this behaviour, which probably means its a bug in snappy itself of some sort. If you run couchdb interactively[1], you can test this: S = fun(Doc) -> {ok, C} = snappy:compress(erlang:term_to_binary(Doc)), true = snappy:is_valid(C), {ok, U} = snappy:decompress(C), New = erlang:binary_to_term(U), New == Doc end. S([<<"00000000-7fffffff">>]). false S([<<"zzzzzzzz-7fffffff">>]). false S([<<"00000000-7ffffff">>]). true I have 2 choices going forwards: - build the port with CXXFLAGS to include `-O0` (see attached diff) - update snappy to 1.1.8 or newer, where this issue has been fixed The former is preferable for me, as bumping snappy should be done within CouchDB project as a whole, rather than in a single down stream project.
See https://git.sr.ht/~dch/ports/commit/235bd5f2bafa01bea2596ca685bf6d2a10522d72 for patch to test, this bumps couchdb to 3.4.2 also, and removes runtime dependency on the lang/erlang* packages, just as a full couchdb release.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=6924dfdf7ea3b4b511c7f987727cb6933175765c commit 6924dfdf7ea3b4b511c7f987727cb6933175765c Author: Dave Cottlehuber <dch@FreeBSD.org> AuthorDate: 2024-11-22 15:02:32 +0000 Commit: Dave Cottlehuber <dch@FreeBSD.org> CommitDate: 2024-11-22 17:27:52 +0000 databases/couchdb3: update to 3.4.2, resolve PR282217 - include QuickJS engine - support xxHash for data integrity checksums - improved mango keys-only indexes - https://blog.couchdb.org/2024/10/22/3-4-2/ - https://docs.couchdb.org/en/stable/whatsnew/3.4.html - ensure snappy NIF is compiled without optimisations to avoid corrupted documents under newer clang versions, used in in 14.1-RELEASE (see PR282217 for details) Sponsored by: SkunkWerks, GmbH PR: 282217 UPDATING | 18 + databases/couchdb3/Makefile | 34 +- databases/couchdb3/distinfo | 6 +- databases/couchdb3/files/couchdb3.in | 12 +- databases/couchdb3/files/patch-configure | 21 +- .../couchdb3/files/patch-rel_overlay_etc_vm.args | 4 +- .../couchdb3/files/patch-rel_reltool.config (gone) | 10 - .../files/patch-src_snappy_rebar.config (new) | 11 + databases/couchdb3/pkg-plist | 954 +++++++++++++++++---- 9 files changed, 849 insertions(+), 221 deletions(-)
would appreciate some community feedback on this (binary packages should be available in 3 days on latest package branch), before I push it back down to quarterly branch.