The port seems to be affected by this issue: https://github.com/paperless-ngx/paperless-ngx/discussions/2373 In short, on my instance, with 106 documents, 12 tag(s), 29 correspondent(s), 8 document type(s), 0 storage path(es), classifier training gets stuck/hangs indefinitely with 1 CPU showing 100% usage. The logs show: [2024-05-15 15:56:20,696] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching. [2024-05-15 15:56:20,697] [DEBUG] [paperless.classifier] Gathering data from database... [2024-05-15 15:56:20,843] [DEBUG] [paperless.classifier] 106 documents, 12 tag(s), 29 correspondent(s), 8 document type(s). 0 storage path(es) [2024-05-15 15:56:20,877] [DEBUG] [paperless.classifier] Vectorizing data... [2024-05-15 15:56:21,650] [DEBUG] [paperless.classifier] Training tags classifier... [2024-05-15 15:56:22,163] [DEBUG] [paperless.classifier] Training correspondent classifier... Running `su -l paperless -c 'paperless document_create_classifier'` shows: OMP: Warning #96: Cannot form a team with 16 threads, using 1 instead. OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if any are set). As suggested in the linked discussion, adding this to /etc/profile fixes the issue for me: # Fix document classifier hanging: # https://github.com/paperless-ngx/paperless-ngx/discussions/2373#discussioncomment-9244780 export OMP_NUM_THREADS=1 The port could either set this variable as well (as the NixOS port did: https://github.com/NixOS/nixpkgs/pull/299008), or try replacing the dependency on openblas by mkl, but that seems to be problematic for FreeBSD: https://github.com/paperless-ngx/paperless-ngx/discussions/2373#discussioncomment-8563927
Thank you for reporting!
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=3ace88e8ff1e9e2eaa2e7417da1ac03182188314 commit 3ace88e8ff1e9e2eaa2e7417da1ac03182188314 Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2024-05-21 07:23:23 +0000 Commit: Michael Gmelin <grembo@FreeBSD.org> CommitDate: 2024-05-21 07:23:23 +0000 deskutils/py-paperless-ngx: Fix document classifier hanging Set OMP_NUM_THREADS=1 as a workaround (taken from NixOS port). PR: 279011 Reported by: freebsd.bugzilla@mail.tinsuke.com deskutils/py-paperless-ngx/Makefile | 2 +- deskutils/py-paperless-ngx/files/paperless-worker.in | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-)
Thanks for the fix! I have validated that it solves the issue when updating classifier training is triggered by the periodically scheduled task. But running `su -l paperless -c 'paperless document_create_classifier'` still leaves the user hanging. Would it be worth it to also patch manage.py to set OMP_NUM_THREADS=1? I believe that should "cover all bases", but if there are other entry points I can see how this can become a cat-and-mouse game.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=aed26383cdf5928ed5f7642954123f8099b23920 commit aed26383cdf5928ed5f7642954123f8099b23920 Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2024-05-28 15:44:24 +0000 Commit: Michael Gmelin <grembo@FreeBSD.org> CommitDate: 2024-05-28 15:45:50 +0000 deskutils/py-paperless-ngx: Fix document classifier hanging (2) Turn paperless symlink into wrapper, this way OMP_NUM_THREADS=1 is also set when paperless is called outside rc scripts. PR: 279011 Reported by: freebsd.bugzilla@mail.tinsuke.com deskutils/py-paperless-ngx/Makefile | 8 +++----- deskutils/py-paperless-ngx/files/paperless-ngx.7.in | 13 ++++++------- deskutils/py-paperless-ngx/files/paperless.in (new) | 9 +++++++++ 3 files changed, 18 insertions(+), 12 deletions(-)
(In reply to freebsd.bugzilla from comment #3) Let's see if you're right and this will actually turn into a game of Whac-A-Mole. For the time being, I (stubbornly) went with turning %%PREFIX%%/bin/paperless into a wrapper that sets the environment variable.
Thanks for looking into it again, hopefully this won't creep up again anywhere else! I have verified the fix, and running `su -l paperless -c 'paperless document_create_classifier'` now works just fine.