Bug 279011 - deskutils/py-paperless-ngx: port affected by classifier training hanging
Summary: deskutils/py-paperless-ngx: port affected by classifier training hanging
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Michael Gmelin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-15 14:05 UTC by freebsd.bugzilla
Modified: 2024-05-31 07:32 UTC (History)
0 users

See Also:
bugzilla: maintainer-feedback? (grembo)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description freebsd.bugzilla 2024-05-15 14:05:57 UTC
The port seems to be affected by this issue:
https://github.com/paperless-ngx/paperless-ngx/discussions/2373

In short, on my instance, with 106 documents, 12 tag(s), 29 correspondent(s), 8 document type(s), 0 storage path(es), classifier training gets stuck/hangs indefinitely with 1 CPU showing 100% usage.

The logs show:
[2024-05-15 15:56:20,696] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2024-05-15 15:56:20,697] [DEBUG] [paperless.classifier] Gathering data from database...
[2024-05-15 15:56:20,843] [DEBUG] [paperless.classifier] 106 documents, 12 tag(s), 29 correspondent(s), 8 document type(s). 0 storage path(es)
[2024-05-15 15:56:20,877] [DEBUG] [paperless.classifier] Vectorizing data...
[2024-05-15 15:56:21,650] [DEBUG] [paperless.classifier] Training tags classifier...
[2024-05-15 15:56:22,163] [DEBUG] [paperless.classifier] Training correspondent classifier...

Running `su -l paperless -c 'paperless document_create_classifier'` shows:
OMP: Warning #96: Cannot form a team with 16 threads, using 1 instead.
OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if any are set).

As suggested in the linked discussion, adding this to /etc/profile fixes the issue for me:
# Fix document classifier hanging:
# https://github.com/paperless-ngx/paperless-ngx/discussions/2373#discussioncomment-9244780
export OMP_NUM_THREADS=1

The port could either set this variable as well (as the NixOS port did: https://github.com/NixOS/nixpkgs/pull/299008), or try replacing the dependency on openblas by mkl, but that seems to be problematic for FreeBSD: https://github.com/paperless-ngx/paperless-ngx/discussions/2373#discussioncomment-8563927
Comment 1 Michael Gmelin freebsd_committer freebsd_triage 2024-05-21 07:27:10 UTC
Thank you for reporting!
Comment 2 commit-hook freebsd_committer freebsd_triage 2024-05-21 07:27:51 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=3ace88e8ff1e9e2eaa2e7417da1ac03182188314

commit 3ace88e8ff1e9e2eaa2e7417da1ac03182188314
Author:     Michael Gmelin <grembo@FreeBSD.org>
AuthorDate: 2024-05-21 07:23:23 +0000
Commit:     Michael Gmelin <grembo@FreeBSD.org>
CommitDate: 2024-05-21 07:23:23 +0000

    deskutils/py-paperless-ngx: Fix document classifier hanging

    Set OMP_NUM_THREADS=1 as a workaround (taken from NixOS port).

    PR:             279011
    Reported by:    freebsd.bugzilla@mail.tinsuke.com

 deskutils/py-paperless-ngx/Makefile                  | 2 +-
 deskutils/py-paperless-ngx/files/paperless-worker.in | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)
Comment 3 freebsd.bugzilla 2024-05-27 08:17:38 UTC
Thanks for the fix!

I have validated that it solves the issue when updating classifier training is triggered by the periodically scheduled task.

But running `su -l paperless -c 'paperless document_create_classifier'` still leaves the user hanging.

Would it be worth it to also patch manage.py to set OMP_NUM_THREADS=1?

I believe that should "cover all bases", but if there are other entry points I can see how this can become a cat-and-mouse game.
Comment 4 commit-hook freebsd_committer freebsd_triage 2024-05-28 15:47:48 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=aed26383cdf5928ed5f7642954123f8099b23920

commit aed26383cdf5928ed5f7642954123f8099b23920
Author:     Michael Gmelin <grembo@FreeBSD.org>
AuthorDate: 2024-05-28 15:44:24 +0000
Commit:     Michael Gmelin <grembo@FreeBSD.org>
CommitDate: 2024-05-28 15:45:50 +0000

    deskutils/py-paperless-ngx: Fix document classifier hanging (2)

    Turn paperless symlink into wrapper, this way OMP_NUM_THREADS=1 is
    also set when paperless is called outside rc scripts.

    PR:             279011
    Reported by:    freebsd.bugzilla@mail.tinsuke.com

 deskutils/py-paperless-ngx/Makefile                 |  8 +++-----
 deskutils/py-paperless-ngx/files/paperless-ngx.7.in | 13 ++++++-------
 deskutils/py-paperless-ngx/files/paperless.in (new) |  9 +++++++++
 3 files changed, 18 insertions(+), 12 deletions(-)
Comment 5 Michael Gmelin freebsd_committer freebsd_triage 2024-05-28 15:49:40 UTC
(In reply to freebsd.bugzilla from comment #3)

Let's see if you're right and this will actually turn into a game of Whac-A-Mole. For the time being, I (stubbornly) went with turning %%PREFIX%%/bin/paperless into a wrapper that sets the environment variable.
Comment 6 freebsd.bugzilla 2024-05-31 07:32:42 UTC
Thanks for looking into it again, hopefully this won't creep up again anywhere else!

I have verified the fix, and running `su -l paperless -c 'paperless document_create_classifier'` now works just fine.