The man page states, about setting up NLTK: > NLTK DATA > In order to process scanned documents using machine learning, paperless- > ngx requires NLTK (natural language toolkit) data. The required files > can be downloaded by using these commands: > > /usr/local/bin/python3.9 -m nltk.downloader \ > stopwords punkt -d /var/db/paperless/nltkdata It is missing the "snowball_data" file to be downloaded. The file is referred to in the project's doc (https://docs.paperless-ngx.com/setup/#bare_metal): > Optional: If using the NLTK machine learning processing (see PAPERLESS_ENABLE_NLTK for details), download the NLTK data for the Snowball Stemmer, Stopwords and Punkt tokenizer to your PAPERLESS_DATA_DIR/nltk. Refer to the NLTK instructions for details on how to download the data. I can't vouch for how handy it is to have that in NLTK or not, but it sounds very useful from its description (https://github.com/snowballstem/snowball?tab=readme-ov-file#what-is-stemming): > What is Stemming? > Stemming maps different forms of the same word to a common "stem" - for example, the English stemmer maps connection, connections, connective, connected, and connecting to connect. So a search for connected would also find documents which only have the other forms. I suggest "snowball_data" is added to the man page's sample NLTK download command so it is in line with the project's docs and can be useful to users of this port (thanks for it, btw!).
Thanks, that's a good addition, will be in the next port revision.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=d716cbc8e98a9410782bbad8139f7b6457920e5b commit d716cbc8e98a9410782bbad8139f7b6457920e5b Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2024-04-24 12:39:17 +0000 Commit: Michael Gmelin <grembo@FreeBSD.org> CommitDate: 2024-04-24 21:56:50 +0000 deskutils/py-paperless-ngx: Fixes and improvements - Fix NLTK configuration in paperless.conf.sample - Mention snowball_data in NLTK instructions - Fix sad defect that prevented rc scripts from picking up configuration data from rc.conf - Improve startup script documentation - Document required ACL when using a remote redis instance - Document how to enable JBIG2 and its potential pitfalls PR: 278424, 278425, 278426 Reported by: anonymous bugzilla user, netchild MFH: 2024Q2 deskutils/py-paperless-ngx/Makefile | 1 + deskutils/py-paperless-ngx/files/paperless-beat.in | 12 ++++++----- .../py-paperless-ngx/files/paperless-consumer.in | 8 +++++--- .../py-paperless-ngx/files/paperless-flower.in | 12 ++++++----- .../py-paperless-ngx/files/paperless-migrate.in | 12 ++++++----- .../py-paperless-ngx/files/paperless-ngx.7.in | 24 +++++++++++++++++++--- .../py-paperless-ngx/files/paperless-webui.in | 20 +++++++++++------- .../py-paperless-ngx/files/paperless-worker.in | 12 ++++++----- .../files/patch-paperless.conf.example | 10 ++++----- deskutils/py-paperless-ngx/files/pkg-message.in | 21 +++++++++++++++++++ 10 files changed, 93 insertions(+), 39 deletions(-)
A commit in branch 2024Q2 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=27a97f780c56cf51b482aa828ad0e20e24f23fcc commit 27a97f780c56cf51b482aa828ad0e20e24f23fcc Author: Michael Gmelin <grembo@FreeBSD.org> AuthorDate: 2024-04-24 12:39:17 +0000 Commit: Michael Gmelin <grembo@FreeBSD.org> CommitDate: 2024-04-24 23:09:18 +0000 deskutils/py-paperless-ngx: Fixes and improvements - Fix NLTK configuration in paperless.conf.sample - Mention snowball_data in NLTK instructions - Fix sad defect that prevented rc scripts from picking up configuration data from rc.conf - Improve startup script documentation - Document required ACL when using a remote redis instance - Document how to enable JBIG2 and its potential pitfalls PR: 278424, 278425, 278426 Reported by: anonymous bugzilla user, netchild MFH: 2024Q2 (cherry picked from commit d716cbc8e98a9410782bbad8139f7b6457920e5b) deskutils/py-paperless-ngx/Makefile | 1 + deskutils/py-paperless-ngx/files/paperless-beat.in | 12 ++++++----- .../py-paperless-ngx/files/paperless-consumer.in | 8 +++++--- .../py-paperless-ngx/files/paperless-flower.in | 12 ++++++----- .../py-paperless-ngx/files/paperless-migrate.in | 12 ++++++----- .../py-paperless-ngx/files/paperless-ngx.7.in | 24 +++++++++++++++++++--- .../py-paperless-ngx/files/paperless-webui.in | 20 +++++++++++------- .../py-paperless-ngx/files/paperless-worker.in | 12 ++++++----- .../files/patch-paperless.conf.example | 10 ++++----- deskutils/py-paperless-ngx/files/pkg-message.in | 21 +++++++++++++++++++ 10 files changed, 93 insertions(+), 39 deletions(-)