Created attachment 235833 [details] textproc-py-textract-1.6.5.patch textract provides a single interface for extracting content embedded from Word documents, PowerPoint presentations, PDFs and much more, which can be used for further textual analysis and visualization. WWW: https://github.com/deanmalmgren/textract portlint: looks fine. poudriere: testport is ok: with all options enabled, without any option enabled, and with default options enabled (including groups). Requirements: * audio/py-pocketsphinx [1] * textproc/python-pptx [2] * textproc/py-extract-msg [3] [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265766 [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265763 [3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265765
Created attachment 235984 [details] textproc-py-python-pptx.1.6.5.patch I have change the python-pptx name to match with the ports collection, py-python-pptx [1]. [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265763#c4
Created attachment 235985 [details] python-pptx renamed to py-python-pptx (I am re-uploading the content to fix the name of the patch to make it more descriptive.) I have changed the name of python-pptx to match the port collection [1] "py-python-pptx". [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265763#c4
Created attachment 236300 [details] Update port maintainer Reason: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266157
Created attachment 236768 [details] textract-1.6.5 Description: * pet portclippy * move WWW from pkg-descr to Makefile QA: * portlint: OK (looks fine.) * testport: (poudriere: 13-1, amd64, ANTIWORD BEAUTIFULSOUP DOCX2TXT LIBXML2 LIBXSLT MSG PPTX PS SPREADSHEET UNRTF FFMPEG FLAC LAME POCKETSPHINX SOX SPEECH_RECOGNITION JPEG_TURBO TESSERACT PDFMINER PDFTOTEXT tested)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=b6e6388dab6dd78e37adebf738e568997db6d15a commit b6e6388dab6dd78e37adebf738e568997db6d15a Author: Jesús Daniel Colmenares Oviedo <DtxdF@disroot.org> AuthorDate: 2022-09-23 16:18:31 +0000 Commit: Li-Wen Hsu <lwhsu@FreeBSD.org> CommitDate: 2022-10-25 20:49:12 +0000 Add textproc/py-textract: Extract text from any document textract provides a single interface for extracting content embedded from Word documents, PowerPoint presentations, PDFs and much more, which can be used for further textual analysis and visualization. WWW: https://github.com/deanmalmgren/textract PR: 265768 textproc/Makefile | 1 + textproc/py-textract/Makefile (new) | 69 ++++++++++++++++++++++++++++++++++++ textproc/py-textract/distinfo (new) | 3 ++ textproc/py-textract/pkg-descr (new) | 3 ++ 4 files changed, 76 insertions(+)