Bug 265768 - [NEW PORT] textproc/py-textract: Extract text from any document
Summary: [NEW PORT] textproc/py-textract: Extract text from any document
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Li-Wen Hsu
URL: https://github.com/deanmalmgren/textract
Keywords:
Depends on: 265763 265765 265766
Blocks:
  Show dependency treegraph
 
Reported: 2022-08-10 19:14 UTC by Jesús Daniel Colmenares Oviedo
Modified: 2022-10-25 20:50 UTC (History)
2 users (show)

See Also:


Attachments
textproc-py-textract-1.6.5.patch (5.02 KB, patch)
2022-08-10 19:14 UTC, Jesús Daniel Colmenares Oviedo
no flags Details | Diff
textproc-py-python-pptx.1.6.5.patch (5.08 KB, patch)
2022-08-18 05:21 UTC, Jesús Daniel Colmenares Oviedo
no flags Details | Diff
python-pptx renamed to py-python-pptx (5.08 KB, patch)
2022-08-18 06:02 UTC, Jesús Daniel Colmenares Oviedo
no flags Details | Diff
Update port maintainer (5.08 KB, patch)
2022-09-01 21:52 UTC, Jesús Daniel Colmenares Oviedo
no flags Details | Diff
textract-1.6.5 (5.00 KB, patch)
2022-09-23 16:33 UTC, Jesús Daniel Colmenares Oviedo
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jesús Daniel Colmenares Oviedo 2022-08-10 19:14:37 UTC
Created attachment 235833 [details]
textproc-py-textract-1.6.5.patch

textract provides a single interface for extracting content embedded
from Word documents, PowerPoint presentations, PDFs and much more,
which can be used for further textual analysis and visualization.

WWW: https://github.com/deanmalmgren/textract

portlint: looks fine.
poudriere: testport is ok: with all options enabled, without any option enabled, and with default options enabled (including groups).

Requirements:

* audio/py-pocketsphinx [1]
* textproc/python-pptx [2]
* textproc/py-extract-msg [3]

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265766
[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265763
[3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265765
Comment 1 Jesús Daniel Colmenares Oviedo 2022-08-18 05:21:48 UTC
Created attachment 235984 [details]
textproc-py-python-pptx.1.6.5.patch

I have change the python-pptx name to match with the ports collection, py-python-pptx [1].

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265763#c4
Comment 2 Jesús Daniel Colmenares Oviedo 2022-08-18 06:02:41 UTC
Created attachment 235985 [details]
python-pptx renamed to py-python-pptx

(I am re-uploading the content to fix the name of the patch to make it more descriptive.)

I have changed the name of python-pptx to match the port collection [1] "py-python-pptx".

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265763#c4
Comment 3 Jesús Daniel Colmenares Oviedo 2022-09-01 21:52:28 UTC
Created attachment 236300 [details]
Update port maintainer

Reason:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266157
Comment 4 Jesús Daniel Colmenares Oviedo 2022-09-23 16:33:39 UTC
Created attachment 236768 [details]
textract-1.6.5

Description:

* pet portclippy
* move WWW from pkg-descr to Makefile

QA:

* portlint: OK (looks fine.)
* testport: (poudriere: 13-1, amd64, ANTIWORD BEAUTIFULSOUP DOCX2TXT LIBXML2 LIBXSLT MSG PPTX PS SPREADSHEET UNRTF FFMPEG FLAC LAME POCKETSPHINX SOX SPEECH_RECOGNITION JPEG_TURBO TESSERACT PDFMINER PDFTOTEXT tested)
Comment 5 commit-hook freebsd_committer freebsd_triage 2022-10-25 20:50:03 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=b6e6388dab6dd78e37adebf738e568997db6d15a

commit b6e6388dab6dd78e37adebf738e568997db6d15a
Author:     Jesús Daniel Colmenares Oviedo <DtxdF@disroot.org>
AuthorDate: 2022-09-23 16:18:31 +0000
Commit:     Li-Wen Hsu <lwhsu@FreeBSD.org>
CommitDate: 2022-10-25 20:49:12 +0000

    Add textproc/py-textract: Extract text from any document

    textract provides a single interface for extracting content embedded
    from Word documents, PowerPoint presentations, PDFs and much more,
    which can be used for further textual analysis and visualization.

    WWW: https://github.com/deanmalmgren/textract

    PR:             265768

 textproc/Makefile                    |  1 +
 textproc/py-textract/Makefile (new)  | 69 ++++++++++++++++++++++++++++++++++++
 textproc/py-textract/distinfo (new)  |  3 ++
 textproc/py-textract/pkg-descr (new) |  3 ++
 4 files changed, 76 insertions(+)