FreeBSD Bugzilla – Attachment 236768 Details for
Bug 265768
[NEW PORT] textproc/py-textract: Extract text from any document
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
[patch]
textract-1.6.5
0001-textproc-py-textract-New-port-Extract-text-from-any-.patch (text/plain), 5.00 KB, created by
Jesús Daniel Colmenares Oviedo
on 2022-09-23 16:33:39 UTC
(
hide
)
Description:
textract-1.6.5
Filename:
MIME Type:
Creator:
Jesús Daniel Colmenares Oviedo
Created:
2022-09-23 16:33:39 UTC
Size:
5.00 KB
patch
obsolete
>From 9242648af51db6085992ab390217b29ec628bf0b Mon Sep 17 00:00:00 2001 >From: =?UTF-8?q?Jes=C3=BAs=20Daniel=20Colmenares=20Oviedo?= > <DtxdF@disroot.org> >Date: Fri, 23 Sep 2022 12:18:31 -0400 >Subject: [PATCH] textproc/py-textract: New port: Extract text from any > document > >textract provides a single interface for extracting content embedded >from Word documents, PowerPoint presentations, PDFs and much more, >which can be used for further textual analysis and visualization. > >WWW: https://github.com/deanmalmgren/textract >--- > textproc/py-textract/Makefile | 69 ++++++++++++++++++++++++++++++++++ > textproc/py-textract/distinfo | 3 ++ > textproc/py-textract/pkg-descr | 3 ++ > 3 files changed, 75 insertions(+) > create mode 100644 textproc/py-textract/Makefile > create mode 100644 textproc/py-textract/distinfo > create mode 100644 textproc/py-textract/pkg-descr > >diff --git a/textproc/py-textract/Makefile b/textproc/py-textract/Makefile >new file mode 100644 >index 000000000..a1a57ea56 >--- /dev/null >+++ b/textproc/py-textract/Makefile >@@ -0,0 +1,69 @@ >+PORTNAME= textract >+PORTVERSION= 1.6.5 >+CATEGORIES= textproc python >+MASTER_SITES= CHEESESHOP >+PKGNAMEPREFIX= ${PYTHON_PKGNAMEPREFIX} >+ >+MAINTAINER= DtxdF@disroot.org >+COMMENT= Extract text from any document >+WWW= https://github.com/deanmalmgren/textract >+ >+LICENSE= MIT >+LICENSE_FILE= ${WRKSRC}/LICENSE >+ >+RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}argcomplete>=1.10.0:devel/py-argcomplete@${PY_FLAVOR} \ >+ ${PYTHON_PKGNAMEPREFIX}chardet>=3:textproc/py-chardet@${PY_FLAVOR} \ >+ ${PYTHON_PKGNAMEPREFIX}six>1.12.0:devel/py-six@${PY_FLAVOR} >+ >+USES= python:3.8+ >+USE_PYTHON= autoplist distutils >+ >+OPTIONS_DEFINE= ANTIWORD BEAUTIFULSOUP DOCX2TXT MSG LIBXML2 \ >+ LIBXSLT PPTX PS SPREADSHEET UNRTF >+OPTIONS_DEFAULT= ANTIWORD BEAUTIFULSOUP DOCX2TXT FFMPEG FLAC JPEG_TURBO \ >+ LAME LIBXML2 LIBXSLT MSG PDFTOTEXT PPTX PS SOX \ >+ SPEECH_RECOGNITION SPREADSHEET TESSERACT UNRTF >+OPTIONS_GROUP= AUDIO OCR PDF RTF >+OPTIONS_GROUP_AUDIO= FFMPEG FLAC LAME POCKETSPHINX SOX SPEECH_RECOGNITION >+OPTIONS_GROUP_OCR= JPEG_TURBO TESSERACT >+OPTIONS_GROUP_PDF= PDFMINER PDFTOTEXT >+ >+ANTIWORD_DESC= DOC document support >+BEAUTIFULSOUP_DESC= HTML parsing library >+DOCX2TXT_DESC= DOCX document support >+JPEG_TURBO_DESC= SIMD-accelerated JPEG codec >+LIBXML2_DESC= Python interface for XML parser library >+LIBXSLT_DESC= XML stylesheet transformation library >+MSG_DESC= MS Outlook MSG file format support >+PDFMINER_DESC= PDF parser and analyzer >+PDFTOTEXT_DESC= Extract text from a PDF document >+POCKETSPHINX_DESC= Interface to CMU Sphinxbase and Pocketsphinx >+PPTX_DESC= MS PowerPoint PPTX presentations support >+SOX_DESC= Command-line audio processing tool >+SPEECH_RECOGNITION_DESC= Python library for performing speech recognition >+SPREADSHEET_DESC= XLS and XLSX spreadsheet support >+TESSERACT_DESC= Commercial quality open source OCR engine >+UNRTF_DESC= RTF document support >+ >+ANTIWORD_RUN_DEPENDS= antiword>0:textproc/antiword >+BEAUTIFULSOUP_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}beautifulsoup>=4.8.0:www/py-beautifulsoup@${PY_FLAVOR} >+DOCX2TXT_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}docx2txt>=0.8:textproc/py-docx2txt@${PY_FLAVOR} >+FFMPEG_RUN_DEPENDS= ffmpeg>0:multimedia/ffmpeg >+FLAC_RUN_DEPENDS= flac>0:audio/flac >+JPEG_TURBO_RUN_DEPENDS= jpeg-turbo>0:graphics/jpeg-turbo >+LAME_RUN_DEPENDS= lame>0:audio/lame >+LIBXML2_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}libxml2>0:textproc/py-libxml2@${PY_FLAVOR} >+LIBXSLT_RUN_DEPENDS= libxslt>=1.1.15:textproc/libxslt >+MSG_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}extract-msg>=0.29:textproc/py-extract-msg@${PY_FLAVOR} >+PDFMINER_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}pdfminer.six>=20191110:textproc/py-pdfminer.six@${PY_FLAVOR} >+PDFTOTEXT_RUN_DEPENDS= poppler-utils>0:graphics/poppler-utils >+POCKETSPHINX_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}pocketsphinx>0:audio/py-pocketsphinx@${PY_FLAVOR} >+PPTX_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}python-pptx>=0.6.18:textproc/py-python-pptx@${PY_FLAVOR} >+PS_RUN_DEPENDS= pstotext>0:print/pstotext >+SOX_RUN_DEPENDS= sox>0:audio/sox >+SPEECH_RECOGNITION_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}SpeechRecognition>=3.8.1:audio/py-speechrecognition@${PY_FLAVOR} >+SPREADSHEET_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}xlrd>=1.2.0:textproc/py-xlrd@${PY_FLAVOR} >+TESSERACT_RUN_DEPENDS= tesseract>0:graphics/tesseract >+UNRTF_RUN_DEPENDS= unrtf>0:textproc/unrtf >+ >+.include <bsd.port.mk> >diff --git a/textproc/py-textract/distinfo b/textproc/py-textract/distinfo >new file mode 100644 >index 000000000..14f25b8e6 >--- /dev/null >+++ b/textproc/py-textract/distinfo >@@ -0,0 +1,3 @@ >+TIMESTAMP = 1659835075 >+SHA256 (textract-1.6.5.tar.gz) = 68f0f09056885821e6c43d8538987518daa94057c306679f2857cc5ee66ad850 >+SIZE (textract-1.6.5.tar.gz) = 17871 >diff --git a/textproc/py-textract/pkg-descr b/textproc/py-textract/pkg-descr >new file mode 100644 >index 000000000..7d4986c9d >--- /dev/null >+++ b/textproc/py-textract/pkg-descr >@@ -0,0 +1,3 @@ >+textract provides a single interface for extracting content embedded >+from Word documents, PowerPoint presentations, PDFs and much more, >+which can be used for further textual analysis and visualization. >-- >2.37.3 >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Diff
Attachments on
bug 265768
:
235833
|
235984
|
235985
|
236300
| 236768