Bug 258593 - [NEW] sysutils/py-puremagic: Pure python implementation of magic file detection
Summary: [NEW] sysutils/py-puremagic: Pure python implementation of magic file detection
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-ports-bugs (Nobody)
URL: https://pypi.org/project/puremagic/
Keywords: feature
Depends on:
Blocks: 258597
  Show dependency treegraph
Reported: 2021-09-19 07:26 UTC by James French
Modified: 2021-09-20 15:22 UTC (History)
2 users (show)

See Also:
koobs: maintainer-feedback+

v1.10 (1.67 KB, patch)
2021-09-19 07:26 UTC, James French
no flags Details | Diff
Poudriere build log (Py38) (20.05 KB, text/plain)
2021-09-19 09:50 UTC, James French
no flags Details
v1.10 with test suite (uses github) (1.98 KB, patch)
2021-09-20 04:17 UTC, James French
no flags Details | Diff
v1.10 without test suite (uses cheesehop) (1.86 KB, patch)
2021-09-20 04:18 UTC, James French
no flags Details | Diff
Poudriere Build Log (Py38, Github patch) (20.62 KB, text/plain)
2021-09-20 04:19 UTC, James French
no flags Details
v1.10 GitHub, argparse dependency dropped (1.90 KB, patch)
2021-09-20 15:22 UTC, James French
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James French 2021-09-19 07:26:35 UTC
Created attachment 228001 [details]

Puremagic is a python module that will identify a file from magic numbers

It is designed to be minimalistic and inherently cross platform compatible. It is also designed to be a stand in for python-magic, it incorporates the functions from_file(filename[, mime]) and from_string(string[, mime]) however the magic_file() and magic_string() are more powerful and will also display confidence and duplicate matches.

It does NOT try to match files off non-magic string. In other words it will not search for a string within a certain window of bytes like others might.

WWW: https://pypi.org/project/puremagic/
Comment 1 James French 2021-09-19 09:50:59 UTC
Created attachment 228007 [details]
Poudriere build log (Py38)
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2021-09-20 00:56:21 UTC
Thanks for submitting a new port James.

Review items:

- Add LICENSE_FILE when a license file is provided with a distribution file. This one does: LICENSE

- Match COMMENT to upstream (setup.py:description). In this case "Pure python implementation of magic file detection"

- Ports must match and include all upstream dependency declarations. In this case, upstream defines:  install_requires=["argparse"], which *can* be provided by the stdlib, or not. Add a RUN_DEPENDS for this

- Add TEST_DEPENDS and a do-test target when upstream ships tests with their distfiles. This one does: pytest for tests and the following for an invocation should work:

  cd ${WRKSRC} && ${PYTHON_CMD}-m pytest -v -rs -o addopts= test/

-o addopts= overrides upstream pytest configurations that often require coverage and other dev only tests which we dont need (this one does).

See Also: https://wiki.freebsd.org/Python/PortsPolicy
Comment 3 James French 2021-09-20 03:22:22 UTC
Thanks for the review Kubilay, I'm working on an update that will address those issues and should have it back later today.

Just on the test suite, the python code for the tests is distributed with the source but the data files they depend on is not (presumably because it's quite a large set of data). The test suite will always fail from a PyPi tarball, even if the code is fine.

I can cobble something together that sources those files from GitHub to make the test suite work but it may overcomplicate the port. Not sure if this is something we should do or not.

In either case, I will probably submit two patches one with and one without fetching that data.

Comment 4 James French 2021-09-20 04:17:00 UTC
Created attachment 228039 [details]
v1.10 with test suite (uses github)

Updated patch with feedback items from review.

Uses Github to bring in the test suite data files which are not on PyPi. Will submit a PR upstream about this as the data files are only ~1.2M and probably should be on PyPi.

This is the preferred patch for commit
Comment 5 James French 2021-09-20 04:18:57 UTC
Created attachment 228040 [details]
v1.10 without test suite (uses cheesehop)

v1.10 incorporating feedback, without test suite.
Comment 6 James French 2021-09-20 04:19:37 UTC
Created attachment 228041 [details]
Poudriere Build Log (Py38, Github patch)
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2021-09-20 10:52:52 UTC
(In reply to James French from comment #3)

If the test vectors/data aren't packaged on PyPI, like some projects such as cryptography do, then we can grab them from GitHub (USE_GITHUB=yes), which is a legitimate exception to using CHEESESHOP as MASTER_SITES.

Unless the test data set is ridiculously large, having ports support tests is almost always better for everyone, and will only be fetched when 'test' target is called (same as TEST_DEPENDS)

Happy to run through this with you on IRC / Discord if you need help or have questions
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2021-09-20 10:57:58 UTC
Aaaaaaaand I see you answered all that :)

Attachment 228039 [details] looks good, nice work.

Only other thing I think is worth pursuing, is the explicit argparse dependency.
Even though stdlib has it since 3, if they were keeping it for 2.7 support, this is at odd with the 3.x only support declaration, so it would be good to either see that go or have some clarity on that. 

While the above doesn't doesn't block this port, I think it warrants testing against our other lang/python* ports just to be safe (and again, why test suites are awesome :D)
Comment 9 James French 2021-09-20 15:22:41 UTC
Created attachment 228053 [details]
v1.10 GitHub, argparse dependency dropped

(In reply to Kubilay Kocak from comment #8)

Ah, I'm now on the same wavelength re: argparse, I probably shouldn't do this stuff at work while on my lunch break!

I've just run the test suite through on all the flavor iterations from 3.6 through to 3.10, can confirm argparse is not required with the Python 3 series, looks like legacy 2 series dependency. Upstream last released ~ 15 months ago, so that makes sense - I'll follow up with them. Attached should be the final patch to drop that dependency because it's confirmed it isn't needed.

Also uncovered a bug with 3.6 which is present whether argparse is brought in or not. That and the code is not compatible with 3.10 yet, both will need fixing upstream. For now I've marked the port as 3.7-3.9 which should stop the issues from our end for now.

Thanks again for the help Kubilay