Bug 262278 - file(1) fails to identify a JAR file
Summary: file(1) fails to identify a JAR file
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-01 17:45 UTC by Yuri Victorovich
Modified: 2022-05-17 12:34 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Victorovich freebsd_committer freebsd_triage 2022-03-01 17:45:33 UTC
For this JAR:
https://repo1.maven.org/maven2/org/opentest4j/opentest4j/1.1.1/opentest4j-1.1.1.jar
file prints:
> opentest4j-1.1.1.jar:                               Zip archive data, at least v1.0 to extract

when for JARs it normally prints: "Java archive data (JAR)"
Comment 1 Ed Maste freebsd_committer freebsd_triage 2022-03-01 18:39:29 UTC
JAR files are in fact ZIP files, just with special metadata. libmagic's JAR detection was added here: https://github.com/file/file/commit/e45cd303713418af058361f5711a768550e1c867

JAR files often have 0xcafe at a specific location in the ZIP file and libmagic keys on this, but it is not required, and the file you've linked does not have this field.

Useful links:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6211008
https://bugs.openjdk.java.net/browse/JDK-6808540

This should probably be reported/discussed at http://www.darwinsys.com/file/
Comment 2 Yuri Victorovich freebsd_committer freebsd_triage 2022-03-01 18:44:19 UTC
(In reply to Ed Maste from comment #1)

JAR files also have META-INF/MANIFEST.MF.

* https://www.baeldung.com/java-jar-manifest
Comment 3 Michael Osipov 2022-05-17 12:27:33 UTC
The 0xcafebabe applies to classes, not to JAR files.

Unwritten law how to detect a JAR:
Either the first or the second ZIP entry must be the manifest file:

> META-INF/
> META-INF/MANIFEST.NF

or just

> META-INF/MANIFEST.NF

Maven produces fully reproducible JARs that those two entries come first:
> $ unzip -t maven-core-4.0.0-alpha-1-SNAPSHOT.jar | head
> Archive:  maven-core-4.0.0-alpha-1-SNAPSHOT.jar
>     testing: META-INF/                OK
>     testing: META-INF/MANIFEST.MF     OK
>     testing: META-INF/maven/          OK
>     testing: META-INF/sisu/           OK
>     testing: org/                     OK
>     testing: org/apache/              OK

Don't expect a JAR to contain a class file. It could solely contain resources, still being a JAR file due to the manifest and the first entry of the manifest is fixed to "Manifest-Version: 1.0". This is by spec and guaranteed to be written by Maven libraries.
Comment 4 Michael Osipov 2022-05-17 12:34:47 UTC
In contrast:
$ hexdump -C classes/org/apache/maven/SessionScoped.class
00000000  ca fe ba be 00 00 00 34  00 12 07 00 0f 07 00 10  |.......4........|
00000010  07 00 11 01 00 0a 53 6f  75 72 63 65 46 69 6c 65  |......SourceFile|
00000020  01 00 12 53 65 73 73 69  6f 6e 53 63 6f 70 65 64  |...SessionScoped|

Valid JAR:
$ hexdump -C maven-core-4.0.0-alpha-1-SNAPSHOT.jar | head
00000000  50 4b 03 04 0a 00 00 08  00 00 89 41 85 52 00 00  |PK.........A.R..|
00000010  00 00 00 00 00 00 00 00  00 00 09 00 00 00 4d 45  |..............ME|
00000020  54 41 2d 49 4e 46 2f 50  4b 03 04 14 00 00 08 08  |TA-INF/PK.......|
00000030  00 89 41 85 52 9c ea 73  82 b1 00 00 00 4e 01 00  |..A.R..s.....N..|
00000040  00 14 00 00 00 4d 45 54  41 2d 49 4e 46 2f 4d 41  |.....META-INF/MA|
00000050  4e 49 46 45 53 54 2e 4d  46 8d 8f d1 0a 82 30 18  |NIFEST.MF.....0.|
00000060  85 ef 05 df 61 2f b0 a1  d6 45 78 a7 41 94 60 49  |....a/...Ex.A.`I|

I don't know whether I understand the C code from the old Java 7 launcher correctly, but it is searching for 0xcafe to make it a valid JAR file.

Note: I am Maven PMC member and done a lot on the reproducibility topic.