Bug 199052 - [PATCH] Use 7 alphanumeric GH_COMMIT for WRKSRC, related to "legacy.tar.gz" (codeload) GitHub backend method
Summary: [PATCH] Use 7 alphanumeric GH_COMMIT for WRKSRC, related to "legacy.tar.gz" (...
Status: Closed Not Accepted
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Ports Framework (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Port Management Team
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2015-03-30 22:18 UTC by lightside
Modified: 2015-04-04 07:12 UTC (History)
2 users (show)

See Also:


Attachments
Proposed patch (since 382622 revision) (590 bytes, patch)
2015-03-30 22:18 UTC, lightside
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description lightside 2015-03-30 22:18:16 UTC
Created attachment 155032 [details]
Proposed patch (since 382622 revision)

Currently, the GH_COMMIT is (considered to) deprecated in favor of "new" USE_GITHUB and GH_TAGNAME (see changes after ports r381618).

But I did some tests (bug 194898, comment #24) and found, that 7 alphanumeric commit hash might doesn't work after some changes to repository (which applies to GH_TAGNAME and related "new" USE_GITHUB also). Looks like, the 7 alphanumeric commit hash is used for newer Git commits, but then it might change to longer abbreviated commit hash with newer changes to repository. There is a need to use longer abbreviated commit hash or full commit hash in case of correct download method. For example, there is "f45daed3cf3" abbreviated commit hash for FreeBSD ports tree on GitHub, which is 11 alphanumeric abbreviated commit hash, currently:
https://github.com/freebsd/freebsd-ports/tree/f45daed3cf3
It doesn't work with 7 alphanumeric commit hash in this case:
% curl -Is https://github.com/freebsd/freebsd-ports/tree/f45daed | grep ^HTTP
HTTP/1.1 404 Not Found

There are about 5208 non 7 alphanumeric commit hashes from 381033 of total for mentioned repository, currently:
% git log --pretty=%h | grep -v '^.\{7\}$' | wc -l
    5208

Some statistics for length of abbreviated commit hash and current count:
L:	Count
7:	375825
8:	4884
9:	311
10:	12
11:	1

Personally, I think, that deprecation of GH_COMMIT (and related "legacy.tar.gz" codeload GitHub backend method) was untimely (premature). The GH_TAGNAME (and related "tar.gz" codeload GitHub backend method) has the same issues with 7 alphanumeric commit hash. It's possible to use longer abbreviated (or full) commit hash for "legacy.tar.gz" method with fix to automatic determination of WRKSRC, which I attached to this PR.

If you do/did your own tests, you might understand, that GitHub uses their codeload backend methods (legacy.tar.gz, legacy.zip, tar.gz, zip) through their frontend methods (archive, tarball, zipball) correctly, e.g. with using full commit hash, tag or branch:

1. With using "tarball" GitHub frontend method:
% curl -Lv -o freebsd-ports-f45daed3cf3-tarball.tar.gz https://github.com/freebsd/freebsd-ports/tarball/f45daed3cf3
Location: https://codeload.github.com/freebsd/freebsd-ports/legacy.tar.gz/f45daed3cf3b694a969192c615bceba0a247b4d4
% sha256 freebsd-ports-f45daed3cf3-tarball.tar.gz
SHA256 (freebsd-ports-f45daed3cf3-tarball.tar.gz) = fac04ff56d18ef6af23dd3d7b4271fd4d7e8e9d5924d84317921a07224745f93
% tar -tf freebsd-ports-f45daed3cf3-tarball.tar.gz | head -1 | cut -d'/' -f1
freebsd-freebsd-ports-f45daed

2. With using "archive" GitHub frontend method:
% curl -Lv -o freebsd-ports-f45daed3cf3-archive.tar.gz https://github.com/freebsd/freebsd-ports/archive/f45daed3cf3.tar.gz
Location: https://codeload.github.com/freebsd/freebsd-ports/tar.gz/f45daed3cf3b694a969192c615bceba0a247b4d4
% sha256 freebsd-ports-f45daed3cf3-archive.tar.gz
SHA256 (freebsd-ports-f45daed3cf3-archive.tar.gz) = 71b5def07f84f522cba78cc9d570c70521f404ecea0bd291743f9b81a6140d0b
% tar -tf freebsd-ports-f45daed3cf3-archive.tar.gz | head -1 | cut -d'/' -f1
freebsd-ports-f45daed3cf3b694a969192c615bceba0a247b4d4

The contents of 1 and 2 archives are the same, while different parent directory:
% tar -xf freebsd-ports-f45daed3cf3-tarball.tar.gz
% tar -xf freebsd-ports-f45daed3cf3-archive.tar.gz
% diff -qruN freebsd-freebsd-ports-f45daed freebsd-ports-f45daed3cf3b694a969192c615bceba0a247b4d4

The benefit of using "legacy.tar.gz" GitHub backend with full (or abbreviated) commit hash is short commit hash for parent directory. The tests shows exactly 7 alphanumeric commit hash even for requested full commit hash.

Again, since "legacy.tar.gz" and "tar.gz" backend methods exists on the same https://codeload.github.com server and works, the deprecation of some of them is untimely (premature), in my opinion. The existing methods might have possible usage, if used correctly.

There are other changes, which might need attention (e.g. removal of forced external DISTNAME changes and constant _GH* addition (where it could be a one-time feature or solved inside of concrete port, when needed)), but they are out of scope of this PR and have some opposition, related to previous troubles of using GitHub methods and different opinion(s) about implementation(s) in ports framework. Personally, I attached some GitHub usage examples in bug 194898 (e.g. attachment 154942 [details] and attachment 154943 [details]), if someone interested.
Comment 1 Bryan Drewery freebsd_committer freebsd_triage 2015-03-31 16:20:12 UTC
I don't see any reason to change GH_COMMIT. It will be removed soon. Please convert ports using it.
Comment 2 lightside 2015-04-01 00:46:34 UTC
(In reply to Bryan Drewery from comment #1)
> I don't see any reason to change GH_COMMIT.

Really? This is not surprising, because you are the person, who committed related changes.

(In reply to Bryan Drewery from comment #1)
> Please convert ports using it.

I already submitted related PRs. Some of the ports has a benefit, because they may use available tags, but some not, if there is a need to use full commit hash, i.e. very long directory name for WRKSRC. Overall, this is just a not needed (untimely) switch from one workable solution (legacy.tar.gz) to another (tar.gz), in my opinion. There was a possibility to use tags with proper changes, without the need to convert other ports. The one of the reasons to convert was a not supported longer abbreviated (or full) commit hash for GH_COMMIT method, which is wrong. But I agree, that this PR may be too late. There are other changes and public statements, which prevents it.

Feel free to close this PR, if you don't need proposed changes. At least, there are no other expressed opinions here to this moment.
Comment 3 lightside 2015-04-01 16:26:40 UTC
(In reply to comment #2)
> but some not, if there is a need to use full commit hash, i.e. very long directory name for WRKSRC.

Realistically, the need for full commit hash may be rare (to this time). Even creation of repository with many commits takes time. Personally, I didn't find abbreviated commit hash, which equals to full commit hash for some existing large repositories. But this is logical conclusion to use full commit hash, if you may want your download method works for previous versions of port, independent from future changes to repository.

What I said may be not related to concrete "legacy.tar.gz" or "tar.gz" GitHub backend methods. Technically, they are the same, but with different output, I guess.

I think, the possible issue with long name for WRKSRC directory is solvable with renaming to short directory name, e.g.:
WRKSRC=	${WRKDIR}/${PORTNAME}
# ...
post-extract:
	@(cd ${WRKDIR} && ${MV} ${GH_PROJECT}-* ${PORTNAME})

But this is what "legacy.tar.gz" GitHub backend method (potentially) does in result.

In other words, there are solutions, independent from where current ports framework takes its development. If ports framework doesn't fit (or help), it's possible to create own proper methods, when needed.
Comment 4 lightside 2015-04-04 07:12:03 UTC
Closed. The proposed patch is not acceptable, based on comment #1. There is no reason to wait a possible GH_COMMIT deprecation.