Bug 194898 - [bsd.sites.mk] Change GitHub URL to comply with their API
Summary: [bsd.sites.mk] Change GitHub URL to comply with their API
Status: Closed Not Accepted
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Ports Framework (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Bryan Drewery
URL:
Keywords: needs-qa, patch
Depends on:
Blocks:
 
Reported: 2014-11-08 12:52 UTC by Yuri Victorovich
Modified: 2016-08-26 09:08 UTC (History)
4 users (show)

See Also:


Attachments
patch (1.04 KB, patch)
2014-11-08 12:52 UTC, Yuri Victorovich
no flags Details | Diff
Include file with (possible) GitHub usage example (1.04 KB, text/plain)
2015-03-24 23:45 UTC, lightside
no flags Details
Include file with GitHub (frontend) usage example (1.09 KB, text/plain)
2015-03-29 02:36 UTC, lightside
no flags Details
Include file with GitHub (codeload backend) usage example (1.21 KB, text/plain)
2015-03-29 02:37 UTC, lightside
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Victorovich freebsd_committer 2014-11-08 12:52:55 UTC
Created attachment 149183 [details]
patch

From communication with GitHub employee, I found out that URL that we currently use is a legacy one and it doesn't comply with their API.

The attached patch fixes this problem.
Comment 1 John Marino freebsd_committer 2014-11-14 10:28:56 UTC
This looks important, CC portmgr
Comment 2 John Marino freebsd_committer 2015-01-26 07:25:40 UTC
portmgr, is anyone in charge of USE_GITHUB?  This PR needs some attention I think, but I don't know who to ping.
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2015-01-26 08:47:48 UTC
Yuri, 

Thanks for the report. Our use of the legacy URL is known (as in, intentional, at least in the recent past), and if I recall correctly there have (or had been) issues with the latest version as of a number of months ago.

We will need to investigate the transition to the latest for any regressions or issues in the framework before we match a switch.

It would be great if you could have your your contact @ GitHub ping us at git-admin at FreeBSD.org with the information they passed on to you.

If they could also including any technical details as to the status of the legacy API and the nature of the compliance required, that would also be helpful
Comment 4 Yuri Victorovich freebsd_committer 2015-01-26 10:00:18 UTC
Actually, here is the GitHub documentation the URL in patch is based on (this GitHub guy provided it):
https://developer.github.com/v3/repos/contents/#get-archive-link

The only part missing there is 'dummy=' part. This might be some undocumented URL convention, but it definitely works for github.

He actually answered all questions in his previous message:
The status of the URLs currently used: undocumented.

> In the future, if we change the codeload service, those codeload
> URLs you're using directly might stop working, and you'd be in
> trouble. However, if you're using the official API endpoint -- you
> could always count on being redirected to a working codeload URL.
Comment 5 Yuri Victorovich freebsd_committer 2015-03-09 00:51:14 UTC
Github API is quite well documented.
I use this URL to regularly update systems, ans have not seen any problems.
I see no reason to leave this pending.

Is there any particular concern?
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2015-03-09 06:23:44 UTC
None, other than time (mine or portmgr) to assess and QA (not just the patch)

You could likely save time here by attaching the comparative http response headers from both the current API endpoint in use, as well as the new, to address any questions that might be centred around any potentially different behaviours on that front.

Also, if you're on IRC, id love to have a real-time catchup to go over some of the specifics of your testing. I'm available on efnet and freenode (nickname is my freebsd.org username)
Comment 7 Yuri Victorovich freebsd_committer 2015-03-11 23:32:27 UTC
GitHub download happens through https, so http headers are encrypted.

The only use case for GitHub in FreeBSD context I know, please correct me if I am wrong, is the fetch phase of the port build process. This works fine, I use the API-based URL ever since I created this PR. I didn't see any problem as of yet, both with and without poudriere. Behavior seems exactly the same.

Please clarify what else I can do to facilitate adoption of this patch.
Comment 8 Bryan Drewery freebsd_committer 2015-03-12 13:31:10 UTC
We're switching to the non-legacy one very soon. Within days. https://reviews.freebsd.org/D748
Comment 9 Bryan Drewery freebsd_committer 2015-03-12 13:32:52 UTC
Your patch would break all github ports by the way as it would change checksums.
Comment 10 Yuri Victorovich freebsd_committer 2015-03-12 13:38:46 UTC
Change checksums? Why? I build them regularly, and this never happened.
Comment 11 Bryan Drewery freebsd_committer 2015-03-12 13:47:17 UTC
(In reply to yuri from comment #10)

The reason we use legacy is that legacy and non-legacy extract to different directories. That simple difference causes the tarball to have a different checksum from what is expected in the distinfos. It's possible this api.github.com/tarball link is also returning the legacy tarball. Anyway we are trying to get all ports to using the non-legacy version.
Comment 12 Bryan Drewery freebsd_committer 2015-03-12 13:48:35 UTC
I will attempt to modify the new patch to use the API-documented link as well.
Comment 13 Yuri Victorovich freebsd_committer 2015-03-12 13:52:23 UTC
I was under impression that tarballs are identical.
Comment 14 Bryan Drewery freebsd_committer 2015-03-19 17:24:39 UTC
(In reply to yuri from comment #13)

Yup, the API points to the legacy tarball. This is a problem. Github can't seem to agree on what their distfiles should be named.

The last time I discused this with Github they told me to not use Legacy anymore, so why does their API use the Legacy URL??

~/svn/ports.commit/ports-mgmt/poudriere-devel # curl -LI 'https://api.github.com/repos/freebsd/poudriere/tarball/80b7167'
HTTP/1.1 302 Found
Server: GitHub.com
Date: Thu, 19 Mar 2015 17:19:31 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 0
Status: 302 Found
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 49
X-RateLimit-Reset: 1426788619
Cache-Control: public, must-revalidate, max-age=0
Expires: Thu, 19 Mar 2015 17:19:31 GMT
Location: https://codeload.github.com/freebsd/poudriere/legacy.tar.gz/80b7167
X-XSS-Protection: 1; mode=block
X-Frame-Options: deny
Content-Security-Policy: default-src 'none'
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
Access-Control-Allow-Origin: *
X-GitHub-Request-Id: ADA0765E:290A:245EE31:550B0523
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Content-Type-Options: nosniff
Vary: Accept-Encoding
X-Served-By: 474556b853193c38f1b14328ce2d1b7d

HTTP/1.1 200 OK
Content-Length: 2713458
Access-Control-Allow-Origin: https://render.githubusercontent.com
Content-Security-Policy: default-src 'none'
X-XSS-Protection: 1; mode=block
X-Frame-Options: deny
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
ETag: "80b71677a5ee18ead28b6d39beaf2151f613ccb0"
Content-Type: application/x-gzip
Content-Disposition: attachment; filename=freebsd-poudriere-3.1.1-20-g80b7167.tar.gz
Date: Thu, 19 Mar 2015 17:19:32 GMT
Comment 15 Bryan Drewery freebsd_committer 2015-03-19 17:47:23 UTC
I won't change the URLS.

I could change our GITHUB_LEGACY to use the API to grab the codeload redirect
since it returns legacy currently. However it could easily be 'fixed' to return the non-legacy URL. The fact that their api.github.com does not support a VERSION in the URL is a big redflag to me that we can't be guaranteed behavior.

We'll just keep doing what we're doing. If Github changes their backend then
we will adjust our download based on how their main URL redirects.

As for just using the github.com/REPO/PROJECT/archive/HASH.tar.gz that is not
possible either since it will download a tarball that extracts as the FULLHASH
rather than whatever GH_COMMIT was set to. That's another big inconvenience for porters. The current USE_GITHUB allows any length of hash in GH_TAGNAME.
Comment 16 commit-hook freebsd_committer 2015-03-19 18:17:11 UTC
A commit references this bug:

Author: bdrewery
Date: Thu Mar 19 18:16:53 UTC 2015
New revision: 381655
URL: https://svnweb.freebsd.org/changeset/ports/381655

Log:
  Add a hint on how to deal with github download failures.

  PR:		194898

Changes:
  head/Mk/bsd.sites.mk
Comment 17 lightside 2015-03-24 23:45:30 UTC
Created attachment 154783 [details]
Include file with (possible) GitHub usage example

Hello.

I think, there is a misunderstanding (of "whole picture").
As far as I know, there are three (external) methods on GitHub, which is useful to get a repository archive(s), based on 7 alphanumeric commit or tag name: archive, tarball, zipball.

Let's do some examples (with omitting curl output, except Location).
Examples, which lead to known output for WRKSRC:
1. Get tar.gz archive, based on 7 alphanumeric commit:
% curl -Lv -o spyder-e13b93b.tar.gz https://github.com/spyder-ide/spyder/tarball/e13b93b
Location: https://codeload.github.com/spyder-ide/spyder/legacy.tar.gz/e13b93b23036878215618cd71ff8ea35dd6a5c10
% sha256 spyder-e13b93b.tar.gz
SHA256 (spyder-e13b93b.tar.gz) = dfc2c0f6f4e9a21dfd3553182b150966c558d2d5d9e82fc959d98e2f5d7f59b2
% tar -tf spyder-e13b93b.tar.gz | head -1 | cut -d'/' -f1
spyder-ide-spyder-e13b93b

2. Get zip archive, based on 7 alphanumeric commit:
% curl -Lv -o spyder-e13b93b.zip https://github.com/spyder-ide/spyder/zipball/e13b93b
Location: https://codeload.github.com/spyder-ide/spyder/legacy.zip/e13b93b23036878215618cd71ff8ea35dd6a5c10
% sha256 spyder-e13b93b.zip
SHA256 (spyder-e13b93b.zip) = 616f8fb5067c22311f8670fc87c933c48d88e07286f87784ed4d8fc042987edf
% tar -tf spyder-e13b93b.zip | head -1 | cut -d'/' -f1
spyder-ide-spyder-e13b93b

3. Get tar.gz archive, based on tag name:
3a. GH_TAGNAME=v${PORTVERSION} example:
% curl -Lv -o spyder-2.3.4.tar.gz https://github.com/spyder-ide/spyder/archive/v2.3.4.tar.gz
Location: https://codeload.github.com/spyder-ide/spyder/tar.gz/v2.3.4
% sha256 spyder-2.3.4.tar.gz
SHA256 (spyder-2.3.4.tar.gz) = d5d87dc27b085771c8878ef611a63b5a4dc7f4560e5d96421f1fb1eb51fd4883
% tar -tf spyder-2.3.4.tar.gz | head -1 | cut -d'/' -f1
spyder-2.3.4

3b. GH_TAGNAME=release-${PORTVERSION} example:
% curl -Lv -o wxlauncher-0.9.5.tar.gz https://github.com/wxLauncher/wxlauncher/archive/release-0.9.5.tar.gz
Location: https://codeload.github.com/wxLauncher/wxlauncher/tar.gz/release-0.9.5
% sha256 wxlauncher-0.9.5.tar.gz
SHA256 (wxlauncher-0.9.5.tar.gz) = 2a4ebfdf71114b5dda09713b6896eec908e16280dfb7d25bb448887afe74fc1a
% tar -tf wxlauncher-0.9.5.tar.gz | head -1 | cut -d'/' -f1
wxlauncher-release-0.9.5

4. Get zip archive, based on tag name:
% curl -Lv -o spyder-2.3.4.zip https://github.com/spyder-ide/spyder/archive/v2.3.4.zip
Location: https://codeload.github.com/spyder-ide/spyder/zip/v2.3.4
% sha256 spyder-2.3.4.zip
SHA256 (spyder-2.3.4.zip) = c55ea5b804e01333ac05f957f6c60e51669c846e74754fe94c576aad5680eab9
% tar -tf spyder-2.3.4.zip | head -1 | cut -d'/' -f1
spyder-2.3.4

Examples, which might lead to unknown (in time) output for WRKSRC:
5. Get tar.gz archive, based on 7 alphanumeric commit with using "archive" method (need to know full hash commit):
% curl -Lv -o spyder-e13b93bfull.tar.gz https://github.com/spyder-ide/spyder/archive/e13b93b.tar.gz
Location: https://codeload.github.com/spyder-ide/spyder/tar.gz/e13b93b23036878215618cd71ff8ea35dd6a5c10
% sha256 spyder-e13b93bfull.tar.gz
SHA256 (spyder-e13b93bfull.tar.gz) = ec53bb7903a03c68d5ac3ab84e3c2c2b37c57a50054584fdebce21b940c2769a
% tar -tf spyder-e13b93bfull.tar.gz | head -1 | cut -d'/' -f1
spyder-e13b93b23036878215618cd71ff8ea35dd6a5c10

6. Get zip archive, based on 7 alphanumeric commit with using "archive" method (need to know full hash commit):
% curl -Lv -o spyder-e13b93bfull.zip https://github.com/spyder-ide/spyder/archive/e13b93b.zip
Location: https://codeload.github.com/spyder-ide/spyder/zip/e13b93b23036878215618cd71ff8ea35dd6a5c10
% sha256 spyder-e13b93bfull.zip
SHA256 (spyder-e13b93bfull.zip) = 50c8e3128042a26b8b33bdb408d2c3cbaaffc44eb69e53500d687beda8975290
% tar -tf spyder-e13b93bfull.zip | head -1 | cut -d'/' -f1
spyder-e13b93b23036878215618cd71ff8ea35dd6a5c10

7. Get tar.gz archive, based on tag name with using "tarball" method (need to know (incorrect) 7 alphanumeric commit):
% curl -Lv -o spyder-2.3.4-tarball.tar.gz https://github.com/spyder-ide/spyder/tarball/v2.3.4
Location: https://codeload.github.com/spyder-ide/spyder/legacy.tar.gz/v2.3.4
% sha256 spyder-2.3.4-tarball.tar.gz
SHA256 (spyder-2.3.4-tarball.tar.gz) = 81a36df1434cad1277032530e5857042d65c161bc176d4c398fe9387fd5cb578
% tar -tf spyder-2.3.4-tarball.tar.gz | head -1 | cut -d'/' -f1
spyder-ide-spyder-a23539a

8. Get zip archive, based on tag name with using "zipball" method (need to know (incorrect) 7 alphanumeric commit):
% curl -Lv -o spyder-2.3.4-zipball.zip https://github.com/spyder-ide/spyder/zipball/v2.3.4
Location: https://codeload.github.com/spyder-ide/spyder/legacy.zip/v2.3.4
% sha256 spyder-2.3.4-zipball.zip
SHA256 (spyder-2.3.4-zipball.zip) = d48d46849c6d1ad63682ecc3a5dcfe85ed0ad3ac2b2aff4df133c57bdfd4e56d
% tar -tf spyder-2.3.4-zipball.zip | head -1 | cut -d'/' -f1
spyder-ide-spyder-a23539a

9. Get tar.gz archive, based on master branch (need to know 7 alphanumeric commit, which might be changed by "moving" branch):
% curl -Lv -o spyder-master.tar.gz https://github.com/spyder-ide/spyder/tarball/master
Location: https://codeload.github.com/spyder-ide/spyder/legacy.tar.gz/master
% sha256 spyder-master.tar.gz
SHA256 (spyder-master.tar.gz) = 2cfbdddc2ab5b625da41b5c817f239768c09f32e4ab61e89e44a1b71f32d9244
% tar -tf spyder-master.tar.gz | head -1 | cut -d'/' -f1
spyder-ide-spyder-716ead8

10. Get tar.gz archive, based on 2.3 branch (need to know 7 alphanumeric commit, which might be changed by "moving" branch):
% curl -Lv -o spyder-2.3-branch.tar.gz https://github.com/spyder-ide/spyder/tarball/2.3
Location: https://codeload.github.com/spyder-ide/spyder/legacy.tar.gz/2.3
% sha256 spyder-2.3-branch.tar.gz
SHA256 (spyder-2.3-branch.tar.gz) = 5961727ac5c8a8a8343c526e0862b4ffad13aeda8a3e770e80d4dc114ade055a
% tar -tf spyder-2.3-branch.tar.gz | head -1 | cut -d'/' -f1
spyder-ide-spyder-4b81c9a

In result, 1-4 is correct examples and 5-8(-10) is incorrect examples, in case of automatic determination of WRKSRC.

The 1 example gives following WRKSRC:
MASTER_SITES=	https://github.com/${GH_ACCOUNT}/${GH_PROJECT}/tarball/${GH_COMMIT}?dummy=/
PORTNAME=	spyder
PORTVERSION=	2.3.4
GH_ACCOUNT=	spyder-ide
GH_PROJECT=	${PORTNAME}
GH_COMMIT=	e13b93b
WRKSRC=	${WRKDIR}/${GH_ACCOUNT}-${GH_PROJECT}-${GH_COMMIT}

The 2 example is the same as 1, but with different MASTER_SITES:
MASTER_SITES=	https://github.com/${GH_ACCOUNT}/${GH_PROJECT}/zipball/${GH_COMMIT}?dummy=/

The 3a-4 examples gives following WRKSRC:
MASTER_SITES=	https://github.com/${GH_ACCOUNT}/${GH_PROJECT}/archive/${GH_TAGNAME}${EXTRACT_SUFX}?dummy=/
PORTNAME=	spyder
PORTVERSION=	2.3.4
GH_ACCOUNT=	spyder-ide
GH_PROJECT=	${PORTNAME}
GH_TAGNAME=	v${PORTVERSION}
WRKSRC=	${WRKDIR}/${GH_PROJECT}-${GH_TAGNAME:C/^v([0-9])/\1/}

The 3b example gives the same WRKSRC as 3a-4 examples:
PORTNAME=	wxlauncher
PORTVERSION=	0.9.5
GH_ACCOUNT=	wxLauncher
GH_PROJECT=	${PORTNAME}
GH_TAGNAME=	release-${PORTVERSION}
WRKSRC=	${WRKDIR}/${GH_PROJECT}-${GH_TAGNAME:C/^v([0-9])/\1/}

Therefore, "archive" method is useful for known (release) tag name and "tarball" / "zipball" methods are useful for known 7 alphanumeric commit (or not moving branch, as reference, but 7 alphanumeric commit should be known for WRKSRC).

If you compare external and internal (where external method(s) redirects) GitHub methods, then you might do following conclusions, related to FreeBSD ports changes:
1. Before ports r381618 changes:
There was "tarball" method in main use and GHR shortcut for MASTER_SITES, which represented "archive" method.
2. After ports r381618 changes:
The "tarball" method is (considered to) deprecated and GHR shortcut removed. The "archive" method in main use. The DISTNAME changes outside of concrete port, which (might) lead to incorrect distinfo for concrete port/user/maintainer (i.e. broken patch(es) for port(s), which uses GitHub, if there was a change for DISTNAME between submit and commit stages).

I think, for correct GitHub usage, there is a need to return "tarball" method (based on GH_COMMIT), add "zipball" method and correctly use "archive" method (based on GH_TAGNAME and EXTRACT_SUFX). The DISTNAME external changes should be removed, because if there are changes to external/internal methods, it could be done inside of concrete port (e.g. with using DISTVERSIONSUFFIX) as one-time change, but not on constant basis as after ports r381688 changes. The possible example is attached to this comment, based on external GitHub methods/interface (i.e. archive, tarball, zipball).

It's possible to use internal GitHub methods (i.e. legacy.tar.gz, legacy.zip, tar.gz, zip), but they might change, as said in comment #4.

I suggest to reconsider GitHub usage.

If won't, then don't prohibit (e.g. by reviewer(s) and/or Porter's Handbook) the correct usage of MASTER_SITES instead of current "new" USE_GITHUB by people, who submit patches for ports. There are Bitbucket (https://bitbucket.org), GitLab (https://gitlab.com), http://repo.or.cz, etc. public Git repositories, which provides similar method(s) (but with different name(s), type(s) of supported archive(s), WRKSRC output, etc.).

Thanks for attention.
Comment 18 Bryan Drewery freebsd_committer 2015-03-25 00:21:56 UTC
(In reply to lightside from comment #17)

Reading your very long comment I cannot find any point. What are you trying to say? What is your objection to this? It works. There is a lot of history here you are sweeping away thinking we don't know what we're doing.

Github has told us specifically to not use the legacy link anymore. So why should we? Why should GH_COMMIT be required? It is redundant and inconvenient since you must specify it even when using a tag. If the tag is *annotated* you get a totally different hash than the github interface even shows you. The new USE_GITHUB is far simpler and no longer uses the legacy URL.

There's no reason to not use archive for everything, as we do now. The old tarball (legacy) method required *exactly* 7 characters to work. The new method supports any unique hash, as well as the git-describe output.

To top it all off, we absolutely cannot use the main links rather than codeload because it does not respect the hash we download it. It changes the hash and the resulting file. So then we are shoehorned into using full hashes (which the github interface hides from you and makes inconvenient to use).

Also this new USE_GITHUB allows portscout to finally work while the old did not.

Again, you have not explained why you think the new framework is flawed. It works. It is simpler. The original request in this ticket was to use the API link, which github has specifically told us not to use. It is the legacy link. They changed the checksums before by changing the main download links to a new format and keeping legacy as the old format. There's no reason to risk using the API link (as I explained in comment #15) as someone likely will "fix it" at github someday and then break out checksums again. And using it would also remove all of the conveniences that the new USE_GITHUB has given us.

We (ports framework) have full control over DISTNAME. What we do not control is the extraction name. We do our best to determine the WRKSRC. I think you may misunderstand some of this. The change in r381688 has no detriment to porters.

The point of USE_GITHUB is to simplify fetching files from them. Having 200 ports use the long hack in MASTER_SITES, setting WRKSRC, setting DISTNAME, is not good. We are removing duplication. Combine that with the history that github has no respect for checksums, having the USE_GITHUB feature simplifies dealing with fallout.

If you submit patches that use MASTER_SITES=github.com/..../?dummy they will be changed to USE_GITHUB.
Comment 19 Bryan Drewery freebsd_committer 2015-03-25 00:41:56 UTC
I should mention too that GHR was removed because it is effectively the same
as what the current USE_GITHUB is. GHR was using /archive/ which just redirects to the same URL we use. There was no need for an additional
MASTER_SITE there.

GHR never worked as *intended* either as it was intended to download files
uploaded as "releases" from /releases/. Adding this feature back in to
work with actual "releases" would be fine.
Comment 20 lightside 2015-03-25 01:18:15 UTC
In other words, you don't consider GitHub API frontend as a reliable service (based on previous troubles, about which I don't know), but proposed to use it's codeload backend (https://codeload.github.com)?

(In reply to comment #18)
> If you submit patches that use MASTER_SITES=github.com/..../?dummy they will be changed to USE_GITHUB.

I already used correct GitHub API for some of maintained port and it not affected by these (harmful, in my opinion) changes and distfile re-creation. There are patches for other maintained ports, some of which re-created twice, because of GH0, etc. changes to DISTNAME, and MASTER_SITES finally. I don't consider (your) sentence about MASTER_SITES as noteworthy, otherwise as "vandalism" and ignorance of maintainership by other people.

(In reply to comment #19)
> GHR never worked as *intended* either as it was intended to download files
> uploaded as "releases" from /releases/. Adding this feature back in to
> work with actual "releases" would be fine.

I used it correctly (with DISTVERSIONPREFIX=v) for some port.

But I think, that what I did is doesn't matter with such attitude and direction of dialogue (e.g. the "patent" on GitHub methods is on USE_GITHUB and not MASTER_SITES anymore; at least, other Git repositories are clear from such USE_* "abuse").
Comment 21 Bryan Drewery freebsd_committer 2015-03-25 01:37:51 UTC
(In reply to lightside from comment #20)


> In other words, you don't consider GitHub API frontend as a reliable service
> (based on previous troubles, about which I don't know), but proposed to use
> it's codeload backend (https://codeload.github.com)?
> 

Correct. There is history here. We used the main www interface for downloads in the past and Github changed the checksums (due to changing the filename and extraction directory). We worked around this by using the legacy.tar.gz. This is *exactly* what the API URL redirects to as well. Please see comment #14. I understand the "big picture" very well.

> (In reply to comment #18)
> > If you submit patches that use MASTER_SITES=github.com/..../?dummy they will be changed to USE_GITHUB.
> 
> I already used correct GitHub API for some of maintained port and it not
> affected by these (harmful, in my opinion) changes and distfile re-creation.
> There are patches for other maintained ports, some of which re-created
> twice, because of GH0, etc. changes to DISTNAME, and MASTER_SITES finally. 

Can you please point to a port where the new USE_GITHUB results in an actual problem? GH0 is used to avoid rerolled distfiles (and invalidating our own
provided distfile cache) now and in the future. I see absolutely no harm
in this. If there is please explain. Distfiles fetched from Github (any method) should not be considered to match custom-rolled distfiles. As such adding GH0 makes even more sense. It is ensuring there is no confusion that this distfile was from Github. If github changes their checksumming again then GH0 will become GH1.

Another big point here is that we almost banned the usage of github in MASTER_SITES in the past due to the rerolled checksums and github just flat out not working. This pattern was constant in 2013. It has become stable now at least.

> I don't consider (your) sentence about MASTER_SITES as noteworthy, otherwise
> as "vandalism" and ignorance of maintainership by other people.
> 

> (In reply to comment #19)
> > GHR never worked as *intended* either as it was intended to download files
> > uploaded as "releases" from /releases/. Adding this feature back in to
> > work with actual "releases" would be fine.
> 
> I used it correctly (with DISTVERSIONPREFIX=v) for some port.

You missed the point entirely. For someone who tried to school us on the download methods you missed that Github also provides distinct "releases" and "downloads". The author of GHR has told me they intended it to use "releases", which is not what it was using. Yes it worked fine. It is pretty much exactly what the new USE_GITHUB is. It is no longer needed. It is redundant.

> 
> But I think, that what I did is doesn't matter with such attitude and
> direction of dialogue (e.g. the "patent" on GitHub methods is on USE_GITHUB
> and not MASTER_SITES anymore; at least, other Git repositories are clear
> from such USE_* "abuse").

As a maintainer you have no "patent" on the Makefile. You are in charge of keeping it up-to-date, using the framework appropriately and working with users.

As a committer it is my job to ensure that changes that go in are using the framework properly.

As a portmgr it is my job to maintain the framework and to make it simple to use for Users. I see the new USE_GITHUB as far simpler than the older (which required exactly 7 char GH_COMMIT, knowing the hash for annotated tags, modifying WRKSRC sometimes). Now none of that is needed and portscout works with tags. These bugs have existed since 2013 when USE_GITHUB went in. They really should not have persisted as long as they did.

At least 6 committers have been involved in getting the new USE_GITHUB changes in. No one has objected to any of it or raised any concerns beyond causing rerolled distfiles (hence GH0). All bugs that have come up have been addressed.


If you have an actual bug causing an issue in a Makefile please let me know.
Comment 22 lightside 2015-03-25 02:29:10 UTC
(In reply to comment #21)
> Can you please point to a port where the new USE_GITHUB results in an actual problem?

It's related to some ports (and their PRs), which I maintain currently (deskutils/treesheets, games/wxlauncher) and proposed one by different maintainer (devel/spyder, which used GHR in some moment in time, before DISTNAME changed, which broke the distname for shar archive).

> GH0 is used to avoid rerolled distfiles (and invalidating our own
> provided distfile cache) now and in the future. I see absolutely no harm
> in this. If there is please explain.

I think, this should be a one-time feature, which controlled inside of concrete port, for example, GH_SANITIZE=yes. When GitHub method(s) stabilized, this feature might be removed from the port, e.g. for newer version of port, which doesn't need such changes to DISTNAME.

> Another big point here is that we almost banned the usage of github in MASTER_SITES
> in the past due to the rerolled checksums and github just flat out not working.
> This pattern was constant in 2013. It has become stable now at least.

> You missed the point entirely. For someone who tried to school us on the download
> methods you missed that Github also provides distinct "releases" and "downloads".

I don't. I just haven't such (broken) experience with GitHub and based my examples on current (stable, as you said) situation.

> I see the new USE_GITHUB as far simpler than the older (which required
> exactly 7 char GH_COMMIT, knowing the hash for annotated tags, modifying WRKSRC sometimes).
> Now none of that is needed and portscout works with tags.

The 7 alphanumeric commit is needed for Git repositories, which doesn't use tags (for some reason). There are "dated" versions for them, e.g. 0.0.20150325. It's easy to copy such Git reference from GitHub commits, if needed. The currently used internal "tar.gz" method understands it, for sure.

> If you have an actual bug causing an issue in a Makefile please let me know.

I just have a different experience and opinion, which correlates with this PR and some other changes. I based my opinion on concrete examples (with correct and incorrect usage).
Comment 23 lightside 2015-03-27 00:21:26 UTC
Hello, Bryan Drewery.

First of, I'm sorry about possible interruption of your work. I think, that I should not have asked you about the reliability of GitHub API (front-end) service in public. The GitHub (API) status is monitored:
https://status.github.com
And there were issues in the past:
https://status.github.com/api/messages.json

In the end, what "bothered" me, when I used new USE_GITHUB is DISTNAME changes (which lead to duplication in names), I think. But I partially solved it by DISTNAME define in the port (except _GH0 part, just in case). And overall it might be ok, if there are no further changes to it (e.g. no broken patches and shar archives for PRs).

I think, I could still use MASTER_SITES for my private ports, if needed.

Objectively, the front-end (tarball, zipball, archive) and back-end (legacy.tar.gz, legacy.zip, tar.gz, zip) GitHub methods are workable solutions, if used correctly. But I understand a possible troubles in implementation(s) and a wish to use some "universal" method, instead of many of them. Looks like, it's tar.gz (zip) currently.

Thanks for your work (including on poudriere).
Comment 24 lightside 2015-03-29 02:31:16 UTC
I did some tests and found, that there are cases, where 7 alphanumeric commit hash doesn't work to fetch GitHub archive(s).

For examples, I chose FreeBSD ports tree on GitHub:
https://github.com/freebsd/freebsd-ports

Clone remote Git repository:
% git clone https://github.com/freebsd/freebsd-ports.git freebsd-ports

Create "short_hash_commits.txt" file with abbreviated hashes for commits:
% cd freebsd-ports && git log --pretty=%h > ../short_hash_commits.txt

Search ten last abbreviated commit hashes, which doesn't contain exact 7 alphanumeric commit hash:
% cd .. && grep -v '^.\{7\}$' short_hash_commits.txt | tail -n 10
b9bdb328
02c9c6f9
c6303297
06923f668
787658b4
b14f656b
7c15bb39
0fe103cd
d0a8b053
d61143786

Get full commit hash for last abbreviated commit hash in result, which is first (found) non 7 alphanumeric commit hash for repository:
% cd freebsd-ports && git log -1 --pretty=%H d61143786
d6114378631336543152abf4ca1c4c5f30799562

Let's use "tree" frontend method on GitHub to do following tests:
1. Test availability of abbreviated commit hash:
% curl -Is https://github.com/freebsd/freebsd-ports/tree/d61143786 | grep ^HTTP
HTTP/1.1 200 OK

2. Test availability of full commit hash:
% curl -Is https://github.com/freebsd/freebsd-ports/tree/d6114378631336543152abf4ca1c4c5f30799562 | grep ^HTTP
HTTP/1.1 200 OK

3. Test availability of 7 alphanumeric commit hash, based on tested hash commit:
% curl -Is https://github.com/freebsd/freebsd-ports/tree/d611437 | grep ^HTTP
HTTP/1.1 404 Not Found

Which shows, that 7 alphanumeric commit hash (d611437) doesn't work in this case.
But currently, even "commits" (and "commit") GitHub frontend method(s) shows "d611437" as for parent "50f79c5" abbreviated commit hash (while it uses full hashes for commits in links, of course):
https://github.com/freebsd/freebsd-ports/commits/50f79c5
https://github.com/freebsd/freebsd-ports/commit/50f79c5

The question is:
"Could other GitHub frontend and backend methods use mentioned 7 alphanumeric commit hash?"
Let's find out.

1. With abbreviated commit hash by using "tarball" frontend method:
% curl -Lv -o freebsd-ports-d61143786-tarball.tar.gz https://github.com/freebsd/freebsd-ports/tarball/d61143786
Location: https://codeload.github.com/freebsd/freebsd-ports/legacy.tar.gz/d6114378631336543152abf4ca1c4c5f30799562
% sha256 freebsd-ports-d61143786-tarball.tar.gz
SHA256 (freebsd-ports-d61143786-tarball.tar.gz) = 3b3ea684418e3fac86fcc9c12decf1e11892681edbbfdd9719abedc65905bc11
% tar -tf freebsd-ports-d61143786-tarball.tar.gz | head -1 | cut -d'/' -f1
freebsd-freebsd-ports-d611437

2. With full commit hash by using "tarball" frontend method:
% curl -Lv -o freebsd-ports-d61143786full-tarball.tar.gz https://github.com/freebsd/freebsd-ports/tarball/d6114378631336543152abf4ca1c4c5f30799562
Location: https://codeload.github.com/freebsd/freebsd-ports/legacy.tar.gz/d6114378631336543152abf4ca1c4c5f30799562
% sha256 freebsd-ports-d61143786full-tarball.tar.gz
SHA256 (freebsd-ports-d61143786full-tarball.tar.gz) = 3b3ea684418e3fac86fcc9c12decf1e11892681edbbfdd9719abedc65905bc11
% tar -tf freebsd-ports-d61143786full-tarball.tar.gz | head -1 | cut -d'/' -f1
freebsd-freebsd-ports-d611437

3. With 7 alphanumeric commit hash by using "tarball" frontend method:
% curl -Lv -o freebsd-ports-d611437-tarball.tar.gz https://github.com/freebsd/freebsd-ports/tarball/d611437
Location: https://codeload.github.com/freebsd/freebsd-ports/legacy.tar.gz/d611437
HTTP/1.1 404 Not Found
% cat freebsd-ports-d611437-tarball.tar.gz
Not Found

4. With 7 alphanumeric commit hash by using "legacy.tar.gz" backend method:
% curl -Lv -o freebsd-ports-d611437-legacy.tar.gz https://codeload.github.com/freebsd/freebsd-ports/legacy.tar.gz/d611437
HTTP/1.1 404 Not Found
% cat freebsd-ports-d611437-legacy.tar.gz
Not Found

5. With 7 alphanumeric commit hash by using "archive" frontend method:
curl -Lv -o freebsd-ports-d611437-archive.tar.gz https://github.com/freebsd/freebsd-ports/archive/d611437.tar.gz
Location: https://codeload.github.com/freebsd/freebsd-ports/tar.gz/d611437
HTTP/1.1 404 Not Found
% cat freebsd-ports-d611437-archive.tar.gz
Not Found

6. With 7 alphanumeric commit hash by using "tar.gz" backend method:
curl -Lv -o freebsd-ports-d611437-tar.tar.gz https://codeload.github.com/freebsd/freebsd-ports/tar.gz/d611437
HTTP/1.1 404 Not Found
% cat freebsd-ports-d611437-tar.tar.gz
Not Found

In conclusion, the possible answer is:
"There are cases, where 7 alphanumeric commit hash doesn't work (or ambiguous). Possible to use abbreviated commit hash or other (including full) commit hash, in these cases."

But I didn't test whether previous abbreviated commit hash is unique across future changes to the repository. Let's find out.

Create testing repository based on existing local repository:
% git clone freebsd-ports freebsd-ports-test

Create (and switch to) "test" branch, which based on 87f9190 commit:
% cd freebsd-ports-test && git checkout 87f9190 -b test

Create "first_test_commits.txt" file with abbreviated hashes for commits:
% git log --pretty=%h > ../first_test_commits.txt

Change current directory to parent directory:
% cd ..

Create "test.sh" file with following contents:
% cat test.sh
#!/bin/sh
cd freebsd-ports-test
touch test.txt
git add test.txt
for a in $(seq 1 256); do
	(strings -n 1 < /dev/urandom | tr -d '[:space:]' | head -c80; echo) >> test.txt;
	git commit -am "$a change";
done;

# The (modified) random example is from http://arp242.net/weblog/Generate_passwords_from_the_commandline.html

Make it runnable:
% chmod +x test.sh

And run (with some waiting time):
% ./test.sh

The output for first and last commit might look like as follows (but random):
[test fffaaf9] 1 change
 1 file changed, 1 insertion(+)
<cut>
[test 2a31b94] 256 change
 1 file changed, 1 insertion(+)

Create "second_test_commits.txt" file with abbreviated hashes for commits:
% cd freebsd-ports-test && git log --pretty=%h > ../second_test_commits.txt

Compare first and second files with commits:
% cd .. && diff -u0 first_test_commits.txt second_test_commits.txt | tail -n 10
+45e3d04
+6cffc99
+d969db1
+af22dd6
+680e0d4
+882d6a4
+fffaaf9
@@ -114164 +114420 @@
-46be8d0
+46be8d0d

This shows, that the abbreviated commit 46be8d0 hash changed to 46be8d0d:
https://github.com/freebsd/freebsd-ports/commits/46be8d0d

In conclusion, even previous abbreviated commit hash might change with changes to repository.

Sorry for long comment.
Comment 25 lightside 2015-03-29 02:36:00 UTC
Created attachment 154942 [details]
Include file with GitHub (frontend) usage example

I changed include file for GitHub frontend, based on conclusions in comment #24.
It's possible to use abbreviated (full) commit hash for "tarball" and "zipball" methods, while use 7 alphanumeric commit for WRKSRC:
_GH_COMMIT_LEGACY=	${GH_COMMIT:C/^(.{7}).*/\1/}
WRKSRC?=	${WRKDIR}/${GH_ACCOUNT}-${GH_PROJECT}-${_GH_COMMIT_LEGACY}
Comment 26 lightside 2015-03-29 02:37:10 UTC
Created attachment 154943 [details]
Include file with GitHub (codeload backend) usage example

Also I created include file for GitHub (codeload) backend. Just for example.
Comment 27 lightside 2015-03-29 19:43:48 UTC
(In reply to comment #23)
> And there were issues in the past:
> https://status.github.com/api/messages.json

There is other link for a browser:
https://status.github.com/messages

For example, based on status messages, there are actions against DDoS attack(s) today:
https://status.github.com/messages/2015-03-29