Bug 215968 - svn.freebsd.org and repo.freebsd.org have different metadata on some commits
Summary: svn.freebsd.org and repo.freebsd.org have different metadata on some commits
Status: Closed Overcome By Events
Alias: None
Product: Services
Classification: Unclassified
Component: Core Infrastructure (show other bugs)
Version: unspecified
Hardware: Any Any
: --- Affects Only Me
Assignee: Cluster Admin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-11 14:44 UTC by Ed Maste
Modified: 2021-08-26 13:59 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Maste freebsd_committer freebsd_triage 2017-01-11 14:44:27 UTC
% svn log -r 277323 svn://svn.freebsd.org/base       
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 16:17:45 +0000 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------
% host svn.freebsd.org
svn.freebsd.org is an alias for svnmir.geo.freebsd.org.
svnmir.geo.freebsd.org has address 96.47.72.69
svnmir.geo.freebsd.org has IPv6 address 2610:1c1:1:606c::e6a:0
svnmir.geo.freebsd.org mail is handled by 0 .


% svn log -r 277323 svn+ssh://repo.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 16:17:27 +0000 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------
Comment 1 Ed Maste freebsd_committer freebsd_triage 2017-02-21 02:35:06 UTC
As an aside, the existing FreeBSD svn2git mirror has the correct timestamp:

master: https://github.com/freebsd/freebsd/commit/081af4da16b9046c019ca40f64b1fb7ee8c6dca1

commit 081af4da16b9046c019ca40f64b1fb7ee8c6dca1                                 
Author: dim <dim@FreeBSD.org>                                                   
Date:   Sun Jan 18 16:17:27 2015 +0000                                          

Which implies that svn2git's repo was mirrored from repo.freebsd.org, and that anyone reproducing the git conversion must run against repo.freebsd.org, not svn.freebsd.org.
Comment 2 Allan Jude freebsd_committer freebsd_triage 2017-03-01 17:27:20 UTC
I looked into this a little further, it seems only the NYI east coast svn mirror is out of sync.

Yahoo, Bytemark, Yandex, etc are in sync

%svn log -r 277323 svn://svnmir.ysv.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 11:17:27 -0500 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------
%svn log -r 277323 svn://svnmir.nyi.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 11:17:45 -0500 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------
%svn log -r 277323 svn://svnmir.bme.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 11:17:27 -0500 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------
%svn log -r 277323 svn://svnmir.twn.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 11:17:27 -0500 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

%svn log -r 277323 svn://svnmir.ydx.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 11:17:27 -0500 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------
%svn log -r 277323 svn+ssh://repo.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 11:17:27 -0500 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------
Comment 3 Ed Maste freebsd_committer freebsd_triage 2017-03-01 17:36:47 UTC
Thanks Allan! I guess these are GeoDNS and so I was always getting the bad one. Would it be feasible to temporarily remove NYI from DNS, re-mirror, and re-add?
Comment 4 Peter Wemm freebsd_committer freebsd_triage 2017-03-01 17:45:35 UTC
There's two issues in play.

#1 - the way the mirrors were originally set up with pull-based svnsync was vulnerable to a race.  svnsync writes to the destination with two transactions - one transaction with the contents, then a second to make the metadata match.  When we had svnsync->svnsync chained sometimes it saw the intermediate state and copied the wrong metadata.  svnsync wasn't intended to be used this way - it was intended to be push based - and this is the consequence.

#2 - an internal problem - people figured out how to cheat and bypass the prevention of unsupported forced commits and this has caused corruption on an internal repository that isn't publicly visible.

I believe #1 was fixed and is the cause of the sync error above.  The time window appears to be in the window where I was working on it last and this must have slipped through the cracks.  I have manually resynced the metadata for the revision at hand.

svnmir@svnmir.nyi:/home/svnmir % env HOME=/home/svnmir svnsync copy-revprops -r 277323 file:///home/svn/base
Copied properties for revision 277323.
svnmir@svnmir.nyi:/home/svnmir % 

Resolving #2 is a separate nightmare and an ongoing project.
Comment 5 Peter Wemm freebsd_committer freebsd_triage 2017-03-01 17:47:39 UTC
And a double check from outside:

peter@overcee[ 9:46AM]~-1004> env TZ=UTC svn log -r 277323 svn://svnmir.nyi.freebsd.org/base
------------------------------------------------------------------------
r277323 | dim | 2015-01-18 16:17:27 +0000 (Sun, 18 Jan 2015) | 3 lines

Vendor import of llvm RELEASE_360/rc1 tag r226102 (effectively, 3.6.0 RC1):
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_360/rc1@226102

------------------------------------------------------------------------

This matches the other sites now.
Comment 6 Ed Maste freebsd_committer freebsd_triage 2017-03-01 17:53:34 UTC
Thanks Peter!

> #2 - an internal problem - people figured out how to cheat and bypass the
> prevention of unsupported forced commits and this has caused corruption on an
> internal repository that isn't publicly visible.
>
> Resolving #2 is a separate nightmare and an ongoing project.

Do you know which way this one will be resolved? (i.e., can we treat the now-in-sync public mirrors as authoritative, for the purpose of svn-git mirroring?)
Comment 7 Peter Wemm freebsd_committer freebsd_triage 2017-03-01 18:20:54 UTC
With a huge hand-wavey caveat that this is my current understanding and not fully researched:

It is my understanding that the public mirrors should be correct.

It is my understanding that the svn->git process for git-beta imported incorrect metadata in the past but is no longer doing so.  It has its own private mirror that it is supposed to be treating as the source of truth.  I believe there is a revision in that private mirror that is out of sync with git due to a miscommunication over a fix request, but that mirror can be reverted to match what was imported to git for reproducability.

I believe that our internal repository does not have old (illegal) forced commits recorded consistently with commit mail, logs and public mirrors.  Assuming it is fixed, the internal repo will be repaired to match the public view and commit logs.

The problem with the internal repository is that the illegal forced commits don't have all the no-op pathnames associated with them and don't appear in the logs for the files they were originally forced onto.
Comment 8 Ed Maste freebsd_committer freebsd_triage 2017-03-01 18:29:20 UTC
> It is my understanding that the public mirrors should be correct.

Great.

> It is my understanding that the svn->git process for git-beta imported incorrect metadata in the past but is no longer doing so.

Correct.

The tentative plan to make it reproducible is to stand up a new SVN mirror for svn2git to use and ensure it has a consistent view, and for some period of time maintain a "legacy" and "ng" git master. If we believe the public mirrors are now correct we can start working on updating the git mirror any time now, wait on advocating a migration until we're certain.
Comment 9 Peter Wemm freebsd_committer freebsd_triage 2017-03-01 18:52:37 UTC
The way things are set up can be summarized:

repo --{svnsync}-> svn-master --{svnsync}-> local mirrors:{svnmir.*,git-beta,viewvc,ports*,hg-beta,mkseed,ftp-master,...}

The original problem was the second fan-out svnsync could see a transient inconsistent view of svn-master, but svn-master itself was never subject to the problem, and, as near as I can tell, wasn't subject to the forced commit problem.

As I understand it, git's svn-fast-export needs a local repository to work from so it needs a local replica.  If I understand things correctly, there's one revision that has metadata that's out of sync and that it should be easy to resolve.
Comment 10 Ed Maste freebsd_committer freebsd_triage 2017-03-02 17:37:41 UTC
> The original problem was the second fan-out svnsync could see a transient
> inconsistent view of svn-master, but svn-master itself was never subject to
> the problem, and, as near as I can tell, wasn't subject to the forced commit
> problem.

OK, to make sure I have this correct, the issue is that svn-master may have a small inconsistent window (where the data, but not the metadata, has been sync'd), but this will never persist on svn-master for more than a moment.

A 2nd-level svnsync could mirror during this window and then persist the inconsistent metadata.

Assuming svn2git's repo uses the dosync.sh script (at
https://svnweb.freebsd.org/base/user/uqs/git_conv/dosync.sh?view=annotate), it will not persist inconsistent metadata.

Finally, as long as the dosync.sh and svn2git processes do not run concurrently, we will not get inconsistent metadata into the git mirror. In other words, something logically equivalent to:

while true; do
    dosync.sh
    svn2git
    sleep
done

would not encounter inconsistent metadata in git.
Comment 11 Ed Maste freebsd_committer freebsd_triage 2017-03-02 17:40:26 UTC
(In reply to Ed Maste from comment #10)

Assuming we change dosync.sh to not loop itself, that is.
Comment 12 Peter Wemm freebsd_committer freebsd_triage 2017-03-02 21:39:31 UTC
(In reply to Ed Maste from comment #10)

Correct - svnsync does a multi-stage stage replication, for reasons that I don't fully understand.

The first stage replicates the contents and commits the transaction.  This shows up as a commit by the UID of the replicator (not the original author) and the current time rather than the original time.

The second stage fixes the author and timestamp as a separate transaction.

The third stage would be to call post-commit hooks which would normally push the commit to a downstream mirror.  Except we operate in pull rather than push mode to avoid having to do remote execution across the internet.

I think the theory with push based replication is that it won't ever race, but.. I don't see why a series of rapid fire commits couldn't trigger concurrent asynchronous post-commit callbacks.

Anyway, it looks like uqs captured a snapshot of the mananged script we run everywhere which is auto-updated.  It would be simple enough to add a hook for serializing access via a lockfile.
Comment 13 Baptiste Daroussin freebsd_committer freebsd_triage 2021-08-26 13:59:34 UTC
migration to git being over, I think we can safely close this PR