Re: [eigen] Bitbucket is dropping its Mercurial support!

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]



Hi everybody,

I just came back to work, and this is a sad news. That's really a pity that Bitbucket will simply delete the repository and all the associated information (PR, forks, discussions) without even offering any conversion nor archiving options.

Thank you to everyone who already provided valuable feedbacks to find solutions to the numerous problems we'll have to face.

On my side, I guess this is a good opportunity to please users and occasional contributors by switching to git: though I'm personally much more comfortable with hg, I don't see any mercurial provider as good as bitbucket/gitlab/github.

For the record, I made (and update) the current git mirror [1] using git-remote-hg. The current mirror does not include the initial hg hashes as notes, but this can easily be enabled with git-remote-hg. This tool also generates map files in hg/origin/marks-git and hg/origin/marks-hg, so updating links (in commit messages, source files, bugzilla comments, change-log [2]) should be doable.

We can still keep a clone online (e.g. on tuxfamily) together with a map file to ease following deprecated hg hashes that would not have been automatically updated.

I would also suggest archiving all Eigen's forks that contain changes. If the bitbucket API allows us to list all forks this will be easy.

Regarding bitbucket vs github vs gitlab, as many others I would vote for gitlab. Without mercurial support nor PR migration tools, I don't see any reason to stay with bitbucket, and arguments in favor of gitlab have already been cast. Moreover, I already have a mirror on Inria's gitlab instance (https://gitlab.inria.fr/guenneba/eigen-mirror) that I use for CI through gitlab-runner. At that time, I also created a team (https://gitlab.inria.fr/eigen), just in case... Would using Inria's instance be fine to you? (I'm strongly against maintaining our own instance, that's just a waste of time and effort)

If we go with gitlab, perhaps we could even think about migrating our bug tracker from bugzilla to gitlab. There already exists a few migration tools for that:
 - https://github.com/xmunoz/bugzilla2gitlab
 - https://gitlab.freedesktop.org/freedesktop/bztogl
and it's most likely better to make such switches at once because this is a unique opportunity to rewrite the history.

So now it's time to close as many PR as possible ;) (before we archive them of course)

cheers,
Gael

[1] https://github.com/eigenteam/eigen-git-mirror
[2] http://eigen.tuxfamily.org/index.php?title=ChangeLog



On Mon, Aug 26, 2019 at 9:06 PM Brian Silverman <bsilver16384@xxxxxxxxx> wrote:
On Mon, Aug 26, 2019 at 6:05 AM David Tellenbach <david.tellenbach@xxxxxxxxxxxxx> wrote:
I somewhat doubt that any existing hg->git converters automatically translates these hashes, but I'd be very happy if someone finds out otherwise. Changing these manually is definitely not an option.
I might have good news on this one: We are apparently not the only project that works on migrating from Mercurial to Git. The OpenJDK project (a free implementation of the Java platform) has created Skara, a set of tools to handle all kind of stuff related to contributing to OpenJDK (https://github.com/openjdk/skara). Some of the tools could be really helpful for our issues (see https://openjdk.java.net/jeps/357). 

The relevant tool seem to be git-openjdk-import which is used to import from Mercurial to Git.. I just had a short glance on the code but it seems to be very generic and does not seem to contain OpenJDP related stuff at all. The interesting part is the follow paragraph from https://openjdk.java.net/jeps/357

We've also prototyped new tool, git-translate. This tool uses a file called.hgcommits that is generated by the conversion tools and committed to the Git repositories. This file contains a sequence of lines, each of which contains two hexadecimal hashes: the first is the hash of a Mercurial changeset and the second is the hash of the Git commit resulting from converting that Mercurial changeset. The tool git-translate simply queries the file .hgcommits
I've been pondering how to implement something similar as a custom tool. If you can get a mapping of the hashes, then writing something custom around git filter-branch should be straightforwards. Updating the mapping as commits are rewritten takes a bit of thought, but I don't think it's hard. Using somebody else's tool might be easier though.

However, even if we have a translate tool this is still complicated: Changing hashes or links in a commit again alters the git hash and the translation is wrong for this particular commit. This could be a problem if a commit is referenced by more than one other commit or if commit a references commit b references commit c. 

Traversing the commit graph in a topological order and rewriting hashes based on the mapping (updated by past rewrites) seems like it should be fine to me.

I don't see how a commit can refer to a hash of a commit that descends from it, for basically the same reason (putting the hash into commit A changes the hash of its child commit B, so A can't refer to B by hash). I know that's true for git, but I'm not familiar with hg so I might be missing something about how hashes work there though.

On 24/08/2019 12.30, David Tellenbach wrote:

Also, if we stayed with mercurial, but used a different provider, we can't modify the history, because that would influence all the hashes (but then only the 9 direct links to "bitbucket.org/..." you found would be broken, which is acceptable, IMO)

Of course we can just ignore these links (though I think broken links/hashes are even worse than non-existing ones ...)

Another point are links inside the codebase that point to bitbucket.
Following the same logic as above I use
hg grep "bitbucket.org"
and get 11 links (all seem to be the same). Again something fixable manually.

Agreed, this part is easy to fix manually.
git filter-branch could also fix them all throughout the entire history (just run sed on all the files to rewrite the links). Not sure if rewriting the history is desirable, but it would definitely be easy after they're all in git.

I've used git-remote-hg to import Eigen with git-subtree before, and it worked fine. Looking now, there are more alternatives than I found 4 years ago, including forks of that project, so there are more choices to make. It does have support for putting the Mercurial revisions in Git commit notes, which addresses some of the concerns around recording the mapping.

My two cents about the larger question in this thread: I find git much more familiar to work with as an occasional contributor and debugger. Getting from a diff to a pull request with a VCS I don't use regularly is nontrivial, and Eigen is the only place I've interacted with hg. Being unfamiliar with the VCS is an ever bigger barrier to understanding the history of a project than changing it. I find myself doing that a lot more often than actually contributing. Trying to understand what's been cherry-picked ("grafted from" for hg I think?) into various branches to verify whether fixes for bugs introduced in other commits has been particularly problematic for me with Eigen.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/