I somewhat doubt that any existing hg->git converters automatically translates these hashes, but I'd be very happy if someone finds out otherwise. Changing these manually is definitely not an option.
I might have good news on this one: We are apparently not the only project that works on migrating from Mercurial to Git. The OpenJDK project (a free implementation of the Java platform) has created Skara, a set of tools to handle all kind of stuff related to contributing to OpenJDK (https://github.com/openjdk/skara
). Some of the tools could be really helpful for our issues (see https://openjdk.java.net/jeps/357
The relevant tool seem to be git-openjdk-import
which is used to import from Mercurial to Git. I just had a short glance on the code but it seems to be very generic and does not seem to contain OpenJDP related stuff at all. The interesting part is the follow paragraph from https://openjdk.java.net/jeps/357
We've also prototyped new tool,
git-translate. This tool uses a file called
.hgcommits that is generated by the conversion tools and committed to the Git repositories. This file contains a sequence of lines, each of which contains two hexadecimal hashes: the first is the hash of a Mercurial changeset and the second is the hash of the Git commit resulting from converting that Mercurial changeset. The tool
git-translate simply queries the file
I've been pondering how to implement something similar as a custom tool. If you can get a mapping of the hashes, then writing something custom around git filter-branch
should be straightforwards. Updating the mapping as commits are rewritten takes a bit of thought, but I don't think it's hard. Using somebody else's tool might be easier though.
However, even if we have a translate tool this is still complicated: Changing hashes or links in a commit again alters the git hash and the translation is wrong for this particular commit. This could be a problem if a commit is referenced by more than one other commit or if commit a references commit b references commit c.
Traversing the commit graph in a topological order and rewriting hashes based on the mapping (updated by past rewrites) seems like it should be fine to me.
I don't see how a commit can refer to a hash of a commit that descends from it, for basically the same reason (putting the hash into commit A changes the hash of its child commit B, so A can't refer to B by hash). I know that's true for git, but I'm not familiar with hg so I might be missing something about how hashes work there though.
On 24/08/2019 12.30, David Tellenbach wrote:
Also, if we stayed with mercurial, but used a different provider, we can't modify the history, because that would influence all the hashes (but then only the 9 direct links to "bitbucket.org/
..." you found would be broken, which is acceptable, IMO)
Of course we can just ignore these links (though I think broken links/hashes are even worse than non-existing ones ...)
Another point are links inside the codebase that point to bitbucket.
Following the same logic as above I use
hg grep "bitbucket.org"
and get 11 links (all seem to be the same). Again something fixable manually.
Agreed, this part is easy to fix manually.
could also fix them all throughout the entire history (just run sed on all the files to rewrite the links). Not sure if rewriting the history is desirable, but it would definitely be easy after they're all in git.
I've used git-remote-hg to import Eigen with git-subtree before, and it worked fine. Looking now, there are more alternatives than I found 4 years ago, including forks of that project, so there are more choices to make. It does have support for putting the Mercurial revisions in Git commit notes, which addresses some of the concerns around recording the mapping.
My two cents about the larger question in this thread: I find git much more familiar to work with as an occasional contributor and debugger. Getting from a diff to a pull request with a VCS I don't use regularly is nontrivial, and Eigen is the only place I've interacted with hg. Being unfamiliar with the VCS is an ever bigger barrier to understanding the history of a project than changing it. I find myself doing that a lot more often than actually contributing. Trying to understand what's been cherry-picked ("grafted from" for hg I think?) into various branches to verify whether fixes for bugs introduced in other commits has been particularly problematic for me with Eigen.