Re: [eigen] Bitbucket migration

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 11/09/2019 18.03, Gael Guennebaud wrote:
To prepare the migration from bitbucket, I started to play a bit with its
API to see what could be done. So far I've quickly draft two (ugly) python
scripts to archive the forks and pull-requests. Since this is a one shot
for us, I did not cared about robustness, safety, generality, beauty, etc.

You can see them there : https://gitlab.com/ggael/bitbucket-migration-tools and
contribute!

Wow, that's great!!


** Forks **

You can see the summary of the fork script there:
http://manao.inria.fr/eigen_tmp/archive_forks_log.html

The hg clones (history+checkout) represents 20GB, maybe 12GB if we remove
the checkouts. Among the 460 forks, 214 seems to have no change at all
(according to "hg out") and could be dropped. I don't know yet where to
host them though.

I think it is sufficient if we host a (frozen) version of the main repository (probably simply at tuxfamily). I think many forks don't have changes, because they actually got merged. And I don't think we need to host any of the forks, as long as we have the changesets in the archived PRs.


This script can be ran incrementally.


** Pull-Requests **

You can find the output of the pull-requests script there:
http://manao.inria.fr/eigen_tmp/pullrequests/

There is a short summary, and then for each PR a static .html file plus
diff/patch files, and other details. For instance, see:
http://manao.inria.fr/eigen_tmp/pullrequests/OPEN/686/pr686.html

Currently this script cannot be ran incrementally. You have to run it just
before closing the respective repository!

Also, this script does not grab inline comments. Only the main discussions
is archived. Those can be obtained by iterating over the "activity" pages,
but I don't think that's worth the effort because they would be difficult
to exploit anyway.

Does gitlab allow to map "[Bb]ug (\d+)" to (an archive of) our bugzilla page and "\#(\d+)" to either the archived old PRs or new PRs/issues? (if need be, we could manually create issues #1 to #7xx with just a link to the archived PRs.

Did we actually plan to migrate bugzilla to gitlab-issues as well?
Would we do this by just creating new issues with a link to the bz-archive? (This would be only slightly inconvenient if discussions are split, but that's ok, IMO)

** hg to git **

As discussed in the other thread, if we switch from hg to git, then all
hashes will have to be updated. Generating a map file is easy, and thus
updating the links/hashes in bug comments and PR comments should not be too
difficult (we only have to figure out the right regex to catch all
variants).

I guess searching for hex-numbers of a minimal length should be sufficient (with a plausibility check that they are a unique hg-hash). Of course any "https://bitbucket.org/eigen/eigen/commits/"; or equivalent needs to be translated as well (not completely trivial, but not too difficult, either).



However, updating the hashes within the commit messages will require to
rewrite the whole history in a careful order. Does anyone here feels brave
enough to write such a script? If not, I guess we could live with an online
php script doing the hash conversion on demand. I don't think we'll have to
follow such hashes so frequently.

I agree that it won't be required that often (most "grafted from" references are very close to each other anyway). If we have some tool which can look-up hashes I think we are fine. (I won't prevent anyone from trying to translate the hashes inside the history, of course *g*)


Christoph




cheers,
gael


--
 Dr.-Ing. Christoph Hertzberg

 Besuchsadresse der Nebengeschäftsstelle:
 DFKI GmbH
 Robotics Innovation Center
 Robert-Hooke-Straße 5
 28359 Bremen, Germany

 Postadresse der Hauptgeschäftsstelle Standort Bremen:
 DFKI GmbH
 Robotics Innovation Center
 Robert-Hooke-Straße 1
 28359 Bremen, Germany

 Tel.:     +49 421 178 45-4021
 Zentrale: +49 421 178 45-0
 E-Mail:   christoph.hertzberg@xxxxxxx

 Weitere Informationen: http://www.dfki.de/robotik
  -------------------------------------------------------------
  Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
  Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany

  Geschäftsführung:
  Prof. Dr. Jana Koehler (Vorsitzende)
  Dr. Walter Olthoff

  Vorsitzender des Aufsichtsrats:
  Prof. Dr. h.c. Hans A. Aukes
  Amtsgericht Kaiserslautern, HRB 2313
  -------------------------------------------------------------




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/