lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Lucene/Solr git mirror will soon turn off
Date Wed, 16 Dec 2015 22:45:07 GMT
I filed LUCENE-6937 as a parent issue for an SVN->Git migration. I've
linked the issue that Dawid is working on, as well as a new issue for
converting the build to work correctly in a Git checkout rather than SVN.

- Mark

On Tue, Dec 15, 2015 at 1:26 PM Mark Miller <markrmiller@gmail.com> wrote:

> Let's just make some JIRA issues. I'm not worried about volunteers for any
> of it yet, just a direction we agree upon. Once we know where we are going,
> we generally don't have a big volunteer problem. We haven't heard from Uwe
> yet, but really does seem like moving to Git makes the most sense.
>
> I'm certainly willing to spend some free time on this.
>
> - Mark
>
> On Tue, Dec 15, 2015 at 1:22 PM Dawid Weiss <dawid.weiss@gmail.com> wrote:
>
>>
>> Oh, just for completeness -- moving to git is not just about the version
>> management, it's also:
>>
>> 1) all the scripts that currently do validations, etc.
>> 2) what to do with svn:* properties
>> 3) what to do with empty folders (not available in git).
>>
>> I don't volunteer to solve these :)
>>
>> Dawid
>>
>>
>> On Tue, Dec 15, 2015 at 7:09 PM, Dawid Weiss <dawid.weiss@gmail.com>
>> wrote:
>>
>>>
>>> Ok, give me some time and I'll see what I can achieve. Now that I
>>> actually wrote an SVN dump parser (validator and serializer) things are
>>> under much better control...
>>>
>>> I'll try to achieve the following:
>>>
>>> 1) selectively drop unnecessary stuff from history (cms/, javadocs/,
>>> JARs and perhaps other binaries),
>>> 2) *preserve* history of all core sources. So svn log IndexWriter has to
>>> go back all the way back to when Doug was young and pretty. Ooops, he's
>>> still pretty of course.
>>> 3) provide a way to link git history with svn revisions. I would,
>>> ideally, include a "imported from svn:rev XXX" in the commit log message.
>>> 4) annotate release tags and branches. I don't care much about interim
>>> branches -- they are not important to me (please speak up if you think
>>> otherwise).
>>>
>>> Dawid
>>>
>>> On Tue, Dec 15, 2015 at 7:03 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>
>>>> If Dawid is volunteering to sort out this mess, +1 to let him make it
>>>> a move to git. I don't care if we disagree about JARs, I trust he will
>>>> do a good job and that is more important.
>>>>
>>>> On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <dawid.weiss@gmail.com>
>>>> wrote:
>>>> >
>>>> > It's not true that nobody is working on this. I have been working on
>>>> the SVN
>>>> > dump in the meantime. You would not believe how incredibly complex the
>>>> > process of processing that (remote) dump is. Let me highlight a few
>>>> key
>>>> > issues:
>>>> >
>>>> > 1) There is no "one" Lucene SVN repository that can be transferred to
>>>> git.
>>>> > The history is a mess. Trunk, branches, tags -- all change paths at
>>>> various
>>>> > points in history. Entire projects are copied from *outside* the
>>>> official
>>>> > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator,
>>>> for
>>>> > example).
>>>> >
>>>> > 2) The history of commits to Lucene's subpath of the SVN is ~50k
>>>> commits.
>>>> > ASF's commit history in which those 50k commits live is 1.8 *million*
>>>> > commits. I think the git-svn sync crashes due to the sheer number of
>>>> (empty)
>>>> > commits in between actual changes.
>>>> >
>>>> > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G
>>>> > patch, for example, but there are others (the second larger is
>>>> 190megs, the
>>>> > third is 136 megs).
>>>> >
>>>> > 4) The size of JARs is really not an issue. The entire SVN repo I
>>>> mirrored
>>>> > locally (including empty interim commits to cater for svn:mergeinfos)
>>>> is 4G.
>>>> > If you strip the stuff like javadocs and side projects (Nutch, Tika,
>>>> Mahout)
>>>> > then I bet the entire history can fit in 1G total. Of course
>>>> stripping JARs
>>>> > is also doable.
>>>> >
>>>> > 5) There is lots of junk at the main SVN path so you can't just
>>>> version the
>>>> > top-level folder. If you wanted to checkout /asf/lucene then the size
>>>> of the
>>>> > resulting folder is enormous -- I terminated the checkout after I
>>>> reached
>>>> > over 20 gigs. Well, technically you *could* do it, it'd preserve
>>>> perfect
>>>> > history, but I wouldn't want to git co a past version that checks out
>>>> all
>>>> > the tags, branches, etc. This has to be mapped in a sensible way.
>>>> >
>>>> > What I think is that all the above makes (straightforward) conversion
>>>> to git
>>>> > problematic. Especially moving paths are a problem -- how to mark
>>>> tags/
>>>> > branches, where the main line of development is, etc. This conversion
>>>> would
>>>> > have to be guided and hand-tuned to make sense. This effort would
>>>> only pay
>>>> > for itself if we move to git, otherwise I don't see the benefit.
>>>> Paul's
>>>> > script is fine for keeping short-term history.
>>>> >
>>>> > Dawid
>>>> >
>>>> > P.S. Either the SVN repo at Apache is broken or the SVN is broken,
>>>> which
>>>> > makes processing SVN history even more fun. This dump indicates Tika
>>>> being
>>>> > moved from the incubator to Lucene:
>>>> >
>>>> > svnrdump dump -r 712381 --incremental
>>>> https://svn.apache.org/repos/asf/ >
>>>> > out
>>>> >
>>>> > But when you dump just Lucene's subpath, the output is broken (last
>>>> > changeset in the file is an invalid changeset, it carries no target):
>>>> >
>>>> > svnrdump dump -r 712381 --incremental
>>>> > https://svn.apache.org/repos/asf/lucene > out
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <yseeley@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> If we move to git, stripping out jars seems to be an independent
>>>> decision?
>>>> >> Can you even strip out jars and preserve history (i.e. not change
>>>> >> hashes and invalidate everyone's forks/clones)?
>>>> >> I did run across this:
>>>> >>
>>>> >>
>>>> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history
>>>> >>
>>>> >> -Yonik
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> >>
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>> --
> - Mark
> about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Mime
View raw message