lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Lucene/Solr git mirror will soon turn off
Date Fri, 18 Dec 2015 17:05:12 GMT
I've made some comments about the conversion process here:
https://issues.apache.org/jira/browse/LUCENE-6933?focusedCommentId=15064208#comment-15064208

Feel free to try it out.
https://github.com/dweiss/lucene-solr-svn2git

I don't know what the next steps are. This looks like a good starting point
to switch over to git with all the development? The only thing I still plan
on doing is getting rid of a few large binary blobs in historical
resources, but even without it this seems acceptable size-wise (~200mb).

Dawid



On Thu, Dec 17, 2015 at 9:13 AM, Dawid Weiss <dawid.weiss@gmail.com> wrote:

>
> > The question I had (I am sure a very dumb one): WHY do we care about history
> preserved perfectly in Git?
>
> For me it's for sentimental, archival and task-challenge reasons. Robert's
> requirement is that git praise/blame/log works and on a given file and
> shows its true history of changes. Everyone has his own reasons I guess. If
> the initial clone is small enough then I see no problem in keeping the
> history if we can preserve it.
>
> Dawid
>
>
>
> On Thu, Dec 17, 2015 at 4:52 AM, david.w.smiley@gmail.com <
> david.w.smiley@gmail.com> wrote:
>
>> +1 totally agree.  Any way; the bloat should largely be the binaries &
>> unrelated projects, not code (small text files).
>>
>> On Wed, Dec 16, 2015 at 10:36 PM Doug Turnbull <
>> dturnbull@opensourceconnections.com> wrote:
>>
>>> In defense of more history immediately available--it is often far more
>>> useful to poke around code history/run blame to figure out some code than
>>> by taking it at face value. Putting this in a secondary place like
>>> Apache SVN repo IMO reduces the readability of the code itself. This is
>>> doubly true for new developers that won't know about Apache's SVN. And
>>> Lucene can be quite intricate code. Further in my own work poking around in
>>> github mirrors I frequently hit the current cutoff. Which is one reason I
>>> stopped using them for anything but the casual investigation.
>>>
>>> I'm not totally against a cutoff point, but I'd advocate for exhausting
>>> other options first, such as trimming out unrelated projects, binaries, etc.
>>>
>>> -Doug
>>>
>>>
>>> On Wednesday, December 16, 2015, Shawn Heisey <apache@elyograg.org>
>>> wrote:
>>>
>>>> On 12/16/2015 5:53 PM, Alexandre Rafalovitch wrote:
>>>> > On 16 December 2015 at 00:44, Dawid Weiss <dawid.weiss@gmail.com>
>>>> wrote:
>>>> >> 4) The size of JARs is really not an issue. The entire SVN repo
I
>>>> mirrored
>>>> >> locally (including empty interim commits to cater for
>>>> svn:mergeinfos) is 4G.
>>>> >> If you strip the stuff like javadocs and side projects (Nutch, Tika,
>>>> Mahout)
>>>> >> then I bet the entire history can fit in 1G total. Of course
>>>> stripping JARs
>>>> >> is also doable.
>>>> > I think this answered one of the issues. So, this is not something to
>>>> focus on.
>>>> >
>>>> > The question I had (I am sure a very dumb one): WHY do we care about
>>>> > history preserved perfectly in Git? Because that seems to be the real
>>>> > bottleneck now. Does anybody still checks out an intermediate commit
>>>> > in Solr 1.4 branch?
>>>>
>>>> I do not think we need every bit of history -- at least in the primary
>>>> read/write repository.  I wonder how much of a size difference there
>>>> would be between tossing all history before 5.0 and tossing all history
>>>> before the ivy transition was completed.
>>>>
>>>> In the interests of reducing the size and download time of a clone
>>>> operation, I definitely think we should trim history in the main repo to
>>>> some arbitrary point, as long as the full history is available
>>>> elsewhere.  It's my understanding that it will remain in svn.apache.org
>>>> (possibly forever), and I think we could also create "historical"
>>>> read-only git repos.
>>>>
>>>> Almost every time I am working on the code, I only care about the stable
>>>> branch and trunk.  Sometimes I will check out an older 4.x tag so I can
>>>> see the exact code referenced by a stacktrace in a user's error message,
>>>> but when this is required, I am willing to go to an entirely different
>>>> repository and chew up bandwidth/disk resourcesto obtain it, and I do
>>>> not care whether it is git or svn.  As time marches on, fewer people
>>>> will have reasons to look at the historical record.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>> --
>>> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>> Author: Relevant Search <http://manning.com/turnbull>
>>> This e-mail and all contents, including attachments, is considered to be
>>> Company Confidential unless explicitly stated otherwise, regardless
>>> of whether attachments are marked as such.
>>>
>>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>

Mime
View raw message