lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Proposal about Version API "relaxation"
Date Thu, 15 Apr 2010 12:30:53 GMT
Thanks Danil - you reminded me of another reason why reindexing is
impossible - fetching the data, even if it's available is too damn costly.

Robert, I think you're driven by Analyzers changes ... been too much around
them I'm afraid :).

A major version upgrade is a move to Java 1.5 for example. I can do that,
and I don't see why I need to reindex my data because of that. And I simply
don't buy that "do this work on your own" ... people can take a snapshot of
the code, maintain it separately and you'll never hear back from them. Who
benefits - neither !
It's open source - true, but it's way past the "Hey look, I'm a new open
source project w/ a dozen users, I can do whatever I want". Lucene is a
respected open source project, w/ serious adoption and deployments. People
trust on the select few committers here to do it right for them, so they
don't need to invest the time and resources in developing core IR stuff. And
now you're pushing to "do it yourself" approach? I simply don't get or buy

When were you struck w/ maintaining backwards change because the index
structure changed? I bet no so many of us, or shall I say just the few Mikes
out there? So how hard is it to require such back-compat support? I
wholeheartedly agree that we shouldn't keep back-compat on Analyzer changes,
nor on bugs such that one which changed the position of the field from -1 to
0 (a while ago - don't remember the exact details).


On Thu, Apr 15, 2010 at 3:17 PM, Danil Ε’ORIN <> wrote:

> Sometimes it's REALLY impossible to reindex, or has absolutely prohibitive
> cost to do in a running production system (i can't shut it down for
> maintainance, so i need a lot of hardware to reindex ~5 billion documents, i
> have no idea what are the costs to retrieve that data all over again, but i
> estimate it to be quite a lot)
> And providing a way to migrate existing indexes to new lucene is crucial
> from my point of view.
> I don't care what this way is: calling optimize() with newer lucene or
> running some tool that takes 5 days, it's ok with me.
> Just don't put me through full reindexing as I really don't have all that
> data anymore.
> It's not my data, i just receive it from clients, and provide a search
> interface.
> It took years to build those indexes, rebuilding is not an option, and
> staying with old lucene forever just sucks.
> Danil.
> On Thu, Apr 15, 2010 at 14:57, Robert Muir <> wrote:
>> On Thu, Apr 15, 2010 at 7:52 AM, Shai Erera <> wrote:
>>> Well ... I must say that I completely disagree w/ dropping index
>>> structure back-support. Our customers will simply not hear of reindexing 10s
>>> of TBs of content because of version upgrades. Such a decision is key to
>>> Lucene adoption in large-scale projects. It's entirely not about whether
>>> Lucene is a content store or not - content is stored on other systems, I
>>> agree. But that doesn't mean reindexing it is tolerable.
>> I don't understand how its helpful to do a MAJOR version upgrade without
>> reindexing... what in the world do you stand to gain from that?
>> The idea here, is that development can be free of such hassles.
>> Development should be this way.
>> If you, Shai, need some feature X.Y.Z from Version 4 and don't want to
>> reindex, and are willing to do the work to port it back to Version 3 in a
>> completely backwards compatible way, then under this new scheme it can
>> happen.
>> --
>> Robert Muir

View raw message