lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Proposal about Version API "relaxation"
Date Thu, 15 Apr 2010 18:39:59 GMT
I seriously don't understand the fuss around index format back compat.
How many times is this changed such that it is too much to ask to keep
X support X-1?

I prefer to have ongoing segment merging but can live w/ a manual
converter tool. Thing is - I'll probably not be able to develop one
myself outside the scope of Lucene because I'll miss tons of API. So
having Lucene declare and adhere to it seems reasonable to me.

BTW Earwin, we can come up w/ a migrate() method on IW to accomplish
manual migration on the segments that are still on old versions.
That's not the point about whether optimize() is good or not. It is
the difference between telling the customer to run a 5-day migration
process, or a couple of hours. At the end of the day, the same
migration code will need to be written whether for the manual or
automatic case. And probably by the same developer which changed the
index format. It's the difference of when does it happen.

And I also think that a manual migration tool will need access to some
lower level API which is not exposed today, and will generally not
perform as well as online migration. But that's a side note...


On Thursday, April 15, 2010, Earwin Burrfoot <> wrote:
> I'd like to remind that Mike's proposal has stable branches.
> We can branch off preflex trunk right now and wrap it up as 3.1.
> Current trunk is declared as future 4.0 and all backcompat cruft is
> removed from it.
> If some new features/bugfixes appear in trunk, and they don't break
> stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3,
> etc
> Thus, devs are free to work without back-compat burden, bleeding edge
> users get their blood, conservative users get their stability + a
> subset of new features from stable branches.
> On Thu, Apr 15, 2010 at 22:02, DM Smith <> wrote:
>> On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:
>>>> First, the index format. IMHO, it is a good thing for a major release to
>>>> be
>>>> able to read the prior major release's index. And the ability to convert
>>>> it
>>>> to the current format via optimize is also good. Whatever is decided on
>>>> this
>>>> thread should take this seriously.
>>> Optimize is a bad way to convert to current.
>>> 1. conversion is not guaranteed, optimizing already optimized index is a
>>> noop
>>> 2. it merges all your segments. if you use BalancedSegmentMergePolicy,
>>> that destroys your segment size distribution
>>> Dedicated upgrade tool (available both from command-line and
>>> programmatically) is a good way to convert to current.
>>> 1. conversion happens exactly when you need it, conversion happens for
>>> sure, no additional checks needed
>>> 2. it should leave all your segments as is, only changing their format
>>>> It is my observation, though possibly not correct, that core only has
>>>> rudimentary analysis capabilities, handling English very well. To handle
>>>> other languages well "contrib/analyzers" is required. Until recently it
>>>> did
>>>> not get much love. There have been many bw compat breaking changes
>>>> (though
>>>> w/ version one can probably get the prior behavior). IMHO, most of
>>>> contrib/analyzers should be core. My guess is that most non-trivial
>>>> applications will use contrib/analyzers.
>>> I counter - most non-trivial applications will use their own analyzers.
>>> The more modules - the merrier. You can choose precisely what you need.
>> By and large an analyzer is a simple wrapper for a tokenizer and some
>> filters. Are you suggesting that most non-trivial apps write their own
>> tokenizers and filters?
>> I'd find that hard to believe. For example, I don't know enough Chinese,
>> Farsi, Arabic, Polish, ... to come up with anything better than what Lucene
>> has to tokenize, stem or filter these.
>>>> Our user base are those with ancient,
>>>> underpowered laptops in 3-rd world countries. On those machines it might
>>>> take 10 minutes to create an index and during that time the machine is
>>>> fairly unresponsive. There is no opportunity to "do it in the
>>>> background."
>>> Major Lucene releases (feature-wise, not version-wise) happen like
>>> once in a year, or year-and-a-half.
>>> Is it that hard for your users to wait ten minutes once a year?
>>  I said that was for one index. Multiply that times the number of books
>> available (300+) and yes, it is too much to ask. Even if a small subset is
>> indexed, say 30, that's around 5 hours of waiting.
>> Under consideration is the frequency of breakage. Some are suggesting a
>> greater frequency than yearly.
>> DM
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --
> Kirill Zakharenko/Кирилл Захаренко (
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message