lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanne Grinovero <>
Subject Re: Proposal about Version API "relaxation"
Date Thu, 15 Apr 2010 18:43:15 GMT
I think some compatibility breaks should really be accepted, otherwise
these requirements are going to kill the technological advancement:
the effort in backwards compatibility will grow and be more
timeconsuming and harder every day.

A mayor release won't happen every day, likely not even every year, so
it seems acceptable to have milestones defining compatibility
boundaries: you need to be able to "reset" the complexity curve

Backporting a feature would benefit from being merged in the correct
testsuite, and avoid the explosion of this matrix-like backwards
compatibility test suite. BTW the current testsuite is likely covering
all kinds of combinations which nobody is actually using or caring

Also if I where to discover a nice improvement in an Analyzer, and you
where telling me that to contribute it I would have to face this
amount of complexity.. I would think twice before trying; honestly the
current requirements are scary.



2010/4/15 Earwin Burrfoot <>:
> I'd like to remind that Mike's proposal has stable branches.
> We can branch off preflex trunk right now and wrap it up as 3.1.
> Current trunk is declared as future 4.0 and all backcompat cruft is
> removed from it.
> If some new features/bugfixes appear in trunk, and they don't break
> stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3,
> etc
> Thus, devs are free to work without back-compat burden, bleeding edge
> users get their blood, conservative users get their stability + a
> subset of new features from stable branches.
> On Thu, Apr 15, 2010 at 22:02, DM Smith <> wrote:
>> On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:
>>>> First, the index format. IMHO, it is a good thing for a major release to
>>>> be
>>>> able to read the prior major release's index. And the ability to convert
>>>> it
>>>> to the current format via optimize is also good. Whatever is decided on
>>>> this
>>>> thread should take this seriously.
>>> Optimize is a bad way to convert to current.
>>> 1. conversion is not guaranteed, optimizing already optimized index is a
>>> noop
>>> 2. it merges all your segments. if you use BalancedSegmentMergePolicy,
>>> that destroys your segment size distribution
>>> Dedicated upgrade tool (available both from command-line and
>>> programmatically) is a good way to convert to current.
>>> 1. conversion happens exactly when you need it, conversion happens for
>>> sure, no additional checks needed
>>> 2. it should leave all your segments as is, only changing their format
>>>> It is my observation, though possibly not correct, that core only has
>>>> rudimentary analysis capabilities, handling English very well. To handle
>>>> other languages well "contrib/analyzers" is required. Until recently it
>>>> did
>>>> not get much love. There have been many bw compat breaking changes
>>>> (though
>>>> w/ version one can probably get the prior behavior). IMHO, most of
>>>> contrib/analyzers should be core. My guess is that most non-trivial
>>>> applications will use contrib/analyzers.
>>> I counter - most non-trivial applications will use their own analyzers.
>>> The more modules - the merrier. You can choose precisely what you need.
>> By and large an analyzer is a simple wrapper for a tokenizer and some
>> filters. Are you suggesting that most non-trivial apps write their own
>> tokenizers and filters?
>> I'd find that hard to believe. For example, I don't know enough Chinese,
>> Farsi, Arabic, Polish, ... to come up with anything better than what Lucene
>> has to tokenize, stem or filter these.
>>>> Our user base are those with ancient,
>>>> underpowered laptops in 3-rd world countries. On those machines it might
>>>> take 10 minutes to create an index and during that time the machine is
>>>> fairly unresponsive. There is no opportunity to "do it in the
>>>> background."
>>> Major Lucene releases (feature-wise, not version-wise) happen like
>>> once in a year, or year-and-a-half.
>>> Is it that hard for your users to wait ten minutes once a year?
>>  I said that was for one index. Multiply that times the number of books
>> available (300+) and yes, it is too much to ask. Even if a small subset is
>> indexed, say 30, that's around 5 hours of waiting.
>> Under consideration is the frequency of breakage. Some are suggesting a
>> greater frequency than yearly.
>> DM
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --
> Kirill Zakharenko/Кирилл Захаренко (
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message