lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Lucene's default settings & back compatibility
Date Wed, 10 Jun 2009 20:13:35 GMT
Well .. to be honest I haven't monitored java-user for quite some time, so I
don't know if it hasn't been raised there.

But now there's the other thread that Yonik started, so I'm not really sure
where to answer.

I think that if we look back at 2.0 and compare to 2.9, anyone upgrading
from that version to 2.9 is going to need to learn a lot about Lucene. It's
not just deprecation, but best practices, different approaches for different
situations etc. For example, ConstantScoreQuery is not a *default* thing - I
need to know it exists and what benefits does it give me, in order to use
it. So no back-compat / deprecation stuff would teach me how to use it. Nor
will I miraculaously understand that I'd better not score when sorting. Yes,
the API has changed, but not in a way I now can understand it. Maybe we've
documented it well, dunno ...

If people upgrade from 2.0 to 2.9, then their lives would be a lot easier if
2.9 provided the greatest and latest right out-of-the-box. So yes, they'd
need to fix all the deprecations, but that's easy because we document the
alternative. Add that to the "best defaults" and we've got a good code
migration story.

Again, as long as we release every ~6 months (and I don't think we should
release sooner), I don't think it's such a problem to request someone to
make minor modifications/maintenance to his code every 1year (!). Especially
since we believe a major release will come every ~2 years, by which I need
to re-build my indices, which is by far a more costly operation (sometimes
out of your hands) than updating code.

So relaxing the back-compat a bit overall does not seem like a great "crime
against the Lucene users" to me - all is done (>98% of the time?) for the
better.

But maybe these days will pass soon. If we continue to get rid of interfaces
and adopt abstract classes, perhaps we won't work too hard to improve
things. In 1614 it was quite easy to improve DISI since it is an abstract
class.

Shai

On Wed, Jun 10, 2009 at 7:32 PM, Mark Miller <markrmiller@gmail.com> wrote:

> No one really responded to this Shai? And I take it that the user list
> never saw it?
>
> Perhaps we should just ask for opinion from the user list based on what you
> already have - just to gauge the reaction on different points. Unless
> someone responds shortly, we could take a year waiting to shake it out.
> The threat of sending should prompt anyone with any issues to speak up.
>
> I think we should add though:
> explicitly what has changed (eg if we switch something, what was the policy
> before - most users won't even know)
> an overview of why we are interested in relaxing back compat
>
> - Mark
>
> Shai Erera wrote:
>
>> Ok, so digging back in this thread, I think the following proposals were
>> made (if I missed some, please add them):
>>
>> 1. API deprecation last *at least* one full minor release. Example: if we
>> deprecate an API in 2.4, we can remove it in 2.5. BUT, we are also free to
>> keep it there and remove it in 2.6, 2.9, 3.0, 3.5. I would like to reserve
>> that option for controversial deprecations, like TokenStream, and maybe even
>> the HitCollector recent changes. Those that we feel will have a large impact
>> on the users, we might want to keep around for a bit longer until we get
>> enough feedback from the field and are more confident with that change.
>>
>> 2. Bugs are fixed backwards on the last "dot" release only. Example, A bug
>> that's discovered after 2.4 is released, is fixed on 2.4.X branch. Once 2.5
>> is released, any bug fixes happen on trunk and 2.5.X. A slight relaxation
>> would be adding something like "we may still fix bugs on the 2.4.X branch if
>> we feel it's important enough". For example if 2.5 contains a lot of API
>> changes and we think a considerable portion of our users are still on 2.4.
>>
>> 3. Jar drop-in ability is only guaranteed on point releases (this is
>> slightly of an outcome of (1) and (2), but (6) will also affect it).
>>
>> 4. Changes to the index format last at least one full major release.
>> Example: a change to the index format in 2.X, is supported in all 3.Y
>> releases, and removed in 4.0. Again, I write "at least" since we should have
>> the freedom to extend support for a particular change.
>>
>> 5. Changes to the default settings are allowed between minor releases,
>> provided that we give the users a way to revert back to the old behavior.
>> Examples are LUCENE-1542 and the latest issues Mike opened. Those changes
>> will be applied out-of-the-box. The provided API to revert to the old
>> behavior may be a supported API, or a deprecated API. For deprecation we can
>> decide to keep the API longer than one minor release.
>>
>> 5.1) An exception to (5) are bug fixes which break back-compat - those are
>> always visible, w/ a way to revert to the buggy behavior. That way may be
>> deprecated or not, and its support lifetime can be made on a case-by-case
>> basis.
>>
>> 6. Minor changes to APIs can happen w/o any deprecation. Example,
>> LUCENE-1614, adding 1/2 methods to an interface with a good documentation
>> and trivial proposal for implementation etc.
>>
>> You will notice that almost every proposal has a "we may decide to keep it
>> for longer" - I wrote it following one of the early responses on this thread
>> (I think it was Grant's) - we should not attempt to set things in stone. Our
>> back-compat policy should ensure some level of SLA to our users, but
>> otherwise we should not act as robots, and if we think a certain case
>> requires a different handling than the policy states (only for the user's
>> benefit though), it should be done that way. The burden is still put on the
>> committers, only now the policy is relaxed a bit, and handles different
>> cases in different ways, and the committers/contributors don't need to feel
>> that their hands are tied.
>>
>> These set the ground/basis, but otherwise we should decide on a
>> case-by-case basis on any extension/relaxation of the policy, for our users'
>> benefits. After quite some time I've been following the discussions on this
>> mailing list, I don't remember ever seeing an issue being driven against our
>> users' benefit. All issues attempt to improve Lucene's performance and our
>> users' experience (end users as well as search application developers). I
>> think it's only fair to ask this "users" community be more forgiving and
>> open to make changes on their side too, making the life of the
>> committers/contributors a bit easier.
>>
>> I also agree that the next step would be taking this to java-user and get
>> a sense of whether our "users" community agree with those changes or not. I
>> hope that the above summary captures what's needed to be sent to this list.
>>
>> Shai
>>
>> On Sat, May 30, 2009 at 2:21 PM, Michael McCandless <
>> lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>> wrote:
>>
>>    Actually, I think this is a common, and in fact natural/expected
>>    occurrence in open-source.  When a tricky topic is discussed, and the
>>    opinions are often divergent, frequently the conversation never
>>    "converges" to a consensus and the discussion dies.  Only if
>>    discussion reaches a semblance of consensus do we vote on it.
>>
>>    It's exactly like what happens when a controversial bill tries to go
>>    through the US congress.  It's heavily discussed and then dies off
>>    from lack of consensus, or, it gets far enough to be voted on.
>>
>>    Ie, this is completely normal for open source.
>>
>>    We may not like it, we may consider it inefficient, annoying,
>>    frustrating, whatever, but this is in fact a reality of all healthy
>>    open-source projects.
>>
>>    Consensus building is not easy, and if the number of people trying to
>>    build consensus, by iterating on the proposal, compromising,
>>    suggesting alternatives when others dislike an approach, etc., is
>>    dwarfed by the number of people objecting to the proposal, then
>>    consensus never emerges.
>>
>>    In this case specifically, I had a rather singular goal: the freedom
>>    to make changes to defaults inside Lucene to always favor new users,
>>    while not hurting back-compat users.  I intentionally proposed no
>>    changes to our back-compat policy (knowing reaching consensus would be
>>    that much more difficult).
>>
>>    The proposal went through several iterations (*settings,
>>    *actsAsVersion, etc) that all failed to reach consensus, so we settled
>>    back on the current approach of "make the setting explicit" which is
>>    an OK workaround.  One by one I've been doing that for the original
>>    examples I listed (readOnly IndexReader, NIOFSDir default, etc.)
>>
>>    But, then, the conversation shifted to a different topic ("how to
>>    relax our back-compat policy"), which also failed to reach consensus.
>>
>>    Maybe, the best way forward is to break out each of the separate
>>    bullets and discuss them separately?
>>
>>    Mike
>>
>>    On Fri, May 29, 2009 at 11:22 PM, Shai Erera <serera@gmail.com
>>    <mailto:serera@gmail.com>> wrote:
>>    > So ... I've this happen a lot of times (especially in my thesis
>>    work) -
>>    > someone raises a controversial topic, or one that touches the
>>    nervous of the
>>    > system, there's a flurry of activity and then it dies
>>    unexpectedly, even
>>    > though it feels to everyone that there's "an extra mile" that
>>    should be
>>    > taken in order to bring it to completion.
>>    >
>>    > And that's what I've seen in this thread. A lot has been said -
>>    lots of
>>    > comments, ideas, opinions. Lots of ranting and complaining. Then
>>    it died ...
>>    > Thank you Grant for that last "beep", I hope that was an
>>    intention to
>>    > resurrect it.
>>    >
>>    > So I ask - how come that we don't have a decision? Is it because
>>    we're
>>    > "afraid" to make a decision? (that last sentence is supposed to
>>    "tease" the
>>    > community, not to pass judgement)
>>    >
>>    > I'm asking because it seems like everybody pretty much agrees on
>>    most of the
>>    > suggestions, so why not decide "let's do X, Y and Z" and change the
>>    > back-compat page starting from 2.9? If people don't remember the
>>    decisions,
>>    > I don't mind reiterating them.
>>    >
>>    > (I also ask because I'd like to take the improvements from
>>    LUCENE-1614 to
>>    > TermDocs/Positions, PhrasePositions, Spans. All except
>>    PhrasePositions are
>>    > public interfaces and so it matters if I need to go through creating
>>    > abstract classes, with new names, or I can change those
>>    interfaces, asking
>>    > those that implemented their own TermDocs to modify the code).
>>    >
>>    > Shai
>>    >
>>    > On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll
>>    <gsingers@apache.org <mailto:gsingers@apache.org>>
>>    > wrote:
>>    >>
>>    >> So, here's a real, concrete example of the need for case by
>>    case back
>>    >> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>>    >>
>>    >> It's completely stupid that ExtendedFieldCache even exists.      It
>> is a dumb
>>    >> workaround for a made up problem that has nothing to do with
>>    real coders
>>    >> living in the modern age of development where IDE's make
>>    refactoring these
>>    >> types of things very cheap.  Namely, the notion that interfaces
>>    must never
>>    >> change lest every 6-9 months some minute number of users (I'd
>>    venture it's
>>    >> less than 1% of users) out there, who by any account are
>>    completely capable
>>    >> of implementing hard core Lucene internals (like extending
>>    FieldCache), yet
>>    >> are seemingly incapable of reading a CHANGES file with a huge
>>    disclaimer in
>>    >> it, have to recompile (GASP!) their code and put in a dummy
>>    implementation
>>    >> of some new interface method.  Yet, here we are with Yonik
>>    fixing very real
>>    >> problems that are a direct result of coding around back compat.
>>    (along with
>>    >> a mistake; it took a long time for this issue to even be
>>    discovered) that
>>    >> very much effect the usability of Lucene and the day to day
>>    experience of a
>>    >> good number of users.
>>    >>
>>    >> In other words, the real fix for L-1662 is for ExtFieldCache to
>>    be folded
>>    >> into FieldCache and for the file to be removed, never to be
>>    heard from
>>    >> again.
>>    >>
>>    >> The same can be said for the whole Fieldable issue, but that's
>>    a different
>>    >> day.
>>    >>
>>    >> Ranting,
>>    >> Grant
>>    >>
>>    >>
>>    ---------------------------------------------------------------------
>>    >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>    <mailto:java-dev-unsubscribe@lucene.apache.org>
>>    >> For additional commands, e-mail:
>>    java-dev-help@lucene.apache.org
>>    <mailto:java-dev-help@lucene.apache.org>
>>    >>
>>    >
>>    >
>>
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>    <mailto:java-dev-unsubscribe@lucene.apache.org>
>>    For additional commands, e-mail: java-dev-help@lucene.apache.org
>>    <mailto:java-dev-help@lucene.apache.org>
>>
>>
>>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message