lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Lucene's default settings & back compatibility
Date Thu, 21 May 2009 20:34:10 GMT
I thought we were actually on the track towards not introducing any Settings
and/or actAs, but instead just change the policy?

Can we agree on the following:

* Changes to the index file formats need to be supported for 2 major
releases. I.e. 2.X indexes need to be read by 3.Y code, but not by 4.0.

* Method deprecations last for one full minor release. That is a deprecation
in 2.X lasts through 2.X.1, 2.X+1 but is removed in 2.X+2. If all those X's
are confusing --> a deprecation in 2.4 is kept in 2.4.X and 2.5, but we're
free to remove it in 2.6.

* Changes to default behaviors (whether they are bug fixes or improvements),
where it only affects runtime code, and not the index structure or indexed
data (such as the InvalidAcronym bug fix) are ok to go into any minor
release, w/o deprecation - so long we're documenting the change in CHANGES
along with some sample code on how to migrate easily.

* Changes to default behaviors, bug fixes or improvements, that may
compromise the index structure or indexed data (such as InvalidAcronym) will
last for at least one major release, if not 2 (just like supporting file
formats). The reason is - rebuilding indexes, besides that it might be a
heavy process, is not often acceptable by the customers of those who develop
search solutions. Therefore it may be out of our hands. Personally, I don't
think those will happen a lot, but when they will we can choose between:
(1) Deprecating a class entirely in favor of a new one, such that anyone who
upgrades can still use that class
(2) Introduce a static setter for that behavior, like for InvalidAcronym
(3) Add a actsAs to that class only.

Am I missing a back-compat issue?

What I don't like about actsAs, and perhaps I just don't understand the
proposal well, is that I'm not sure where it's added. Will it be added to
IndexWriter, which will pass it on to all the classes it will meet/use?

If I covered all the back-compat issues above, and we agree on them, then
for the first 3 we just need to document them on the back-compat page, no
code to develop.

For the last one, if we choose to adopt (1) or (2), then we don't need to
develop any mehcanism up-front, but decide on a per-case basis what's the
best alternative. For example, for the InvalidAcronym we could have
deprecated that particular TokenFilter in favor of a new one and give a code
example on how to create a TokenStream with that deprecated TokenFilter.

Shai

On Thu, May 21, 2009 at 10:55 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> I'm having trouble visualizing the various methods people are talking
> about.  It seems like we could open an issue and post patches with code
> illustrating what each person is talking about?
>
> On Thu, May 21, 2009 at 10:02 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Actually, we started with the *Settings classes (to hold defaults),
>> but then realized a simple actsAsVersion (single static method) would
>> suffice for just the back-compat settings and then pushed further and
>> thought perhaps we should relax our back-compat policy entirely so
>> emulating older versions is not needed.
>>
>> So we no longer have the "defaults" class (*Settings).  We may still
>> do it for the future (for its own benefits), but for just back-compat
>> of default settings, it seems like overkill.
>>
>> But I agree, the index altering cases are spooky.  I think this'd make
>> me favor going back to the actsAsVersion option instead of the hard
>> flip on our back compat policy (at least for default settings; for API
>> changes I think 1 whole minor release may be reasonable).
>>
>> Mike
>>
>> On Thu, May 21, 2009 at 12:54 PM, Matthew Hall
>> <mhall@informatics.jax.org> wrote:
>> > Sorry, I wasn't quite sure what to call this new class you guys have
>> been
>> > talking about.
>> >
>> > I was referring to the class that's being discussed to encapsulate all
>> of
>> > the defaults for a given lucene release.  (Its caching strategies etc
>> etc)
>> >
>> > I'm just not certain that something like a static list of words belongs
>> in a
>> > higher level defaults class like you guys are talking about, especially
>> > considering that anyone using a stop enabled analyzer really should be
>> > familiar with this list, and oftentimes needs to override it.
>> >
>> > Meh, now that I'm actually typing it out though, perhaps I'm incorrect
>> here,
>> > assuming this class you guys are describing will be well
>> > advertised/documented maybe it will actually make it easier for end
>> > developers to twiddle around with this list, or at least certainly make
>> them
>> > more aware that its even something that they have the ability to
>> actually
>> > change.
>> >
>> > Matt
>> >
>> > Michael McCandless wrote:
>> >>
>> >> What is the "lucene defaults class"?
>> >>
>> >> Mike
>> >>
>> >> On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
>> >> <mhall@informatics.jax.org> wrote:
>> >>
>> >>>
>> >>> For extreme examples like this, couldn't the stopword list be
>> >>> encapsulated
>> >>> into a single class that's used by the lucene defaults class.
>> >>>
>> >>> That way if you folks released updates to mostly static content like
a
>> >>> stopword list, new or old users could get it easily with a simple drop
>> in
>> >>> fix?
>> >>>
>> >>> Just my two cents.
>> >>>
>> >>> Matt
>> >>>
>> >>> Michael McCandless wrote:
>> >>>
>> >>>>
>> >>>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rcmuir@gmail.com>
>> wrote:
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> even as simple as changing default stopword list for some analyzer
>> >>>>> could
>> >>>>> be
>> >>>>> an issue, if the user doesn't re-index in response to that change.
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> OK, right.
>> >>>>
>> >>>> So say we forgot to include "the" in the default English stopwords
>> >>>> list (yes, an extreme example...).
>> >>>>
>> >>>> Under the proposed changes 1 & 2 to back-compat policy, we would
add
>> >>>> "the" to the default stopword list, so new users get the fix, but
>> >>>> still keep the the-less list accessible (deprecated).  We'd add
an
>> >>>> entry in CHANGES.txt saying this happened, and then show code on
how
>> >>>> to get back to the the-less stopword list.
>> >>>>
>> >>>> New users using that StopFilter would properly see "the" filtered
>> out.
>> >>>>  Users who upgraded would need to fix their code to switch back
to
>> the
>> >>>> deprecated the-less list.
>> >>>>
>> >>>> Mike
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Matthew Hall
>> >>> Software Engineer
>> >>> Mouse Genome Informatics
>> >>> mhall@informatics.jax.org
>> >>> (207) 288-6012
>> >>>
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>>
>> >>>
>> >>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>> > --
>> > Matthew Hall
>> > Software Engineer
>> > Mouse Genome Informatics
>> > mhall@informatics.jax.org
>> > (207) 288-6012
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

Mime
View raw message