lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Lucene's default settings & back compatibility
Date Thu, 21 May 2009 20:57:56 GMT
Sounds like a good proposition.

There's one problem I'd like to address. Good names for
classes/members matter, and matter much. They directly affect how fast
a newcomer is able to understand that particular API, it also affects
how comfortable you work with it once you did understand. When we're
deprecating existing methods and adding new, 'better' ones, bad or
mediocre names replace good names in parts of the code that are most
often used. And there's no way around it.

It's somewhat crazy, but what if we deprecate stuff and rename it? New
stuff gets best names, old stuff is still accessible and with a
"Migration Guide" it's easy to patch client code.

On Fri, May 22, 2009 at 00:34, Shai Erera <serera@gmail.com> wrote:
> I thought we were actually on the track towards not introducing any Settings
> and/or actAs, but instead just change the policy?
>
> Can we agree on the following:
>
> * Changes to the index file formats need to be supported for 2 major
> releases. I.e. 2.X indexes need to be read by 3.Y code, but not by 4.0.
>
> * Method deprecations last for one full minor release. That is a deprecation
> in 2.X lasts through 2.X.1, 2.X+1 but is removed in 2.X+2. If all those X's
> are confusing --> a deprecation in 2.4 is kept in 2.4.X and 2.5, but we're
> free to remove it in 2.6.
>
> * Changes to default behaviors (whether they are bug fixes or improvements),
> where it only affects runtime code, and not the index structure or indexed
> data (such as the InvalidAcronym bug fix) are ok to go into any minor
> release, w/o deprecation - so long we're documenting the change in CHANGES
> along with some sample code on how to migrate easily.
>
> * Changes to default behaviors, bug fixes or improvements, that may
> compromise the index structure or indexed data (such as InvalidAcronym) will
> last for at least one major release, if not 2 (just like supporting file
> formats). The reason is - rebuilding indexes, besides that it might be a
> heavy process, is not often acceptable by the customers of those who develop
> search solutions. Therefore it may be out of our hands. Personally, I don't
> think those will happen a lot, but when they will we can choose between:
> (1) Deprecating a class entirely in favor of a new one, such that anyone who
> upgrades can still use that class
> (2) Introduce a static setter for that behavior, like for InvalidAcronym
> (3) Add a actsAs to that class only.
>
> Am I missing a back-compat issue?
>
> What I don't like about actsAs, and perhaps I just don't understand the
> proposal well, is that I'm not sure where it's added. Will it be added to
> IndexWriter, which will pass it on to all the classes it will meet/use?
>
> If I covered all the back-compat issues above, and we agree on them, then
> for the first 3 we just need to document them on the back-compat page, no
> code to develop.
>
> For the last one, if we choose to adopt (1) or (2), then we don't need to
> develop any mehcanism up-front, but decide on a per-case basis what's the
> best alternative. For example, for the InvalidAcronym we could have
> deprecated that particular TokenFilter in favor of a new one and give a code
> example on how to create a TokenStream with that deprecated TokenFilter.
>
> Shai
>
> On Thu, May 21, 2009 at 10:55 PM, Jason Rutherglen
> <jason.rutherglen@gmail.com> wrote:
>>
>> I'm having trouble visualizing the various methods people are talking
>> about.  It seems like we could open an issue and post patches with code
>> illustrating what each person is talking about?
>>
>> On Thu, May 21, 2009 at 10:02 AM, Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>>>
>>> Actually, we started with the *Settings classes (to hold defaults),
>>> but then realized a simple actsAsVersion (single static method) would
>>> suffice for just the back-compat settings and then pushed further and
>>> thought perhaps we should relax our back-compat policy entirely so
>>> emulating older versions is not needed.
>>>
>>> So we no longer have the "defaults" class (*Settings).  We may still
>>> do it for the future (for its own benefits), but for just back-compat
>>> of default settings, it seems like overkill.
>>>
>>> But I agree, the index altering cases are spooky.  I think this'd make
>>> me favor going back to the actsAsVersion option instead of the hard
>>> flip on our back compat policy (at least for default settings; for API
>>> changes I think 1 whole minor release may be reasonable).
>>>
>>> Mike
>>>
>>> On Thu, May 21, 2009 at 12:54 PM, Matthew Hall
>>> <mhall@informatics.jax.org> wrote:
>>> > Sorry, I wasn't quite sure what to call this new class you guys have
>>> > been
>>> > talking about.
>>> >
>>> > I was referring to the class that's being discussed to encapsulate all
>>> > of
>>> > the defaults for a given lucene release.  (Its caching strategies etc
>>> > etc)
>>> >
>>> > I'm just not certain that something like a static list of words belongs
>>> > in a
>>> > higher level defaults class like you guys are talking about, especially
>>> > considering that anyone using a stop enabled analyzer really should be
>>> > familiar with this list, and oftentimes needs to override it.
>>> >
>>> > Meh, now that I'm actually typing it out though, perhaps I'm incorrect
>>> > here,
>>> > assuming this class you guys are describing will be well
>>> > advertised/documented maybe it will actually make it easier for end
>>> > developers to twiddle around with this list, or at least certainly make
>>> > them
>>> > more aware that its even something that they have the ability to
>>> > actually
>>> > change.
>>> >
>>> > Matt
>>> >
>>> > Michael McCandless wrote:
>>> >>
>>> >> What is the "lucene defaults class"?
>>> >>
>>> >> Mike
>>> >>
>>> >> On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
>>> >> <mhall@informatics.jax.org> wrote:
>>> >>
>>> >>>
>>> >>> For extreme examples like this, couldn't the stopword list be
>>> >>> encapsulated
>>> >>> into a single class that's used by the lucene defaults class.
>>> >>>
>>> >>> That way if you folks released updates to mostly static content
like
>>> >>> a
>>> >>> stopword list, new or old users could get it easily with a simple
>>> >>> drop in
>>> >>> fix?
>>> >>>
>>> >>> Just my two cents.
>>> >>>
>>> >>> Matt
>>> >>>
>>> >>> Michael McCandless wrote:
>>> >>>
>>> >>>>
>>> >>>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rcmuir@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >>>>> even as simple as changing default stopword list for some
analyzer
>>> >>>>> could
>>> >>>>> be
>>> >>>>> an issue, if the user doesn't re-index in response to that
change.
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>> OK, right.
>>> >>>>
>>> >>>> So say we forgot to include "the" in the default English stopwords
>>> >>>> list (yes, an extreme example...).
>>> >>>>
>>> >>>> Under the proposed changes 1 & 2 to back-compat policy,
we would add
>>> >>>> "the" to the default stopword list, so new users get the fix,
but
>>> >>>> still keep the the-less list accessible (deprecated).  We'd
add an
>>> >>>> entry in CHANGES.txt saying this happened, and then show code
on how
>>> >>>> to get back to the the-less stopword list.
>>> >>>>
>>> >>>> New users using that StopFilter would properly see "the" filtered
>>> >>>> out.
>>> >>>>  Users who upgraded would need to fix their code to switch
back to
>>> >>>> the
>>> >>>> deprecated the-less list.
>>> >>>>
>>> >>>> Mike
>>> >>>>
>>> >>>>
>>> >>>> ---------------------------------------------------------------------
>>> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>> --
>>> >>> Matthew Hall
>>> >>> Software Engineer
>>> >>> Mouse Genome Informatics
>>> >>> mhall@informatics.jax.org
>>> >>> (207) 288-6012
>>> >>>
>>> >>>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Matthew Hall
>>> > Software Engineer
>>> > Mouse Genome Informatics
>>> > mhall@informatics.jax.org
>>> > (207) 288-6012
>>> >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message