lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Lucene's default settings & back compatibility
Date Mon, 18 May 2009 21:54:53 GMT
Yeah makes sense, getting in depth with Lucene, and then seeing
real world usage, most users still do use the defaults. I think
I will try to do help this by writing some wiki pages on new
features. Probably this OldSettings/NewSettings model is a good
start for a wiki page?

Our current wiki FAQ is a bit long, so it should help to have a
new page that goes over configurations for different goals.

On Mon, May 18, 2009 at 2:21 PM, Michael Busch <buschmic@gmail.com> wrote:

> +1. this would be great!
>
>  Michael
>
>
> On May 18, 2009, at 2:06 PM, Michael McCandless <lucene@mikemccandless.com>
> wrote:
>
>  As we all know, Lucene's back-compat policy necessarily hurts the
>> out-of-the-box experience for new users: because we are only allowed
>> make substantial improvements to Lucene's default settings at a major
>> release, new users won't see the improvements to our settings until a
>> major release (typically years apart).
>>
>> Lucene has a number of default settings, eg some recent examples:
>>
>>  * Read-only IndexReader gives better much performance with threads,
>>   yet we must now default IndexReader.open to return a non-readOnly
>>   reader
>>
>>  * We can now optionally turn off scoring when sorting by field
>>   (sizable speed gain), but we had to leave it on by default until
>>   3.0
>>
>>  * Letting IndexReader.norms return null
>>
>>  * LogMergePolicy now takes deletions into account, but we had to
>>   disable it by default, since it could conceivably break back
>>   compat.
>>
>>  * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
>>   there's a remote chance they'd break back compat in an app, or we
>>   end up adding confusing methods like "public static void
>>   setDefaultReplaceInvalidAcronym".
>>
>>  * NIOFSDirectory ought to be "the default" on UNIX, but it's not
>>
>>  * Constant score rewrite ought to be the default for most multi-term
>>   queries
>>
>>  * StopFilter should enable position increments by default
>>
>> The fact that we are "forced" delay such "out of the box" improvements
>> to Lucene for so long is a frustrating cost, since it can only stunt
>> Lucene's adoption and growth and my sense is that it's a minority of
>> Lucene's users that need such strict back-compat (this has been
>> discussed before).  It also clutters our APIs because we end up
>> creating setter/getters that often only exist for the sake of a back
>> compat preservation of a bug.
>>
>> I think we can fix this.  Ie, maintain our strong back-compat policy,
>> yet still allow new users to experience the best of Lucene on every
>> release (not just on major releases), by creating an explicit class
>> that holds settings/defaults used by Lucene.
>>
>> For example, say we create a base class named Settings.  It holds the
>> defaults for settings across all of Lucene's classes. When you create
>> IndexReader, IndexWriter and others, you must pass in a Settings
>> instance.
>>
>> A subclass, SettingsMatching24, binds all settings to "match" 2.4's
>> behavior.  When we make improvements in 2.9, we'd add the back-compat
>> settings to SettingsMatching24.  So if your app wants to keep exactly
>> 2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
>> 2.9 you'd still see 2.4's behavior.
>>
>> Users who'd like to see Lucene's improvements on each minor release
>> would instead instantiate LatestAndGreatestSettings() (or
>> CurrentVersionSettings(), or something), understanding that when they
>> upgrade there might be biggish changes to Lucene's defaults.  My guess
>> is most users would use this settings class.
>>
>> Doug actually suggested this exact idea a while back:
>>
>>  http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.
>>
>> Now that I realize we could use this to strongly decouple "users
>> wanting precise back-compat" from "users wanting the latest &
>> greatest", I think it's a very compelling solution.
>>
>> If we do this I'd like to do it in 2.9, so that starting with 3.x we
>> are free to change default settings w/o breaking back compat.
>>
>> Thoughts?
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message