lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <dmsmith...@gmail.com>
Subject Re: Lucene's default settings & back compatibility
Date Tue, 19 May 2009 12:26:13 GMT

On May 19, 2009, at 7:45 AM, Michael McCandless wrote:

> On Tue, May 19, 2009 at 6:47 AM, DM Smith <dmsmith555@gmail.com>  
> wrote:
>
>> It is common in my application, a Bible program, that indexes each  
>> verse
>> (think of a verse as a numbered sentence) as a separate document.  
>> We index
>> everything, including words that are typically stop words as those  
>> might be
>> important to our end users. Besides this, the top 280 word roots  
>> represent
>> 90% of the occurrences.
>> And on searches, we return everything in book order, unless the  
>> user wants
>> to score the result. In that case, we return a small, user  
>> configurable
>> amount of hits ordered by score.
>
> The ability to turn off scoring when sorting by field, new in 2.9,
> should be a good performance boost for your use case (if performance
> is important).
>
>> And we are using Lucene out of the box for the most part. We've  
>> deviated
>> only to incrementally solve performance problems.
>
> Right, my impression is most people will stick w/ Lucene's defaults,
> incrementally changing only limited settings they come across, which
> is why selecting good defaults is vital to Lucene's growth/adoption
> (new users especially simply start w/ our defaults).
>
> But we can't pick good defaults when we're so heavily bound by back- 
> compat.
>
> Which is why I find the Settings approach so appealing :)  Suddenly,
> on all improvements to Lucene, we have the freedom to change our
> defaults so a new user sees all such improvements.

 From my perspective as a user:
Backward compatibility is important, but it is not a be-all and end-all.

To me, if I can drop in the new jar and get bug fixes that's great. My  
expectation is that searches against an existing index will still  
return the same or, in the case of bug fixes, better results.

What I need to know is when that is not the case. Today, we use a  
naming convention of the Lucene jars to indicate whether that is true.  
I'd be just as happy if there were a compatibility level that I could  
check (I'm having to do that in our code as I change our analyzers  
frequently enough to be embarrassed).

The problem, which might be addressed in the "fixing" of core vs  
contrib, is that we use lots of contrib (analyzers, snowball,  
highlighting) and want it to maintain backward compatibility too. (I'm  
happy that has been the case!) So, perhaps a compatibility level per  
contribution.

The packagers for jpackage consider nearly every release of Lucene to  
break backward compatibility, because they treat Lucene as a whole.  
Perhaps that is the same with other Linux distributions. But because  
backward compatibility does not apply to contrib in a strict fashion,  
one cannot reliably use Lucene from distributions unless such a policy  
is the case.

In any case, I don't think anyone should just drop in a new jar  
without some testing. At a minimum, they should compile with  
deprecations turned on.

Regarding deprecations, I'd also be just as happy if a method was marked
	@deprecated This behavior <b>has</b> changed in with this release,  
2.4.3.
That is, as a warning of changed behavior.

And then on the 3.0 release the warning could be removed.

But then again, my use of Lucene, while very important to my  
application, is very simple and easy to change.

-- DM




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message