lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lucene's default settings & back compatibility
Date Fri, 22 May 2009 17:22:24 GMT
On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey
<> wrote:
>> when working on 3.1 if we make some great improvement, I'd like new users in
>> 3.1 to see the improvement by default.
> Sounds like an argument for more frequent major releases.

Yeah.  Or "rebranding" what we now call minor as major releases, by
changing our policy ;) Or "rebranding" to Lucene 2009.

But: localized improvements (like the sizable performance gain from
turning off scoring when sorting by field) should not have to wait for
a major release to benefit new users.  I think they should be on by
default on the next release.

Will Lucy do scoring when sorting by field, by default?

>> On thinking about it more... automagically storing the "actsAsVersion"
>> in the index, and then having IndexWriter (for example) ask the
>> analyzer for a tokenStream matching that version, seems a little too
>> sneaky.
> Can you elaborate?
> In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter
> have to be passed a Schema, which contains all the Analyzers.  Analyzers
> aren't satellite classes under this model -- they are a fixed property of a
> FullTextType field spec.  Think of them as baked into an SQL field definition.
> You can create a Schema from scratch to pass to the QueryParser, but it's
> easier to just get it from the Searcher.  Translating to Java...
>   Searcher searcher = new Searcher("/path/to/index");
>   QueryParser qparser = new QueryParser(searcher.getSchema());
> I don't see how that's so different from getting an analyzer actsAsVersion
> number from the index.

I agree in KS/Lucy, it works well, because you must explicitly pass in
Schema to each of the satellite classes.

But in Lucene, if whenever IndexWriter asked analyzer for a
tokenstream, it passed in the actsAsVersion it had loaded from the
index, that's sneaky.  I'd rather have it explicit (like KS/Lucy), so
you'd have to IndexWrter.getActsAsVersion, then pass that into your
analyzer when you create it.  It's the automatic under-the-hood
passing that makes me nervous and I think would confuse users.

(That said, unrelated to this discussion, I would actually like to
record per-segment which version of Lucene wrote the segment; this
would be very helpful when debugging issues like LUCENE-1474 where I
need to know if the segments were written by 2.4.0 or 2.4.1).

> Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper...
> is that where the sneakiness gets overwhelming?

Per-class actsAsVersion would work well here -- PFAW would just
forward the required version when requesting the tokenStream.

>> I prefer the up-front "you specify actsAsVersion" when you
>> create the analyzer, only for analyzers that have changed across
>> releases.  So things like WhitespaceAnalyzer would likely never need
>> an actsAsVersion arg.
> Hmm, this is kind of hard.  I'd prefer that the argument remain optional, so
> that new users don't have to think about it.

I wouldn't mind optional, but only if it defaults to latest and
greatest.  The goal here is to have new users always see the best of
Lucene when they start out.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message