lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: When to use addIndexes and when addIndexesNoOptimize
Date Mon, 10 May 2010 19:08:40 GMT
That's still weird Mike - we call optimize in addIndexes to reduce the
number of SRs, that's fair. So why don't we do that in addIndexesNoOpt?
There, we get a SR per SI. And name of the method suggests optimize() is
avoided on purpose ... it's as if addIndexesNoOpt should be called
addDirectories, and we should let the caller decide whether to call
optimize() on all IRs (including the local) before he calls addIndexes, or
NoOpt.

I mean, we call those methods in confusing names, and don't follow the same
approach when handling each ... I can live a/ addIndexes existing to take IR
extensions, and w/ addDirectories if you don't need IR extensions. But
calling/not-calling optimize() is inconsistent, and from what I understand,
for no good reason?

I'm asking these questions b/c someone asked me the other day when one
should call each and what the hell that NoOpt is doing in the name ... I was
confused when I was asked the question, and I'm confused now :).

So how about if we:
1) Rename addIndexesNoOptimize to addDirectories
2) Remove optimize() call from addIndexes
3) Document that clearly in both, w/ a recommendation to call optimize()
before on any of the Directories/Indexes if it's a concern.

That way, we maintain all the flexibility in the API - addIndexes allows for
using IR extensions, addDirectories is considered more efficient, by
allowing the merges to happen concurrently (depending on MS) and also
factors in the MP. So unless you have an IR extension, addDirectories is
really the one you should be using. And you have the freedom to call
optimize() before each if you care about it, or don't if you don't care.
Either way, incurring the cost of optimize() is entirely in your hands.

What do you think?

Shai

On Mon, May 10, 2010 at 9:33 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Mon, May 10, 2010 at 2:18 PM, Shai Erera <serera@gmail.com> wrote:
> > Ahh, I see. Didn't think of IndexReader extensions. Why do we call
> > optimize() on the local dir in addIndexes then? What's the benefits?
>
> I really don't know!  Maybe to handle the case where local index has
> many segments?  Ie, reduce the net number of readers open?
>
> I would think "typically" a smallish number of foreign indexes are
> added to a largish number of local segments?
>
> We should at least make it optional to do the optimize...
>
> > We don't do the same on the incoming readers, so why does it matter if
> e.g. the
> > local dir has 2 segments and the incoming ones have 100? We insist on
> > optimizing the local 2 segments ...
> >
> > BTW, addIndexesNoOpt does not obtain a reader, but rather reads the SIs
> from
> > each directory and then calls maybeMerge().
>
> Ahh right, thanks for the clarification.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message