lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: When to use addIndexes and when addIndexesNoOptimize
Date Tue, 11 May 2010 03:38:36 GMT
>
> Hmm addDirectories feels a bit too low level
>

I don't mind calling it addIndexes(Directory...), but I don't think it's too
low level - whoever executes the method passes Directory... and that's
exactly what the method does :). Two addIndexes force you to go read the
jdoc, but so will addDirectories. I don't mind either way.

But advertise this in back compat breaks.
>

I don't think it's a bw break? More of a runtime change IMO. True it can
affect performance, but did we ever measure addIndexes w/ and w/o
optimize(). Are we sure that optimize() first, then SM merges that follow
perform better? I mean, on paper it should. But since we do it only for the
target index, we don't really know what's happening in users' apps. It's
only documentation in CHANGES, so it can go into both sections (BACKWARDS or
RUNTIME). I prefer the latter. It can also go like that into trunk's
CHANGES.

Good!
>

I'll open an issue to track this.

Shai

On Tue, May 11, 2010 at 12:13 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Mon, May 10, 2010 at 3:08 PM, Shai Erera <serera@gmail.com> wrote:
> > That's still weird Mike - we call optimize in addIndexes to reduce the
> > number of SRs, that's fair. So why don't we do that in addIndexesNoOpt?
>
> I agree it's weird and inconsistent and all that :)
>
> > There, we get a SR per SI. And name of the method suggests optimize() is
> > avoided on purpose ... it's as if addIndexesNoOpt should be called
> > addDirectories, and we should let the caller decide whether to call
> > optimize() on all IRs (including the local) before he calls addIndexes,
> or
> > NoOpt.
>
> Well... there used to be an addIndexes(Directory..), that did an
> optimize, I think both before and after.  So addIndexesNoOpt was
> reacting to that.
>
> > I mean, we call those methods in confusing names, and don't follow the
> same
> > approach when handling each ... I can live a/ addIndexes existing to take
> IR
> > extensions, and w/ addDirectories if you don't need IR extensions. But
> > calling/not-calling optimize() is inconsistent, and from what I
> understand,
> > for no good reason?
>
> It's for a good reason -- it's to attempt to ensure that the single
> .merge done by that method isn't insanely slow, if your index has alot
> of segments.
>
> But, really, those merges ought to go through a merge
> policy/scheduler, so we do mergeFactor at a time, we do up to N
> concurrently, etc.
>
> So I think this pre-optimize is a hack to try to keep how many readers
> we merge at once, contained.
>
> > I'm asking these questions b/c someone asked me the other day when one
> > should call each and what the hell that NoOpt is doing in the name ... I
> was
> > confused when I was asked the question, and I'm confused now :).
>
> I hear you...
>
> > So how about if we:
> > 1) Rename addIndexesNoOptimize to addDirectories
>
> Hmm addDirectories feels a bit too low level... why not call it
> addIndexes (it's a different signature since it accepts Dir not IR).
>
> > 2) Remove optimize() call from addIndexes
>
> +1
>
> But advertise this in back compat breaks.  We could also preserve old
> way under Version.
>
> > 3) Document that clearly in both, w/ a recommendation to call optimize()
> > before on any of the Directories/Indexes if it's a concern.
>
> Good.
>
> > That way, we maintain all the flexibility in the API - addIndexes allows
> for
> > using IR extensions, addDirectories is considered more efficient, by
> > allowing the merges to happen concurrently (depending on MS) and also
> > factors in the MP. So unless you have an IR extension, addDirectories is
> > really the one you should be using. And you have the freedom to call
> > optimize() before each if you care about it, or don't if you don't care.
> > Either way, incurring the cost of optimize() is entirely in your hands.
>
> Good!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message