lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: When to use addIndexes and when addIndexesNoOptimize
Date Mon, 10 May 2010 21:13:32 GMT
On Mon, May 10, 2010 at 3:08 PM, Shai Erera <serera@gmail.com> wrote:
> That's still weird Mike - we call optimize in addIndexes to reduce the
> number of SRs, that's fair. So why don't we do that in addIndexesNoOpt?

I agree it's weird and inconsistent and all that :)

> There, we get a SR per SI. And name of the method suggests optimize() is
> avoided on purpose ... it's as if addIndexesNoOpt should be called
> addDirectories, and we should let the caller decide whether to call
> optimize() on all IRs (including the local) before he calls addIndexes, or
> NoOpt.

Well... there used to be an addIndexes(Directory..), that did an
optimize, I think both before and after.  So addIndexesNoOpt was
reacting to that.

> I mean, we call those methods in confusing names, and don't follow the same
> approach when handling each ... I can live a/ addIndexes existing to take IR
> extensions, and w/ addDirectories if you don't need IR extensions. But
> calling/not-calling optimize() is inconsistent, and from what I understand,
> for no good reason?

It's for a good reason -- it's to attempt to ensure that the single
.merge done by that method isn't insanely slow, if your index has alot
of segments.

But, really, those merges ought to go through a merge
policy/scheduler, so we do mergeFactor at a time, we do up to N
concurrently, etc.

So I think this pre-optimize is a hack to try to keep how many readers
we merge at once, contained.

> I'm asking these questions b/c someone asked me the other day when one
> should call each and what the hell that NoOpt is doing in the name ... I was
> confused when I was asked the question, and I'm confused now :).

I hear you...

> So how about if we:
> 1) Rename addIndexesNoOptimize to addDirectories

Hmm addDirectories feels a bit too low level... why not call it
addIndexes (it's a different signature since it accepts Dir not IR).

> 2) Remove optimize() call from addIndexes

+1

But advertise this in back compat breaks.  We could also preserve old
way under Version.

> 3) Document that clearly in both, w/ a recommendation to call optimize()
> before on any of the Directories/Indexes if it's a concern.

Good.

> That way, we maintain all the flexibility in the API - addIndexes allows for
> using IR extensions, addDirectories is considered more efficient, by
> allowing the merges to happen concurrently (depending on MS) and also
> factors in the MP. So unless you have an IR extension, addDirectories is
> really the one you should be using. And you have the freedom to call
> optimize() before each if you care about it, or don't if you don't care.
> Either way, incurring the cost of optimize() is entirely in your hands.

Good!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message