lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Wed, 06 Sep 2006 01:50:19 GMT
On 9/5/06, Ning Li <> wrote:
> > What about an invariant that says the number of main index segments
> > with the same level (f(n)) should be less than M.
> That is exactly what the second property says:
> "Less than M number of segments whose doc count n satisfies B*(M^c) <=
> n < B*(M^(c+1)) for any c >= 0."
> In other words, less than M number of segments with the same f(n).

Ah, I had missed that.  But I don't believe that lucene currently
obeys this in all cases.

> > I am concerned about corner cases causing tons of segments and slowing
> > search or causing errors due to file descriptor exhaustion.
> >
> > When merging, maybe we should count the number of segments at a
> > particular index level f(n), rather than adding up the number of
> > documents.  In the presence of deletions, this should lead to faster
> > indexing (due to less frequent merges) I think.
> Given M, B and an index which has L (0 < L < M) segments with docs
> less than B, how many ram docs should be accumulated before a merge is
> triggered? B is not good. B-sum(L) is the old strategy which has
> problems.

The new IndexWriter changes ad an additional constraint: to delete
documents efficiently, the first merge must be on buffered documents
only to ensure that ids don't change.  We should also explore changing
the index invariants to accommodate this.

Do you have any ideas in this area?  Is a monotonically decreasing
segment level (your f(n)) really required?

> So between B-sum(L) and B? Once there are M segments with
> docs less than B, they'll be merged. But what if L=0? Should B ram
> docs be accumulated before flushed in that case?

It seems like it.  Examples are easier to visualize sometimes... do
you have an example where this wouldn't be advisable?

> In any case, if flushing ram docs causes the the number of segments
> with <B docs to reach M in close(), a merge with those segments should
> be triggered.


-Yonik Solr, the open-source Lucene search server

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message