lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li" <>
Subject Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Wed, 06 Sep 2006 15:26:41 GMT
> > "Less than M number of segments whose doc count n satisfies B*(M^c) <=
> > n < B*(M^(c+1)) for any c >= 0."
> > In other words, less than M number of segments with the same f(n).
> Ah, I had missed that.  But I don't believe that lucene currently
> obeys this in all cases.

I think it does hold for n >= B, i.e. c >= 0. But not for n < B.

> The new IndexWriter changes ad an additional constraint: to delete
> documents efficiently, the first merge must be on buffered documents
> only to ensure that ids don't change.  We should also explore changing
> the index invariants to accommodate this.
> Do you have any ideas in this area?  Is a monotonically decreasing
> segment level (your f(n)) really required?

Currently, the first merge always starts on buffered documents. Do you
want this constraint to be reflected in the index invariants, or do
you want to remove this constraint?

In any case, a monotonically decreasing f(n) is definitely a good
thing. Otherwise, cases like a sandwich (segments with small f(n)
sandwiched by two segments with large f(n)) make it even harder to
come up with a robust merge policy.

> > So between B-sum(L) and B? Once there are M segments with
> > docs less than B, they'll be merged. But what if L=0? Should B ram
> > docs be accumulated before flushed in that case?
> It seems like it.  Examples are easier to visualize sometimes... do
> you have an example where this wouldn't be advisable?

I'm ok with it. I simply wish there were one strategy that would work
for both cases.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message