lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: [jira] Created: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize
Date Thu, 03 May 2007 19:16:07 GMT

"Ning Li" <> wrote:

> > It would merge based on size (not # docs), would be free to merge
> > adjacent segments (not just rightmost segments), and would merge N
> > (configurable) at a time.  The part that's still unclear is how it
> > chooses when to "trigger" a merge and how specifically it picks which
> > N segments to merge (maybe: the series of N adjacent segments that are
> > "most similar" in size, but favoring smaller segments over larger
> > ones).
> Those two are very good questions. It's a challenge to make it work in
> all case. One example is the sandwich case, where two large segments
> sandwich a small one. I'll think about it... It'd be even better if we
> can take deletes into consideration: it's more beneficial to merge a
> segment with more deletes. Right now, we have to open an IndexReader
> to get the number of deletes. We could store that in segments file if
> we decide IndexWriter/MergePolicy will need that...

Yes the sandwich case would be challenging, though, how would you get
to the sandwich case in the first place?  I guess if RAM had flushed
that way; or if many deletes accumulated on the middle one.  But I
don't think merging would tend to produce sandwich cases itself (since
it would have merged that middle one).

I like your idea to keep "delete count per segment" in the segments
file.  This information is certainly useful to the merge policy
because it should proportionally reducde a segments size according to
what %tg of its docs are deleted, and, it should favor merging
segments with high # deletes to free up the storage.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message