lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Tue, 07 Aug 2007 16:47:59 GMT


Michael McCandless commented on LUCENE-847:

> > I think we ideally would like concurrency to be fully independent of
> > the merge policy.
> I thought of that, too, while taking a fresh look at things
> again. It's my current approach, though I'm not yet sure there won't
> be stumbling blocks. More soon, hopefully.

Well I think the current MergePolicy API (where the "merge" method
calls IndexWriter.merge itself, must cascade itself, etc.) makes it
hard to build a generic ConcurrentMergePolicy "wrapper" that you could
use to make any MergePolicy concurrent (?).  How would you do it?

EG I'm working on a new MergePolicy for LUCENE-845, which would be
nice to run concurrently, but I'd really rather not have to figure out
how to build my own concurrency/locking/etc in it.  Ideally
"concurrency" is captured as a single wrapper class that we all can
re-use on top of any MergePolicy.  I think we can do that with the
proposed simplification.

> > I think with one change to your MergePolicy API & control flow, we
> > could make this work very well: instead of requiring the MergePolicy
> > to call IndexWriter.merge, and do the cascading, it should just
> > return the one MergeSpecification that should be done right now.

> Hmm ... interesting idea. I thought about it briefly, though I
> didn't pursue it (see below). It would end up changing the possible
> space of merge policies subtly. You wouldn't be able to have any
> state in the algorithm. Arguably this is a good thing. There is also
> a bit more overhead, since starting the computation of potential
> merges from scratch each time could imply a little more computation,
> but I suspect this is not significant.

I think you can still have state (as instance variables in your
class)?  How would this simplification restrict the space of merge

> > When the inner MergePolicy wants to do a merge, the
> > ConcurrentMergePolicy would in turn kick off that merge in the BG but
> > then return null to the IndexWriter allowing IndexWriter to return to
> > its caller, etc.
> I'm a little unsure here. Are you saying the ConcurrentMergePolicy
> does the merges itself, rather than using the writer? That's going
> to mean a synchronization dance between the CMP and the
> writer. There's no question but that there has to be some synch
> dance, but my current thinking was to try to keep as cleanly within
> one class, IW, as I could.

Oh, no: ConcurrentMergePolicy would still call IndexWriter.merge(spec),
just with a separate thread.  And so all synchronization required is
still inside IndexWriter (I think?).

In fact, if we stick with the current MergePolicy API, aren't you
going to have to put some locking into eg the LogDocMergePolicy when
concurrent merges might be happening?  With the new approach,
IndexWriter could invoke MergePolicy.merge under a
"synchronized(segmentInfos)", and then each MergePolicy doesn't have
to deal with locking at all.

> Factor merge policy out of IndexWriter
> --------------------------------------
>                 Key: LUCENE-847
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.txt
> If we factor the merge policy out of IndexWriter, we can make it pluggable, making it
possible for apps to choose a custom merge policy and for easier experimenting with merge
policy variants.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message