lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Parkes (JIRA)" <>
Subject [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Sat, 18 Aug 2007 18:30:31 GMT


Steven Parkes commented on LUCENE-847:

	my feeling is we should not deprecate
	setUseCompoundFile, setMergeFactor, setMaxMergeDocs

I understood that you didn't want to deprecate them in IndexWriter. I wasn't sure that you
meant that they should be added to the MergePolicy interface? If you do, everything makes
sense. Otherwise, it sounds like there's still a cast in there and I'm not sure about that.

	I think IndexWriter should enforce it?  Ie no merge policy should be
	allowed to leave segments in other dirs (= at inconsistent index) at
	point of commit.

I think it's just about code location: since a merge policy might want to factor into it's
algorithm the directories used, it needs the info and it will presumably sometimes do it.
Presumably you could provide code in MergePolicyBase so the merges could decide when but wouldn't
have to write the copy loop. If you put the code in IndexWriter too, it sounds duplicated,
again presuming sometimes a policy might want to do it itself. 

	I like that idea :)  It fits well w/ the stateless API.  Ie, merge
	policy returns all possible merges and "someone above" takes care of
	scheduling them.

So it returns a vector of specs?

That's essentially what the CMP as an above/below wrapper does. I can see that above/below
is strange enough to be less clever (I wasn't trying to be so much clever as backwards compatible)
and more insane.

Sane is good.

	Hmm.  This means each merge policy must know whether it's talking to
	CMP or IndexWriter underneith?  With the stateless approach this
	wouldn't happen.

Well, I wouldn't so much say it has to know. All it cares is what merge returns. Doesn't have
to know who returned it or why.

The only real difference between this and the "generate a vector of merges" is that in the
merge policy can take advantage immediately of merge results in the serial case where if you're
generating a vector of merges, it can't know.

Of course, I guess in that case, if IndexWriter gets a vector of merges, it can always take
the lowest and ignore the rest, calling the merge policy again incase it wants to request
a different set. Then you only have the excess computation for merges you never really considered.

	Oh I see...  that's kind of sneaky (planning on using exceptions to
	abort a merge requested by the policy).

There's always going to be the chance of an exception to a merge. I'm pretty sure of that.
But you're right, if the merge policy isn't in the control path, it would never see them.
They'll be there, but it's out of the path.

	But since you're already doing the work
	to allow a merge to run in the BG without blocking adding of docs,
	flushing, etc, wouldn't this come nearly for free?

I haven't looked at this.

	Well, eg flush() now synchronizes on IndexWriter

Yeah, and making it not is less than straightforward. I've looked at his code a fair amount,
experimented with different ideas, but hadn't gotten all the way to a working model.

You can look at locking segmentInfos but there are many places that segmentInfos is iterated
over that would require locks if the lock on IW wasn't sufficient to guarantee that the iteration
was safe.

I did look at that early on, so maybe my understanding was still too lacking and it's more
feasible than I was thinking ...

> Factor merge policy out of IndexWriter
> --------------------------------------
>                 Key: LUCENE-847
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.patch.txt,
> If we factor the merge policy out of IndexWriter, we can make it pluggable, making it
possible for apps to choose a custom merge policy and for easier experimenting with merge
policy variants.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message