lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <>
Subject Re: IndexWriter shutdown
Date Wed, 23 May 2007 04:24:49 GMT
Steven Parkes wrote:
>> I'm not certain, but would parts of your goal be achieved by the work
> i've
>> seen floating arround Jira to refactor th MergePolicy so that it can
> be
>> handled by multiple thrads?
> Well, in what I've been working on for LUCENE-847 (merge policy
> factoring) and LUCENE-870 (concurrent merge policy), what Michael's
> talking about really wouldn't be affected.
> The way I envision factoring the merge policy, the policy doesn't get
> involved in the actual merge itself. It simply defines what merges will
> occur. (This makes the merge policy variants very clean and gets them
> out of the segment merging which is a bit tricky.) So since Michael is
> asking for a way to abort an in-flight merge, the merge policy really
> doesn't get involved. 

Exactly. The merge policy decides *when* to merge. For the shutdown 
feature however we want to be able to stop an ongoing merge.

> (Well, it does a little: the merge policy will in
> general generate from the abstract merge or optimize request, a sequence
> of individual merges, each generating a new segment, so it could check
> between individual merge operations. However, since a single merge
> operation of large segments can take a long time, this isn't sufficient
> to bound the time.)

Yes, we could do this already with the current merge policy in 
IndexWriter, but you are right, a single merge operation can already 
take too long.

> I thought about this when the commit/rollback stuff got added to
> IndexWriter. At that point, all it would take to get an immediate abort
> would be to convince the bottom writer to throw an I/O Exception, which
> it looks like is effectively what Michael is talking about, at least for
> the FSDirectory case.
> So my thoughts:
> I think something like what Michael has suggested is a good idea, but I
> would be in favor of putting it in the core, rather than making it a
> derived thing for a single Directory implementation. Seems to me like
> it's a pretty small code change for a very nice thing to have. Doesn't
> seem to add much complexity.

Okay, it seems that this is a desired feature, so I will go ahead and 
open a Jira issue. I will attach the code that I have so far, even 
though it extends IndexWriter and FSDirectory and lacks test cases.

> As to what happens in the middle of a merge or optimize: I think it
> might depend on the autoCommit flag. 

In either case we have to ensure that the buffered docs get flushed to disk.

> Since an optimize may be done in
> stages, whether the intermediary stages are kept or not is going to
> depend on when the segments file gets updated (and I haven't checked the
> current status of this.) I can see it either way: keeping partial work
> (to resume) or throwing everything away on a shutdown.
Good idea. I'm not too familiar with the new autoCommit code yet. I 
implemented the shutdown code before autoCommit was added. Will look 
into that...

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message