lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Parkes" <>
Subject RE: IndexWriter shutdown
Date Wed, 23 May 2007 01:04:06 GMT
> I'm not certain, but would parts of your goal be achieved by the work
> seen floating arround Jira to refactor th MergePolicy so that it can
> handled by multiple thrads?

Well, in what I've been working on for LUCENE-847 (merge policy
factoring) and LUCENE-870 (concurrent merge policy), what Michael's
talking about really wouldn't be affected.

The way I envision factoring the merge policy, the policy doesn't get
involved in the actual merge itself. It simply defines what merges will
occur. (This makes the merge policy variants very clean and gets them
out of the segment merging which is a bit tricky.) So since Michael is
asking for a way to abort an in-flight merge, the merge policy really
doesn't get involved. (Well, it does a little: the merge policy will in
general generate from the abstract merge or optimize request, a sequence
of individual merges, each generating a new segment, so it could check
between individual merge operations. However, since a single merge
operation of large segments can take a long time, this isn't sufficient
to bound the time.)

I thought about this when the commit/rollback stuff got added to
IndexWriter. At that point, all it would take to get an immediate abort
would be to convince the bottom writer to throw an I/O Exception, which
it looks like is effectively what Michael is talking about, at least for
the FSDirectory case.

So my thoughts:

I think something like what Michael has suggested is a good idea, but I
would be in favor of putting it in the core, rather than making it a
derived thing for a single Directory implementation. Seems to me like
it's a pretty small code change for a very nice thing to have. Doesn't
seem to add much complexity.

As to what happens in the middle of a merge or optimize: I think it
might depend on the autoCommit flag. Since an optimize may be done in
stages, whether the intermediary stages are kept or not is going to
depend on when the segments file gets updated (and I haven't checked the
current status of this.) I can see it either way: keeping partial work
(to resume) or throwing everything away on a shutdown.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message