lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Merging with IndexWriter.addIndexes(...)
Date Thu, 08 Dec 2005 18:39:29 GMT
J.J. Larrea wrote:
> So... I notice that both IndexWriter.addIndexes(...) merge methods start 
> and end with calls to optimize() on the target index.  I'm not sure 
> whether that is causing the unpacking and repacking I observe, but it 
> does wonder whether they truly need to be there:

I don't recall exactly why this was done.  (I should have written a 
comment!)

I think the concerns in addIndexes are that, before the segments file is 
written:
1. The segments must be sorted by size, with small segments on top, in 
order for future incremental merging to work correctly.
2. Segment names must be unique and less than the segment counter, so 
that they will not conflict with future segment names.
3. All segments must be stored in the same directory.

Optimizing before and after was a cheap way to ensure these, although 
this still does not explain why the first optimize is required, only the 
last.  I'm sure there was a reason, but I'm no longer sure that it is valid.

Note that the two addIndexes methods currently use different algorithms. 
  I think the addIndexes(Directory[]) uses a merge algorithm that 
observes mergeFactor, while addIndexes(IndexReader[]) does not, since 
all of the indexes are already open.

An improved algorithm for addIndexes(Directory[]) might be to:

0. Check that none of the Directories are the same as this directory.
1. Don't optimize first;
2. Run the existing algorithm, which combines the added segments until 
they are fewer than mergeFactor.
3. (new) Merge any segments that are not in this directory.  This will 
require first moving them to the top of the stack.
4. Re-sort the stack by size.
5. Don't optimize at end.

I think that should do it.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message