lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Background merge hit exception
Date Mon, 22 Sep 2008 23:02:42 GMT

OK I found one path whereby optimize would detect that the  
ConcurrentMergeScheduler had hit an exception while merging in a BG  
thread, and correctly throw an IOException back to its caller, but  
fail to set the root cause in that exception.  I just committed it, so  
it should be fixed in 2.4:


Michael McCandless wrote:

> vivek sar wrote:
>> Thanks Mike for the insight. I did check the stdout log and found it
>> was complaining of not having enough disk space. I thought we need
>> only x2 of the index size. Our index size is 10G (max) and we had 45G
>> left on that parition - should it still complain of the space?
> Is there a reader open on the index while optimize is running?  That  
> ties up potentially another 1X.
> Are you certain you're closing all previously open readers?
> On Linux, because the semantics is "delete on last close", it's hard  
> to detect when you have IndexReaders still open because an "ls"  
> won't show the deleted files, yet, they are still consuming bytes on  
> disk until the last open file handle is closed.  You can try running  
> "lsof" to see which files are held open, while optimize is running?
> Also, if you can call IndexWriter.setInfoStream(...) for all of the  
> operations below, I can peak at it to try to see why it's using up  
> so much intermediate disk space.
>> Some comments/questions on other issues you raised,
>> We have 2 threads that index the data in two different indexes and
>> then we merge them into a master index with following call,
>>   masterWriter.addIndexesNoOptimize(indices);
>> Once the smaller indices have merged into the master index we delete
>> the smaller indices.
>> This process runs every 5 minutes. Master Index can grow up to 10G
>> before we partition it - move it to other directory and start a new
>> master index.
>> Every hour we then optimize the master index using,
>>      writer.optimize(optimizeSegment);    //where optimizeSegment =  
>> 10
> How long does that optimize take?  And what do you do with the  
> every-5-minutes job while optimize is running?  Do you run it,  
> anyway, sharing the same writer (ie you're calling  
> addIndexesNoOptimize while another thread is running the optimize)?
>> Here are my questions,
>> 1) Is this process flawed in terms of performance and efficiency?  
>> What
>> would you recommend?
> Actually I think your approach is the right approach.
>> 2) When you say "partial optimize" what do you mean by that?
> Actually, it's what you're already doing (passing 10 to optimize).   
> This means the index just has to reduce itself to <= 10 segments,  
> instead of the normal 1 segment for a full optimize.
> Still I find that particular merge being done somewhat odd: it was  
> merging 7 segments, the first of which was immense, and the final 6  
> were tiny.  It's not an efficient merge to do.  Seeing the  
> infoStream output might help explain what led to that...
>> 3) In Lucene 2.3 "segment merging is done in a background thread" -
>> how does it work, ie, how does it know which segments to merge? What
>> would cause this background merge exception?
> The selection of segments to merge, and when, is done by the  
> LogByteSizeMergePolicy, which you can swap out for your own merge  
> policy (should not in general be necessary).  Once a merge is  
> selected, the execution of that merge is controlled by  
> ConcurrentMergeScheduler, which runs merges in background threads.   
> You can also swap that out (eg for SerialMergeScheduler, which uses  
> the FG thread to merging, like Lucene used to before 2.3).
> I think the background merge exception is often disk full, but in  
> general it can be anything that went wrong while merging.  Such  
> exceptions won't corrupt your index because the merge only commits  
> the changes to the index if it completes successfully.
>> 4) Can we turn off "background merge" if I'm running the optimize
>> every hour in any case? How do we turn it off?
> Yes: IndexWriter.setMergeScheduler(new SerialMergeScheduler()) gets  
> you back to the old (fg thread) way of running merges.  But in  
> general this gets you worse net performance, unless you are already  
> using multiple threads when adding documents.
> Mike

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message