lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Index Merging Space Requirements
Date Thu, 13 Mar 2008 14:46:41 GMT

Well ... yes and no?

Yes, the Log*MergePolicy will still at certain times merge the index  
all the way down to one segment.  If mergeFactor is 10 then this will  
happen every "power of 10" flushed segments.  Ie, after 10 flushes a  
merge will merge them down to 1 segment, then after 100 flushes as  
well, 1000 flushes, etc.  These are done in the background with  
ConcurrentMergeScheduler.  I don't really like this quality of the  
merge policy ... it's sort of a "pay it forward" approach.  You are  
paying in advance for the expectation that the index is going to keep  
getting larger.

However, this merging does respect maxMergeMB, so if you've set that,  
then it will not merge down to 1 segment once you have segments over  
that size.  So in this aspect it's different from a real call to  


Mark Miller wrote:

> Thanks a lot more question:
> I remember reading that a regular addDocument call could basically  
> trigger an optimize on a given call. Is this true? Maybe not true  
> anymore?
> It doesnt sound right to me, but I do remember reading about it.  
> This was pre background merging when it was mentioned the  
> addDocument call could take a long time to return if basically the  
> equiv of an optmize was triggered.
> Could you clear this up for me?
> Thanks
> Mark
> Michael McCandless wrote:
>> Yes this should reduce transient (while merging) disk usage.   
>> However, optimize disregards this parameter, so it will still use  
>> the same disk space. However, if you call optimize(N) then that  
>> should use less space since it does not merge all the way down to  
>> 1 segment.
>> Note that the limit applies to segments-to-be-merged not to the  
>> final merged segment size.  Ie, any segment > maxMergeMB will  
>> never be merged, but at any given time you can easily have  
>> segments quite a bit larger than maxMergeMB.
>> Mike
>> Mark Miller wrote:
>>> If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down  
>>> on the space needed for optimize/merge? My thought is, if a  
>>> segment is maxed out, it will never need to be copied for a merge  
>>> right? So you could significantly reduce merge/optimize space  
>>> requirments (now at like 2x-4x if readers can still open)?
>>> - Mark
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message