lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael van Rooyen <mich...@loot.co.za>
Subject Re: Lucene 4.4.0 mergeSegments OutOfMemoryError
Date Thu, 26 Sep 2013 10:25:56 GMT
Yes, it happens as part of the early morning optimize, and yes, it's a 
forceMerge(1) which I've disabled for now.

I haven't looked at the persistence mechanism for Lucene since 2.x, but 
if I remember correctly, the deleted documents would stay in an index 
segment until that segment was eventually merged.  Without forcing a 
merge (optimize in old versions), the footprint on disk could be a 
multiple of the actual space required for the live documents, and this 
would have an impact on performance (the deleted documents would clutter 
the buffer cache).

Is this still the case?  I would have thought it good practice to force 
the dead space out of an index periodically, but if the underlying 
storage mechanism has changed and the current index files are more 
efficient at housekeeping, this may no longer be necessary.

If someone could shed a little light on best practice for indexes where 
documents are frequently updated (i.e. deleted and re-added), that would 
be great.

Michael.


On 2013/09/26 11:43 AM, Ian Lea wrote:
> Is this OOM happening as part of your early morning optimize or at
> some other point?  By optimize do you mean IndexWriter.forceMerge(1)?
> You really shouldn't have to use that. If the index grows forever
> without it then something else is going on which you might wish to
> report separately.
>
>
> --
> Ian.
>
>
> On Wed, Sep 25, 2013 at 12:35 PM, Michael van Rooyen <michael@loot.co.za> wrote:
>> We've recently upgraded to Lucene 4.4.0 and mergeSegments now causes an OOM
>> error.
>>
>> As background, our index contains about 14 million documents (growing
>> slowly) and we process about 1 million updates per day. It's about 8GB on
>> disk.  I'm not sure if the Lucene segments merge the way they used to in the
>> early versions, but we've always optimized at 3am to get rid of dead space
>> in the index, or otherwise it grows forever.
>>
>> The mergeSegments was working under 4.3.1 but the index has grown somewhat
>> on disk since then, probably due to a couple of added NumericDocValues
>> fields.  The java process is assigned about 3GB (the maximum, as it's
>> running on a 32 bit i686 Linux box), and it still goes OOM.
>>
>> Any advice as to the possible cause and how to circumvent it would be great.
>> Here's the stack trace:
>>
>> org.apache.lucene.index.MergePolicy$MergeException:
>> java.lang.OutOfMemoryError: Java heap space
>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:212)
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:174)
>> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:253)
>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:215)
>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>
>>
>> Thanks,
>> Michael.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message