lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Moylan <jo...@rte.ie>
Subject Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents
Date Mon, 13 Sep 2004 10:32:48 GMT
IBM JDK1.4.2 should work fine. AFAIK JDK1.3.1 is usable if you disable JIT.

John

Daniel Taurat wrote:
> Hi Doug,
> you are absolutely right about the older version of the JDK: it is 1.3.1 
> (ibm).
> Unfortunately we cannot upgrade since we are bound to IBM Portalserver 4 
> environment.
> Results:
> I patched the Lucene1.4.1:
> it has improved not much: after indexing 1897 Objects  the number of 
> SegmentTermEnum is up to 17936.
> To be realistic: This is even a deterioration :(((
> My next check will be with a JDK1.4.2 for the test environment, but this 
> can only be a reference run for now.
> 
> Thanks,
> Daniel
> 
> Doug Cutting wrote:
> 
>> It sounds like the ThreadLocal in TermInfosReader is not getting 
>> correctly garbage collected when the TermInfosReader is collected. 
>> Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess 
>> is that you're running in an older JVM.  Is that right?
> 
> 
>>
>> I've attached a patch which should fix this.  Please tell me if it 
>> works for you.
>>
>> Doug
>>
>> Daniel Taurat wrote:
>>
>>> Okay, that (1.4rc3)worked fine, too!
>>> Got only 257 SegmentTermEnums for 1900 objects.
>>>
>>> Now I will go for the final test on the production server with the 
>>> 1.4rc3 version  and about 40.000 objects.
>>>
>>> Daniel
>>>
>>> Daniel Taurat schrieb:
>>>
>>>> Hi all,
>>>> here is some update for you:
>>>> I switched back to Lucene 1.3-final and now the  number of the  
>>>> SegmentTermEnum objects is controlled by gc again:
>>>> it goes up to about 1000 and then it is down again to 254 after 
>>>> indexing my 1900 test-objects.
>>>> Stay tuned, I will try 1.4RC3 now, the last version before 
>>>> FieldCache was introduced...
>>>>
>>>> Daniel
>>>>
>>>>
>>>> Rupinder Singh Mazara schrieb:
>>>>
>>>>> hi all
>>>>>  I had a similar problem, i have  database of documents with 24 
>>>>> fields, and a average content of 7K, with  16M+ records
>>>>>
>>>>>  i had to split the jobs into slabs of 1M each and merging the 
>>>>> resulting indexes, submissions to our job queue looked like
>>>>>
>>>>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>>>>  
>>>>> and i still had outofmemory exception , the solution that i created 
>>>>> was to after every 200K, documents create a temp directory, and 
>>>>> merge them together, this was done to do the first production run, 
>>>>> updates are now being handled incrementally
>>>>>
>>>>>  
>>>>>
>>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>>> at 
>>>>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled

>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined

>>>>> Compiled Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined

>>>>> Compiled Code))
>>>>>     at 
>>>>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled

>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled

>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled

>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled

>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled

>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled

>>>>> Code))
>>>>>     at 
>>>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>>>>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>>>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>>>>
>>>>>  
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>>>>> Sent: 10 September 2004 14:42
>>>>>> To: Lucene Users List
>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large

>>>>>> number
>>>>>> of documents
>>>>>>
>>>>>>
>>>>>> Hi Pete,
>>>>>> good hint, but we actually do have physical memory of  4Gb on the

>>>>>> system. But then: we also have experienced that the gc of ibm 
>>>>>> jdk1.3.1 that we use is sometimes
>>>>>> behaving strangely with too large heap space anyway. (Limit seems

>>>>>> to be 1.2 Gb)
>>>>>> I can say that gc is not collecting these objects since I  forced

>>>>>> gc runs when indexing every now and then (when parsing pdf-type 
>>>>>> objects, that is): No effect.
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>> Pete Lewis wrote:
>>>>>>
>>>>>>  
>>>>>>
>>>>>>> Hi all
>>>>>>>
>>>>>>> Reading the thread with interest, there is another way I've come
    
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> across out
>>>>>>  
>>>>>>
>>>>>>> of memory errors when indexing large batches of documents.
>>>>>>>
>>>>>>> If you have your heap space settings too high, then you get 
   
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> swapping (which
>>>>>>  
>>>>>>
>>>>>>> impacts performance) plus you never reach the trigger for garbage
>>>>>>> collection, hence you don't garbage collect and hence you run

>>>>>>> out     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> of memory.
>>>>>>  
>>>>>>
>>>>>>> Can you check whether or not your garbage collection is being

>>>>>>> triggered?
>>>>>>>
>>>>>>> Anomalously therefore if this is the case, by reducing the heap

>>>>>>> space you
>>>>>>> can improve performance get rid of the out of memory errors.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Pete Lewis
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Daniel Taurat" 
>>>>>>> <daniel.taurat@gaussvip.com>
>>>>>>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing 
>>>>>>> large     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> number of
>>>>>>  
>>>>>>
>>>>>>> documents
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>> Daniel Aber schrieb:
>>>>>>>>
>>>>>>>>  
>>>>>>>>    
>>>>>>>>
>>>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         
>>>>>>>>>
>>>>>>>>>> I am facing an out of memory problem using  Lucene
1.4.1.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you try with a recent CVS version? There has been
a fix 
>>>>>>>>>         
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> about files
>>>>>>  
>>>>>>
>>>>>>>>> not being deleted after 1.4.1. Not sure if that could
cause the 
>>>>>>>>> problems
>>>>>>>>> you're experiencing.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Daniel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>            
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Well, it seems not to be files, it looks more like those

>>>>>>>> SegmentTermEnum
>>>>>>>> objects accumulating in memory.
>>>>>>>> #I've seen some discussion on these objects in the 
>>>>>>>> developer-newsgroup
>>>>>>>> that had taken place some time ago.
>>>>>>>> I am afraid this is some kind of runaway caching I have to
deal 
>>>>>>>> with.
>>>>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>>>>
>>>>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------

>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>>> For additional commands, e-mail: 
>>>>>>>> lucene-user-help@jakarta.apache.org
>>>>>>>>
>>>>>>>>  
>>>>>>>>       
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------

>>>>>>>
>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>   
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>  
>>>>>
>>>>
>>>>
>>>
>>>
>> ------------------------------------------------------------------------
>>
>> Index: src/java/org/apache/lucene/index/TermInfosReader.java
>> ===================================================================
>> RCS file: 
>> /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v

>>
>> retrieving revision 1.9
>> diff -u -r1.9 TermInfosReader.java
>> --- src/java/org/apache/lucene/index/TermInfosReader.java    6 Aug 
>> 2004 20:50:29 -0000    1.9
>> +++ src/java/org/apache/lucene/index/TermInfosReader.java    10 Sep 
>> 2004 17:46:47 -0000
>> @@ -45,6 +45,11 @@
>>     readIndex();
>>   }
>>
>> +  protected final void finalize() {
>> +    // patch for pre-1.4.2 JVMs, whose ThreadLocals leak
>> +    enumerators.set(null);
>> +  }
>> +
>>   public int getSkipInterval() {
>>     return origEnum.skipInterval;
>>   }
>>
>>  
>>
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


******************************************************************************
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails to, from and within RTÉ may be subject to the Freedom
of Information Act 1997 and may be liable to disclosure.
******************************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message