Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Received-SPF: pass (hermes.apache.org: local policy)
Message-ID: <4141BCA4.8030605@gaussvip.com>
Date: Fri, 10 Sep 2004 16:39:32 +0200
From: "Daniel Taurat" <daniel.taurat@gaussvip.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-AT;
 rv:1.7.2) Gecko/20040803
MIME-Version: 1.0
To: Lucene Users List <lucene-user@jakarta.apache.org>
Subject: Re: Out of memory in lucene 1.4.1 when re-indexing  large number
 of documents
References: <CFEHIPOIEGKAJJPJENCEMEFECDAA.rsmazara@ebi.ac.uk>
In-Reply-To: <CFEHIPOIEGKAJJPJENCEMEFECDAA.rsmazara@ebi.ac.uk>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Hi all,
here is some update for you:
I switched back to Lucene 1.3-final and now the  number of the  
SegmentTermEnum objects is controlled by gc again:
it goes up to about 1000 and then it is down again to 254 after indexing 
my 1900 test-objects.
Stay tuned, I will try 1.4RC3 now, the last version before FieldCache 
was introduced...

Daniel


Rupinder Singh Mazara schrieb:

>hi all 
>
>  I had a similar problem, i have  database of documents with 24 fields, and a average content of 7K, with  16M+ records
>
>  i had to split the jobs into slabs of 1M each and merging the resulting indexes, submissions to our job queue looked like
> 
>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>  
> and i still had outofmemory exception , the solution that i created was to after every 200K, documents create a temp directory, and merge them together, this was done to do the first production run, updates are now being handled incrementally
> 
>  
>
>Exception in thread "main" java.lang.OutOfMemoryError
>at org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled Code))
>	at org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined Compiled Code))
>	at org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined Compiled Code))
>	at org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled Code))
>	at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled Code))
>	at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled Code))
>	at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled Code))
>	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled Code))
>	at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled Code))
>	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>	at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>	at lucene.Indexer.main(CDBIndexer.java:168)
>
>  
>
>>-----Original Message-----
>>From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>Sent: 10 September 2004 14:42
>>To: Lucene Users List
>>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large number
>>of documents
>>
>>
>>Hi Pete,
>>good hint, but we actually do have physical memory of  4Gb on the 
>>system. But then: we also have experienced that the gc of ibm jdk1.3.1 
>>that we use is sometimes
>>behaving strangely with too large heap space anyway. (Limit seems to be 
>>1.2 Gb)
>>I can say that gc is not collecting these objects since I  forced gc 
>>runs when indexing every now and then (when parsing pdf-type objects, 
>>that is): No effect.
>>
>>regards,
>>
>>Daniel
>>
>>
>>Pete Lewis wrote:
>>
>>    
>>
>>>Hi all
>>>
>>>Reading the thread with interest, there is another way I've come 
>>>      
>>>
>>across out
>>    
>>
>>>of memory errors when indexing large batches of documents.
>>>
>>>If you have your heap space settings too high, then you get 
>>>      
>>>
>>swapping (which
>>    
>>
>>>impacts performance) plus you never reach the trigger for garbage
>>>collection, hence you don't garbage collect and hence you run out 
>>>      
>>>
>>of memory.
>>    
>>
>>>Can you check whether or not your garbage collection is being triggered?
>>>
>>>Anomalously therefore if this is the case, by reducing the heap space you
>>>can improve performance get rid of the out of memory errors.
>>>
>>>Cheers
>>>Pete Lewis
>>>
>>>----- Original Message ----- 
>>>From: "Daniel Taurat" <daniel.taurat@gaussvip.com>
>>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>Sent: Friday, September 10, 2004 1:10 PM
>>>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large 
>>>      
>>>
>>number of
>>    
>>
>>>documents
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>Daniel Aber schrieb:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>Could you try with a recent CVS version? There has been a fix 
>>>>>          
>>>>>
>>about files
>>    
>>
>>>>>not being deleted after 1.4.1. Not sure if that could cause the problems
>>>>>you're experiencing.
>>>>>
>>>>>Regards
>>>>>Daniel
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>Well, it seems not to be files, it looks more like those SegmentTermEnum
>>>>objects accumulating in memory.
>>>>#I've seen some discussion on these objects in the developer-newsgroup
>>>>that had taken place some time ago.
>>>>I am afraid this is some kind of runaway caching I have to deal with.
>>>>Maybe not  correctly addressed in this newsgroup, after all...
>>>>
>>>>Anyway: any idea if there is an API command to re-init caches?
>>>>
>>>>Thanks,
>>>>
>>>>Daniel
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>> 
>>>
>>>      
>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>  
>


-- 
Mit freundlichen Grüßen

    Dr. Daniel Taurat

    Senior Consultant
-- 
VIP ENTERPRISE 8 | THE POWER OF CONTENT AT WORK
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Gauss Interprise AG      Phone:  +49-40-3250-1508
Weidestr. 120 a          Mobile: +49-173-2418472
D- 22083 Hamburg         Fax:    +49-40-3250-191508
Germany                  E-Mail: daniel.taurat@gaussvip.com
                         Web:    http://www.gaussvip.com
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org