lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roxana Angheluta <rox...@attentio.com>
Subject Re: java on 64 bits
Date Thu, 27 Oct 2005 09:46:56 GMT
Hello everyone!

Here are the conclusions we got after digging more into the problem, 
maybe they help someone:

1) Filling of the hard-drive was not due to java 64, this was 
coincidentally.
2) The intermediate files Yonik talked about (*.f*) were present because 
the indexing process was merging very large segments, which took a while 
to be merged.
3) We are indexing a continous stream of data. As documents get 
out-of-date they are deleted from the index. In order to ensure data 
throughput we use a batch indexing strategy by setting mergeFactor to 
50, but never optimizing. The downside of this is that it will take a 
long time before we reach the point where deleted documents are purged 
when out-of-date segments are merged. This means we end up with large 
segments that contain nothing but deleted documents that could be 
deleted if they weren't included in the segments file.
4) Assuming that frequently merging into a large segment doesn't affect 
the data throughput, then we should probably have implemented the 
strategy as described by Doug Cutting here - scroll down: 
http://www.gossamer-threads.com/lists/lucene/java-user/29350?page=last

Hth,
casper & roxana
> Thanks everyone for the answers!
> I'm experimenting with your suggestions, I will let you know if 
> something interesting pops up.
>
> roxana
>> 1) make sure the failure was due to an OutOfMemory exception and not
>> something else.
>> 2) if you have enough memory, increase the max JVM heap size (-Xmx)
>> 3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM
>> instead (depending on architecture, it can acutally be a little faster
>> because more references fit in the CPU cache).
>> 4) see how many indexed fields you have and if you can consolidate 
>> any of
>> them
>> 4.5) if you don't have too many indexed fields, and have enough spare 
>> file
>> descriptors, try using the non-compound file format instead.
>> 5) run with the latest version of lucene (1.9 dev version) which may 
>> have
>> better memory usage during optimizes & segment merges.
>> 6) If/when optional norms
>> http://issues.apache.org/jira/browse/LUCENE-448
>> makes it into lucene, you can apply it to any indexed fields for 
>> which you
>> don't need index-time boosting or length normalization.
>>
>> As for getting rid of your current intermediate files, I'd rebuild from
>> scratch just to ensure things are OK.
>>
>> -Yonik
>> Now hiring -- http://tinyurl.com/7m67g
>>
>> On 10/21/05, Roxana Angheluta <roxana@attentio.com> wrote:
>>  
>>> Thank you, Yonik, it seems this is the case.
>>> What can we do in this case? Would running the program with java 
>>> -d32 be
>>> a solution?
>>>
>>> Thanks again,
>>> roxana
>>>   
>>>> One possibility: if lucene runs out of memory while adding or 
>>>> optimizing,
>>>>      
>>> it
>>>   
>>>> can leave unused files beind that increase the size of the index. A 64
>>>>      
>>> bit
>>>   
>>>> JVM will require more memory than a 32 bit one due to the size of all
>>>> references being doubled.
>>>>
>>>> If you are using the compound file format (the default - check for 
>>>> .cfs
>>>> files), then it's easy to check if you have this problem by seeing if
>>>>      
>>> there
>>>   
>>>> are any *.f* files in the index directory. These are intermediate 
>>>> files
>>>>      
>>> and
>>>   
>>>> shouldn't exist for long in a compound-file index.
>>>>
>>>> -Yonik
>>>> Now hiring -- http://tinyurl.com/7m67g
>>>>      
>>>    
>>
>>  
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message