lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: Out of memory - CachingWrappperFilter and multiple threads
Date Tue, 19 Feb 2008 20:01:11 GMT
You could always use an array of byte[]. Each sub-array will be  
allocated on its own - making the contiguous need much smaller.

With proper coding the offset calculation is a simple shift - so the  
performance should be negligible given the other code.

On Feb 19, 2008, at 1:48 PM, Paul Elschot wrote:

>
> Allocating large blocks while also allocating more smaller
> blocks is a known problem for memory allocators, so adding a
> pool with preallocated blocks sounds like a good idea.
>
> With 14 million of 64 million bits set, there may not be much
> room to decrease the memory needed. When the set bits
> are random, I'd expect it to be practically impossible to compress
> to less than 55%. When there are long ranges of set bits,
> things get different, and interval coding can help a lot.
>
> Btw. there is some room in SortedVIntList to add interval
> coding. Normally the VInt value 0 cannot occur in the current
> version, and this could be used as a prefix to encode a run of
> set bits.
>
> Regards,
> Paul Elschot
>
>
> Op Tuesday 19 February 2008 12:58:34 schreef eks dev:
>> hi Mark,
>>
>> just out of curiosity, do you know the distribution of set bits  in
>> these terms you have tried to cache? maybe this simple tip could
>> help.
>> If you are lucky like we were, such terms typically used for filters
>> are good candidates to be used to sort your index before indexing
>> (once in a while) and then with some sort of IntervalDocIdSet you can
>> reduce memory requirements dramatically.
>>
>>
>>
>> ----- Original Message ----
>> From: markharw00d <markharw00d@yahoo.co.uk>
>> To: java-dev@lucene.apache.org
>> Sent: Tuesday, 19 February, 2008 9:20:02 AM
>> Subject: Re: Out of memory - CachingWrappperFilter and multiple
>> threads
>>
>> I now think the main issue here is that a busy JVM gets into trouble
>> trying to find large free blocks of memory for large bitsets.
>> In my index of 64 million documents, ~8meg of contiguous free memory
>> must be found for each bitset allocated. The terms I was trying to
>> cache had 14 million entries so the new DocIdSet alternatives for
>> bitsets probably fare no better.
>>
>> The JVM (Sun 1..5) doesn't seem to deal with these allocations well.
>> Perhaps there's an obscure JVM option I can set to reserve a section
>> of RAM for large allocations.
>> However, I wonder if we should help the JVM out a little here by
>> having pre-allocated pools of BitsSets/OpenBitSets that can be
>> reserved and reused by the application. This would imply a change to
>> filter classes so instead of constructing BitSets/OpenBitsets
>> directly they get them from a pool instead.
>>
>> Thoughts?
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>>
>>
>>       __________________________________________________________
>> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message