lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: Out of memory - CachingWrappperFilter and multiple threads
Date Tue, 19 Feb 2008 20:12:53 GMT
You should probably limit each array segment to 64k or such.

That being said, I am having doubts that this is really the problem.

We perform extensive image processing and use byte[] arrays much  
larger than 8 mb, and lots of them, continually allocating and  
deallocating.

Also, the GC can move objects around to make larger contiguous blocks  
- that is one of the biggest benefits of managed memory !

I think the submitter has other problems in the code... especially  
given the large heap (1 gig) the JVM is allocated with.


On Feb 19, 2008, at 2:01 PM, robert engels wrote:

> You could always use an array of byte[]. Each sub-array will be  
> allocated on its own - making the contiguous need much smaller.
>
> With proper coding the offset calculation is a simple shift - so  
> the performance should be negligible given the other code.
>
> On Feb 19, 2008, at 1:48 PM, Paul Elschot wrote:
>
>>
>> Allocating large blocks while also allocating more smaller
>> blocks is a known problem for memory allocators, so adding a
>> pool with preallocated blocks sounds like a good idea.
>>
>> With 14 million of 64 million bits set, there may not be much
>> room to decrease the memory needed. When the set bits
>> are random, I'd expect it to be practically impossible to compress
>> to less than 55%. When there are long ranges of set bits,
>> things get different, and interval coding can help a lot.
>>
>> Btw. there is some room in SortedVIntList to add interval
>> coding. Normally the VInt value 0 cannot occur in the current
>> version, and this could be used as a prefix to encode a run of
>> set bits.
>>
>> Regards,
>> Paul Elschot
>>
>>
>> Op Tuesday 19 February 2008 12:58:34 schreef eks dev:
>>> hi Mark,
>>>
>>> just out of curiosity, do you know the distribution of set bits  in
>>> these terms you have tried to cache? maybe this simple tip could
>>> help.
>>> If you are lucky like we were, such terms typically used for filters
>>> are good candidates to be used to sort your index before indexing
>>> (once in a while) and then with some sort of IntervalDocIdSet you  
>>> can
>>> reduce memory requirements dramatically.
>>>
>>>
>>>
>>> ----- Original Message ----
>>> From: markharw00d <markharw00d@yahoo.co.uk>
>>> To: java-dev@lucene.apache.org
>>> Sent: Tuesday, 19 February, 2008 9:20:02 AM
>>> Subject: Re: Out of memory - CachingWrappperFilter and multiple
>>> threads
>>>
>>> I now think the main issue here is that a busy JVM gets into trouble
>>> trying to find large free blocks of memory for large bitsets.
>>> In my index of 64 million documents, ~8meg of contiguous free memory
>>> must be found for each bitset allocated. The terms I was trying to
>>> cache had 14 million entries so the new DocIdSet alternatives for
>>> bitsets probably fare no better.
>>>
>>> The JVM (Sun 1..5) doesn't seem to deal with these allocations well.
>>> Perhaps there's an obscure JVM option I can set to reserve a section
>>> of RAM for large allocations.
>>> However, I wonder if we should help the JVM out a little here by
>>> having pre-allocated pools of BitsSets/OpenBitSets that can be
>>> reserved and reused by the application. This would imply a change to
>>> filter classes so instead of constructing BitSets/OpenBitsets
>>> directly they get them from a pool instead.
>>>
>>> Thoughts?
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>       __________________________________________________________
>>> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message