lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: Out of memory - CachingWrappperFilter and multiple threads
Date Tue, 19 Feb 2008 20:48:03 GMT
hi Paul, 

>Allocating large blocks while also allocating more smaller
>blocks is a known problem for memory allocators, so adding a
>pool with preallocated blocks sounds like a good idea.

sure, reducing allocation pressure on jvm is always good for performance, always and everywhere.

>Btw. there is some room in SortedVIntList to add interval
>coding. Normally the VInt value 0 cannot occur in the current
>version, and this could be used as a prefix to encode a run of
>set bits.

> I like this! I was just experimenting with 
int[] leftIntervalExtreme 
int[] intervalLength
representation of interval lists, this has one nice feature, you can binary search left intervals
for really fast long skipTo(), but has  somewhat  higher  memory consumption  in case bit
vector gets ugly distributed... SortedVIntList with RLEncoding could prove more robust in
that sense.  
Friend of mine sent me this link, looks very interesting  
http://repositories.cdlib.org/cgi/viewcontent.cgi?article=3104&context=lbnl


 

Regards,
Paul Elschot


Op Tuesday 19 February 2008 12:58:34 schreef eks dev:
> hi Mark,
>
> just out of curiosity, do you know the distribution of set bits  in
> these terms you have tried to cache? maybe this simple tip could
> help.
> If you are lucky like we were, such terms typically used for filters
> are good candidates to be used to sort your index before indexing
> (once in a while) and then with some sort of IntervalDocIdSet you can
> reduce memory requirements dramatically.
>
>
>
> ----- Original Message ----
> From: markharw00d <markharw00d@yahoo.co.uk>
> To: java-dev@lucene.apache.org
> Sent: Tuesday, 19 February, 2008 9:20:02 AM
> Subject: Re: Out of memory - CachingWrappperFilter and multiple
> threads
>
> I now think the main issue here is that a busy JVM gets into trouble
> trying to find large free blocks of memory for large bitsets.
> In my index of 64 million documents, ~8meg of contiguous free memory
> must be found for each bitset allocated. The terms I was trying to
> cache had 14 million entries so the new DocIdSet alternatives for
> bitsets probably fare no better.
>
> The JVM (Sun 1..5) doesn't seem to deal with these allocations well.
> Perhaps there's an obscure JVM option I can set to reserve a section
> of RAM for large allocations.
> However, I wonder if we should help the JVM out a little here by
> having pre-allocated pools of BitsSets/OpenBitSets that can be
> reserved and reused by the application. This would imply a change to
> filter classes so instead of constructing BitSets/OpenBitsets
> directly they get them from a pool instead.
>
> Thoughts?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
>
>
>       __________________________________________________________
> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org






      __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message