cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mick Semb Wever <...@apache.org>
Subject Re: OOM opening bloom filter
Date Sun, 11 Mar 2012 23:44:00 GMT
On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote:
> Are you doing RF=1? 

That is correct. So are you calculations then :-)


> > very small, <1k. Data from this cf is only read via hadoop jobs in batch
> > reads of 16k rows at a time.
> [snip]
> > It's my understanding then for this use case that bloom filters are of
> > little importance and that i can
> 
> Depends. I'm not familiar enough with how the hadoop integration works
> so someone else will have to comment, but if your hadoop jobs are just
> performan normal reads of keys via thrift and the keys they are
> grabbing are not in token order, those reads would be effectively
> random and bloom filters should still be highly relevant to the amount
> of I/O operations you need to perform. 

They are thrift get_range_slice reads of 16k rows per request.
Hadoop reads are based on tokens, but in my use case the keys are also
ordered and this cluster is using BOP.

~mck

-- 
"Living on Earth is expensive, but it does include a free trip around
the sun every year." Unknown 

| http://github.com/finn-no | http://tech.finn.no |

Mime
View raw message