cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Mason <>
Subject Re: Optimal Way to Tune For Searchs For Missing Keys
Date Fri, 10 Jan 2014 18:36:18 GMT
Hi Rob,

It sounds like Cassandra is actually a very good fit with this use case. I
have been experiencing slower performance in my app than I was expecting.
Although I am fairly sure now its something else now rather than this part
of the app. It was just with such a lot of queries I was keen to know how
well Cassandra would handle this particularly going forward.

The application is scanning a much larger data set for a much smaller
subset of items its interested in. The interested in list changes
periodically and whilst I could probably hold the list in memory at the
moment on one host, there's a good chance I may need to shard the scanning
of the big data set so I would rather not limit everything to one node or
have to come up with a way to guarantee the list of interested in items had
been distributed to each node.

Thanks for the info.

Charlie M

On Thu, Jan 9, 2014 at 11:11 PM, Robert Coli <> wrote:

> On Thu, Jan 9, 2014 at 1:42 PM, Charlie Mason <>wrote:
>> There are a lot more reads than writes on this particular table. All of
>> the queries are just for the partition key. Most of the queries are for
>> partition keys that don't exists, more than 99% of the queries.
> Reads for partition keys that don't exist are the fastest reads Cassandra
> can do, because it can usually correctly answer the "exists?" question from
> bloom filters, which are resident in RAM. If the bloom filter says the row
> does not exist, the key cache is not even consulted. You have an extremely
> low number of bloom filter false positives relative to number of reads, so
> it is likely that you cannot further optimize requests to this column
> family.
> (Why is your application asking for so many keys which don't exist?)
> =Rob

View raw message