cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Low <r...@acunu.com>
Subject Re: Misc Performance Questions
Date Wed, 08 Jun 2011 12:40:55 GMT
On Wed, Jun 8, 2011 at 12:30 PM, AJ <aj@dude.podzone.net> wrote:

>> There is however a difference in running multiple column families
>> versus putting everything in the same column family and separating
>> them with e.g. a key prefix.  E.g. if you have a large data set and a
>> small one, it will be quicker to query the small one if it is in its
>> own column family.
>>
>
> I assumed that a read would be O(1) for any size CF since Cass is
> implemented with hashmaps.  Do you know why size matters?  (forgive the pun)
>

You may not notice a difference, but it can happen.

For a query, each SSTable is queried.  If there is more data then
there are (most likely) more SSTables to query, slowing it down.  For
point queries, this isn't so bad because the Bloom filters will help,
but for range queries you will notice a big difference.  You will have
to do more seeks to seek over unwanted data.

It will also help buffer caching to separate them - the small SSTables
are more likely to remain in cache.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu

Mime
View raw message