cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Cassandra Counters and Replication Factor
Date Wed, 12 Oct 2011 12:48:12 GMT
On Wed, Oct 12, 2011 at 9:37 AM, Amit Chavan <camit90@gmail.com> wrote:
> Hi,
> Looking at this talk
> (http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf)
> by Sylvain Lesbresne at DataStax, I had a few questions related to my
> understanding Cassandra architecture.
> Assuming that we have a keyspace in Cassandra with:
> 1. Replication Factor (RF) = 1.
> 2. "Counters" as a counter column family having "row-key" as a row key which
> has "cnt" as a counter column.
> 3. We always update Counters["row-key"]["cnt"] with a Consistency level of
> ONE.
> My understanding is that in such a case, the updates/second of that counter
> will be limited by the performance of just one node in the cluster. Adding
> new nodes will not increase the rate of update.

Yes, but that's not limited to counters. In Cassandra, if RF=1, all the columns
for a given row key are on one and one machine only. It follows that every query
(write or read, counter or not counter) on that row will be limited by
the performance
of one node of the cluster. That's why the less scalable way to design
your schema
in Cassandra would be to model something with 1 column family and 1 row key
and shoving everything into that single key.

> However if RF was 3 (keeping everything else same), updates/second would
> roughly have been 3 times the current value. Am I correct here?

Probably not 3 times but it should increase. More precisely, let's
first consider
the non-counter case. In that case, if you go from RF=1 to RF=3 and only
consider the performance on one unique key, then you should not see sensible
improvement, because with RF=1, one node was taking all the insert, but at
RF=3, all the 3 nodes are also taking all the inserts (*even* at CL.ONE). So if
all nodes are considered as fast, RF=1 or RF=3 will give you the same
performance
on a single key for non counter updates.

For counters, it's a little bit different. At RF=3, for each inserts,
one node is doing
a write *and* a read, while the two other nodes are only doing a
write. So given that
the read takes a time is non negligible, you should see simple
improvement a RF=3
compared to RF=1 because each node gets 1/3 of the reads (involved in
the counter
write) it would get if it was the only replica. Now if the write time
were negligible
compared to the read time, then yes you would see roughly a 3x
increase. But while
writes are still faster than reads in Cassandra, reads a now fairly
fast too (but all this
depends on other factor like how much the caches helps, etc...), so it
will likely be
less than a 3x increase. Should be noticeable though.

> Moreover, any write operation to a column in a key in the above mentioned
> configuration can scale only if RF increases. Is this inference correct?

The discussion apply to a given row. It doesn't matter in all what's
above if we are
talking of updates to a single column inside our given row or updates
to multiple
columns.

All this being said, the takeaway is probably that Cassandra doesn't really much
scale by augmenting the replication factor or considering a single row
key in isolation,
it scales by adding more node and by considering the overall
throughput on every keys.
The fact that counter write do scale a bit by augmenting the
replication factor is more
of an artifact of the design.

--
Sylvain

> --
> Regards
> Amit S. Chavan
>
>
>
>

Mime
View raw message