Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A78227982 for ; Wed, 12 Oct 2011 12:49:03 +0000 (UTC) Received: (qmail 99303 invoked by uid 500); 12 Oct 2011 12:49:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 99280 invoked by uid 500); 12 Oct 2011 12:49:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99272 invoked by uid 99); 12 Oct 2011 12:49:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2011 12:49:01 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2011 12:48:55 +0000 Received: by gyh20 with SMTP id 20so796348gyh.31 for ; Wed, 12 Oct 2011 05:48:34 -0700 (PDT) Received: by 10.236.124.11 with SMTP id w11mr36550183yhh.130.1318423712056; Wed, 12 Oct 2011 05:48:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.154.133 with HTTP; Wed, 12 Oct 2011 05:48:12 -0700 (PDT) X-Originating-IP: [88.183.33.171] In-Reply-To: References: From: Sylvain Lebresne Date: Wed, 12 Oct 2011 14:48:12 +0200 Message-ID: Subject: Re: Cassandra Counters and Replication Factor To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Wed, Oct 12, 2011 at 9:37 AM, Amit Chavan wrote: > Hi, > Looking at this talk > (http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters= .pdf) > by Sylvain Lesbresne at DataStax, I had a few questions related to my > understanding Cassandra architecture. > Assuming that we have a keyspace in Cassandra with: > 1. Replication Factor (RF) =3D 1. > 2. "Counters" as a counter column family having "row-key" as a row key wh= ich > has "cnt" as a counter column. > 3. We always update Counters["row-key"]["cnt"] with a Consistency level o= f > ONE. > My understanding is that in such a case, the updates/second of that count= er > will be limited by the performance of just one node in the cluster. Addin= g > new nodes will not increase the rate of update. Yes, but that's not limited to counters. In Cassandra, if RF=3D1, all the c= olumns for a given row key are on one and one machine only. It follows that every = query (write or read, counter or not counter) on that row will be limited by the performance of one node of the cluster. That's why the less scalable way to design your schema in Cassandra would be to model something with 1 column family and 1 row key and shoving everything into that single key. > However if RF was 3 (keeping everything else same), updates/second=A0woul= d > roughly have been 3 times the current value. Am I correct here? Probably not 3 times but it should increase. More precisely, let's first consider the non-counter case. In that case, if you go from RF=3D1 to RF=3D3 and onl= y consider the performance on one unique key, then you should not see sensibl= e improvement, because with RF=3D1, one node was taking all the insert, but a= t RF=3D3, all the 3 nodes are also taking all the inserts (*even* at CL.ONE).= So if all nodes are considered as fast, RF=3D1 or RF=3D3 will give you the same performance on a single key for non counter updates. For counters, it's a little bit different. At RF=3D3, for each inserts, one node is doing a write *and* a read, while the two other nodes are only doing a write. So given that the read takes a time is non negligible, you should see simple improvement a RF=3D3 compared to RF=3D1 because each node gets 1/3 of the reads (involved in the counter write) it would get if it was the only replica. Now if the write time were negligible compared to the read time, then yes you would see roughly a 3x increase. But while writes are still faster than reads in Cassandra, reads a now fairly fast too (but all this depends on other factor like how much the caches helps, etc...), so it will likely be less than a 3x increase. Should be noticeable though. > Moreover, any write operation to a column in a key in the above mentioned > configuration can scale only if RF increases. Is this inference correct? The discussion apply to a given row. It doesn't matter in all what's above if we are talking of updates to a single column inside our given row or updates to multiple columns. All this being said, the takeaway is probably that Cassandra doesn't really= much scale by augmenting the replication factor or considering a single row key in isolation, it scales by adding more node and by considering the overall throughput on every keys. The fact that counter write do scale a bit by augmenting the replication factor is more of an artifact of the design. -- Sylvain > -- > Regards > Amit S. Chavan > > > >