Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89D41D812 for ; Mon, 24 Sep 2012 10:52:25 +0000 (UTC) Received: (qmail 7128 invoked by uid 500); 24 Sep 2012 10:52:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 7109 invoked by uid 500); 24 Sep 2012 10:52:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 7088 invoked by uid 99); 24 Sep 2012 10:52:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Sep 2012 10:52:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of roshni_rajagopal@hotmail.com designates 65.55.34.146 as permitted sender) Received: from [65.55.34.146] (HELO col0-omc3-s8.col0.hotmail.com) (65.55.34.146) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Sep 2012 10:52:16 +0000 Received: from COL121-W57 ([65.55.34.136]) by col0-omc3-s8.col0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 24 Sep 2012 03:51:55 -0700 Message-ID: Content-Type: multipart/alternative; boundary="_3c9d89d3-b84a-4783-89c7-837af5617b7f_" X-Originating-IP: [216.207.42.15] From: Roshni Rajagopal To: Subject: Cassandra Counters Date: Mon, 24 Sep 2012 16:21:55 +0530 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 24 Sep 2012 10:51:55.0743 (UTC) FILETIME=[9D1376F0:01CD9A42] X-Virus-Checked: Checked by ClamAV on apache.org --_3c9d89d3-b84a-4783-89c7-837af5617b7f_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi =2C I'm trying to understand if counters are a good fit for my use case.Ive wat= ched http://blip.tv/datastax/counters-in-cassandra-5497678 many times over = now...and still need help! Suppose I have a list of items- to which I can add or delete a set of items= at a time=2C and I want a count of the items=2C without considering chang= ing the database or additional components like zookeeper=2CI have 2 option= s_ the first is a counter col family=2C and the second is a standard one =20 =20 1. List_Counter_CF =20 =20 =20 =20 =20 =20 TotalItems =20 =20 =20 =20 =20 =20 ListId 50 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 2.List_Std_CF =20 =20 =20 =20 =20 =20 =20 =20 TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5 =20 =20 ListId 3 70 -20 3 -6 =20 And in the second I can add a new col with every set of items added or dele= ted. Over time this row may grow wide.To display the final count=2C Id need= to read the row=2C slice through all columns and add them. In both cases the writes should be fast=2C in fact standard col family shou= ld be faster as there's no read=2C before write. And for CL ONE write the l= atency should be same. For reads=2C the first option is very good=2C just r= ead one column for a key For the second=2C the read involves reading the row=2C and adding each colu= mn value via application code. I dont think there's a way to do math via CQ= L yet.There should be not hot spotting=2C if the key is sharded well. I cou= ld even maintain the count derived from the List_Std_CF in a separate colum= n family which is a standard col family with the final number=2C but I coul= d do that as a separate process immediately after the write to List_Std_CF= completes=2C so that its not blocking. I understand cassandra is faster f= or writes than reads=2C but how slow would Reading by row key be...? Is the= re any number around after how many columns the performance starts deterior= ating=2C or how much worse in performance it would be?=20 The advantage I see is that I can use the same consistency rules as for the= rest of column families. If quorum for reads & writes=2C then you get stro= ngly consistent values. In case of counters I see that in case of timeout e= xceptions because the first replica is down or not responding=2C there's a = chance of the values getting messed up=2C and re-trying can mess it up furt= her. Its not idempotent like a standard col family design can be. If it gets messed up=2C it would need administrator's help (is there a a do= cument on how we could resolve counter values going wrong?) I believe the rest of the limitations still hold good- has anything changed= in recent versions? In my opinion=2C they are not as major as the consiste= ncy question.-removing a counter & then modifying value - behaviour is unde= termined-special process for counter col family sstable loss( need to remov= e all files)-no TTL support-no secondary indexes In short=2C I can recommend counters can be used for analytics or while dea= ling with data where the exact numbers are not important=2C orwhen its ok t= o take some time to fix the mismatch=2C and the performance requirements ar= e most important.However where the numbers should match =2C its better to u= se a std column family and a manual implementation. Please share your thoughts on this. Regards=2Croshni = --_3c9d89d3-b84a-4783-89c7-837af5617b7f_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi =2C

I'm trying to understand if counters are a good f= it for my use case.
Ive watched http://blip.tv/datastax/counters-= in-cassandra-5497678 many times over now...
and still need help!<= /div>

Suppose I have a list of items- to which I can add= or delete a set of items at a time=2C  =3Band I want a count of the it= ems=2C without considering changing the database  =3Bor additional comp= onents like zookeeper=2C
I have 2 options_ the first is a counter= col family=2C and the second is a standard one
1. List_Counter_CF
TotalItems
ListId 50
2.List_Std_CF

TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5
ListId 3 70 -20 3 -6

And in the second I can add a new col with every set of items added or d= eleted. Over time this row may grow wide.
To display the f= inal count=2C Id need to read the row=2C slice through all columns and add = them.

In both cases the writes should be fast=2C i= n fact standard col family should be faster as there's no read=2C before wr= ite. And for CL ONE write the latency should be same. =3B
For= reads=2C the first option is very good=2C just read one column for a key

For the second=2C the read involves reading the row= =2C and adding each column value via application code. I dont think there's= a way to do math via CQL yet.
There should be not hot spotting= =2C if the key is sharded well. I could even maintain the count derived fro= m the List_Std_CF in a separate column family which is a standard col famil= y with the final number=2C but I could do that as a separate process  = =3Bimmediately after the write to List_Std_CF completes=2C so that its not = blocking.  =3BI understand cassandra is faster for writes than reads=2C= but how slow would Reading by row key be...? Is there any number around af= ter how many columns the performance starts deteriorating=2C or how much wo= rse in performance it would be? =3B

The advant= age I see is that I can use the same consistency rules as for the rest of c= olumn families. If quorum for reads &=3B writes=2C then you get strongly= consistent values. =3B
In case of counters I see that in cas= e of timeout =3Bexceptions =3Bbecause the first replica is down or = not responding=2C there's a chance of the values getting messed up=2C and r= e-trying can mess it up further. Its not idempotent like a standard col fam= ily design can be.

If it gets messed up=2C it woul= d need administrator's help (is there a a document on how we could resolve = counter values going wrong?)

I believe the rest of= the limitations still hold good- has anything changed in recent versions? = In my opinion=2C they are not as major as the consistency question.
-removing a counter &=3B then modifying value - behaviour is undeterm= ined
-special process for counter col family sstable loss( need t= o remove all files)
-no TTL support
-no secondary index= es


In short=2C I can recommend coun= ters can be used for analytics or while dealing with data where the exact n= umbers are not important=2C or
when its ok to take some time to f= ix the mismatch=2C and the performance requirements are most important.
However where the numbers should= match =2C its better to use a std column family and a manual implementatio= n.

Please share your thoughts on this.

Regards=2C
roshni
 =3B
=
= --_3c9d89d3-b84a-4783-89c7-837af5617b7f_--