Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of roshni_rajagopal@hotmail.com
 designates 65.55.34.146 as permitted sender)
Message-ID: <COL121-W579388C87029DF6AB7B412FC9E0@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_3c9d89d3-b84a-4783-89c7-837af5617b7f_"
From: Roshni Rajagopal <roshni_rajagopal@hotmail.com>
To: <user@cassandra.apache.org>
Subject: Cassandra Counters
Date: Mon, 24 Sep 2012 16:21:55 +0530
Importance: Normal
MIME-Version: 1.0

--_3c9d89d3-b84a-4783-89c7-837af5617b7f_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Hi =2C
I'm trying to understand if counters are a good fit for my use case.Ive wat=
ched http://blip.tv/datastax/counters-in-cassandra-5497678 many times over =
now...and still need help!
Suppose I have a list of items- to which I can add or delete a set of items=
 at a time=2C  and I want a count of the items=2C without considering chang=
ing the database  or additional components like zookeeper=2CI have 2 option=
s_ the first is a counter col family=2C and the second is a standard one


=20
=20
  1. List_Counter_CF
 =20
 =20
 =20
=20
=20
 =20
  TotalItems
 =20
 =20
 =20
 =20
=20
=20
  ListId
  50
 =20
 =20
 =20
 =20
=20
=20
 =20
 =20
 =20
 =20
 =20
 =20
=20
=20
  2.List_Std_CF


 =20
 =20
 =20
 =20
 =20
=20
=20
 =20
  TimeUUID1
  TimeUUID2
  TimeUUID3
  TimeUUID4
  TimeUUID5
=20
=20
  ListId
  3
  70
  -20
  3
  -6
=20


And in the second I can add a new col with every set of items added or dele=
ted. Over time this row may grow wide.To display the final count=2C Id need=
 to read the row=2C slice through all columns and add them.
In both cases the writes should be fast=2C in fact standard col family shou=
ld be faster as there's no read=2C before write. And for CL ONE write the l=
atency should be same. For reads=2C the first option is very good=2C just r=
ead one column for a key
For the second=2C the read involves reading the row=2C and adding each colu=
mn value via application code. I dont think there's a way to do math via CQ=
L yet.There should be not hot spotting=2C if the key is sharded well. I cou=
ld even maintain the count derived from the List_Std_CF in a separate colum=
n family which is a standard col family with the final number=2C but I coul=
d do that as a separate process  immediately after the write to List_Std_CF=
 completes=2C so that its not blocking.  I understand cassandra is faster f=
or writes than reads=2C but how slow would Reading by row key be...? Is the=
re any number around after how many columns the performance starts deterior=
ating=2C or how much worse in performance it would be?=20
The advantage I see is that I can use the same consistency rules as for the=
 rest of column families. If quorum for reads & writes=2C then you get stro=
ngly consistent values. In case of counters I see that in case of timeout e=
xceptions because the first replica is down or not responding=2C there's a =
chance of the values getting messed up=2C and re-trying can mess it up furt=
her. Its not idempotent like a standard col family design can be.
If it gets messed up=2C it would need administrator's help (is there a a do=
cument on how we could resolve counter values going wrong?)
I believe the rest of the limitations still hold good- has anything changed=
 in recent versions? In my opinion=2C they are not as major as the consiste=
ncy question.-removing a counter & then modifying value - behaviour is unde=
termined-special process for counter col family sstable loss( need to remov=
e all files)-no TTL support-no secondary indexes

In short=2C I can recommend counters can be used for analytics or while dea=
ling with data where the exact numbers are not important=2C orwhen its ok t=
o take some time to fix the mismatch=2C and the performance requirements ar=
e most important.However where the numbers should match =2C its better to u=
se a std column family and a manual implementation.
Please share your thoughts on this.
Regards=2Croshni  		 	   		  =

--_3c9d89d3-b84a-4783-89c7-837af5617b7f_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 10pt=3B
font-family:Tahoma
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>
Hi =2C<div><br></div><div>I'm trying to understand if counters are a good f=
it for my use case.</div><div>Ive watched http://blip.tv/datastax/counters-=
in-cassandra-5497678 many times over now...</div><div>and still need help!<=
/div><div><br></div><div>Suppose I have a list of items- to which I can add=
 or delete a set of items at a time=2C &nbsp=3Band I want a count of the it=
ems=2C without considering changing the database &nbsp=3Bor additional comp=
onents like zookeeper=2C</div><div>I have 2 options_ the first is a counter=
 col family=2C and the second is a standard one</div><div>


<table border=3D"0" cellpadding=3D"0" cellspacing=3D"0" width=3D"390" style=
=3D"border-collapse:
 collapse=3Bwidth:390pt">
<!--StartFragment-->
 <colgroup><col width=3D"65" span=3D"6" style=3D"width:65pt">
 </colgroup><tbody><tr height=3D"15" style=3D"height:15.0pt">
  <td height=3D"15" colspan=3D"2" width=3D"130" style=3D"height:15.0pt=3Bms=
o-ignore:colspan=3B
  width:130pt">1. List_Counter_CF</td><td width=3D"65" style=3D"width:65pt"=
></td>
  <td width=3D"65" style=3D"width:65pt"></td>
  <td width=3D"65" style=3D"width:65pt"></td>
  <td width=3D"65" style=3D"width:65pt"></td>
 </tr>
 <tr height=3D"15" style=3D"height:15.0pt">
  <td height=3D"15" style=3D"height:15.0pt"></td>
  <td>TotalItems</td>
  <td></td>
  <td></td>
  <td></td>
  <td></td>
 </tr>
 <tr height=3D"15" style=3D"height:15.0pt">
  <td height=3D"15" style=3D"height:15.0pt">ListId</td>
  <td align=3D"right">50</td>
  <td></td>
  <td></td>
  <td></td>
  <td></td>
 </tr>
 <tr height=3D"15" style=3D"height:15.0pt">
  <td height=3D"15" style=3D"height:15.0pt"></td>
  <td></td>
  <td></td>
  <td></td>
  <td></td>
  <td></td>
 </tr>
 <tr height=3D"15" style=3D"height:15.0pt">
  <td height=3D"15" style=3D"height:15.0pt">2.List_Std_CF<br><br></td>
  <td></td>
  <td></td>
  <td></td>
  <td></td>
  <td></td>
 </tr>
 <tr height=3D"15" style=3D"height:15.0pt">
  <td height=3D"15" style=3D"height:15.0pt"></td>
  <td>TimeUUID1</td>
  <td>TimeUUID2</td>
  <td>TimeUUID3</td>
  <td>TimeUUID4</td>
  <td>TimeUUID5</td>
 </tr>
 <tr height=3D"15" style=3D"height:15.0pt">
  <td height=3D"15" style=3D"height:15.0pt">ListId</td>
  <td align=3D"right">3</td>
  <td align=3D"right">70</td>
  <td align=3D"right">-20</td>
  <td align=3D"right">3</td>
  <td align=3D"right">-6</td>
 </tr>
<!--EndFragment-->
</tbody></table></div><div><br></div><div><span style=3D"font-size: 10pt=3B=
 ">And in the second I can add a new col with every set of items added or d=
eleted. Over time this row may grow wide.</span></div><div>To display the f=
inal count=2C Id need to read the row=2C slice through all columns and add =
them.</div><div><br></div><div>In both cases the writes should be fast=2C i=
n fact standard col family should be faster as there's no read=2C before wr=
ite. And for CL ONE write the latency should be same.&nbsp=3B</div><div>For=
 reads=2C the first option is very good=2C just read one column for a key</=
div><div><br></div><div>For the second=2C the read involves reading the row=
=2C and adding each column value via application code. I dont think there's=
 a way to do math via CQL yet.</div><div>There should be not hot spotting=
=2C if the key is sharded well. I could even maintain the count derived fro=
m the List_Std_CF in a separate column family which is a standard col famil=
y with the final number=2C but I could do that as a separate process &nbsp=
=3Bimmediately after the write to List_Std_CF completes=2C so that its not =
blocking. &nbsp=3BI understand cassandra is faster for writes than reads=2C=
 but how slow would Reading by row key be...? Is there any number around af=
ter how many columns the performance starts deteriorating=2C or how much wo=
rse in performance it would be?&nbsp=3B</div><div><br></div><div>The advant=
age I see is that I can use the same consistency rules as for the rest of c=
olumn families. If quorum for reads &amp=3B writes=2C then you get strongly=
 consistent values.&nbsp=3B</div><div>In case of counters I see that in cas=
e of timeout&nbsp=3Bexceptions&nbsp=3Bbecause the first replica is down or =
not responding=2C there's a chance of the values getting messed up=2C and r=
e-trying can mess it up further. Its not idempotent like a standard col fam=
ily design can be.</div><div><br></div><div>If it gets messed up=2C it woul=
d need administrator's help (is there a a document on how we could resolve =
counter values going wrong?)</div><div><br></div><div>I believe the rest of=
 the limitations still hold good- has anything changed in recent versions? =
In my opinion=2C they are not as major as the consistency question.</div><d=
iv>-removing a counter &amp=3B then modifying value - behaviour is undeterm=
ined</div><div>-special process for counter col family sstable loss( need t=
o remove all files)</div><div>-no TTL support</div><div>-no secondary index=
es</div><div><br></div><div><br></div><div>In short=2C I can recommend coun=
ters can be used for analytics or while dealing with data where the exact n=
umbers are not important=2C or</div><div>when its ok to take some time to f=
ix the mismatch=2C and the performance requirements are most important.</di=
v><div><span style=3D"font-size: 10pt=3B ">However where the numbers should=
 match =2C its better to use a std column family and a manual implementatio=
n.</span></div><div><br></div><div>Please share your thoughts on this.</div=
><div><br></div><div>Regards=2C</div><div>roshni</div><div>&nbsp=3B</div> 	=
	 	   		  </div></body>
</html>=

--_3c9d89d3-b84a-4783-89c7-837af5617b7f_--