incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshni Rajagopal <roshni_rajago...@hotmail.com>
Subject RE: Cassandra Counters
Date Mon, 24 Sep 2012 17:27:34 GMT

Hi folks,
   I looked at my mail below, and Im rambling a bit, so Ill try to re-state my queries pointwise.

a) what are the performance tradeoffs on reads & writes between creating a standard column
family and manually doing the counts by a lookup on a key, versus using counters. 
b) whats the current state of counters limitations in the latest version of apache cassandra?
c) with there being a possibilty of counter values getting out of sync, would counters not
be recommended where strong consistency is desired. The normal benefits of cassandra's tunable
consistency would not be applicable, as re-tries may cause overstating. So the normal use
case is high performance, and where consistency is not paramount.
Regards,roshni


From: roshni_rajagopal@hotmail.com
To: user@cassandra.apache.org
Subject: Cassandra Counters
Date: Mon, 24 Sep 2012 16:21:55 +0530





Hi ,
I'm trying to understand if counters are a good fit for my use case.Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678
many times over now...and still need help!
Suppose I have a list of items- to which I can add or delete a set of items at a time,  and
I want a count of the items, without considering changing the database  or additional components
like zookeeper,I have 2 options_ the first is a counter col family, and the second is a standard
one











 
 
  1. List_Counter_CF
  
  
  
 
 
  
  TotalItems
  
  
  
  
 
 
  ListId
  50
  
  
  
  
 
 
  
  
  
  
  
  
 
 
  2.List_Std_CF


  
  
  
  
  
 
 
  
  TimeUUID1
  TimeUUID2
  TimeUUID3
  TimeUUID4
  TimeUUID5
 
 
  ListId
  3
  70
  -20
  3
  -6
 


And in the second I can add a new col with every set of items added or deleted. Over time
this row may grow wide.To display the final count, Id need to read the row, slice through
all columns and add them.
In both cases the writes should be fast, in fact standard col family should be faster as there's
no read, before write. And for CL ONE write the latency should be same. For reads, the first
option is very good, just read one column for a key
For the second, the read involves reading the row, and adding each column value via application
code. I dont think there's a way to do math via CQL yet.There should be not hot spotting,
if the key is sharded well. I could even maintain the count derived from the List_Std_CF in
a separate column family which is a standard col family with the final number, but I could
do that as a separate process  immediately after the write to List_Std_CF completes, so that
its not blocking.  I understand cassandra is faster for writes than reads, but how slow would
Reading by row key be...? Is there any number around after how many columns the performance
starts deteriorating, or how much worse in performance it would be? 
The advantage I see is that I can use the same consistency rules as for the rest of column
families. If quorum for reads & writes, then you get strongly consistent values. In case
of counters I see that in case of timeout exceptions because the first replica is down or
not responding, there's a chance of the values getting messed up, and re-trying can mess it
up further. Its not idempotent like a standard col family design can be.
If it gets messed up, it would need administrator's help (is there a a document on how we
could resolve counter values going wrong?)
I believe the rest of the limitations still hold good- has anything changed in recent versions?
In my opinion, they are not as major as the consistency question.-removing a counter &
then modifying value - behaviour is undetermined-special process for counter col family sstable
loss( need to remove all files)-no TTL support-no secondary indexes

In short, I can recommend counters can be used for analytics or while dealing with data where
the exact numbers are not important, orwhen its ok to take some time to fix the mismatch,
and the performance requirements are most important.However where the numbers should match
, its better to use a std column family and a manual implementation.
Please share your thoughts on this.
Regards,roshni  		 	   		   		 	   		  
Mime
View raw message