cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra and count
Date Sun, 30 Jan 2011 08:29:07 GMT
There are two functions on the 0.7 API http://wiki.apache.org/cassandra/API to count the columns
in a row, get_count() and multiget_count() (not listed on the wiki yet). Both of these will
take a SlicePredicate which may have an empty start and end. 

The only way to count rows is to use  get_range_slice(), which will return the columns request.
To reduce bandwidth of the query request it to return a single column.

However the return from these functions is not guaranteed to be correct. Cassandra does not
lock it's internal structures, so while it's busy processing your request other connections
may be adding columns and rows. So that by the time it returns back to you the count if already
wrong. You can apply the same reasoning to why there are no aggregate functions. 

Do you need count the rows as a once off or is it part of your application design ? 

Hope that helps
Aaron

On 29 Jan 2011, at 05:02, Victor Kabdebon wrote:

> Buddasystem is right.
> A count returns columns to the client which count it. My advice : do not count big columns
/ supercolumns. People in the dev team are trying to develop distributed counters but I don't
know the state of this research.
> 
> Best regards,
> Victor Kabdebon
> http://www.voxnucleus.fr
> 
> 2011/1/28 buddhasystem <potekhin@bnl.gov>
> 
> As far as I know, there are no aggregate operations built into Cassandra,
> which means you'll have to retrieve all of the data to count it in the
> client. I had a thread on this topic 2 weeks ago. It's pretty bad.
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
> 


Mime
View raw message