incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Counter consistency - are counters idempotent?
Date Mon, 25 Jul 2011 18:24:43 GMT
On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner <synfinatic@gmail.com> wrote:
> On Sun, Jul 24, 2011 at 3:36 PM, aaron morton <aaron@thelastpickle.com> wrote:
>> What's your use case ? There are people out there having good times with counters,
see
>>
>> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
>> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter
>
> It's actually pretty similar to Twitter's click counting, but
> apparently we have different requirements for accuracy.  It's possible
> Rainbird does something on the front end to solve for this issue- I'm
> honestly not sure since they haven't released the code yet.
>
> Anyways, when you're building network aggregate graphs and fail to add
> the +100G of traffic from one switch to your site or metro aggregate,
> people around here notice.  And people quickly start distrusting
> graphs which don't look "real" and either ignore them completely or
> complain.
>
> Obviously, one should manage their Cassandra cluster to limit the
> occurrence of Timeouts, but frankly I don't want to be paged at 2am to
> "fix" these kind of problems.  If I knew "timeout" meant "failed to
> increment counter", I could spool my changes on the client and try
> again later, but that's not what timeout means.  Without any means to
> recover I've actually lost a lot of reliability that I currently have
> with my single PostgreSQL server backed data store.

Just to make it very clear: *nobody* is arguing this is not a limitation.

The thing is some find counters useful even while perfectly aware of
that limitation and seems to be very productive with it, so we have
added them. Truth is, if you can live with the limitations and manage
the timeout to a bare minimum (hopefully 0), then you won't find much
system that are able to scale counting both in term of number of
counters and number of ops/s on each counter, and that across
datacenters, like Cassandra counters does. And let's recall that
while you don't know what happened on a timeout, you at least know
when those happens, so you can compute the error margin.

Again, this does not mean we don't want to fix the limitations, nor
that we want you to wake up at 2am, and there is actually a ticket
open for that:
https://issues.apache.org/jira/browse/CASSANDRA-2495
The problem is, so far, we haven't found any satisfying solution to
that problem. If someone has a solution, please, please, share!

But yes, counters in their current state don't fit everyone needs
and we certainly don't want to hide it.

--
Sylvain

Mime
View raw message