Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C663E6C18 for ; Mon, 25 Jul 2011 18:25:33 +0000 (UTC) Received: (qmail 23766 invoked by uid 500); 25 Jul 2011 18:25:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23611 invoked by uid 500); 25 Jul 2011 18:25:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23603 invoked by uid 99); 25 Jul 2011 18:25:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jul 2011 18:25:30 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sylvain@datastax.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jul 2011 18:25:24 +0000 Received: by ywp31 with SMTP id 31so2800000ywp.31 for ; Mon, 25 Jul 2011 11:25:03 -0700 (PDT) Received: by 10.236.185.161 with SMTP id u21mr6465313yhm.9.1311618303097; Mon, 25 Jul 2011 11:25:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.108.177 with HTTP; Mon, 25 Jul 2011 11:24:43 -0700 (PDT) X-Originating-IP: [88.183.33.171] In-Reply-To: References: <4E29A50C.9080903@ihep.ac.cn> From: Sylvain Lebresne Date: Mon, 25 Jul 2011 20:24:43 +0200 Message-ID: Subject: Re: Counter consistency - are counters idempotent? To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner wrote: > On Sun, Jul 24, 2011 at 3:36 PM, aaron morton w= rote: >> What's your use case ? There are people out there having good times with= counters, see >> >> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitt= er-strata-2011 >> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter > > It's actually pretty similar to Twitter's click counting, but > apparently we have different requirements for accuracy. =A0It's possible > Rainbird does something on the front end to solve for this issue- I'm > honestly not sure since they haven't released the code yet. > > Anyways, when you're building network aggregate graphs and fail to add > the +100G of traffic from one switch to your site or metro aggregate, > people around here notice. =A0And people quickly start distrusting > graphs which don't look "real" and either ignore them completely or > complain. > > Obviously, one should manage their Cassandra cluster to limit the > occurrence of Timeouts, but frankly I don't want to be paged at 2am to > "fix" these kind of problems. =A0If I knew "timeout" meant "failed to > increment counter", I could spool my changes on the client and try > again later, but that's not what timeout means. =A0Without any means to > recover I've actually lost a lot of reliability that I currently have > with my single PostgreSQL server backed data store. Just to make it very clear: *nobody* is arguing this is not a limitation. The thing is some find counters useful even while perfectly aware of that limitation and seems to be very productive with it, so we have added them. Truth is, if you can live with the limitations and manage the timeout to a bare minimum (hopefully 0), then you won't find much system that are able to scale counting both in term of number of counters and number of ops/s on each counter, and that across datacenters, like Cassandra counters does. And let's recall that while you don't know what happened on a timeout, you at least know when those happens, so you can compute the error margin. Again, this does not mean we don't want to fix the limitations, nor that we want you to wake up at 2am, and there is actually a ticket open for that: https://issues.apache.org/jira/browse/CASSANDRA-2495 The problem is, so far, we haven't found any satisfying solution to that problem. If someone has a solution, please, please, share! But yes, counters in their current state don't fit everyone needs and we certainly don't want to hide it. -- Sylvain