Return-Path: Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: (qmail 54809 invoked from network); 2 Sep 2010 20:56:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Sep 2010 20:56:43 -0000 Received: (qmail 75423 invoked by uid 500); 2 Sep 2010 20:56:43 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 75409 invoked by uid 500); 2 Sep 2010 20:56:42 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 75401 invoked by uid 99); 2 Sep 2010 20:56:42 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Sep 2010 20:56:42 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Sep 2010 20:56:21 +0000 Received: by vws10 with SMTP id 10so726637vws.31 for ; Thu, 02 Sep 2010 13:56:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=hqVYIKFmLdEpJPyMYt7D0i7+knab8chj26otrge6qOk=; b=UZoG+kje/iDUOCssG166r/pggBQZOb8Hs5Qvy0BBIfEVZbx1fwShWWALvkD8RHg55F 4QvTAtnN0CIpJqT+sYttctPfCqpsG57fpQVZJMvrhgDSUj36jYtccnVZqAkiJk134r2I V2p0uFXRxWzMXCe/Xu+DJya1AchIShNf8qmgw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Cip+Me30PFxvUWfhE+1AVNT0C4vv7Sto3CE2Hcn2i9LtDEUL5+4MrNpHWH/sOGDxBC pYoQ+ZNSUkVxanmOajSd74jkl/nAfwGzn9Js7WZ67DbpIXvBELtz9aKRER3iD5XMSYkf hnR4uPx52wFaAC4CT6gDO36YixILtFaWMaTeM= Received: by 10.220.168.213 with SMTP id v21mr7557083vcy.134.1283460960322; Thu, 02 Sep 2010 13:56:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.201.129 with HTTP; Thu, 2 Sep 2010 13:55:40 -0700 (PDT) In-Reply-To: <3CCCC121-BD60-4D3B-B7AA-353CEAB9C241@oskarsson.nu> References: <3CCCC121-BD60-4D3B-B7AA-353CEAB9C241@oskarsson.nu> From: Jonathan Ellis Date: Thu, 2 Sep 2010 13:55:40 -0700 Message-ID: Subject: Re: [DISCUSSION] High-volume counters in Cassandra To: dev@cassandra.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I still have not seen any response to my other misgivings about 1072 that I have raised on the ticket. Specifically, the existing patch is based around a Clock structure that, since 580 is a dead end, is no longer necessary. I'm also uneasy about adding 200k of code that meshes as poorly with the rest of Cassandra as this does. The more it can be split off into separate code paths, the better. Adding its own thrift method is a good start, but it should go deeper than that. On Thu, Sep 2, 2010 at 12:01 PM, Johan Oskarsson wrote= : > In the last few months Digg and Twitter have been using a counter patch t= hat lets Cassandra act as a high-volume realtime counting system. Atomic co= unters enable new applications that were previously difficult to implement = at scale, including realtime analytics and large-scale systems monitoring. > > Discussion > There are currently two different suggestions for how to implement counte= rs in Cassandra. The discussion has so far been limited to those following = the jiras (CASSANDRA-1072 and CASSANDRA-1421) closely and we don=92t seem t= o be nearing a decision. I want to open it up to the Cassandra community at= large to get additional feedback. > > Below are very basic and brief introductions to the alternatives. Please = help us move forward by reading through the docs and jiras and reply to thi= s thread with your thoughts. Would one or the other, both or neither be sui= table for inclusion in Cassandra? Is there a third option? What can we do t= o reach a decision? > > We believe that both options can coexist; their strengths and weaknesses = make them suitable for different use cases. > > > CASSANDRA-1072 + CASSANDRA-1397 > https://issues.apache.org/jira/browse/CASSANDRA-1072 (see design doc) > https://issues.apache.org/jira/browse/CASSANDRA-1397 > > How does it work? > A node is picked as the primary replica for each write. The context byte = array for a column contains (primary replica ip, value). Any previous data = with the same ip is reconciled with the new increment and put as the column= value. > > Concerns raised > * an increment in flight will be lost if the wrong node goes down > * if an increment operation times out it=92s impossible to know if it has= been executed or not > > The most recent jira comment proposes a new API method for increments tha= t reflects the different consistency level guarantees. > > > CASSANDRA-1421 > https://issues.apache.org/jira/browse/CASSANDRA-1421 > > How does it work? > Each increment for a counter is stored as a (UUID, value) tuple. The read= operations will read all these increment tuples for a counter, reconcile a= nd return. On a regular interval the values are all read and reconciled int= o one value to reduce the amount of data required for each read operation. > > Concerns raised > * poor read performance, especially for time-series data > * post aggregation reconciliation issues > > > Again, we feel that both options can co-exist, especially if the 1072 pat= ch uses a new API method that reflects its different consistency level guar= antees. Our proposal is to accept 1072 into trunk with the new API method, = and when an implementation of 1421 is completed it can be accepted alongsid= e. --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com