Return-Path: Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: (qmail 88668 invoked from network); 27 Sep 2010 03:33:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Sep 2010 03:33:21 -0000 Received: (qmail 49623 invoked by uid 500); 27 Sep 2010 03:33:21 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 49216 invoked by uid 500); 27 Sep 2010 03:33:18 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 49205 invoked by uid 99); 27 Sep 2010 03:33:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 03:33:17 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of schumi.han@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 03:33:12 +0000 Received: by bwz9 with SMTP id 9so4057523bwz.31 for ; Sun, 26 Sep 2010 20:32:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=ynlTOHv60nku7DR11xRnPst8+ZoN7FE1QfyUJi1Udo0=; b=wKgzztC2yFRjMoRmB+spN7/gGKPGNiCdfsVIyVIIkc3tpmP9Iws2cpAaPlUgAv/imS CeYd2KpDfFVCpXzcnN/drU8hbSdHJP7ENt+k1kFmzNkQ9/MQo2agDipDdf/d3GryYGQ9 gJtB4IefhMHOUoXcE1I3Eg95H+q8zglazm1JI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=XvaT79z4Pf8DLyYClVsmgxVjXVkwTZZHlmNkLymHVqhSsIW9ChvAmyPvXoKqWWQMRk KQ/0NebrIoxXD1mMhLePiwuXTjGnkVRXSbC1uoyMumiNAkQbVySLWOIPIdisX4cq092U jWd3URubJsbxCgyAVxdx8O+NdYfcJohibp2DY= MIME-Version: 1.0 Received: by 10.204.57.130 with SMTP id c2mr4704613bkh.144.1285558371197; Sun, 26 Sep 2010 20:32:51 -0700 (PDT) Received: by 10.204.163.1 with HTTP; Sun, 26 Sep 2010 20:32:51 -0700 (PDT) In-Reply-To: References: <3CCCC121-BD60-4D3B-B7AA-353CEAB9C241@oskarsson.nu> <4C0416C5-6422-48D0-9055-092543C47C42@oskarsson.nu> Date: Mon, 27 Sep 2010 11:32:51 +0800 Message-ID: Subject: Re: [DISCUSSION] High-volume counters in Cassandra From: Zhu Han To: dev@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636c5b3240da7770491356237 --001636c5b3240da7770491356237 Content-Type: text/plain; charset=ISO-8859-1 I propose a new way to solve the counter problem in cassandra-1502[1]. Since I do not follow the jira update very carefully, I paste it here and want to let more people comment it and then to see whether its feasible. "Seems like we have not found a solution acceptable to everybody. I tries to propose a new approach. Let's see whether anybody can shed some light on it and make it as reality. 1) We add a basic data structure, called as counter, which is a special type of super column. 2) The name of each column in the counter super column, is the host name of a cassandra node. And the value is the calculated result from that node. 3) WRITE PATH: Once a node receives the add/dec request of a counter, it de-serializes its local counter super column, and update the column named by itself atomically. After that, it propagates the updated column value to other replicas, just like how the mutation of a normal column is propagated to other replicas. Different consistency levels can be supported as before. 4) READ PATH: Depends on the consistency level, contact several replicas, read back the counter super column as whole, and get the latest counter value by summing up all columns in the counter. Read-repair logic can work as before. IMHO, the biggest advantages of this approach, is re-using as many mechanisms already in the code as possible. So it might not so disruptive. But adding new thrift API is inevitable. " NB: If it's feasible, I might not be the right man working on it as I have not touched the internal of cassandra for more than 1 year. I wants to contribute something to help us get consensus. [1] https://issues.apache.org/jira/browse/CASSANDRA-1502?focusedCommentId=12915103&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12915103 best regards, hanzhu On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis wrote: > you have misunderstood. if we continue the 1072 approach of writing > counter data to the clock field, this is necessarily incompatible with > the right way of writing counter data to the value field. it's no > longer simply a matter of reversing 1070. > > On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han wrote: > > Jonathan, > > > > This is a personnel email. > > > > On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis > wrote: > >> > >> On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han wrote: > >> > Can we just let the patch committed but mark it as "alpah" or > >> > "experimental"? > >> > >> I explained exactly why that is not a good approach here: > >> http://www.mail-archive.com/dev@cassandra.apache.org/msg00917.html > >> > > Yes, I see. But the clock structure is in truck since Cassandra-1070. We > > still need to clean them > > out, whatever. We need somebody to be volunteer to take this work. > > Considering the complexity > > of Cassandra-1070, the programmer who has the in depth knowledge of this > > patch is preferable. And it > > will take some time to do it. > > > > Fortunately, Johan Oskarsson has promised to take it in the comment of > > Cassandra-1072[1]: > > > > "The clock changes would get into trunk quicker if we didn't, avoiding > the > > extra overhead of a big patch during reviews, merge with trunk, code > updates > > and publication of a new patch. > > If the concern is that we won't attend to the clocks once this patch is > in I > > can promise that we'll look at it straight away. " > > > > And if twitter/digg/simplegeo forks their tree of cassandra, this will > give > > a big marketing opportunities of other NOSQL system supporters. As you > know, > > the competition is quite fierce currently. > > > > So, instead of sticking to the embarrassed situation, why not change to > > another strategy: > > > >> "Fork another experimental tree from 0.7 beta 1 and accept > >> Cassandra-1072. At the same time, start the clean up work on this tree. > >> Once it's finalized , merge them back to 0.7, no matter it's 0.7.1 or > 0.7.2. > >> > >> Hence, these guys from twitter does not need to maintain a huge > >> out-of-tree patch, while the quality impact of cassandra-1072 is still > >> limited. > > > > I do know the pain of maintaining a large patch out of the official tree. > > Once it gets in, everybody will feels much better. > > > > If you give some opportunities to this patch, Johan or others can be > highly > > motivated because all of the community works together. It's a > compromise, > > but it's worth. > > > > [1] > > > https://issues.apache.org/jira/browse/CASSANDRA-1072?focusedCommentId=12909234&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12909234 > > > > > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder of Riptano, the source for professional Cassandra support > >> http://riptano.com > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > --001636c5b3240da7770491356237--