Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 66255 invoked from network); 28 Sep 2010 09:31:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Sep 2010 09:31:03 -0000 Received: (qmail 31969 invoked by uid 500); 28 Sep 2010 09:31:03 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 31617 invoked by uid 500); 28 Sep 2010 09:31:00 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 31598 invoked by uid 99); 28 Sep 2010 09:30:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Sep 2010 09:30:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Sep 2010 09:30:59 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o8S9Uc6s022556 for ; Tue, 28 Sep 2010 09:30:38 GMT Message-ID: <2897720.439391285666238385.JavaMail.jira@thor> Date: Tue, 28 Sep 2010 05:30:38 -0400 (EDT) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Commented: (CASSANDRA-1546) (Yet another) approach to counting In-Reply-To: <14882558.418751285581874401.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915683#action_12915683 ] Sylvain Lebresne commented on CASSANDRA-1546: --------------------------------------------- (I updated the patch because I've found a way to simplify a bit of the code (I've removed the special deserialization function in ColumnSerializer if some had read the code already)) Allow me to explain a little further how this work before answering the preceding questions (sorry if that a tad long). Let's consider a counter c whose replicas is node A, B and C. Let's say that we have updated 3 times the counter, with values 1, 2 and 3 respectively and with node A, B and C for respective 'update leader'. The row for c (I don't consider marker columns here) will be the following one *on node A*: {noformat} c : { : 1, (LocalCounterColumn) : 2, (CounterColumn) : 3, (CounterColumn) } {noformat} and on *node B*, the row for c will be: {noformat} c : { : 1, (CounterColumn) : 2, (LocalCounterColumn) : 3, (CounterColumn) } {noformat} In parenthesis are the actual class implementing the column. Note that on each node, the column with its id is special. And the difference is that when a LocalCounterColumn c1 conflicts with another LocalCounterColumn c2, then we resolve this by returning a new LocalCounterColumn c3, whose value is c1.value() + c2.value() (and the timestamp is the max of c1 and c2 timestmap). CounterColumn in contrast have the exact same resolution than standard column (that is, if two CounterColumn conflicts, the result is the one with higher timestamp). So, to answer the question about serializing the writes, there is no need (and I believe it's a good thing performance-wise). When a leader receives an update, it doesn't read-then-write. It writes-then-read. And as parts of the read, the newly inserted LocalCounterColumn will be 'merged' with the other, already present LocalCounterColumn and yield the actual value of the column, without the risk of loosing an increment. But now, we see that the data is not exactly mirrored in the nodes. In particular, there is one thing that we must absolutely avoid: we should never have a repair operation (read repair or AE repair) that inserts to node A a LocalCounterColumn whose name is (otherwise, this would get added to the actual value and screw up the total counter value). Another way to say this is that the value of the column is always equals to the sum of all the update A have leads, and we are sure of that. So we need not repair the value of this column on node A and we *must never do it*. Moreover, when A sends it's value parts to B, it sends a LocalCounterColumn, but when received by B (or any other host for this matter), it should become a CounterColumn. The implementation enforces this in the ColumnSerializer, during deserialization. When a node deserialize a (serialized) LocalCounterColumn, it will always deserialize it as a CounterColumn unless, it is its locaCounterColumn. So when A sends it LocalCounterColumn to B (for a read repair say), B will deserialize it as a CounterColumn. If now B sends this back to A, A will receive a CounterColumn for its local counter column and it will discard it. So, because we ensure that an host different from A will never 'see' a LocalCounterColumn whose name is (but it will see such CounterColumn), we know that we will never wrongfully repair the local counter of A. During AE repair, because we use streaming, we could end up with a SStable on B having a LocalCounterColumn of name . However, as soon a this column is deserialized, it is deserialized as a CounterColumn. So here again, we will not wrongfully repair A. Unless ... we stream back the exact same sstable to A. But I think this can never happen (anybody more familiar with AE repair and streaming could confirm?). > (Yet another) approach to counting > ---------------------------------- > > Key: CASSANDRA-1546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1546 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 0.7.0 > > Attachments: 0001-Remove-IClock-from-internals.patch, 0002-Counters.patch, 0003-Generated-thrift-files-changes.patch > > > This could be described as a mix between CASSANDRA-1072 without clocks and CASSANDRA-1421. > More details in the comment below. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.