Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D8C9497F for ; Sat, 25 Jun 2011 04:28:34 +0000 (UTC) Received: (qmail 15883 invoked by uid 500); 25 Jun 2011 04:28:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 15704 invoked by uid 500); 25 Jun 2011 04:28:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15693 invoked by uid 99); 25 Jun 2011 04:28:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Jun 2011 04:28:14 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [204.13.248.66] (HELO mho-01-ewr.mailhop.org) (204.13.248.66) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Jun 2011 04:28:04 +0000 Received: from 67-6-222-36.hlrn.qwest.net ([67.6.222.36] helo=[192.168.0.2]) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72) (envelope-from ) id 1QaKT8-000KmC-JJ for user@cassandra.apache.org; Sat, 25 Jun 2011 04:27:42 +0000 X-Mail-Handler: MailHop Outbound by DynDNS X-Originating-IP: 67.6.222.36 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/mailhop/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/LRFKWA1lvscJmWFWElLnKfXe0rJwabX8= Message-ID: <4E0563B6.8050305@dude.podzone.net> Date: Fri, 24 Jun 2011 22:27:34 -0600 From: AJ User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.18) Gecko/20110616 Lightning/1.0b2 Thunderbird/3.1.11 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Cassandra ACID References: <4E0434FF.2080702@dude.podzone.net> <4E04E994.8050108@referentia.com> In-Reply-To: <4E04E994.8050108@referentia.com> Content-Type: multipart/alternative; boundary="------------010503090300000303000100" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------010503090300000303000100 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Ok, here it is reworked; consider it a summary of the thread. If I left out an important point that you think is 100% correct even if you already mentioned it, then make some noise about it and provide some evidence so it's captured sufficiently. And, if you're in a debate, please try and get to a resolution; all will appreciate it. It will be evident below that Consistency is not the only thing that is "tunable", at least indirectly. Unfortunately, you still can't tunafish. Ar ar ar. *Atomicity* All individual writes are atomic at the row level. So, a batch mutate for one specific key will apply updates to all the columns for that one specific row atomically. If part of the single-key batch update fails, then all of the updates will be reverted since they all pertained to one key/row. Notice, I said 'reverted' not 'rolled back'. Note: atomicity and isolation are related to the topic of transactions but one does not imply the other. Even though row updates are atomic, they are not isolated from other users' updates or reads. Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic *Consistency* Cassandra does not provide the same scope of Consistency as defined in the ACID standard. Consistency in C* does not include referential integrity since C* is not a relational database. Any referential integrity required would have to be handled by the client. Also, even though the official docs say that QUORUM writes/reads is the minimal consistency_level setting to guarantee full consistency, this assumes that the write preceding the read does not fail (see comments below). Therefore, an ALL write would be necessary prior to a QUORUM read of the same data. For a multi-dc scenario use an ALL write followed by a EACH_QUORUM read. Refs: http://wiki.apache.org/cassandra/ArchitectureOverview *Isolation* NOTHING is isolated; because there is no transaction support in the first place. This means that two or more clients can update the same row at the same time. Their updates of the same or different columns may be interleaved and leave the row in a state that may not make sense depending on your application. Note: this doesn't mean to say that two updates of the same column will be corrupted, obviously; columns are the smallest atomic unit ('atomic' in the more general thread-safe context). Refs: None that directly address this explicitly and clearly and in one place. *Durability* Updates are made highly durable at the level comparable to a DBMS by the use of the commit log. However, this requires "commitlog_sync: batch" in cassandra.yaml. For "some" performance improvement with "some" cost in durability you can specify "commitlog_sync: periodic". See discussion below for more details. Refs: Plenty + this thread. On 6/24/2011 1:46 PM, Jim Newsham wrote: > On 6/23/2011 8:55 PM, AJ wrote: >> Can any Cassandra contributors/guru's confirm my understanding of >> Cassandra's degree of support for the ACID properties? >> >> I provide official references when known. Please let me know if I >> missed some good official documentation. >> >> *Atomicity* >> All individual writes are atomic at the row level. So, a batch >> mutate for one specific key will apply updates to all the columns for >> that one specific row atomically. If part of the single-key batch >> update fails, then all of the updates will be reverted since they all >> pertained to one key/row. Notice, I said 'reverted' not 'rolled >> back'. Note: atomicity and isolation are related to the topic of >> transactions but one does not imply the other. Even though row >> updates are atomic, they are not isolated from other users' updates >> or reads. >> Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic >> >> *Consistency* >> If you want 100% consistency, use consistency level QUORUM for both >> reads and writes and EACH_QUORUM in a multi-dc scenario. >> Refs: http://wiki.apache.org/cassandra/ArchitectureOverview >> > > This is a pretty narrow interpretation of consistency. In a > traditional database, consistency prevents you from getting into a > logically inconsistent state, where records in one table do not agree > with records in another table. This includes referential integrity, > cascading deletes, etc. It seems to me Cassandra has no support for > this concept whatsoever. > >> *Isolation* >> NOTHING is isolated; because there is no transaction support in the >> first place. This means that two or more clients can update the same >> row at the same time. Their updates of the same or different columns >> may be interleaved and leave the row in a state that may not make >> sense depending on your application. Note: this doesn't mean to say >> that two updates of the same column will be corrupted, obviously; >> columns are the smallest atomic unit ('atomic' in the more general >> thread-safe context). >> Refs: None that directly address this explicitly and clearly and in >> one place. >> >> *Durability* >> Updates are made durable by the use of the commit log. No worries here. >> Refs: Plenty. > > Jim --------------010503090300000303000100 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Ok, here it is reworked; consider it a summary of the thread.  If I left out an important point that you think is 100% correct even if you already mentioned it, then make some noise about it and provide some evidence so it's captured sufficiently.  And, if you're in a debate, please try and get to a resolution; all will appreciate it.

It will be evident below that Consistency is not the only thing that is "tunable", at least indirectly.  Unfortunately, you still can't tunafish.  Ar ar ar.

Atomicity
All individual writes are atomic at the row level.  So, a batch mutate for one specific key will apply updates to all the columns for that one specific row atomically.  If part of the single-key batch update fails, then all of the updates will be reverted since they all pertained to one key/row.  Notice, I said 'reverted' not 'rolled back'.  Note: atomicity and isolation are related to the topic of transactions but one does not imply the other.  Even though row updates are atomic, they are not isolated from other users' updates or reads.   
Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Consistency
Cassandra does not provide the same scope of Consistency as defined in the ACID standard.  Consistency in C* does not include referential integrity since C* is not a relational database.  Any referential integrity required would have to be handled by the client.  Also, even though the official docs say that QUORUM writes/reads is the minimal consistency_level setting to guarantee full consistency, this assumes that the write preceding the read does not fail (see comments below).  Therefore, an ALL write would be necessary prior to a QUORUM read of the same data.  For a multi-dc scenario use an ALL write followed by a EACH_QUORUM read.
Refs: http://wiki.apache.org/cassandra/ArchitectureOverview

Isolation
NOTHING is isolated; because there is no transaction support in the first place.  This means that two or more clients can update the same row at the same time.  Their updates of the same or different columns may be interleaved and leave the row in a state that may not make sense depending on your application.  Note: this doesn't mean to say that two updates of the same column will be corrupted, obviously; columns are the smallest atomic unit ('atomic' in the more general thread-safe context).
Refs: None that directly address this explicitly and clearly and in one place.

Durability
Updates are made highly durable at the level comparable to a DBMS by the use of the commit log.  However, this requires "commitlog_sync: batch" in cassandra.yaml.  For "some" performance improvement with "some" cost in durability you can specify "commitlog_sync: periodic".  See discussion below for more details.
Refs: Plenty + this thread.



On 6/24/2011 1:46 PM, Jim Newsham wrote:
On 6/23/2011 8:55 PM, AJ wrote:
Can any Cassandra contributors/guru's confirm my understanding of Cassandra's degree of support for the ACID properties?

I provide official references when known.  Please let me know if I missed some good official documentation.

Atomicity
All individual writes are atomic at the row level.  So, a batch mutate for one specific key will apply updates to all the columns for that one specific row atomically.  If part of the single-key batch update fails, then all of the updates will be reverted since they all pertained to one key/row.  Notice, I said 'reverted' not 'rolled back'.  Note: atomicity and isolation are related to the topic of transactions but one does not imply the other.  Even though row updates are atomic, they are not isolated from other users' updates or reads.   
Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Consistency
If you want 100% consistency, use consistency level QUORUM for both reads and writes and EACH_QUORUM in a multi-dc scenario. 
Refs: http://wiki.apache.org/cassandra/ArchitectureOverview


This is a pretty narrow interpretation of consistency.  In a traditional database, consistency prevents you from getting into a logically inconsistent state, where records in one table do not agree with records in another table.  This includes referential integrity, cascading deletes, etc.  It seems to me Cassandra has no support for this concept whatsoever.

Isolation
NOTHING is isolated; because there is no transaction support in the first place.  This means that two or more clients can update the same row at the same time.  Their updates of the same or different columns may be interleaved and leave the row in a state that may not make sense depending on your application.  Note: this doesn't mean to say that two updates of the same column will be corrupted, obviously; columns are the smallest atomic unit ('atomic' in the more general thread-safe context).
Refs: None that directly address this explicitly and clearly and in one place.

Durability
Updates are made durable by the use of the commit log.  No worries here.
Refs: Plenty.

Jim

--------------010503090300000303000100--