Return-Path: Delivered-To: apmail-hadoop-hbase-commits-archive@minotaur.apache.org Received: (qmail 18296 invoked from network); 20 Apr 2010 23:16:28 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Apr 2010 23:16:28 -0000 Received: (qmail 98221 invoked by uid 500); 20 Apr 2010 23:16:28 -0000 Delivered-To: apmail-hadoop-hbase-commits-archive@hadoop.apache.org Received: (qmail 98197 invoked by uid 500); 20 Apr 2010 23:16:28 -0000 Mailing-List: contact hbase-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-commits@hadoop.apache.org Received: (qmail 98190 invoked by uid 99); 20 Apr 2010 23:16:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 23:16:28 +0000 X-ASF-Spam-Status: No, hits=-1530.0 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 23:16:27 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id ACD0E23888AD; Tue, 20 Apr 2010 23:15:45 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r936112 - in /hadoop/hbase/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/acid-semantics.xml src/docs/src/documentation/content/xdocs/site.xml Date: Tue, 20 Apr 2010 23:15:45 -0000 To: hbase-commits@hadoop.apache.org From: stack@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20100420231545.ACD0E23888AD@eris.apache.org> Author: stack Date: Tue Apr 20 23:15:45 2010 New Revision: 936112 URL: http://svn.apache.org/viewvc?rev=936112&view=rev Log: HBASE-2294 Enumerate ACID properties of HBase in a well defined spec Added: hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/acid-semantics.xml Modified: hadoop/hbase/trunk/CHANGES.txt hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/site.xml Modified: hadoop/hbase/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/CHANGES.txt?rev=936112&r1=936111&r2=936112&view=diff ============================================================================== --- hadoop/hbase/trunk/CHANGES.txt (original) +++ hadoop/hbase/trunk/CHANGES.txt Tue Apr 20 23:15:45 2010 @@ -18,6 +18,8 @@ Release 0.21.0 - Unreleased HBASE-2378 Bulk insert with multiple reducers broken due to improper ImmutableBytesWritable comparator (Todd Lipcon via Stack) HBASE-2392 Upgrade to ZooKeeper 3.3.0 + HBASE-2294 Enumerate ACID properties of HBase in a well defined spec + (Todd Lipcon via Stack) BUG FIXES HBASE-1791 Timeout in IndexRecordWriter (Bradford Stephens via Andrew Added: hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/acid-semantics.xml URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/acid-semantics.xml?rev=936112&view=auto ============================================================================== --- hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/acid-semantics.xml (added) +++ hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/acid-semantics.xml Tue Apr 20 23:15:45 2010 @@ -0,0 +1,227 @@ + + + + + + + + +
+ + HBase ACID Properties + +
+ + +
+ About this Document +

HBase is not an ACID compliant database. However, it does guarantee certain specific + properties.

+

This specification enumerates the ACID properties of HBase.

+
+
+ Definitions +

For the sake of common vocabulary, we define the following terms:

+
+
Atomicity
+
an operation is atomic if it either completes entirely or not at all
+ +
Consistency
+
+ all actions cause the table to transition from one valid state directly to another + (eg a row will not disappear during an update, etc) +
+ +
Isolation
+
+ an operation is isolated if it appears to complete independently of any other concurrent transaction +
+ +
Durability
+
any update that reports "successful" to the client will not be lost
+ +
Visibility
+
an update is considered visible if any subsequent read will see the update as having been committed
+
+

+ The terms must and may are used as specified by RFC 2119. + In short, the word "must" implies that, if some case exists where the statement + is not true, it is a bug. The word "may" implies that, even if the guarantee + is provided in a current release, users should not rely on it. +

+
+
+ APIs to consider +
    +
  • Read APIs +
      +
    • get
    • +
    • scan
    • +
    +
  • +
  • Write APIs
  • +
      +
    • put
    • +
    • batch put
    • +
    • delete
    • +
    +
  • Combination (read-modify-write) APIs
  • +
      +
    • incrementColumnValue
    • +
    • checkAndPut
    • +
    +
+
+ +
+ Guarantees Provided + +
+ Atomicity + +
    +
  1. All mutations are atomic within a row. Any put will either wholely succeed or wholely fail.
  2. +
      +
    1. An operation that returns a "success" code has completely succeeded.
    2. +
    3. An operation that returns a "failure" code has completely failed.
    4. +
    5. An operation that times out may have succeeded and may have failed. However, + it will not have partially succeeded or failed.
    6. +
    +
  3. This is true even if the mutation crosses multiple column families within a row.
  4. +
  5. APIs that mutate several rows will _not_ be atomic across the multiple rows. + For example, a multiput that operates on rows 'a','b', and 'c' may return having + mutated some but not all of the rows. In such cases, these APIs will return a list + of success codes, each of which may be succeeded, failed, or timed out as described above.
  6. +
  7. The checkAndPut API happens atomically like the typical compareAndSet (CAS) operation + found in many hardware architectures.
  8. +
  9. The order of mutations is seen to happen in a well-defined order for each row, with no + interleaving. For example, if one writer issues the mutation "a=1,b=1,c=1" and + another writer issues the mutation "a=2,b=2,c=2", the row must either + be "a=1,b=1,c=1" or "a=2,b=2,c=2" and must not be something + like "a=1,b=2,c=1".
  10. +
      +
    1. Please note that this is not true _across rows_ for multirow batch mutations.
    2. +
    +
+
+
+ Consistency and Isolation +
    +
  1. All rows returned via any access API will consist of a complete row that existed at + some point in the table's history.
  2. +
  3. This is true across column families - i.e a get of a full row that occurs concurrent + with some mutations 1,2,3,4,5 will return a complete row that existed at some point in time + between mutation i and i+1 for some i between 1 and 5.
  4. +
  5. The state of a row will only move forward through the history of edits to it.
  6. +
+ +
Consistency of Scans +

+ A scan is not a consistent view of a table. Scans do + not exhibit snapshot isolation. +

+

+ Rather, scans have the following properties: +

+ +
    +
  1. + Any row returned by the scan will be a consistent view (i.e. that version + of the complete row existed at some point in time) +
  2. +
  3. + A scan will always reflect a view of the data at least as new as + the beginning of the scan. This satisfies the visibility guarantees + enumerated below.
  4. +
      +
    1. For example, if client A writes data X and then communicates via a side + channel to client B, any scans started by client B will contain data at least + as new as X.
    2. +
    3. A scan _must_ reflect all mutations committed prior to the construction + of the scanner, and _may_ reflect some mutations committed subsequent to the + construction of the scanner.
    4. +
    5. Scans must include all data written prior to the scan (except in + the case where data is subsequently mutated, in which case it _may_ reflect + the mutation)
    6. +
    +
+

+ Those familiar with relational databases will recognize this isolation level as "read committed". +

+

+ Please note that the guarantees listed above regarding scanner consistency + are referring to "transaction commit time", not the "timestamp" + field of each cell. That is to say, a scanner started at time t may see edits + with a timestamp value greater than t, if those edits were committed with a + "forward dated" timestamp before the scanner was constructed. +

+
+
+
+ Visibility +
    +
  1. When a client receives a "success" response for any mutation, that + mutation is immediately visible to both that client and any client with whom it + later communicates through side channels.
  2. +
  3. A row must never exhibit so-called "time-travel" properties. That + is to say, if a series of mutations moves a row sequentially through a series of + states, any sequence of concurrent reads will return a subsequence of those states.
  4. +
      +
    1. For example, if a row's cells are mutated using the "incrementColumnValue" + API, a client must never see the value of any cell decrease.
    2. +
    3. This is true regardless of which read API is used to read back the mutation.
    4. +
    +
  5. Any version of a cell that has been returned to a read operation is guaranteed to + be durably stored.
  6. +
+ +
+
+ Durability +
    +
  1. All visible data is also durable data. That is to say, a read will never return + data that has not been made durable on disk[1]
  2. +
  3. Any operation that returns a "success" code (eg does not throw an exception) + will be made durable.
  4. +
  5. Any operation that returns a "failure" code will not be made durable + (subject to the Atomicity guarantees above)
  6. +
  7. All reasonable failure scenarios will not affect any of the guarantees of this document.
  8. + +
+
+
+ Tunability +

All of the above guarantees must be possible within HBase. For users who would like to trade + off some guarantees for performance, HBase may offer several tuning options. For example:

+
    +
  • Visibility may be tuned on a per-read basis to allow stale reads or time travel.
  • +
  • Durability may be tuned to only flush data to disk on a periodic basis
  • +
+
+
+
+ Footnotes + +

[1] In the context of HBase, "durably on disk" implies an hflush() call on the transaction + log. This does not actually imply an fsync() to magnetic media, but rather just that the data has been + written to the OS cache on all replicas of the log. In the case of a full datacenter power loss, it is + possible that the edits are not truly durable.

+
+ + +
Modified: hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/site.xml URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=936112&r1=936111&r2=936112&view=diff ============================================================================== --- hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/site.xml (original) +++ hadoop/hbase/trunk/src/docs/src/documentation/content/xdocs/site.xml Tue Apr 20 23:15:45 2010 @@ -36,6 +36,7 @@ See http://forrest.apache.org/docs/linki +