hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2294) Enumerate ACID properties of HBase in a well defined spec
Date Wed, 31 Mar 2010 23:56:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852183#action_12852183

Todd Lipcon commented on HBASE-2294:

Thanks for reviving this issue, Stack.

I thought a bit more about the stale reads thing, and I think the safest bet is this: by default
we do _not_ allow stale reads, but in the future we could add a flag on get() calls that explicitly
allows it. I think this is more what people expect out of a datastore, and if people want
to make the tradeoff they should ask for it. Since we determined above it should be perfectly
efficient to be correct, we might as well be correct by default.

Here's the current state of the gist:

Here's a first pass at some kind of spec. These aren't meant to be final - just posting for
discussion. I anticipate that after we (developers) come to some kind of conclusion here we
will want to run this by the user list to see if we're missing use cases, etc.

h1. Definitions

For the sake of common vocabulary, we define the following terms:

*ATOMICITY*: an operation is atomic if it either completes entirely or not at all
*CONSISTENCY*: all actions cause the table to transition from one valid state directly to
another (eg a row will not disappear during an update,e tc)
*ISOLATION*: an operation is isolated if it appears to complete independently of any other
concurrent transaction
*DURABILITY*: any update that reports "successful" to the client will not be lost
*VISIBILITY*: an update is considered visible if any subsequent read will see the update as
having been committed

The terms _must_ and _may_ are used as specified by RFC 2119. In short, the word "must" implies
that, if some case exists where the statement is not true, it is a bug. The word "may" implies
that, even if the guarantee is provided in a current release, users should not rely on it.

h1. APIs to consider

* Read APIs
** get
** scan

* Write APIs
** put
** batch put
** delete

* Combination (read-modify-write) APIs
** incrementColumnValue
** checkAndPut

h1. Guarantees Provided

h2. Atomicity

# All mutations are atomic within a row. Any put will either wholely succeed or wholely fail.
## An operation that returns a "success" code has completely succeeded.
## An operation that returns a "failure" code has completely failed.
## An operation that times out may have succeeded and may have failed. However, it will not
have partially succeeded or failed.
# This is true even if the mutation crosses multiple column families within a row.
# APIs that mutate several rows will _not_ be atomic across the multiple rows. For example,
a multiput that operates on rows 'a','b', and 'c' may return having mutated some but not all
of the rows. In such cases, these APIs will return a list of success codes, each of which
may be succeeded, failed, or timed out as described above.
# The checkAndPut API happens atomically like the typical compareAndSet (CAS) operation found
in many hardware architectures.
# The order of mutations is seen to happen in a well-defined order for each row, with no interleaving.
For example, if one writer issues the mutation "a=1,b=1,c=1" and another writer issues the
mutation "a=2,b=2,c=2", the row must either be "a=1,b=1,c=1" or "a=2,b=2,c=2" and must _not_
be something like "a=1,b=2,c=1".
## Please note that this is not true _across rows_ for multirow batch mutations.

h2. Consistency and Isolation

# All rows returned via any access API will consist of a complete row that existed at some
point in the table's history.
# This is true across column families - i.e a get of a full row that occurs concurrent with
some mutations 1,2,3,4,5 will return a complete row that existed at some point in time between
mutation i and i+1 for some i between 1 and 5.

h3. Consistency of Scans

A scan is *not* a consistent view of a table. Scans do *not* exhibit _snapshot isolation_.

Rather, scans have the following properties:

# Any row returned by the scan will be a consistent view (i.e. that version of the complete
row existed at some point in time)
# A scan will always reflect a view of the data _at least as new as_ the beginning of the
scan. This satisfies the visibility guarantees enumerated below.
## For example, if client A writes data X and then communicates via a side channel to client
B, any scans started by client B will contain data at least as new as X.
## A scan _must_ reflect all mutations committed prior to the construction of the scanner,
and _may_ reflect some mutations committed subsequent to the construction of the scanner.
## Scans must include _all_ data written prior to the scan (except in the case where data
is subsequently mutated, in which case it _may_ reflect the mutation)

Those familiar with relational databases will recognize this isolation level as "read committed".

Please note that the guarantees listed above regarding scanner consistency are referring to
"transaction commit time", not the "timestamp" field of each cell. That is to say, a scanner
started at time t may see edits with a timestamp value less than t, if those edits were committed
with a "backdated" timestamp after the scanner was constructed.

h2. Visibility

# When a client receives a "success" response for any mutation, that mutation is immediately
visible to both that client and any client with whom it later communicates through side channels.
# A row must never exhibit so-called "time-travel" properties. That is to say, if a series
of mutations moves a row sequentially through a series of states, any sequence of concurrent
reads will return a subsequence of those states.
## For example, if a row's cells are mutated using the "incrementColumnValue" API, a client
must never see the value of any cell decrease.
## This is true regardless of which read API is used to read back the mutation.
# Any version of a cell that has been returned to a read operation is guaranteed to be durably

h2. Durability

# All visible data is also durable data. That is to say, a read will never return data that
has not been made durable on disk[1]
# Any operation that returns a "success" code (eg does not throw an exception) will be made
# Any operation that returns a "failure" code will not be made durable (subject to the Atomicity
guarantees above)
# All reasonable failure scenarios will not affect any of the guarantees of this document.

h1. Tunability

All of the above guarantees must be possible within HBase. For users who would like to trade
off some guarantees for performance, HBase may offer several tuning options. For example:
- Visibility may be tuned on a per-read basis to allow stale reads or time travel.
- Durability may be tuned to only flush data to disk on a periodic basis


[1] In the context of HBase, "durably on disk" implies an hflush() call on the transaction
log. This does not actually imply an fsync() to magnetic media, but rather just that the data
has been written to the OS cache on all replicas of the log. In the case of a full datacenter
power loss, it is possible that the edits are not truly durable.

> Enumerate ACID properties of HBase in a well defined spec
> ---------------------------------------------------------
>                 Key: HBASE-2294
>                 URL: https://issues.apache.org/jira/browse/HBASE-2294
>             Project: Hadoop HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.20.4, 0.21.0
> It's not written down anywhere what the guarantees are for each operation in HBase with
regard to the various ACID properties. I think the developers know the answers to these questions,
but we need a clear spec for people building systems on top of HBase. Here are a few sample
questions we should endeavor to answer:
> - For a multicell put within a CF, is the update made durable atomically?
> - For a put across CFs, is the update made durable atomically?
> - Can a read see a row that hasn't been sync()ed to the HLog?
> - What isolation do scanners have? Somewhere between snapshot isolation and no isolation?
> - After a client receives a "success" for a write operation, is that operation guaranteed
to be visible to all other clients?
> etc
> I see this JIRA as having several points of discussion:
> - Evaluation of what the current state of affairs is
> - Evaluate whether we currently provide any guarantees that aren't useful to users of
the system (perhaps we can drop in exchange for performance)
> - Evaluate whether we are missing any guarantees that would be useful to users of the

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message