hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2294) Enumerate ACID properties of HBase in a well defined spec
Date Mon, 15 Mar 2010 22:42:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845575#action_12845575

ryan rawson commented on HBASE-2294:

So previously without durability, some of the things in here were just not applicable.  Sync
= noop really doesnt lend itself to answering these questions.

I have postponed commenting until I fixed 2248, but now that I have here are my suggestions
on how we should do things:

- Row mutate operations should be atomic. Concurrent gets/scans do not see the results of
a row mutation until it is "finished".  In the code, this means "when rwcc.completeMemstoreInsert()
is called".  This has to happen _after_ all KVs have been put in memstore.  We have to call
HLog.sync() _before_ we start modifying the memstore so if there is any HLog issue we don't
mutate memstore.  Thus rows become visibile _very shortly_ after a HLog.sync occurs.  The
time it takes to modify in-memory structures and call rwcc.completeMemstoreInsert().
- Row mutates across multiple families should be atomic. This was not too hard to implement
in HBASE-2248 and represents a good level of service I think.
- Reads cannot see rows that have not been sync()ed to HLog.
- Scanners have a weak isolation - they are continuously seeing a updated view of the table
as it runs across rows.  That means a scanner can see rows inserted _after_ it's creation.
 Providing stronger isolation doesn't make sense since there is no intra-row atomic guarantees.

- Once a client gets a success after a mutation operation, all other clients, including itself
will be able to see the new data. 

In my work for HDFS-0.21, it was pretty obvious that hflush was fairly slow.  For high volume
updates, with lower value data (eg: calling ICV on a row many thousands of times a seconds)
it seemed to make sense to use a time-based flush.  That is the durability promise is relaxed
slightly to say that the row is only durable after X milliseconds (configurable) at the most.
 This is a per-table setting (see: HBASE-1944).

Right now we have the in-memory atomic reads.  The durability story is being improved in HBASE-2283
with restructuring hlog appends/syncs and memstore mutations.  The performance and locking
of in-memory atomic reads is being improved in HBASE-2248.

> Enumerate ACID properties of HBase in a well defined spec
> ---------------------------------------------------------
>                 Key: HBASE-2294
>                 URL: https://issues.apache.org/jira/browse/HBASE-2294
>             Project: Hadoop HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.20.4, 0.21.0
> It's not written down anywhere what the guarantees are for each operation in HBase with
regard to the various ACID properties. I think the developers know the answers to these questions,
but we need a clear spec for people building systems on top of HBase. Here are a few sample
questions we should endeavor to answer:
> - For a multicell put within a CF, is the update made durable atomically?
> - For a put across CFs, is the update made durable atomically?
> - Can a read see a row that hasn't been sync()ed to the HLog?
> - What isolation do scanners have? Somewhere between snapshot isolation and no isolation?
> - After a client receives a "success" for a write operation, is that operation guaranteed
to be visible to all other clients?
> etc
> I see this JIRA as having several points of discussion:
> - Evaluation of what the current state of affairs is
> - Evaluate whether we currently provide any guarantees that aren't useful to users of
the system (perhaps we can drop in exchange for performance)
> - Evaluate whether we are missing any guarantees that would be useful to users of the

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message