hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2294) Enumerate ACID properties of HBase in a well defined spec
Date Tue, 16 Mar 2010 06:32:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845736#action_12845736

Todd Lipcon commented on HBASE-2294:

bq. IMHO having the scanner stay 'up to date' as much as possible is a nice-to-have, definitely
not important enough to hurt performance.

I think I agree with you. I don't want to sidetrack this particular JIRA towards implementation
details, so I'll leave it at that. Without regard to the specifics of the other JIRA, it seems
likely to me that the "as up to date as possible" can often be implemented _more_ efficiently
than the "snapshot iterator". The current implementation may not be up to snuff, so I'll leave
it at this: I think the scanner semantics should be as loose as possible to achieve the maximum
speed, and I view "up to date" as _looser_ than snapshot.

bq. I would think that clients which do 'lengthy scans' don't particularly care about performance

I disagree - MR jobs are a typical "lengthy scan" application and throughput is certainly
important. Especially important is the ability to have the bulk (MR) jobs coexist with high
concurrent live load on the table.

> Enumerate ACID properties of HBase in a well defined spec
> ---------------------------------------------------------
>                 Key: HBASE-2294
>                 URL: https://issues.apache.org/jira/browse/HBASE-2294
>             Project: Hadoop HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.20.4, 0.21.0
> It's not written down anywhere what the guarantees are for each operation in HBase with
regard to the various ACID properties. I think the developers know the answers to these questions,
but we need a clear spec for people building systems on top of HBase. Here are a few sample
questions we should endeavor to answer:
> - For a multicell put within a CF, is the update made durable atomically?
> - For a put across CFs, is the update made durable atomically?
> - Can a read see a row that hasn't been sync()ed to the HLog?
> - What isolation do scanners have? Somewhere between snapshot isolation and no isolation?
> - After a client receives a "success" for a write operation, is that operation guaranteed
to be visible to all other clients?
> etc
> I see this JIRA as having several points of discussion:
> - Evaluation of what the current state of affairs is
> - Evaluate whether we currently provide any guarantees that aren't useful to users of
the system (perhaps we can drop in exchange for performance)
> - Evaluate whether we are missing any guarantees that would be useful to users of the

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message