hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-7212) Globally Barriered Procedure mechanism
Date Sun, 02 Dec 2012 02:01:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508128#comment-13508128
] 

Andrew Purtell edited comment on HBASE-7212 at 12/2/12 2:00 AM:
----------------------------------------------------------------

bq. The main questions I had when I was initially understanding the previous implementation
was "Is this 2pc?" and "Do we need 2pc?". The answers are: what we have implemented here has
two phases but is not true two-phase commit. 2pc, as defined in the literature (http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf),
requires that once the coordinator says something is committed, any failures at a member or
coordinator must be recover by failing forward and completing it. The key point here is that
while we will need a global barrier for one of the snapshot flavors (global), it don't need
full 2PC because 1) the we don't need to undo work (like a log roll or flush) if some sub
part of the first phase (our acquire/2pc's prepare) fails, and because 2) we don't need to
recover failing forward if anything fails in the second phase (our release/2pc's commit).
In the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and carry on
with extra flushed/rolled hlogs.

+1 

This makes a good case. I like the "keep it as simple as possible and only do as much as we
actually need to" approach.

Edit: Moved unrelated comment to HBASE-7254
                
      was (Author: apurtell):
    bq. The main questions I had when I was initially understanding the previous implementation
was "Is this 2pc?" and "Do we need 2pc?". The answers are: what we have implemented here has
two phases but is not true two-phase commit. 2pc, as defined in the literature (http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf),
requires that once the coordinator says something is committed, any failures at a member or
coordinator must be recover by failing forward and completing it. The key point here is that
while we will need a global barrier for one of the snapshot flavors (global), it don't need
full 2PC because 1) the we don't need to undo work (like a log roll or flush) if some sub
part of the first phase (our acquire/2pc's prepare) fails, and because 2) we don't need to
recover failing forward if anything fails in the second phase (our release/2pc's commit).
In the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and carry on
with extra flushed/rolled hlogs.

+1 

This makes a good case. I like the "keep it as simple as possible and only do as much as we
actually need to" approach.

I can see a use for this in security too. We could tighten up the permissions cache using
a barrier for grant and revoke ops. In other words, replace the current ZK watcher based permissions
cache "RPC via ZK" with the Procedure mechanism that provides much the same, but with the
added benefit that we can fail the grant or revoke op if one or more RSes fail to ack the
update.
                  
> Globally Barriered Procedure mechanism
> --------------------------------------
>
>                 Key: HBASE-7212
>                 URL: https://issues.apache.org/jira/browse/HBASE-7212
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: hbase-6055
>
>         Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch, pre-hbase-7212.patch
>
>
> This is a simplified version of what was proposed in HBASE-6573.  Instead of claiming
to be a 2pc or 3pc implementation (which implies logging at each actor, and recovery operations)
this is just provides a best effort global barrier mechanism called a Procedure.  
> Users need only to implement a methods to acquireBarrier, to act when insideBarrier,
and to releaseBarrier that use the ExternalException cooperative error checking mechanism.
> Globally consistent snapshots require the ability to quiesce writes to a set of region
servers before a the snapshot operation is executed.  Also if any node fails, it needs to
be able to notify them so that they abort.
> The first cut of other online snapshots don't need the fully barrier but may still use
this for its error propagation mechanisms.
> This version removes the extra layer incurred in the previous implementation due to the
use of generics, separates the coordinator and members, and reduces the amount of inheritance
used in favor of composition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message