hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7212) Globally Barriered Procedure mechanism
Date Tue, 04 Dec 2012 03:01:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509465#comment-13509465
] 

Jonathan Hsieh commented on HBASE-7212:
---------------------------------------

bq. What happens when the coordinator dies (in this case hmaster). Does the new HMaster discover
the prev procedure and abort?

The new HMaster will delete all znodes associated with the procedure class (all znodes associated
with snapshotting procedures), all members still using them should timeout and fail, and new
operations need to be issued.  For snapshots in particular, there isn't really a chance for
a partial snapshot being present when taking one because all the snapshot work is done in
a temp dir and atomically put into place with a dir rename op after the coordinator realizes
all the members have released/leave'd successfully.   There will be junk in these tmp dirs
left over but they get cleaned up on the next take snapshot attempt, or when the new master
starts.


                
> Globally Barriered Procedure mechanism
> --------------------------------------
>
>                 Key: HBASE-7212
>                 URL: https://issues.apache.org/jira/browse/HBASE-7212
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: hbase-6055
>
>         Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch, pre-hbase-7212.patch
>
>
> This is a simplified version of what was proposed in HBASE-6573.  Instead of claiming
to be a 2pc or 3pc implementation (which implies logging at each actor, and recovery operations)
this is just provides a best effort global barrier mechanism called a Procedure.  
> Users need only to implement a methods to acquireBarrier, to act when insideBarrier,
and to releaseBarrier that use the ExternalException cooperative error checking mechanism.
> Globally consistent snapshots require the ability to quiesce writes to a set of region
servers before a the snapshot operation is executed.  Also if any node fails, it needs to
be able to notify them so that they abort.
> The first cut of other online snapshots don't need the fully barrier but may still use
this for its error propagation mechanisms.
> This version removes the extra layer incurred in the previous implementation due to the
use of generics, separates the coordinator and members, and reduces the amount of inheritance
used in favor of composition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message