hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
Date Thu, 19 Jan 2012 06:46:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188954#comment-13188954

stack commented on HBASE-5179:

Regards v11 new splitLog method, I don't get this justification Zhihong:

"What I am thinking is that maybe we should split currentMetaServer's log in a non-distributed
fashion because the splitting is of high priority."

Is the thought that local splitting will run faster?  Is this true?

areDeadServersInProgress method name should match the other method names so it should be areDeadServersBeingProcessed
(minor).  Ditto these methods, isDeadRootServerInProgress, etc.  Whats the difference between
InProgress and BeingProcessed?  We also seem to have active voice UnderProcessing going on.
 Should be consistent?

Do these need to be public?  Seem like only used in same package by master.

Should the zk callback be up and operating before the master comes completely on line?

The knownServers in HMaster, are heartbeating servers that have come in before the master
came on line?  That seems like an important fix.

I'm now a little confused as to the scope of this patch.  The Jinchao descriptions above on
how to reproduce pathological situations I get.  It'd be great to do these up as a unit tests.
 I'm not sure which of Jinchao descriptions apply to TRUNK as opposed to 0.90.  Any chance
of getting a list of scenarios this patch is supposed to fix?  If we had that, I'd be up for
writing unit tests for TRUNK at least (I think it has sufficient primitives mocking up Jinchao
descriptions w/o need of a cluster).

That said, this patch and the discussion above in this issue is uncovering critical stuff;
thanks for all the work lads.

> Concurrent processing of processFaileOver and ServerShutdownHandler may cause region
to be assigned before log splitting is completed, causing data loss
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-5179
>                 URL: https://issues.apache.org/jira/browse/HBASE-5179
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.6
>         Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch,
5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch,
5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch,
hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch,
> If master's processing its failover and ServerShutdownHandler's processing happen concurrently,
it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be doing split
log, Therefore, it may cause data loss.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message