hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
Date Tue, 13 Mar 2012 03:51:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228174#comment-13228174

stack commented on HBASE-5270:

@Chunhui Sorry.  I thought I'd provided comments already on this issue.  It seems like my
comments were lost by me.

So, to me its odd that we disable one handler only; i.e. serverShutdownHandlerEnabled

If a split or a close or an open come in while master is initializing, won't that mess up
our view of the cluster?  Why not turn all all handlers on initialization queuing tasks?

I would tolerate our doing this in another issue if thats what you think we should do.

Who needs this:


+  void joinCluster() throws IOException, KeeperException, InterruptedException {

I think you need to add comments explaining what deadNotExpired servers are.  Its a weird
concept and folks can go read over where they are defined but it could help if up in HMaster
and AssingmentManager where they are used, you explain what they are to help those coming
after trying to figure whats going on.  Maybe point at the explaination over in ServerManager?

+    Iterator<ServerName> serverIterator=deadNotExpiredServers.iterator();

Notice how there are spaces around operators in all of the surrounding code.

What happens if when we call expireDeadNotExpiredServers, the server has already been expired?

Why do the following:

+    serverShutdownHandlerEnabled = true;
+    this.serverManager.expireDeadNotExpiredServers();

.. then in expireDeadNotExpiredServers, first thing we do is check if serverShutdownHandlerEnabled
is true or not.  Under what cirumstance would it not be true if we just set it?  This is only
place we call expireDeadNotExpiredServers.

In general I'm in favor of committing this patch because it has a test.  Thanks Chunhui.

> Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
> -----------------------------------------------------------------------------------------------------
>                 Key: HBASE-5270
>                 URL: https://issues.apache.org/jira/browse/HBASE-5270
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master
>            Reporter: Zhihong Yu
>            Assignee: chunhui shen
>             Fix For: 0.92.2
>         Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch,
5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch,
hbase-5270v10.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch,
hbase-5270v7.patch, hbase-5270v8.patch, hbase-5270v9.patch, sampletest.txt
> This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about
patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does not seem to
be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant by definitive?
Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying root and
> What is difference between asking assignment manager isCarryingRoot and this variable
that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be explained... either
in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I
think I know but am unsure. God love the poor noob that comes awandering this code trying
to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that have not checked
in. Don't we do this operation earlier in master setup? Are we doing it again here?
> Though distributed split log is configured, we will do in master single process splitting
under some conditions with this patch. Its not explained in code why we would do this. Why
do we think master log splitting 'high priority' when it could very well be slower. Should
we only go this route if distributed splitting is not going on. Do we know if concurrent distributed
log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a servershutdownhandler
> This patch is different to the patch for 0.90. Should go into trunk first with tests,
then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this
issue is for 0.90.x and new issue for more work on this trunk patch?
> This patch needs to have the v18 differences applied.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message