hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7551) nodeChildrenChange event may happen after the transition to RS_ZK_REGION_SPLITTING in SplitTransaction causing the SPLIT event to be missed in the master side.
Date Tue, 15 Jan 2013 04:20:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553472#comment-13553472

ramkrishna.s.vasudevan commented on HBASE-7551:

bq.While with the proposal we would have
bq.Regionserver RPC master
bq.Master creates znode
bq.Does master RPC region servers now?
So you mean should we make the master RPC the region server to say i have created the znode?
May be that is not needed.  Why dont the RS act based on the znode.  
bq.If I not wrong, we would still have to manage that the master can miss the 'new child event'
for this specific znode?
I am not sure i follow you. I feel that the master is sure to see the new children.  Its only
the fact when it sees it.  So once the master sees let it change the state so that znode version
changes and let RS work on it.
What do you think of the suggestion 
bq.Another way could be, make the RS create the znode in SPLITTING state. Let the transition
be done by the master from SPLITTING to SPLITTING. This ensures that there is a change in
the znode version. Now only if the version changes let the RS carry on with its further steps
> nodeChildrenChange event may happen after the transition to RS_ZK_REGION_SPLITTING in
SplitTransaction causing the SPLIT event to be missed in the master side.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-7551
>                 URL: https://issues.apache.org/jira/browse/HBASE-7551
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.4
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.96.0, 0.94.5
> This came from HBASE-7468.
> I got the issue. I am able to reproduce this
> See the logs
> {code}
> 2013-01-14 14:37:21,760 INFO  [main] regionserver.SplitTransaction(216): Starting split
of region testShouldClearRITWhenNodeFoundInSplittingState,,1358154439514.a9e57d09c58b3ef3b949d602232fb2c2.
> 2013-01-14 14:37:21,760 DEBUG [main] regionserver.SplitTransaction(871): regionserver:61665-0x13c384e4e4f0002
Creating ephemeral node for a9e57d09c58b3ef3b949d602232fb2c2 in SPLITTING state
> 2013-01-14 14:37:21,844 DEBUG [main] zookeeper.ZKAssign(757): regionserver:61665-0x13c384e4e4f0002
Attempting to transition node a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING
> 2013-01-14 14:37:21,849 DEBUG [Thread-873-EventThread] zookeeper.ZooKeeperWatcher(277):
master:62334-0x13c384e4e4f001b Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected,
> 2013-01-14 14:37:21,853 DEBUG [main] zookeeper.ZKUtil(1565): regionserver:61665-0x13c384e4e4f0002
Retrieved 140 byte(s) of data from znode /hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2;
origin=Ram.Home,61665,1358154325430, state=RS_ZK_REGION_SPLITTING
> 2013-01-14 14:37:21,918 DEBUG [main] zookeeper.ZKAssign(820): regionserver:61665-0x13c384e4e4f0002
Successfully transitioned node a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING
> 2013-01-14 14:37:21,919 DEBUG [Thread-873-EventThread] zookeeper.ZKUtil(417): master:62334-0x13c384e4e4f001b
Set watcher on existing znode /hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2
> {code}
> Here we can observe that the SPLITTING node was first created. Then we transit it to
SPLITTING to SPLITTING so that AM can have the nodeDataChange event. But for the nodeDataChange
event to happen first nodeChildrenChange event should happen so that the master can set a
watcher on the node.
> Now when this hang happens, we can see that after the transition happens only then the
watcher is set by nodeChildrenChange event and so the SPLITTING to SPLITTING event itself
is missed or skipped.
> Ideally the nodeChildrenChange event iterates thro the list of new znodes on the /hbase/assignment
nodes. And then creates a watcher on that. One reason could be there are more than one znode
and so the watch setting operation takes time. The order of execution is different when we
try running from eclipse and when we run mvn tests. 
> My conclusion is that the testcase actually reveals the problem but the same can happen
in any case where the SPLITTING event can get missed out. May be some of the SPLIT related
bugs that were raised is due to this? Need to analyse.
> Any suggestions welcome. We should ensure that the transition from SPLITTING to SPLITTING
should happen only after the master has set the watch on the znode and we should be sure of

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message