hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94
Date Sun, 03 Feb 2013 15:20:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569798#comment-13569798
] 

Himanshu Vashishtha commented on HBASE-7607:
--------------------------------------------

Interestingly, with this patch, the regionserver which is aborted is processed normally. And,
the test passes its normal phase. Its in the cluster shutdown process, sometimes master is
not able to process the other regionserver dying process, but the cluster is considered as
shutdown by JVMClusterUtil.
{code}
2013-01-30 19:40:14,048 INFO  [RegionServer:0;localhost,49074,1359600001555] regionserver.HRegionServer(851):
stopping server localhost,49074,1359600001555; zookeeper connection closed.

2013-01-30 19:40:14,048 INFO  [RegionServer:0;localhost,49074,1359600001555] regionserver.HRegionServer(854):
RegionServer:0;localhost,49074,1359600001555 exiting

2013-01-30 19:40:14,048 INFO  [localhost,35387,1359600001393.timerUpdater] hbase.Chore(80):
localhost,35387,1359600001393.timerUpdater exiting

2013-01-30 19:40:14,048 INFO  [Shutdown of org.apache.hadoop.hbase.fs.HFileSystem@32d35f5f]
hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(182): Hook closing fs=org.apache.hadoop.hbase.fs.HFileSystem@32d35f5f

2013-01-30 19:40:14,049 INFO  [main] util.JVMClusterUtil(262): Shutdown of 1 master(s) and
2 regionserver(s) complete

{code}

{code}
2013-01-30 19:40:14,168 INFO  [RegionServer:0;localhost,49074,1359600001555.leaseChecker]
regionserver.Leases(132): RegionServer:0;localhost,49074,1359600001555.leaseChecker closed
leases

2013-01-30 19:40:14,227 INFO  [Master:0;localhost,35387,1359600001393] master.ServerManager(357):
Waiting on regionserver(s) to go down localhost,49074,1359600001555
{code}

But, master thread still looping in its ServerManager#letRegionServersShutdown method to process
the dead regionserver, which it doesn't get. I am looking into the reason why this happens
only with this patch (frequently is around 1/5).
                
> Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94
> -------------------------------------------------------------------
>
>                 Key: HBASE-7607
>                 URL: https://issues.apache.org/jira/browse/HBASE-7607
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, test
>    Affects Versions: 0.94.4
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.6
>
>         Attachments: HBASE-7607-v2.patch
>
>
> TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk and 0.94.X.
The codebase is different in both. 
> In 0.94.x, client retries to look at the root region, while the cluster is down and /hbase
znode is no longer present.
> "Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with
the one configured in the master."
> I will file a separate jira for the trunk as the code is different there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message