hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francis Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17306) IntegrationTestRSGroup#testRegionMove may fail due to region server not online
Date Wed, 21 Dec 2016 03:06:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765953#comment-15765953
] 

Francis Liu commented on HBASE-17306:
-------------------------------------

{quote}
Francis Liu:
Can you give us some background on the above requirement ?
{quote}
You can't move a regionserver that's not online in default group since membership in default
group is dynamic (all online regionservers that are not members of any other group) there
is no way to determine if a offline RS being move is a valid RS or not which would just lead
to more problems.

In any case it seems the problem here is more about stabilizing the test itself. ie Avoiding
the race of moving an RS that is still not online. 

> IntegrationTestRSGroup#testRegionMove may fail due to region server not online
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-17306
>                 URL: https://issues.apache.org/jira/browse/HBASE-17306
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Priority: Minor
>         Attachments: 17306.v1.txt
>
>
> {code}
> 2016-12-13 05:26:57,965|INFO|MainThread|machine.py:145 - run()|2) testRegionMove(org.apache.hadoop.hbase.rsgroup.IntegrationTestRSGroup)
> 2016-12-13 05:26:57,965|INFO|MainThread|machine.py:145 - run()|org.apache.hadoop.hbase.constraint.ConstraintException:
org.apache.hadoop.hbase.constraint.                    ConstraintException: Server ctr-e77-1481596162056-0240-01-000005.a.com:16020
is not an online server in default group.
> 2016-12-13 05:26:57,966|INFO|MainThread|machine.py:145 - run()|at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveServers(RSGroupAdminServer.java:135)
> 2016-12-13 05:26:57,966|INFO|MainThread|machine.py:145 - run()|at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.moveServers(RSGroupAdminEndpoint.java:169)
> 2016-12-13 05:26:57,966|INFO|MainThread|machine.py:145 - run()|at org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.
                         callMethod(RSGroupAdminProtos.java:11136)
> 2016-12-13 05:26:57,966|INFO|MainThread|machine.py:145 - run()|at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:679)
> 2016-12-13 05:26:57,966|INFO|MainThread|machine.py:145 - run()|at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2
> {code}
> Shortly before the test failure, the server was shutdown:
> {code}
> 2016-12-13 05:21:25,428 INFO  [MASTER_SERVER_OPERATIONS-ctr-e77-1481596162056-0240-01-000008:20000-4]
handler.ServerShutdownHandler: Finished processing of shutdown of ctr-  e77-1481596162056-0240-01-000005.a.com,16020,1481606309159
> ...
> 2016-12-13 05:26:57,935 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=20000]
master.ServerManager: Registering server=ctr-e77-1481596162056-0240-01-000005.hwx. site,16020,1481606803303
> 2016-12-13 05:27:06,219 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added
tracking of RS /hbase-secure/rs/ctr-e77-1481596162056-0240-01-000005.a.com,16020,       1481606803303
> {code}
> The registration of the new server (start code1481606803303) happened shortly after the
test failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message