hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry
Date Sat, 15 Aug 2015 01:34:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698037#comment-14698037
] 

Hadoop QA commented on HBASE-14207:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12750536/HBASE-14207-0.98-V2.patch
  against 0.98 branch at commit 45aafb25b7911b5917ab47133e3e4268806e4c91.
  ATTACHMENT ID: 12750536

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions
(2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the total number of
protoc compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 23 warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the total number
of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

    {color:red}-1 site{color}.  The patch appears to cause mvn post-site goal to fail.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15106//testReport/
Release Findbugs (version 2.0.3) 	warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15106//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15106//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15106//artifact/patchprocess/patchJavadocWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15106//console

This message is automatically generated.

> Region was hijacked and remained in transition when RS failed to open a region and later
regionplan changed to new RS on retry
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14207
>                 URL: https://issues.apache.org/jira/browse/HBASE-14207
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.98.6
>            Reporter: Pankaj Kumar
>            Assignee: Pankaj Kumar
>            Priority: Critical
>             Fix For: 0.98.15
>
>         Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98.patch
>
>
> On production environment, following events happened
> 1. Master is trying to assign a region to RS, but due to KeeperException$SessionExpiredException
RS failed to open the region.
> 	In RS log, saw multiple WARN log related to KeeperException$SessionExpiredException

> 		> KeeperErrorCode = Session expired for /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
> 		> Unable to get data of znode /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
> 2. Master retried to assign the region to same RS, but RS again failed.
> 3. On second retry new plan formed and this time plan destination (RS) is different,
so master send the request to new RS to open the region. But new RS failed to open the region
as there was server mismatch in ZNODE than the  expected current server name. 
> Logs Snippet:
> {noformat}
> HM
> 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 08f1935d652e5dbdac09b423b8f9401b
in state: M_ZK_REGION_OFFLINE | org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
> 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned {08f1935d652e5dbdac09b423b8f9401b
state=OFFLINE, ts=1436817029679, server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN,
ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed region 08f1935d652e5dbdac09b423b8f9401b
in state M_ZK_REGION_OFFLINE, on server: T101PC03VM13,21302,1436816690692 | org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
> 2015-07-14 03:50:29,800 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to T101PC03VM13,21302,1436816690692
| org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:29,801 | WARN  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed
assignment of INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 of 10 | org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
> 2015-07-14 03:50:29,802 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying
to re-assign INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
the same failed server. | org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
> 2015-07-14 03:50:31,804 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to T101PC03VM13,21302,1436816690692
| org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:31,806 | WARN  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed
assignment of INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 of 10 | org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
> 2015-07-14 03:50:31,807 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned
{08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, server=T101PC03VM13,21302,1436816690692}
to {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692}
| org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:50:31,807 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to T101PC03VM14,21302,1436816997967
| org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:31,807 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned
{08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692}
to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031807, server=T101PC03VM14,21302,1436816997967}
| org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:51:09,501 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-4 | Skip
assigning region in transition on other server{08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN,
ts=1436817031807, server=T101PC03VM14,21302,1436816997967} | org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:250)
> {noformat}
> {noformat}
> RS - T101PC03VM14
> 2015-07-14 03:50:31,809 | INFO  | PriorityRpcServer.handler=2,queue=0,port=21302 | Open
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. | org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:3671)
> 2015-07-14 03:50:31,830 | WARN  | RS_OPEN_REGION-T101PC03VM14:21302-2 | regionserver:21302-0xe4e88f6f1b70002,
quorum=t101pc03vm12:24002,t101pc03vm13:24002,t101pc03vm14:24002, baseZNode=/hbase Attempt
to transition the unassigned node for 08f1935d652e5dbdac09b423b8f9401b from M_ZK_REGION_OFFLINE
to RS_ZK_REGION_OPENING failed, the server that tried to transition was T101PC03VM14,21302,1436816997967
not the expected T101PC03VM13,21302,1436816690692 | org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:875)
> 2015-07-14 03:50:31,830 | WARN  | RS_OPEN_REGION-T101PC03VM14:21302-2 | Failed transition
from OFFLINE to OPENING for region=08f1935d652e5dbdac09b423b8f9401b | org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:539)
> 2015-07-14 03:50:31,831 | WARN  | RS_OPEN_REGION-T101PC03VM14:21302-2 | Region was hijacked?
Opening cancelled for encodedName=08f1935d652e5dbdac09b423b8f9401b | org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:132)
> 2015-07-14 03:50:31,831 | INFO  | RS_OPEN_REGION-T101PC03VM14:21302-2 | Opening of region
{ENCODED => 08f1935d652e5dbdac09b423b8f9401b, NAME => 'INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b.',
STARTKEY => '', ENDKEY => '200'} failed, transitioning from OFFLINE to FAILED_OPEN in
ZK, expecting version -1 | org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tryTransitionFromOfflineToFailedOpen(OpenRegionHandler.java:436)
> 2015-07-14 03:50:31,834 | WARN  | RS_OPEN_REGION-T101PC03VM14:21302-2 | regionserver:21302-0xe4e88f6f1b70002,
quorum=t101pc03vm12:24002,t101pc03vm13:24002,t101pc03vm14:24002, baseZNode=/hbase Attempt
to transition the unassigned node for 08f1935d652e5dbdac09b423b8f9401b from M_ZK_REGION_OFFLINE
to RS_ZK_REGION_FAILED_OPEN failed, the server that tried to transition was T101PC03VM14,21302,1436816997967
not the expected T101PC03VM13,21302,1436816690692 | org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:875)
> 2015-07-14 03:50:31,834 | WARN  | RS_OPEN_REGION-T101PC03VM14:21302-2 | Unable to mark
region {ENCODED => 08f1935d652e5dbdac09b423b8f9401b, NAME => 'INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b.',
STARTKEY => '', ENDKEY => '200'} as FAILED_OPEN. It's likely that the master already
timed out this open attempt, and thus another RS already has the region. | org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tryTransitionFromOfflineToFailedOpen(OpenRegionHandler.java:444)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message