Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 23 Dec 2014 01:49:14 +0000 (UTC)
From: "Enis Soztutar (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12763276.1419270443000.88372.1419299354225@Atlassian.JIRA>
In-Reply-To: <JIRA.12763276.1419270443000@Atlassian.JIRA>
References: <JIRA.12763276.1419270443000@Atlassian.JIRA>
 <JIRA.12763276.1419270443602@arcas>
Subject: [jira] [Updated] (HBASE-12743) [ITBLL] Master fails rejoining
 cluster stuck splitting logs; Distributed log replay=true
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HBASE-12743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-12743:
----------------------------------
    Fix Version/s: 1.1.0
                   2.0.0
                   1.0.0

> [ITBLL] Master fails rejoining cluster stuck splitting logs; Distributed log replay=true
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-12743
>                 URL: https://issues.apache.org/jira/browse/HBASE-12743
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 1.0.0, 2.0.0, 1.1.0
>
>
> Master is stuck for two days trying to rejoin cluster after monkey killed and restarted it.
> After retrying to get namespace 350 times, Master goes down:
> {code}
> 2014-12-20 18:43:54,285 INFO  [c2020:16020.activeMasterManager] client.RpcRetryingCaller: Call exception, tries=349, retries=350, started=6885331 ms ago, cancelled=false, msg=row 'default' on table 'hbase:namespace' at region=hbase:namespace,,1417551886199.ecdcd0172cd3e32d291bc282771895da., hostname=c2023.halxg.cloudera.com,16020,1418988286696, seqNum=6000000190
> 2014-12-20 18:43:54,303 WARN  [c2020:16020.activeMasterManager] master.TableNamespaceManager: Caught exception in initializing namespace table manager
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=350, exceptions:
> Sat Dec 20 16:49:08 PST 2014, RpcRetryingCaller{globalStartTime=1419122948954, pause=100, retries=350}, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region hbase:namespace,,1417551886199.ecdcd0172cd3e32d291bc282771895da. is not online on c2023.halxg.cloudera.com,16020,1418988286696
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2722)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:851)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1695)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30434)
> {code}
> Seems like 2014-12-20 16:49:03,665 INFO  [RS_LOG_REPLAY_OPS-c2021:16020-0] wal.WALSplitter: DistributedLogReplay = true
> Seems easy enough to reproduce.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)