Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AA1AD109E4 for ; Tue, 25 Mar 2014 23:01:19 +0000 (UTC) Received: (qmail 19567 invoked by uid 500); 25 Mar 2014 23:01:18 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 19432 invoked by uid 500); 25 Mar 2014 23:01:17 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 19251 invoked by uid 99); 25 Mar 2014 23:01:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Mar 2014 23:01:16 +0000 Date: Tue, 25 Mar 2014 23:01:16 +0000 (UTC) From: "Devaraj Das (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10833) Region assignment may fail during cluster start up MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10833?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D139= 47310#comment-13947310 ]=20 Devaraj Das commented on HBASE-10833: ------------------------------------- +1 > Region assignment may fail during cluster start up > -------------------------------------------------- > > Key: HBASE-10833 > URL: https://issues.apache.org/jira/browse/HBASE-10833 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Reporter: Jeffrey Zhong > Assignee: Jeffrey Zhong > Attachments: hbase-10833.patch > > > This is an intermittent & infrequent issue. It happens when a cluster is = starting up, only when one region server is available for region assignment= temporally and master has a transient issue to talk to the only RS=20 > When the RPC layer has a hiccup to talk to a RS, the RS will be put in fa= iled server list and it will stay in failed server list for 2 sec(default v= alue).=20 > While the 20 region retires completes much shorter than 2 secs, the end r= esult is a failed assignment.=20 > Below are the logging for the total time spent on 20 assignment retires, = it only took about 36 ms for all the retries. > {code} > 2014-03-24 18:14:43,451 WARN [AM.ZK.Worker-pool2-t59] master.AssignmentM= anager: Failed assignment of hbase:labels,,1395668310177.f7372ede6c8bd7de4e= 91bfeda884cffb. to hor15n18.gq1.ygridcore.net,60020,1395684489232, trying t= o assign elsewhere instead; try=3D1 of 20 > =E2=80=A6. > 2014-03-24 18:14:43,487 WARN [AM.ZK.Worker-pool2-t59] master.AssignmentM= anager: Failed assignment of hbase:labels,,1395668310177.f7372ede6c8bd7de4e= 91bfeda884cffb. to hor15n18.gq1.ygridcore.net,60020,1395684489232, trying t= o assign elsewhere instead; try=3D20 of 20 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)