Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 79669 invoked from network); 14 Jan 2010 06:51:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Jan 2010 06:51:18 -0000 Received: (qmail 44785 invoked by uid 500); 14 Jan 2010 06:51:18 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 44714 invoked by uid 500); 14 Jan 2010 06:51:18 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 44704 invoked by uid 99); 14 Jan 2010 06:51:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jan 2010 06:51:18 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jan 2010 06:51:15 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 79780234C1F0 for ; Wed, 13 Jan 2010 22:50:54 -0800 (PST) Message-ID: <1765999918.235131263451854495.JavaMail.jira@brutus.apache.org> Date: Thu, 14 Jan 2010 06:50:54 +0000 (UTC) From: "Benoit Sigoure (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-2121) HBase client doesn't retry the right number of times when a region is unavailable In-Reply-To: <874407477.230511263434454588.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated HBASE-2121: ---------------------------------- Summary: HBase client doesn't retry the right number of times when a region is unavailable (was: HBase client doesn't retry the right number of times when a region server is unavailable) Actually, this issue doesn't require a region *server* to be unavailable, just a region itself. > HBase client doesn't retry the right number of times when a region is unavailable > --------------------------------------------------------------------------------- > > Key: HBASE-2121 > URL: https://issues.apache.org/jira/browse/HBASE-2121 > Project: Hadoop HBase > Issue Type: Bug > Components: client > Affects Versions: 0.20.2, 0.21.0 > Reporter: Benoit Sigoure > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries retries 10 times (by default). It ends up calling HConnectionManager$TableServers.locateRegionInMeta, which retries 10 times on its own. So the HBase client is effectively retrying 100 times before giving up, instead of 10 (10 is the default hbase.client.retries.number). > I'm using hbase trunk HEAD. I verified this bug is also in 0.20.2. > Sample call stack: > org.apache.hadoop.hbase.client.RegionOfflineException: region offline: mytable,,1263421423787 > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:709) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:640) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:609) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:430) > at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) > at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:62) > at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1047) > at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:836) > at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:756) > at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:354) > at > How to reproduce: > with a trivial HBase client (mine was just trying to scan the table), start the client, take offline the table the client uses, tell the client to start the scan. The client will not give up after 10 attempts, unlike what it's supposed to do. > If locateRegionInMeta is only ever called from getRegionServerWithRetries, then the fix is trivial: just remove the retry logic in there. If it has some other callers who possibly relied on the retry logic in locateRegionInMeta, then the fix is going to be a bit more involved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.