Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 90F86200B9F for ; Tue, 11 Oct 2016 22:41:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8DFFE160AE6; Tue, 11 Oct 2016 20:41:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D4E37160AC3 for ; Tue, 11 Oct 2016 22:41:21 +0200 (CEST) Received: (qmail 69142 invoked by uid 500); 11 Oct 2016 20:41:21 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 69131 invoked by uid 99); 11 Oct 2016 20:41:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2016 20:41:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8C5D82C4C79 for ; Tue, 11 Oct 2016 20:41:20 +0000 (UTC) Date: Tue, 11 Oct 2016 20:41:20 +0000 (UTC) From: "David Pope (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-16810) HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers in /hbase/draining znode and unloaded MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 11 Oct 2016 20:41:22 -0000 [ https://issues.apache.org/jira/browse/HBASE-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pope updated HBASE-16810: ------------------------------- Attachment: master.patch The issue is that the wrong value from cluster.regionLocations is sent as the server index to cluster.getLocalityOfRegion(int region, int server) in LocalityCostFunction.cost(). Specifically, the index of the region's RegionLocation is being passed instead of its value, where the value is the server index. As a result, the calculation is based on a "random" server rather than the server hosting the region. So, this calculation is calculating the wrong value even when it doesn't throw an exception. In the case when region servers are in draining, the draining servers are removed from the servers array and a "-1" is stored where it existed in cluster.RegionLocations. If the server that is hosting the region is stored at an index of cluster.RegionLocations greater than the size of the servers array, cluster.getLocalityOfRegion(int region, int server) will throw an ArrayIndexOutOfBoundsException. E.g., servers: [0] = "server0" [1] = "server1" [2] = "server2" regions: [0] = "region0" [1] = "region1" [2] = "region2" regionIndexToServerIndex: [0] = 2 // region0 is hosted on server2 [1] = 0 [3] = 1 RegionLocations: [0][0] = 1 // region0 has blocks on server1 [0][1] = 0 [0][2] = -1 // region0 has blocks on a server in draining [0][3] = 2 // this is the matching entry, but 3 is used as the server index instead of 2 when calling getLocalityOfRegion(int region, int server) > HBase Balancer throws ArrayIndexOutOfBoundsException when regionservers in /hbase/draining znode and unloaded > ------------------------------------------------------------------------------------------------------------- > > Key: HBASE-16810 > URL: https://issues.apache.org/jira/browse/HBASE-16810 > Project: HBase > Issue Type: Bug > Components: Balancer > Affects Versions: 2.0.0, 1.3.0 > Reporter: Ashu Pachauri > Assignee: David Pope > Attachments: master.patch > > > 1. Add a regionserver znode under /hbase/draining znode. > 2. Use RegionMover to unload all regions from the regionserver. > 3. Run balancer. > {code} > 16/09/21 14:17:33 ERROR ipc.RpcServer: Unexpected throwable object > java.lang.ArrayIndexOutOfBoundsException: 75 > at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.getLocalityOfRegion(BaseLoadBalancer.java:867) > at org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer$LocalityCostFunction.cost(StochasticLoadBalancer.java:1186) > at org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.computeCost(StochasticLoadBalancer.java:521) > at org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:309) > at org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:264) > at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1339) > at org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:442) > at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58555) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2268) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)