Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 15074200BBA for ; Sat, 22 Oct 2016 08:10:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 13AB7160AE8; Sat, 22 Oct 2016 06:10:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5F870160AE9 for ; Sat, 22 Oct 2016 08:10:00 +0200 (CEST) Received: (qmail 62062 invoked by uid 500); 22 Oct 2016 06:09:59 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 61985 invoked by uid 99); 22 Oct 2016 06:09:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Oct 2016 06:09:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id F0E382C2A66 for ; Sat, 22 Oct 2016 06:09:58 +0000 (UTC) Date: Sat, 22 Oct 2016 06:09:58 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16815) Low scan ratio in RPC queue tuning triggers divide by zero exception MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 22 Oct 2016 06:10:01 -0000 [ https://issues.apache.org/jira/browse/HBASE-16815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15597264#comment-15597264 ] Hudson commented on HBASE-16815: -------------------------------- SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #52 (See [https://builds.apache.org/job/HBase-1.2-JDK7/52/]) HBASE-16815 Low scan ratio in RPC queue tuning triggers divide by zero (matteo.bertozzi: rev 5526c947082ce37e93f0a6c330e6828f2fadaede) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RWQueueRpcExecutor.java > Low scan ratio in RPC queue tuning triggers divide by zero exception > -------------------------------------------------------------------- > > Key: HBASE-16815 > URL: https://issues.apache.org/jira/browse/HBASE-16815 > Project: HBase > Issue Type: Bug > Components: regionserver, rpc > Affects Versions: 2.0.0, 1.3.0 > Reporter: Lars George > Assignee: Guanghao Zhang > Fix For: 2.0.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-16815-branch-1.2.patch, HBASE-16815.patch > > > Trying the following settings: > {noformat} > > hbase.ipc.server.callqueue.handler.factor > 0.5 > > > hbase.ipc.server.callqueue.read.ratio > 0.5 > > > hbase.ipc.server.callqueue.scan.ratio > 0.1 > > {noformat} > With 30 default handlers, this means 15 queues. Further, it means 8 write queues and 7 read queues. 10% of that is {{0.7}} which is then floor'ed to {{0}}. The debug log confirms it, as the tertiary check omits the scan details when they are zero: > {noformat} > 2016-10-12 12:50:27,305 INFO [main] ipc.SimpleRpcScheduler: Using fifo as user call queue, count=15 > 2016-10-12 12:50:27,311 DEBUG [main] ipc.RWQueueRpcExecutor: FifoRWQ.default writeQueues=7 writeHandlers=15 readQueues=8 readHandlers=14 > {noformat} > But the code in {{RWQueueRpcExecutor}} calls {{RpcExecutor.startHandler()}} nevertheless and that does this: > {code} > for (int i = 0; i < numHandlers; i++) { > final int index = qindex + (i % qsize); > String name = "RpcServer." + threadPrefix + ".handler=" + handlers.size() + ",queue=" + > index + ",port=" + port; > {code} > The modulo triggers then > {noformat} > 2016-10-12 11:41:22,810 ERROR [main] master.HMasterCommandLine: Master exiting > java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster > at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145) > at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:220) > at org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:155) > at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:222) > at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:137) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2524) > Caused by: java.lang.ArithmeticException: / by zero > at org.apache.hadoop.hbase.ipc.RpcExecutor.startHandlers(RpcExecutor.java:125) > at org.apache.hadoop.hbase.ipc.RWQueueRpcExecutor.startHandlers(RWQueueRpcExecutor.java:178) > at org.apache.hadoop.hbase.ipc.RpcExecutor.start(RpcExecutor.java:78) > at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.start(SimpleRpcScheduler.java:272) > at org.apache.hadoop.hbase.ipc.RpcServer.start(RpcServer.java:2212) > at org.apache.hadoop.hbase.regionserver.RSRpcServices.start(RSRpcServices.java:1143) > at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:615) > at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:396) > at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.(HMasterCommandLine.java:312) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:140) > ... 7 more > {noformat} > That causes the server to not even start. I would suggest we either skip the {{startHandler()}} call altogether, or make it zero aware. > Another possible option is to reserve at least _one_ scan handler/queue when the scan ratio is greater than zero, but only of there is more than one read handler/queue to begin with. Otherwise the scan handler/queue should be zero and share the one read handler/queue. > Makes sense? -- This message was sent by Atlassian JIRA (v6.3.4#6332)