Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB3EA10E27 for ; Wed, 12 Feb 2014 04:21:23 +0000 (UTC) Received: (qmail 59460 invoked by uid 500); 12 Feb 2014 04:21:21 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 59393 invoked by uid 500); 12 Feb 2014 04:21:19 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 59382 invoked by uid 99); 12 Feb 2014 04:21:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Feb 2014 04:21:19 +0000 Date: Wed, 12 Feb 2014 04:21:19 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10506) Fail-fast if client connection is lost before the real call be executed in RPC layer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898773#comment-13898773 ] Lars Hofhansl commented on HBASE-10506: --------------------------------------- Looks good to me. +1 > Fail-fast if client connection is lost before the real call be executed in RPC layer > ------------------------------------------------------------------------------------ > > Key: HBASE-10506 > URL: https://issues.apache.org/jira/browse/HBASE-10506 > Project: HBase > Issue Type: Bug > Components: IPC/RPC > Affects Versions: 0.94.3 > Reporter: Liang Xie > Assignee: Liang Xie > Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 > > Attachments: HBASE-10506-0.94.txt, HBASE-10506-trunk.txt > > > In current HBase rpc impletement, there is no any connection double-checking just before the "call" be invoked, considing there's a gc or other OS scheduling or the call queue is full enough(e.g. the server side is slow/hang due to some issues), and if the client side has a small rpc timeout value, it could be possible when this request be taken from call queue, the client connection is lost in that moment. we'd better has some fail-fast code before the reall "call" be invoked, it just waste the server side resource. > Here is a strace trace from our production env: > 2014-02-11,18:16:19,525 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@3eae6c77, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":{"X":["T"]},"maxVersions":1,"row":"074103000000001-m8997060"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43252: output error > 2014-02-11,18:16:19,526 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 151 on 12600 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null > 2014-02-11,18:16:19,797 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: > org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call get([B@3f10ffd2, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":{"X":["T"]},"maxVersions":1,"row":"4245978-m7281526"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43259 after 0 ms, since caller disconnected > at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:450) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3633) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4414) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4387) > at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2075) > at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457) > 2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@3f10ffd2, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":{"X":["T"]},"maxVersions":1,"row":"4245978-m7281526"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43259: output error > 2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 46 on 12600 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null > With this fix, we can reduce this hit probability at least:) the upstream hadoop has this checking already, see: https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2034-L2036 -- This message was sent by Atlassian JIRA (v6.1.5#6160)