Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0643A181F2 for ; Wed, 1 Jul 2015 02:30:07 +0000 (UTC) Received: (qmail 9633 invoked by uid 500); 1 Jul 2015 02:30:05 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 9600 invoked by uid 500); 1 Jul 2015 02:30:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 9482 invoked by uid 99); 1 Jul 2015 02:30:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jul 2015 02:30:05 +0000 Date: Wed, 1 Jul 2015 02:30:05 +0000 (UTC) From: "Kurt Young (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13960) HConnection stuck with UnknownHostException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609468#comment-14609468 ] Kurt Young commented on HBASE-13960: ------------------------------------ sorry for the typo, path -> patch > HConnection stuck with UnknownHostException > -------------------------------------------- > > Key: HBASE-13960 > URL: https://issues.apache.org/jira/browse/HBASE-13960 > Project: HBase > Issue Type: Bug > Components: hbase > Affects Versions: 0.98.8 > Reporter: Kurt Young > Attachments: 1.patch, HBASE-13960-v1.patch, HBASE-13960-v1.patch-0.98 > > > when put/get from hbase, if we meet a temporary dns failure causes resolve RS's host, the error will never recovered. put/get will failed with UnknownHostException forever. > I checked the code, and the reason maybe: > 1. when RegionServerCallable or MultiServerCallable prepare(), it gets a ClientService.BlockingInterface stub from Hconnection > 2. In HConnectionImplementation::getClient, it caches the stub with a BlockingRpcChannelImplementation > 3. In BlockingRpcChannelImplementation(), > this.isa = new InetSocketAddress(sn.getHostname(), sn.getPort()); If we meet a temporary dns failure then the "address" in isa will be null. > 4. then we launch the real rpc call, the following stack is: > Caused by: java.net.UnknownHostException: unknown host: xxx.host2 > at org.apache.hadoop.hbase.ipc.RpcClient$Connection.(RpcClient.java:385) > at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351) > at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1523) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) > at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > Besides, i noticed there is a protection in RpcClient: > if (remoteId.getAddress().isUnresolved()) { > throw new UnknownHostException("unknown host: " + remoteId.getAddress().getHostName()); > } > shouldn't we do something when this situation occurred? -- This message was sent by Atlassian JIRA (v6.3.4#6332)