Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 127AAF964 for ; Thu, 28 Mar 2013 18:49:20 +0000 (UTC) Received: (qmail 42688 invoked by uid 500); 28 Mar 2013 18:49:19 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 42054 invoked by uid 500); 28 Mar 2013 18:49:18 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 41977 invoked by uid 99); 28 Mar 2013 18:49:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 18:49:17 +0000 Date: Thu, 28 Mar 2013 18:49:17 +0000 (UTC) From: "Jagane Sundar (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HDFS-4646) createNNProxyWithClientProtocol ignores configured timeout value MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Jagane Sundar created HDFS-4646: ----------------------------------- Summary: createNNProxyWithClientProtocol ignores configured timeout value Key: HDFS-4646 URL: https://issues.apache.org/jira/browse/HDFS-4646 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.3-alpha, 3.0.0, 2.0.4-alpha Environment: Linux Reporter: Jagane Sundar Priority: Minor Fix For: 3.0.0, 2.0.4-alpha The Client RPC I/O timeout mechanism appears to be configured by two core-site.xml paramters: 1. A boolean ipc.client.ping 2. A numeric value ipc.ping.interval If ipc.client.ping is true, then we send a RPC ping every ipc.ping.interval milliseconds If ipc.client.ping is false, then ipc.ping.interval turns into the socket timeout value. The bug here is that while creating a Non HA proxy, the configured timeout value is ignored, and 0 is passed in. 0 is taken to mean 'wait forever' and the client RPC socket never times out. Note that this bug is reproducible only in the case where the NN machine dies, i.e. the TCP stack with the NN IP address stops responding completely. The code does not take this path when you do a 'kill -9' of the NN process, since there is a TCP stack that is alive and sends out a TCP RST to the client, and that results in a socket error (not a timeout). The fix is to pass in the correct configured value for timeout by calling Client.getTimeout(conf) instead of passing in 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira