Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E9096C08 for ; Sun, 7 Aug 2011 00:25:53 +0000 (UTC) Received: (qmail 37224 invoked by uid 500); 7 Aug 2011 00:25:52 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 37103 invoked by uid 500); 7 Aug 2011 00:25:51 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 37093 invoked by uid 99); 7 Aug 2011 00:25:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Aug 2011 00:25:50 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Aug 2011 00:25:48 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 1F488AF8A6 for ; Sun, 7 Aug 2011 00:25:27 +0000 (UTC) Date: Sun, 7 Aug 2011 00:25:27 +0000 (UTC) From: "Konstantin Shvachko (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <1948653097.14272.1312676727124.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1720286243.52045.1302604686655.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-7488) When Namenode network is unplugged, DFSClient operations waits for ever MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080491#comment-13080491 ] Konstantin Shvachko commented on HADOOP-7488: --------------------------------------------- If {{rpcTimeout > 0}} then {{ handleTimeout()}} will throw {{SocketTimeoutException}} instead of going into ping loop. Can you control the required behavior by setting {{rpcTimeout > 0}} rather introducing the # of pings limit. DataNodes and TaskTrackers are designed to ping NN and JT infinitely, because during startup you cannot predict when NN will come online as it depends on the size of the image and edits. Also when NN becomes busy it is important for DNs to keep retrying rather than assuming the NN is dead. For DFSClient this may make sense, but I think they already timeout. At list DFSShell ls does. And even if they don't this should be an HDFS change not generic IPC change, which affects many Hadoop components. As for HA I don't know what you did for HA and therefore cannot understand what problem you are trying to solve here. I can guess that you want DNs switch to another NN when they timeout rather than retrying. In this case you should be able to use rpcTimeout. > When Namenode network is unplugged, DFSClient operations waits for ever > ----------------------------------------------------------------------- > > Key: HADOOP-7488 > URL: https://issues.apache.org/jira/browse/HADOOP-7488 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Reporter: Uma Maheswara Rao G > Assignee: Uma Maheswara Rao G > Attachments: HADOOP-7488.patch > > > When NN/DN is shutdown gracefully, the DFSClient operations which are waiting for a response from NN/DN, will throw exception & come out quickly > But when the NN/DN network is unplugged, the DFSClient operations which are waiting for a response from NN/DN, waits for ever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira