Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A423F966B for ; Mon, 5 Mar 2012 19:22:21 +0000 (UTC) Received: (qmail 46374 invoked by uid 500); 5 Mar 2012 19:22:21 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 46320 invoked by uid 500); 5 Mar 2012 19:22:21 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 46312 invoked by uid 99); 5 Mar 2012 19:22:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 19:22:21 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 19:22:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 183CAA7EF for ; Mon, 5 Mar 2012 19:21:58 +0000 (UTC) Date: Mon, 5 Mar 2012 19:21:58 +0000 (UTC) From: "Kihwal Lee (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1403956945.23443.1330975318100.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <24185553.5412.1330559038623.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-3032) Lease renewer tries forever even if renewal is not possible MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222530#comment-13222530 ] Kihwal Lee commented on HDFS-3032: ---------------------------------- Thanks for the review Nicholas. In that case, we can add the retry limit in LeaseRenewer where IOException is caught and retried forever. After all, it doesn't make sense to renew after HdfsConstants.LEASE_SOFTLIMIT_PERIOD has passed. I will upload a new patch soon. I am adding a test case for the limited retry right now. > Lease renewer tries forever even if renewal is not possible > ----------------------------------------------------------- > > Key: HDFS-3032 > URL: https://issues.apache.org/jira/browse/HDFS-3032 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.23.0, 0.24.0, 0.23.1 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.2, 0.23.3 > > Attachments: hdfs-3032.patch.txt > > > When LeaseRenewer gets an IOException while attempting to renew for a client, it retries after sleeping 500ms. If the exception is caused by a condition that will never change, it keeps talking to the name node until the DFSClient object is closed or aborted. With the FileSystem cache, a DFSClient can stay alive for very long time. We've seen the cases in which node managers and long living jobs flooding name node with this type of calls. > The current proposal is to abort the client when RemoteException is caught during renewal. LeaseRenewer already does abort on all clients when it sees a SocketTimeoutException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira