Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 13101 invoked from network); 9 May 2008 15:45:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 May 2008 15:45:22 -0000 Received: (qmail 22172 invoked by uid 500); 9 May 2008 15:45:23 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 22139 invoked by uid 500); 9 May 2008 15:45:23 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 22128 invoked by uid 99); 9 May 2008 15:45:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 May 2008 08:45:23 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 May 2008 15:44:35 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BE078234C114 for ; Fri, 9 May 2008 08:44:55 -0700 (PDT) Message-ID: <1229080450.1210347895777.JavaMail.jira@brutus> Date: Fri, 9 May 2008 08:44:55 -0700 (PDT) From: "Raghu Angadi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3232) Datanodes time out In-Reply-To: <555922254.1207839487710.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595645#action_12595645 ] Raghu Angadi commented on HADOOP-3232: -------------------------------------- regd the patch, > It does change the behavior a bit, but in most cases it shouldn't be a problem. I haven't looked at it properly yet, could you describe what the change in behavior is? Also not sure why it needs to change Shell stuff. Could the desired behavor for DF be implemented in DF class? > Datanodes time out > ------------------ > > Key: HADOOP-3232 > URL: https://issues.apache.org/jira/browse/HADOOP-3232 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.2 > Environment: 10 node cluster + 1 namenode > Reporter: Johan Oskarsson > Priority: Critical > Fix For: 0.18.0 > > Attachments: du-nonblocking-v1.patch, hadoop-hadoop-datanode-new.log, hadoop-hadoop-datanode-new.out, hadoop-hadoop-datanode.out, hadoop-hadoop-namenode-master2.out > > > I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster. > Unfortunately we're seeing datanode timeout issues. In previous versions we've often seen in the nn webui that one or two datanodes "last contact" goes from the usual 0-3 sec to ~200-300 before it drops down to 0 again. > This causes mild discomfort but the big problems appear when all nodes do this at once, as happened a few times after the upgrade. > It was suggested that this could be due to namenode garbage collection, but looking at the gc log output it doesn't seem to be the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.