Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <1451903823.1210752355732.JavaMail.jira@brutus>
Date: Wed, 14 May 2008 01:05:55 -0700 (PDT)
From: "Raghu Angadi (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-3232) Datanodes time out
In-Reply-To: <555922254.1207839487710.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596666#action_12596666 ] 

Raghu Angadi commented on HADOOP-3232:
--------------------------------------

> I'm not sure what you mean by not having a permanent thread. How would we update the value without blocking on getUsed in that case?

This thread will stay idle pretty much most of the time. So we could start a thread inside getUsed() (and possibly in other accessor methods if interval has passed) and make the thread exit after running du. This is no less accurate than current implementation. Or you could schedule a periodic thread using Java 'Executor'. Even if you do keep the persistent thread, could you add comment  if you agree that it need not be persistent.. we might implement that later.

Regd the patch: 
# 'lock' is not required. You can synchronize on DU.this. 
# Also DURefreshThread should either be static class or not keep a ref to du. 

> Datanodes time out
> ------------------
>
>                 Key: HADOOP-3232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.2
>         Environment: 10 node cluster + 1 namenode
>            Reporter: Johan Oskarsson
>            Priority: Critical
>             Fix For: 0.18.0
>
>         Attachments: du-nonblocking-v1.patch, du-nonblocking-v2-trunk.patch, du-nonblocking-v4-trunk.patch, hadoop-hadoop-datanode-new.log, hadoop-hadoop-datanode-new.out, hadoop-hadoop-datanode.out, hadoop-hadoop-namenode-master2.out
>
>
> I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
> Unfortunately we're seeing datanode timeout issues. In previous versions we've often seen in the nn webui that one or two datanodes "last contact" goes from the usual 0-3 sec to ~200-300 before it drops down to 0 again.
> This causes mild discomfort but the big problems appear when all nodes do this at once, as happened a few times after the upgrade.
> It was suggested that this could be due to namenode garbage collection, but looking at the gc log output it doesn't seem to be the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.