hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3232) Datanodes time out
Date Fri, 09 May 2008 17:08:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595673#action_12595673

Doug Cutting commented on HADOOP-3232:

Oops.  I posted my previous comment before I saw your last comment.  You're right, I was confusing
DU and DF.  And I missed the refresh in the ctor.  Sorry!  That resolves most of my concerns.
 Since this is called much more frequently than I was thinking, it makes sense for it not
to be synchronous.  I think my only remaining concern is the default interval, which you've
said you'd address.  Thanks!

> Datanodes time out
> ------------------
>                 Key: HADOOP-3232
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3232
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.2
>         Environment: 10 node cluster + 1 namenode
>            Reporter: Johan Oskarsson
>            Priority: Critical
>             Fix For: 0.18.0
>         Attachments: du-nonblocking-v1.patch, du-nonblocking-v2-trunk.patch, hadoop-hadoop-datanode-new.log,
hadoop-hadoop-datanode-new.out, hadoop-hadoop-datanode.out, hadoop-hadoop-namenode-master2.out
> I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
> Unfortunately we're seeing datanode timeout issues. In previous versions we've often
seen in the nn webui that one or two datanodes "last contact" goes from the usual 0-3 sec
to ~200-300 before it drops down to 0 again.
> This causes mild discomfort but the big problems appear when all nodes do this at once,
as happened a few times after the upgrade.
> It was suggested that this could be due to namenode garbage collection, but looking at
the gc log output it doesn't seem to be the case.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message