hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4517) unstable dfs when running jobs on 0.18.1
Date Fri, 24 Oct 2008 20:27:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642531#action_12642531

Christian Kunz commented on HADOOP-4517:

The thread dump was taken long time (about 10 hrs)  after the last log message containing
above exception for this datanode.

In general, from what I observed for the whole job there was an unsually high number of write
errors including reduce task failures compared to 0.17.2. In the 12 hours leading up to the
last exception there were 300+ exceptions like above on this datanode alone. I checked a datanode
which did not become dead. It showed similar order of magnitude of exceptions.

> unstable dfs when running jobs on 0.18.1
> ----------------------------------------
>                 Key: HADOOP-4517
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4517
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 plus patches HADOOP-4277 HADOOP-4271 HADOOP-4326 HADOOP-4314
>            Reporter: Christian Kunz
>         Attachments: datanode.out
> 2 attempts of a job using 6000 maps, 1900 reduces
> 1.st attempt: failed during reduce phase after 22 hours with 31 dead datanodes most of
which became unresponsive due to an exception; dfs lost blocks
> 2nd attempt: failed during map phase after 5 hours with 5 dead datanodes due to exception;
dfs lost blocks responsible for job failure.
> I will post typical datanode exception and attach thread dump.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message