hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1296) Tasks fail after the first disk (/grid/0/) of all TTs reaches 100%, even though other disks still have space.
Date Tue, 15 Dec 2009 17:02:18 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790820#action_12790820
] 

Hong Tang commented on MAPREDUCE-1296:
--------------------------------------

The slight difference I can see based on both descriptions is that this Jira states that the
disk that gets filled up is deterministic (either the first disk of the list of disks, or
the disk that is also configured to store logs).

> Tasks fail after the first disk (/grid/0/) of all TTs reaches 100%, even though other
disks still have space.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1296
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1296
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.2
>            Reporter: Iyappan Srinivasan
>
> Tasks fail after the first disk (/grid/0/) of all TTs reaches 100%, even though other
disks still have space.
> In a cluster, data is distributed almost uniformly.  Disk /grid/0/ reaches 100% first,
because of extra filling up of info like logs etc. After it reaches 100% tasks starts to fail
with the error, 
> java.lang.Throwable: Child Error
> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:516)
> Caused by: java.io.IOException: Task process exit with nonzero status of 1.
> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:503)
> This happens even though the other disks are still at 80%, so still can be filled up
more.
> Steps to reproduce:
> 1) Bring up  a cluster with Linux task controller.
> 2) Start filling the dfs up with data using randomwriter or teragen.
> 3) Once the first disk reaches 100%, the tasks are starting to fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message