hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5241) Reduce tasks get stuck because of over-estimated task size (regression from 0.18)
Date Mon, 23 Feb 2009 09:03:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675826#action_12675826
] 

Hadoop QA commented on HADOOP-5241:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12400731/5241_v1.patch
  against trunk revision 746864.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/console

This message is automatically generated.

> Reduce tasks get stuck because of over-estimated task size (regression from 0.18)
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-5241
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5241
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>         Environment: Red Hat Enterprise Linux Server release 5.2
> JDK 1.6.0_11
> Hadoop 0.19.0
>            Reporter: Andy Pavlo
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.19.1
>
>         Attachments: 5241_v1.patch, 5241_v1.patch, hadoop-jobtracker.log.gz, hadoop-patched-jobtracker.log.gz,
hadoop_task_screenshot.png
>
>
> I have a simple MR benchmark job that computes PageRank on about 600 GB of HTML files
using a 100 node cluster. For some reason, my reduce tasks get caught in a pending state.
The JobTracker's log gets filled with the following messages:
> 2009-02-12 15:47:29,839 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce
task. Node tracker_d-59.cs.wisc.edu:localhost/127.0.0.1:33227 has 110125027328 bytes free;
but we expect reduce input to take 399642198235
> 2009-02-12 15:47:29,852 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce
task. Node tracker_d-67.cs.wisc.edu:localhost/127.0.0.1:48626 has 107537776640 bytes free;
but we expect reduce input to take 399642198235
> 2009-02-12 15:47:29,885 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce
task. Node tracker_d-73.cs.wisc.edu:localhost/127.0.0.1:58849 has 113631690752 bytes free;
but we expect reduce input to take 399642198235
> <SNIP>
> The weird thing is that I get through about 70 reduce tasks completing before it hangs.
If I reduce the amount of the input data on 100 nodes down to 200GB, then it seems to work.
As I scale the amount of input to the number of nodes, I can get it work some of the times
on 50 nodes and without any problems on 25 nodes and less.
> Note that it worked without any problems on Hadoop 0.18 late last year without changing
any of the input data or the actual MR code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message