hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vikash kumar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-10145) Reduce task stuck on 0.16666667%
Date Wed, 04 Dec 2013 17:26:35 GMT
vikash kumar created HADOOP-10145:
-------------------------------------

             Summary: Reduce task stuck on 0.16666667%
                 Key: HADOOP-10145
                 URL: https://issues.apache.org/jira/browse/HADOOP-10145
             Project: Hadoop Common
          Issue Type: Bug
          Components: conf
    Affects Versions: 0.20.2
         Environment: OS:  RHEL 6.4
Hadoop version:  0.20.2-cdh3u6
            Reporter: vikash kumar


All of sudden, one of the Hadoop jobs is stuck, basically the reduce takes forever to complete(we
have waited for 30 hours, usually it takes an hour to complete).
in tasktracker logs i see tons of following messages, however at times, resubmitting the same
job works fine. 

2013-12-04 00:00:00,381 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000041_0
0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) >
2013-12-04 00:00:00,750 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000048_0
0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) >
2013-12-04 00:00:01,729 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000046_0
0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,918 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000055_0
0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,919 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000021_0
0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,922 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000031_0
0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,940 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000057_0
0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:02,443 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000047_0
0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) >


there are no other resonable clues in log for me to get a direction on, what am i looking
for. with my setup, upgrading to new version is not an option.

please help!



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message