Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-issues@hadoop.apache.org
Date: Wed, 4 Dec 2013 17:26:35 +0000 (UTC)
From: "vikash kumar (JIRA)" <jira@apache.org>
To: common-issues@hadoop.apache.org
Message-ID: <JIRA.12682729.1386177982563.71039.1386177995585@arcas>
In-Reply-To: <JIRA.12682729.1386177982563@arcas>
References: <JIRA.12682729.1386177982563@arcas>
Subject: [jira] [Created] (HADOOP-10145) Reduce task stuck on 0.16666667%
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

vikash kumar created HADOOP-10145:
-------------------------------------

             Summary: Reduce task stuck on 0.16666667%
                 Key: HADOOP-10145
                 URL: https://issues.apache.org/jira/browse/HADOOP-10145
             Project: Hadoop Common
          Issue Type: Bug
          Components: conf
    Affects Versions: 0.20.2
         Environment: OS:  RHEL 6.4
Hadoop version:  0.20.2-cdh3u6
            Reporter: vikash kumar


All of sudden, one of the Hadoop jobs is stuck, basically the reduce takes forever to complete(we have waited for 30 hours, usually it takes an hour to complete).
in tasktracker logs i see tons of following messages, however at times, resubmitting the same job works fine. 

2013-12-04 00:00:00,381 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000041_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) >
2013-12-04 00:00:00,750 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000048_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) >
2013-12-04 00:00:01,729 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000046_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,918 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000055_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,919 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000021_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,922 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000031_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:01,940 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000057_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) >
2013-12-04 00:00:02,443 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000047_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) >


there are no other resonable clues in log for me to get a direction on, what am i looking for. with my setup, upgrading to new version is not an option.

please help!


--
This message was sent by Atlassian JIRA
(v6.1#6144)