Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E0AC10943 for ; Wed, 4 Dec 2013 17:26:43 +0000 (UTC) Received: (qmail 77412 invoked by uid 500); 4 Dec 2013 17:26:39 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 76553 invoked by uid 500); 4 Dec 2013 17:26:36 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 76525 invoked by uid 99); 4 Dec 2013 17:26:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Dec 2013 17:26:35 +0000 Date: Wed, 4 Dec 2013 17:26:35 +0000 (UTC) From: "vikash kumar (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HADOOP-10145) Reduce task stuck on 0.16666667% MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 vikash kumar created HADOOP-10145: ------------------------------------- Summary: Reduce task stuck on 0.16666667% Key: HADOOP-10145 URL: https://issues.apache.org/jira/browse/HADOOP-10145 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.20.2 Environment: OS: RHEL 6.4 Hadoop version: 0.20.2-cdh3u6 Reporter: vikash kumar All of sudden, one of the Hadoop jobs is stuck, basically the reduce takes forever to complete(we have waited for 30 hours, usually it takes an hour to complete). in tasktracker logs i see tons of following messages, however at times, resubmitting the same job works fine. 2013-12-04 00:00:00,381 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000041_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) > 2013-12-04 00:00:00,750 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000048_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) > 2013-12-04 00:00:01,729 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000046_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,918 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000055_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,919 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000021_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,922 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000031_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,940 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000057_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:02,443 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000047_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) > there are no other resonable clues in log for me to get a direction on, what am i looking for. with my setup, upgrading to new version is not an option. please help! -- This message was sent by Atlassian JIRA (v6.1#6144)