Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0F432FBD2 for ; Mon, 25 Mar 2013 05:31:22 +0000 (UTC) Received: (qmail 53138 invoked by uid 500); 25 Mar 2013 05:31:17 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 52777 invoked by uid 500); 25 Mar 2013 05:31:14 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 52752 invoked by uid 99); 25 Mar 2013 05:31:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2013 05:31:13 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hemanty@thoughtworks.com designates 64.18.0.28 as permitted sender) Received: from [64.18.0.28] (HELO exprod5og114.obsmtp.com) (64.18.0.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2013 05:31:07 +0000 Received: from mail-ie0-f199.google.com ([209.85.223.199]) (using TLSv1) by exprod5ob114.postini.com ([64.18.4.12]) with SMTP ID DSNKUU/hBqSSagcp0vIicZZuJ2vwq10AphvQ@postini.com; Sun, 24 Mar 2013 22:30:46 PDT Received: by mail-ie0-f199.google.com with SMTP id qd14so19590201ieb.10 for ; Sun, 24 Mar 2013 22:30:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=w1wbJlke60Tl3lLAvDUV8/aaPkcEGG80MEXDP9l43v0=; b=YVrnCx0mTVVUpY/TJyGJpCeScUUeHOknfLri/wGcPM+Oe/do65xhwlmKUlG9nm1uNB q2cqiAfFKzg1It8KFihrSsfD784dxH0gXt2QZqIPI2CHM1q42rJSBLBYnvYxCIUlqXsd dYRgxmopAnGh/XxmXT+A2tbQUtaNCqw5Qr8XDrGHg7dM9ztnK5QGIpjWYYfEX+DdACZw IE0xfukRjhk9PZS4MChBInsVZ6OuIys38sJIo4O7iPE6gKpqupnEPzDFdnH/AFZ5744T 3sSnVB5FL5r97IkmqwklYIHS2+8HrNk7SKy8JNrCdHkpdWE7WUStYgJPv9e4X4BXtoJR NK2A== X-Received: by 10.60.169.237 with SMTP id ah13mr9821517oec.41.1364189446200; Sun, 24 Mar 2013 22:30:46 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.169.237 with SMTP id ah13mr9821514oec.41.1364189446075; Sun, 24 Mar 2013 22:30:46 -0700 (PDT) Received: by 10.76.154.136 with HTTP; Sun, 24 Mar 2013 22:30:45 -0700 (PDT) In-Reply-To: References: Date: Mon, 25 Mar 2013 11:00:45 +0530 Message-ID: Subject: Re: MapReduce Failed and Killed From: Hemanth Yamijala To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec54b534857487504d8b91b11 X-Gm-Message-State: ALoCoQk1zyAyFBjG4KX8mv9m2RDfc+MQAyQT6b87W9avrFXFQk/Mgk940qrGHwp68n6lHIQpzt8lPq6HV5+NIBwB5mBN83Dqy6Y8xxeTx798fgrO2lXXwzXsanP9rK8ANZrT+C3Ap3bX2AysCZb7bxnmQnu3+E7EyQ== X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54b534857487504d8b91b11 Content-Type: text/plain; charset=ISO-8859-1 Any MapReduce task needs to communicate with the tasktracker that launched it periodically in order to let the tasktracker know it is still alive and active. The time for which silence is tolerated is controlled by a configuration property mapred.task.timeout. It looks like in your case, this has already been bumped up to 20 minutes (from the default 10 minutes). It also looks like this is not sufficient. You could bump this value even further up. However, the correct approach could be to see what the reducer is actually doing to become inactive during this time. Can you look at the reducer attempt's logs (which you can access from the web UI of the Jobtracker) and post them here ? Thanks hemanth On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim wrote: > Hi, All. > > I'm trying to create category-based splits of Wikipedia dataset(41GB) and > the training data set(5GB) using Mahout. > I'm using following command. > > $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o > wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt > > I had no problem with the training data set, but Hadoop showed following > messages > when I tried to do a same job with Wikipedia dataset, > > ......... > 13/03/21 22:31:00 INFO mapred.JobClient: map 27% reduce 1% > 13/03/21 22:40:31 INFO mapred.JobClient: map 27% reduce 2% > 13/03/21 22:58:49 INFO mapred.JobClient: map 27% reduce 3% > 13/03/21 23:22:57 INFO mapred.JobClient: map 27% reduce 4% > 13/03/21 23:46:32 INFO mapred.JobClient: map 27% reduce 5% > 13/03/22 00:27:14 INFO mapred.JobClient: map 27% reduce 6% > 13/03/22 01:06:55 INFO mapred.JobClient: map 27% reduce 7% > 13/03/22 01:14:06 INFO mapred.JobClient: map 27% reduce 3% > 13/03/22 01:15:35 INFO mapred.JobClient: Task Id : > attempt_201303211339_0002_r_000000_1, Status : FAILED > Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200 > seconds. Killing! > 13/03/22 01:20:09 INFO mapred.JobClient: map 27% reduce 4% > 13/03/22 01:33:35 INFO mapred.JobClient: Task Id : > attempt_201303211339_0002_m_000037_1, Status : FAILED > Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228 > seconds. Killing! > 13/03/22 01:35:12 INFO mapred.JobClient: map 27% reduce 5% > 13/03/22 01:40:38 INFO mapred.JobClient: map 27% reduce 6% > 13/03/22 01:52:28 INFO mapred.JobClient: map 27% reduce 7% > 13/03/22 02:16:27 INFO mapred.JobClient: map 27% reduce 8% > 13/03/22 02:19:02 INFO mapred.JobClient: Task Id : > attempt_201303211339_0002_m_000018_1, Status : FAILED > Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204 > seconds. Killing! > 13/03/22 02:49:03 INFO mapred.JobClient: map 27% reduce 9% > 13/03/22 02:52:04 INFO mapred.JobClient: map 28% reduce 9% > ........ > > Because I just started to learn how to run Hadoop, I have no idea how to > solve > this problem... > Does anyone have an idea how to handle this weird thing? > > -- > *Jinchun Kim* > --bcaec54b534857487504d8b91b11 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Any MapReduce task needs to communicate with the tasktrack= er that launched it periodically in order to let the tasktracker know it is= still alive and active. The time for which silence is tolerated is control= led by a configuration property=A0mapred.task.timeout.

It looks like in your case, this has already been bump= ed up to 20 minutes (from the default 10 minutes). It also looks like this = is not sufficient. You could bump this value even further up. However, the = correct approach could be to see what the reducer is actually doing to beco= me inactive during this time. Can you look at the reducer attempt's log= s (which you can access from the web UI of the Jobtracker) and post them he= re ?

Thanks
hemanth
<= div class=3D"gmail_extra">

On Fri, Mar 22= , 2013 at 5:32 PM, Jinchun Kim <cienlux@gmail.com> wrote:
Hi, All.

I'm trying to create category-based splits of Wikipedia dataset(41GB) = and
the training data set(5GB) using Mahout.
I'm using follo= wing command.

$MAHOUT_HOME/bin/mahout wikipediaDataSetCrea= tor -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/temp/cat= egories.txt

I had no problem with the training data set, but = Hadoop showed following messages
when I tried to do a same job wi= th Wikipedia dataset,=A0

.........
13/03/21 22:31:00 INFO mapred.JobClient: =A0map 27= % reduce 1%
13/03/21 22:40:31 INFO mapred.JobClient: =A0map 27% r= educe 2%
13/03/21 22:58:49 INFO mapred.JobClient: =A0map 27% redu= ce 3%
13/03/21 23:22:57 INFO mapred.JobClient: =A0map 27% reduce 4%
13/03/21 23:46:32 INFO mapred.JobClient: =A0map 27% reduce 5%
1= 3/03/22 00:27:14 INFO mapred.JobClient: =A0map 27% reduce 6%
13/0= 3/22 01:06:55 INFO mapred.JobClient: =A0map 27% reduce 7%
13/03/22 01:14:06 INFO mapred.JobClient: =A0map 27% reduce 3%
13/03/22 01:15:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0= 002_r_000000_1, Status : FAILED
Task attempt_201303211339_0002_r_= 000000_1 failed to report status for 1200 seconds. Killing!
13/03/22 01:20:09 INFO mapred.JobClient: =A0map 27% reduce 4%
13/03/22 01:33:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0= 002_m_000037_1, Status : FAILED
Task attempt_201303211339_0002_m_= 000037_1 failed to report status for 1228 seconds. Killing!
13/03/22 01:35:12 INFO mapred.JobClient: =A0map 27% reduce 5%
13/03/22 01:40:38 INFO mapred.JobClient: =A0map 27% reduce 6%
1= 3/03/22 01:52:28 INFO mapred.JobClient: =A0map 27% reduce 7%
13/0= 3/22 02:16:27 INFO mapred.JobClient: =A0map 27% reduce 8%
13/03/22 02:19:02 INFO mapred.JobClient: Task Id : attempt_20130321133= 9_0002_m_000018_1, Status : FAILED
Task attempt_201303211339_0002= _m_000018_1 failed to report status for 1204 seconds. Killing!
13/03/22 02:49:03 INFO mapred.JobClient: =A0map 27% reduce 9%
13/= 03/22 02:52:04 INFO mapred.JobClient: =A0map 28% reduce 9%
......= ..

Because I just started to learn how to ru= n Hadoop, I have no idea how to solve
this problem...
Does anyone have an idea how to handle this = weird thing?

--
Jinchun Kim

--bcaec54b534857487504d8b91b11--