Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of hemanty@thoughtworks.com
 designates 64.18.0.28 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKi5+mMfh597H2E6s6xhQQWVTn2mYSdK2-3tUjuGcnH6rujVXw@mail.gmail.com>
References: 
 <CAKi5+mMfh597H2E6s6xhQQWVTn2mYSdK2-3tUjuGcnH6rujVXw@mail.gmail.com>
Date: Mon, 25 Mar 2013 11:00:45 +0530
Message-ID: 
 <CAEAKFL_tNTSbJ3upSjhV_kz8WLWfDS0XfeNFes+pVwXA5kO64w@mail.gmail.com>
Subject: Re: MapReduce Failed and Killed
From: Hemanth Yamijala <yhemanth@thoughtworks.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=bcaec54b534857487504d8b91b11

--bcaec54b534857487504d8b91b11
Content-Type: text/plain; charset=ISO-8859-1

Any MapReduce task needs to communicate with the tasktracker that launched
it periodically in order to let the tasktracker know it is still alive and
active. The time for which silence is tolerated is controlled by a
configuration property mapred.task.timeout.

It looks like in your case, this has already been bumped up to 20 minutes
(from the default 10 minutes). It also looks like this is not sufficient.
You could bump this value even further up. However, the correct approach
could be to see what the reducer is actually doing to become inactive
during this time. Can you look at the reducer attempt's logs (which you can
access from the web UI of the Jobtracker) and post them here ?

Thanks
hemanth


On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim <cienlux@gmail.com> wrote:

> Hi, All.
>
> I'm trying to create category-based splits of Wikipedia dataset(41GB) and
> the training data set(5GB) using Mahout.
> I'm using following command.
>
> $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o
> wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt
>
> I had no problem with the training data set, but Hadoop showed following
> messages
> when I tried to do a same job with Wikipedia dataset,
>
> .........
> 13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
> 13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
> 13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
> 13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_r_000000_1, Status : FAILED
> Task attempt_201303211339_0002_r_000000_1 failed to report status for 1200
> seconds. Killing!
> 13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
> 13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000037_1, Status : FAILED
> Task attempt_201303211339_0002_m_000037_1 failed to report status for 1228
> seconds. Killing!
> 13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
> 13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
> 13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
> 13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
> 13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
> attempt_201303211339_0002_m_000018_1, Status : FAILED
> Task attempt_201303211339_0002_m_000018_1 failed to report status for 1204
> seconds. Killing!
> 13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
> 13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
> ........
>
> Because I just started to learn how to run Hadoop, I have no idea how to
> solve
> this problem...
> Does anyone have an idea how to handle this weird thing?
>
> --
> *Jinchun Kim*
>

--bcaec54b534857487504d8b91b11
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Any MapReduce task needs to communicate with the tasktrack=
er that launched it periodically in order to let the tasktracker know it is=
 still alive and active. The time for which silence is tolerated is control=
led by a configuration property=A0mapred.task.timeout.<div>
<br></div><div style>It looks like in your case, this has already been bump=
ed up to 20 minutes (from the default 10 minutes). It also looks like this =
is not sufficient. You could bump this value even further up. However, the =
correct approach could be to see what the reducer is actually doing to beco=
me inactive during this time. Can you look at the reducer attempt&#39;s log=
s (which you can access from the web UI of the Jobtracker) and post them he=
re ?</div>
<div style><br></div><div style>Thanks</div><div style>hemanth</div></div><=
div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Fri, Mar 22=
, 2013 at 5:32 PM, Jinchun Kim <span dir=3D"ltr">&lt;<a href=3D"mailto:cien=
lux@gmail.com" target=3D"_blank">cienlux@gmail.com</a>&gt;</span> wrote:<br=
>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi, All.<div><br></div><div=
>I&#39;m trying to create category-based splits of Wikipedia dataset(41GB) =
and</div>
<div>the training data set(5GB) using Mahout.</div><div>I&#39;m using follo=
wing command.</div>
<div><br></div><div><span style=3D"text-align:justify;font-size:13.60000038=
1469727px;font-family:Verdana">$MAHOUT_HOME/bin/mahout wikipediaDataSetCrea=
tor -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/temp/cat=
egories.txt</span><br>

</div><div><br></div><div>I had no problem with the training data set, but =
Hadoop showed following messages</div><div>when I tried to do a same job wi=
th Wikipedia dataset,=A0</div><div><br></div><div>
.........</div><div><div>13/03/21 22:31:00 INFO mapred.JobClient: =A0map 27=
% reduce 1%</div><div>13/03/21 22:40:31 INFO mapred.JobClient: =A0map 27% r=
educe 2%</div><div>13/03/21 22:58:49 INFO mapred.JobClient: =A0map 27% redu=
ce 3%</div>

<div>13/03/21 23:22:57 INFO mapred.JobClient: =A0map 27% reduce 4%</div><di=
v>13/03/21 23:46:32 INFO mapred.JobClient: =A0map 27% reduce 5%</div><div>1=
3/03/22 00:27:14 INFO mapred.JobClient: =A0map 27% reduce 6%</div><div>13/0=
3/22 01:06:55 INFO mapred.JobClient: =A0map 27% reduce 7%</div>

<div>13/03/22 01:14:06 INFO mapred.JobClient: =A0map 27% reduce 3%</div><di=
v>13/03/22 01:15:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0=
002_r_000000_1, Status : FAILED</div><div>Task attempt_201303211339_0002_r_=
000000_1 failed to report status for 1200 seconds. Killing!</div>

<div>13/03/22 01:20:09 INFO mapred.JobClient: =A0map 27% reduce 4%</div><di=
v>13/03/22 01:33:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0=
002_m_000037_1, Status : FAILED</div><div>Task attempt_201303211339_0002_m_=
000037_1 failed to report status for 1228 seconds. Killing!</div>

<div>13/03/22 01:35:12 INFO mapred.JobClient: =A0map 27% reduce 5%</div><di=
v>13/03/22 01:40:38 INFO mapred.JobClient: =A0map 27% reduce 6%</div><div>1=
3/03/22 01:52:28 INFO mapred.JobClient: =A0map 27% reduce 7%</div><div>13/0=
3/22 02:16:27 INFO mapred.JobClient: =A0map 27% reduce 8%</div>

<div>13/03/22 02:19:02 INFO mapred.JobClient: Task Id : attempt_20130321133=
9_0002_m_000018_1, Status : FAILED</div><div>Task attempt_201303211339_0002=
_m_000018_1 failed to report status for 1204 seconds. Killing!</div><div>

13/03/22 02:49:03 INFO mapred.JobClient: =A0map 27% reduce 9%</div><div>13/=
03/22 02:52:04 INFO mapred.JobClient: =A0map 28% reduce 9%</div><div>......=
..</div></div><div><br></div><div>Because I just started to learn how to ru=
n Hadoop, I have no idea how to solve</div>

<div>this problem...</div><div>Does anyone have an idea how to handle this =
weird thing?<span class=3D"HOEnZb"><font color=3D"#888888"><br clear=3D"all=
"><div><br></div>-- <br><b>Jinchun Kim</b>
</font></span></div></div>
</blockquote></div><br></div>

--bcaec54b534857487504d8b91b11--