Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates
 209.85.223.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAGmgmKW6Ssn4qZRF+FoCA7upnAjhdpNYj-6z5=ozWFBGwKEb3w@mail.gmail.com>
References: 
 <CAGmgmKXztgpCM1vbE_fCJfGsk+eqDGc-wZckM17ebchTmoVMQg@mail.gmail.com>
 <CANiuQZfO+Xt631jVmEZUJ1uhWK=06MTfT9_N29-vsHrFMh5W5Q@mail.gmail.com>
 <CAGmgmKW6Ssn4qZRF+FoCA7upnAjhdpNYj-6z5=ozWFBGwKEb3w@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Fri, 11 Jan 2013 11:43:50 +0530
Message-ID: 
 <CAOcnVr03M-jDa1wsQ1majJV3MpOB61rZSF-zzRc__2GO1j+m4w@mail.gmail.com>
Subject: Re: I am running MapReduce on a 30G data on 1master/2 slave,
 but failed.
To: "<user@hadoop.apache.org>" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=14dae934051723916904d2fd34c2

--14dae934051723916904d2fd34c2
Content-Type: text/plain; charset=ISO-8859-1

If the per-record processing time is very high, you will need to
periodically report a status. Without a status change report from the task
to the tracker, it will be killed away as a dead task after a default
timeout of 10 minutes (600s).

Also, beware of holding too much memory in a reduce JVM - you're still
limited there. Best to let the framework do the sort or secondary sort.


On Fri, Jan 11, 2013 at 10:58 AM, yaotian <yaotian@gmail.com> wrote:

> Yes, you are right. The data is GPS trace related to corresponding uid.
> The reduce is doing this: Sort user to get this kind of result: uid, gps1,
> gps2, gps3........
> Yes, the gps data is big because this is 30G data.
>
> How to solve this?
>
>
>
> 2013/1/11 Mahesh Balija <balijamahesh.mca@gmail.com>
>
>> Hi,
>>
>>           2 reducers are successfully completed and 1498 have been
>> killed. I assume that you have the data issues. (Either the data is huge or
>> some issues with the data you are trying to process)
>>           One possibility could be you have many values associated to a
>> single key, which can cause these kind of issues based on the operation you
>> do in your reducer.
>>           Can you put some logs in your reducer and try to trace out what
>> is happening.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>>
>> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <yaotian@gmail.com> wrote:
>>
>>> I have 1 hadoop master which name node locates and 2 slave which
>>> datanode locate.
>>>
>>> If i choose a small data like 200M, it can be done.
>>>
>>> But if i run 30G data, Map is done. But the reduce report error. Any
>>> sugggestion?
>>>
>>>
>>> This is the information.
>>>
>>> *Black-listed TaskTrackers:* 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041>
>>> ------------------------------
>>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed
>>> Task Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041>
>>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1>
>>> 100.00%4500 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed>
>>> 00 / 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed>
>>> reduce<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1>
>>> 100.00%1500 0 02<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=completed>
>>> 1498<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=killed>
>>> 12<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=failed>
>>>  / 3<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=killed>
>>>
>>>
>>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters
>>> task_201301090834_0041_r_000001<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000001>
>>> 0.00%
>>> 10-Jan-2013 04:18:54
>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)
>>>
>>> Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 seconds. Killing!
>>> Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 seconds. Killing!
>>> Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 seconds. Killing!
>>> Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 seconds. Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000001>
>>> task_201301090834_0041_r_000002<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000002>
>>> 0.00%
>>> 10-Jan-2013 04:18:54
>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)
>>>
>>> Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 seconds. Killing!
>>> Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 seconds. Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000002>
>>> task_201301090834_0041_r_000003<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000003>
>>> 0.00%
>>> 10-Jan-2013 04:18:57
>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)
>>>
>>> Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 seconds. Killing!
>>> Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 seconds. Killing!
>>> Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 seconds. Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000003>
>>> task_201301090834_0041_r_000005<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000005>
>>> 0.00%
>>> 10-Jan-2013 06:11:07
>>> 10-Jan-2013 06:46:38 (35mins, 31sec)
>>>
>>>
>>> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 seconds. Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000005>
>>>
>>
>>
>


-- 
Harsh J

--14dae934051723916904d2fd34c2
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">If the per-record processing time is very high, you will n=
eed to periodically report a status. Without a status change report from th=
e task to the tracker, it will be killed away as a dead task after a defaul=
t timeout of 10 minutes (600s).<div>

<br></div><div>Also, beware of holding too much memory in a reduce JVM - yo=
u&#39;re still limited there. Best to let the framework do the sort or seco=
ndary sort.</div></div><div class=3D"gmail_extra"><br><br><div class=3D"gma=
il_quote">

On Fri, Jan 11, 2013 at 10:58 AM, yaotian <span dir=3D"ltr">&lt;<a href=3D"=
mailto:yaotian@gmail.com" target=3D"_blank">yaotian@gmail.com</a>&gt;</span=
> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">

<div dir=3D"ltr">Yes, you are right. The data is GPS trace related to corre=
sponding uid. The reduce is doing this: Sort user to get this kind of resul=
t: uid, gps1, gps2, gps3........<div>Yes, the gps data is big because this =
is 30G data.</div>


<div><br></div><div>How to solve this?</div><div><br></div></div><div class=
=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=
=3D"gmail_quote">2013/1/11 Mahesh Balija <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:balijamahesh.mca@gmail.com" target=3D"_blank">balijamahesh.mca@gmail.=
com</a>&gt;</span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi,<br><br>=A0=A0=A0=A0=A0=A0=A0=A0=A0 2 red=
ucers are successfully completed and 1498 have been killed. I assume that y=
ou have the data issues. (Either the data is huge or some issues with the d=
ata you are trying to process)<br>


=A0=A0=A0=A0=A0=A0=A0=A0=A0 One possibility could be you have many values a=
ssociated to a single key, which can cause these kind of issues based on th=
e operation you do in your reducer.<br>
=A0=A0=A0=A0=A0=A0=A0=A0=A0 Can you put some logs in your reducer and try t=
o trace out what is happening.<br><br>Best,<br>Mahesh Balija,<br>Calsoft La=
bs.<div><div><br><br><div class=3D"gmail_quote">On Fri, Jan 11, 2013 at 8:5=
3 AM, yaotian <span dir=3D"ltr">&lt;<a href=3D"mailto:yaotian@gmail.com" ta=
rget=3D"_blank">yaotian@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><span style=3D"font-fa=
mily:arial,sans-serif;font-size:14px">I have 1 hadoop master which name nod=
e locates and 2 slave which datanode locate.</span></div>


<div><span style=3D"font-family:arial,sans-serif;font-size:14px"><br>
</span></div><span style=3D"font-family:arial,sans-serif;font-size:14px">If=
 i choose a small data like 200M, it can be done.</span><div style=3D"font-=
family:arial,sans-serif;font-size:14px"><br></div><div style=3D"font-family=
:arial,sans-serif;font-size:14px">


But if i run 30G data, Map is done. But the reduce report error. Any suggge=
stion?</div><div style=3D"font-family:arial,sans-serif;font-size:14px"><br>=
</div><div style=3D"font-family:arial,sans-serif;font-size:14px"><br></div>


<div style=3D"font-family:arial,sans-serif;font-size:14px">This is the info=
rmation.</div><div style=3D"font-family:arial,sans-serif;font-size:14px"><b=
r></div><div style=3D"font-family:arial,sans-serif;font-size:14px"><b style=
=3D"font-size:medium;font-family:sans-serif">Black-listed TaskTrackers:</b>=
<span style=3D"font-size:medium;font-family:sans-serif">=A0</span><a href=
=3D"http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=3Djob_2013010=
90834_0041" style=3D"text-decoration:initial;font-family:sans-serif;font-si=
ze:medium" target=3D"_blank">1</a><br style=3D"font-size:medium;font-family=
:sans-serif">


<hr style=3D"font-size:medium;font-family:sans-serif"><table style=3D"font-=
family:sans-serif" border=3D"2" cellpadding=3D"5" cellspacing=3D"2"><tbody>=
<tr><th style=3D"border-bottom-style:none;padding-bottom:4px;padding-top:4p=
x">Kind</th>


<th style=3D"border-bottom-style:none;padding-bottom:4px;padding-top:4px">%=
 Complete</th><th style=3D"border-bottom-style:none;padding-bottom:4px;padd=
ing-top:4px">Num Tasks</th><th style=3D"border-bottom-style:none;padding-bo=
ttom:4px;padding-top:4px">


Pending</th><th style=3D"border-bottom-style:none;padding-bottom:4px;paddin=
g-top:4px">Running</th><th style=3D"border-bottom-style:none;padding-bottom=
:4px;padding-top:4px">Complete</th><th style=3D"border-bottom-style:none;pa=
dding-bottom:4px;padding-top:4px">


Killed</th><th style=3D"border-bottom-style:none;padding-bottom:4px;padding=
-top:4px"><a href=3D"http://23.20.27.135:9003/jobfailures.jsp?jobid=3Djob_2=
01301090834_0041" style=3D"text-decoration:initial" target=3D"_blank">Faile=
d/Killed<br>


Task Attempts</a></th></tr><tr><th style=3D"border-bottom-style:none;paddin=
g-bottom:4px;padding-top:4px"><a href=3D"http://23.20.27.135:9003/jobtasks.=
jsp?jobid=3Djob_201301090834_0041&amp;type=3Dmap&amp;pagenum=3D1" style=3D"=
text-decoration:initial" target=3D"_blank">map</a></th>


<td align=3D"right">100.00%<table border=3D"1px" width=3D"80px"><tbody><tr>=
<td cellspacing=3D"0" style=3D"background-color:rgb(170,170,255)" width=3D"=
100%"></td></tr></tbody></table></td><td align=3D"right">450</td><td align=
=3D"right">0</td>


<td align=3D"right">0</td><td align=3D"right"><a href=3D"http://23.20.27.13=
5:9003/jobtasks.jsp?jobid=3Djob_201301090834_0041&amp;type=3Dmap&amp;pagenu=
m=3D1&amp;state=3Dcompleted" style=3D"text-decoration:initial" target=3D"_b=
lank">450</a></td>


<td align=3D"right">0</td><td align=3D"right">0 /=A0<a href=3D"http://23.20=
.27.135:9003/jobfailures.jsp?jobid=3Djob_201301090834_0041&amp;kind=3Dmap&a=
mp;cause=3Dkilled" style=3D"text-decoration:initial" target=3D"_blank">1</a=
></td></tr><tr>


<th style=3D"border-bottom-style:none;padding-bottom:4px;padding-top:4px"><=
a href=3D"http://23.20.27.135:9003/jobtasks.jsp?jobid=3Djob_201301090834_00=
41&amp;type=3Dreduce&amp;pagenum=3D1" style=3D"text-decoration:initial" tar=
get=3D"_blank">reduce</a></th>


<td align=3D"right">100.00%<table border=3D"1px" width=3D"80px"><tbody><tr>=
<td cellspacing=3D"0" style=3D"background-color:rgb(170,170,255)" width=3D"=
100%"></td></tr></tbody></table></td><td align=3D"right">1500</td><td align=
=3D"right">


0</td>
<td align=3D"right">0</td><td align=3D"right"><a href=3D"http://23.20.27.13=
5:9003/jobtasks.jsp?jobid=3Djob_201301090834_0041&amp;type=3Dreduce&amp;pag=
enum=3D1&amp;state=3Dcompleted" style=3D"text-decoration:initial" target=3D=
"_blank">2</a></td>


<td align=3D"right"><a href=3D"http://23.20.27.135:9003/jobtasks.jsp?jobid=
=3Djob_201301090834_0041&amp;type=3Dreduce&amp;pagenum=3D1&amp;state=3Dkill=
ed" style=3D"text-decoration:initial" target=3D"_blank">1498</a></td><td al=
ign=3D"right">


<a href=3D"http://23.20.27.135:9003/jobfailures.jsp?jobid=3Djob_20130109083=
4_0041&amp;kind=3Dreduce&amp;cause=3Dfailed" style=3D"text-decoration:initi=
al" target=3D"_blank">12</a>=A0/=A0<a href=3D"http://23.20.27.135:9003/jobf=
ailures.jsp?jobid=3Djob_201301090834_0041&amp;kind=3Dreduce&amp;cause=3Dkil=
led" style=3D"text-decoration:initial" target=3D"_blank">3</a><br>


<br></td></tr></tbody></table></div><div style=3D"font-family:arial,sans-se=
rif;font-size:14px"><br></div><div style=3D"font-family:arial,sans-serif;fo=
nt-size:14px"><table style=3D"font-family:sans-serif" border=3D"2" cellpadd=
ing=3D"5" cellspacing=3D"2">


<tbody><tr><td align=3D"center">Task</td><td>Complete</td><td>Status</td><t=
d>Start Time</td><td>Finish Time</td><td>Errors</td><td>Counters</td></tr><=
tr><td><a href=3D"http://23.20.27.135:9003/taskdetails.jsp?tipid=3Dtask_201=
301090834_0041_r_000001" style=3D"text-decoration:initial" target=3D"_blank=
">task_201301090834_0041_r_000001</a></td>


<td>0.00%<table border=3D"1px" width=3D"80px"><tbody><tr><td cellspacing=3D=
"0" width=3D"100%"></td></tr></tbody></table></td><td><br></td><td>10-Jan-2=
013 04:18:54<br></td><td>10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)<br></td=
><td>


<pre style=3D"white-space:pre-wrap">Task attempt_201301090834_0041_r_000001=
_0 failed to report status for 600 seconds. Killing!
Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 s=
econds. Killing!
</pre><br></td><td><a href=3D"http://23.20.27.135:9003/taskstats.jsp?tipid=
=3Dtask_201301090834_0041_r_000001" style=3D"text-decoration:initial" targe=
t=3D"_blank">0</a></td></tr><tr><td><a href=3D"http://23.20.27.135:9003/tas=
kdetails.jsp?tipid=3Dtask_201301090834_0041_r_000002" style=3D"text-decorat=
ion:initial" target=3D"_blank">task_201301090834_0041_r_000002</a></td>


<td>0.00%<table border=3D"1px" width=3D"80px"><tbody><tr><td cellspacing=3D=
"0" width=3D"100%"></td></tr></tbody></table></td><td><br></td><td>10-Jan-2=
013 04:18:54<br></td><td>10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)<br></td=
><td>


<pre style=3D"white-space:pre-wrap">Task attempt_201301090834_0041_r_000002=
_0 failed to report status for 601 seconds. Killing!
Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 s=
econds. Killing!
</pre><br></td><td><a href=3D"http://23.20.27.135:9003/taskstats.jsp?tipid=
=3Dtask_201301090834_0041_r_000002" style=3D"text-decoration:initial" targe=
t=3D"_blank">0</a></td></tr><tr><td><a href=3D"http://23.20.27.135:9003/tas=
kdetails.jsp?tipid=3Dtask_201301090834_0041_r_000003" style=3D"text-decorat=
ion:initial" target=3D"_blank">task_201301090834_0041_r_000003</a></td>


<td>0.00%<table border=3D"1px" width=3D"80px"><tbody><tr><td cellspacing=3D=
"0" width=3D"100%"></td></tr></tbody></table></td><td><br></td><td>10-Jan-2=
013 04:18:57<br></td><td>10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)<br></td=
><td>


<pre style=3D"white-space:pre-wrap">Task attempt_201301090834_0041_r_000003=
_0 failed to report status for 602 seconds. Killing!
Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 s=
econds. Killing!
</pre><br></td><td><a href=3D"http://23.20.27.135:9003/taskstats.jsp?tipid=
=3Dtask_201301090834_0041_r_000003" style=3D"text-decoration:initial" targe=
t=3D"_blank">0</a></td></tr><tr><td><a href=3D"http://23.20.27.135:9003/tas=
kdetails.jsp?tipid=3Dtask_201301090834_0041_r_000005" style=3D"text-decorat=
ion:initial" target=3D"_blank">task_201301090834_0041_r_000005</a></td>


<td>0.00%<table border=3D"1px" width=3D"80px"><tbody><tr><td cellspacing=3D=
"0" width=3D"100%"></td></tr></tbody></table></td><td><br></td><td>10-Jan-2=
013 06:11:07<br></td><td>10-Jan-2013 06:46:38 (35mins, 31sec)<br></td><td><=
pre style=3D"white-space:pre-wrap">

Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 s=
econds. Killing!
</pre><br></td><td><a href=3D"http://23.20.27.135:9003/taskstats.jsp?tipid=
=3Dtask_201301090834_0041_r_000005" style=3D"text-decoration:initial" targe=
t=3D"_blank">0</a></td></tr></tbody></table></div></div>
</blockquote></div><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Harsh J
</div>

--14dae934051723916904d2fd34c2--