Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3FFA8E8E1 for ; Fri, 11 Jan 2013 05:29:30 +0000 (UTC) Received: (qmail 97537 invoked by uid 500); 11 Jan 2013 05:29:25 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 97375 invoked by uid 500); 11 Jan 2013 05:29:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 97358 invoked by uid 99); 11 Jan 2013 05:29:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 05:29:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yaotian@gmail.com designates 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 05:29:17 +0000 Received: by mail-ob0-f182.google.com with SMTP id 16so1400988obc.13 for ; Thu, 10 Jan 2013 21:28:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=e4gOlwqlV+wwi7UDTpCtgiuWb8bLe0TTSiB446DolBQ=; b=np72l1OlhM/z7gBxr+NYM3p3A8OA4qsxPuyYkDsi4vo4Nk4gQJt5cbZ+cpmUU3R6PD 08hm0XKawsl1+o3coB4NC5F6jos32sYfcPvpZjHxHlQn1vLHIDm/WHkEM3Eu3Zprvh1v ecHIamW1tYkq/2SpmVpJmh+cNT17VF4vCQbcmrqlTTdV1vRAIlGX8g1YGJcZ7Y4S9oEp QHL8prSWBSStc6etb6nkuWgg1Ebm8zLrYNl2WL9engnBpplXluOaC/CJWO27/xU8KRM5 finMV0fT5NV1fqvOl5m4wtMZXLYpI3Jb8il02xTSdb7hd6uHLSCgCnIZdOWHzaNNs6QK PsRA== MIME-Version: 1.0 Received: by 10.60.170.242 with SMTP id ap18mr43121915oec.97.1357882135322; Thu, 10 Jan 2013 21:28:55 -0800 (PST) Received: by 10.182.131.72 with HTTP; Thu, 10 Jan 2013 21:28:55 -0800 (PST) In-Reply-To: References: Date: Fri, 11 Jan 2013 13:28:55 +0800 Message-ID: Subject: Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed. From: yaotian To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec54d3f8452a01904d2fc92b8 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54d3f8452a01904d2fc92b8 Content-Type: text/plain; charset=ISO-8859-1 Yes, you are right. The data is GPS trace related to corresponding uid. The reduce is doing this: Sort user to get this kind of result: uid, gps1, gps2, gps3........ Yes, the gps data is big because this is 30G data. How to solve this? 2013/1/11 Mahesh Balija > Hi, > > 2 reducers are successfully completed and 1498 have been killed. > I assume that you have the data issues. (Either the data is huge or some > issues with the data you are trying to process) > One possibility could be you have many values associated to a > single key, which can cause these kind of issues based on the operation you > do in your reducer. > Can you put some logs in your reducer and try to trace out what > is happening. > > Best, > Mahesh Balija, > Calsoft Labs. > > > On Fri, Jan 11, 2013 at 8:53 AM, yaotian wrote: > >> I have 1 hadoop master which name node locates and 2 slave which datanode >> locate. >> >> If i choose a small data like 200M, it can be done. >> >> But if i run 30G data, Map is done. But the reduce report error. Any >> sugggestion? >> >> >> This is the information. >> >> *Black-listed TaskTrackers:* 1 >> ------------------------------ >> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed >> Task Attempts >> map >> 100.00%4500 0450 >> 00 / 1 >> reduce >> 100.00%1500 0 02 >> 1498 >> 12 >> / 3 >> >> >> TaskCompleteStatusStart TimeFinish TimeErrorsCounters >> task_201301090834_0041_r_000001 >> 0.00% >> 10-Jan-2013 04:18:54 >> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec) >> >> Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 seconds. Killing! >> Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 seconds. Killing! >> Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 seconds. Killing! >> Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 seconds. Killing! >> >> >> 0 >> task_201301090834_0041_r_000002 >> 0.00% >> 10-Jan-2013 04:18:54 >> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec) >> >> Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 seconds. Killing! >> Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 seconds. Killing! >> >> >> 0 >> task_201301090834_0041_r_000003 >> 0.00% >> 10-Jan-2013 04:18:57 >> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec) >> >> Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 seconds. Killing! >> Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 seconds. Killing! >> Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 seconds. Killing! >> >> >> 0 >> task_201301090834_0041_r_000005 >> 0.00% >> 10-Jan-2013 06:11:07 >> 10-Jan-2013 06:46:38 (35mins, 31sec) >> >> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 seconds. Killing! >> >> >> 0 >> > > --bcaec54d3f8452a01904d2fc92b8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Yes, you are right. The data is GPS trace related to corre= sponding uid. The reduce is doing this: Sort user to get this kind of resul= t: uid, gps1, gps2, gps3........
Yes, the gps data is big because= this is 30G data.

How to solve this?



2013/= 1/11 Mahesh Balija <balijamahesh.mca@gmail.com>
Hi,

=A0=A0=A0=A0=A0=A0=A0=A0=A0 2 red= ucers are successfully completed and 1498 have been killed. I assume that y= ou have the data issues. (Either the data is huge or some issues with the d= ata you are trying to process)
=A0=A0=A0=A0=A0=A0=A0=A0=A0 One possibility could be you have many values a= ssociated to a single key, which can cause these kind of issues based on th= e operation you do in your reducer.
=A0=A0=A0=A0=A0=A0=A0=A0=A0 Can you put some logs in your reducer and try t= o trace out what is happening.

Best,
Mahesh Balija,
Calsoft La= bs.


On Fri, Jan 11, 2013 at 8:53 AM, yaotian <yaotian@gmail.com> wrote:
I have 1 hadoop master which name nod= e locates and 2 slave which datanode locate.

If= i choose a small data like 200M, it can be done.

But if i run 30G data, Map is done. But the reduce report error. Any suggge= stion?

=

This is the info= rmation.
Black-listed TaskTrackers:= =A01

=
Kind %= CompleteNum Tasks PendingRunningComplete KilledFaile= d/Killed
Task Attempts
map 100.00%=
4500 0450 00 /=A01
<= a href=3D"http://23.20.27.135:9003/jobtasks.jsp?jobid=3Djob_201301090834_00= 41&type=3Dreduce&pagenum=3D1" style=3D"text-decoration:initial" tar= get=3D"_blank">reduce 100.00%=
1500 0 02 1498 12=A0/=A03


Start Time<= tr>
TaskCompleteStatusFinish TimeErrorsCounters
task_201301090834_0041_r_000001 0.00%

10-Jan-2= 013 04:18:54
10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)
Task attempt_201301090834_0041_r_000001=
_0 failed to report status for 600 seconds. Killing!
Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 s=
econds. Killing!

0
task_201301090834_0041_r_000002 0.00%

10-Jan-2= 013 04:18:54
10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)
Task attempt_201301090834_0041_r_000002=
_0 failed to report status for 601 seconds. Killing!
Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 s=
econds. Killing!

0
task_201301090834_0041_r_000003 0.00%

10-Jan-2= 013 04:18:57
10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)
Task attempt_201301090834_0041_r_000003=
_0 failed to report status for 602 seconds. Killing!
Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 s=
econds. Killing!

0
task_201301090834_0041_r_000005 0.00%

10-Jan-2= 013 06:11:07
10-Jan-2013 06:46:38 (35mins, 31sec)
<= pre style=3D"white-space:pre-wrap"> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 s= econds. Killing!
0


--bcaec54d3f8452a01904d2fc92b8--