Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C1D7EB44 for ; Fri, 11 Jan 2013 06:14:42 +0000 (UTC) Received: (qmail 36686 invoked by uid 500); 11 Jan 2013 06:14:37 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 36456 invoked by uid 500); 11 Jan 2013 06:14:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 36444 invoked by uid 99); 11 Jan 2013 06:14:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 06:14:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 06:14:30 +0000 Received: by mail-ie0-f182.google.com with SMTP id s9so1911895iec.13 for ; Thu, 10 Jan 2013 22:14:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=EySPemluOs11wiPg0hfbakY4N2WeynPAvMH8wyjbqrA=; b=V8S+Vn3A7ZFS+3rpKZYvmnrnVdFUbiqjrc0qJln75hn0eBspinMWuExFQPJn3JBJbn +5/kNQx1Bzp/ThtwIcEgaCsLp9mrJ7qVujcY5hL5gILiqsIrTkK3iOLulviMjefiCg4j esumA/ExtdDx2a3rz528/nMY427jNrVNZIk17cX1hoBQkZ0eajLvw10akhQYoRuwwF2p FiTskfzM6h5XllgTXlgvtexxK4883r9uZZEY81Pp99qnlaP2aAUWpGPL2/DeDillv6Sl tMQqNWOuUklR5T3S1SYrN4x3VuEQ1l2aJYgQP33nKfsVlHeE41brZAQQCZK23UgUX9Fd +8vg== X-Received: by 10.50.12.138 with SMTP id y10mr8102895igb.58.1357884850147; Thu, 10 Jan 2013 22:14:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.32.166 with HTTP; Thu, 10 Jan 2013 22:13:50 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Fri, 11 Jan 2013 11:43:50 +0530 Message-ID: Subject: Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed. To: "" Content-Type: multipart/alternative; boundary=14dae934051723916904d2fd34c2 X-Gm-Message-State: ALoCoQmozFDhvMo/Wqz6b5Mjv9U9x7u3Hp+sUSCqBk+ScPSVd+3iBpG8EGJf35oBTOK/zSnrcKTw X-Virus-Checked: Checked by ClamAV on apache.org --14dae934051723916904d2fd34c2 Content-Type: text/plain; charset=ISO-8859-1 If the per-record processing time is very high, you will need to periodically report a status. Without a status change report from the task to the tracker, it will be killed away as a dead task after a default timeout of 10 minutes (600s). Also, beware of holding too much memory in a reduce JVM - you're still limited there. Best to let the framework do the sort or secondary sort. On Fri, Jan 11, 2013 at 10:58 AM, yaotian wrote: > Yes, you are right. The data is GPS trace related to corresponding uid. > The reduce is doing this: Sort user to get this kind of result: uid, gps1, > gps2, gps3........ > Yes, the gps data is big because this is 30G data. > > How to solve this? > > > > 2013/1/11 Mahesh Balija > >> Hi, >> >> 2 reducers are successfully completed and 1498 have been >> killed. I assume that you have the data issues. (Either the data is huge or >> some issues with the data you are trying to process) >> One possibility could be you have many values associated to a >> single key, which can cause these kind of issues based on the operation you >> do in your reducer. >> Can you put some logs in your reducer and try to trace out what >> is happening. >> >> Best, >> Mahesh Balija, >> Calsoft Labs. >> >> >> On Fri, Jan 11, 2013 at 8:53 AM, yaotian wrote: >> >>> I have 1 hadoop master which name node locates and 2 slave which >>> datanode locate. >>> >>> If i choose a small data like 200M, it can be done. >>> >>> But if i run 30G data, Map is done. But the reduce report error. Any >>> sugggestion? >>> >>> >>> This is the information. >>> >>> *Black-listed TaskTrackers:* 1 >>> ------------------------------ >>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed >>> Task Attempts >>> map >>> 100.00%4500 0450 >>> 00 / 1 >>> reduce >>> 100.00%1500 0 02 >>> 1498 >>> 12 >>> / 3 >>> >>> >>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters >>> task_201301090834_0041_r_000001 >>> 0.00% >>> 10-Jan-2013 04:18:54 >>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec) >>> >>> Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 seconds. Killing! >>> Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 seconds. Killing! >>> Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 seconds. Killing! >>> Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 seconds. Killing! >>> >>> >>> 0 >>> task_201301090834_0041_r_000002 >>> 0.00% >>> 10-Jan-2013 04:18:54 >>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec) >>> >>> Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 seconds. Killing! >>> Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 seconds. Killing! >>> >>> >>> 0 >>> task_201301090834_0041_r_000003 >>> 0.00% >>> 10-Jan-2013 04:18:57 >>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec) >>> >>> Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 seconds. Killing! >>> Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 seconds. Killing! >>> Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 seconds. Killing! >>> >>> >>> 0 >>> task_201301090834_0041_r_000005 >>> 0.00% >>> 10-Jan-2013 06:11:07 >>> 10-Jan-2013 06:46:38 (35mins, 31sec) >>> >>> >>> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 seconds. Killing! >>> >>> >>> 0 >>> >> >> > -- Harsh J --14dae934051723916904d2fd34c2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
If the per-record processing time is very high, you will n= eed to periodically report a status. Without a status change report from th= e task to the tracker, it will be killed away as a dead task after a defaul= t timeout of 10 minutes (600s).

Also, beware of holding too much memory in a reduce JVM - yo= u're still limited there. Best to let the framework do the sort or seco= ndary sort.


On Fri, Jan 11, 2013 at 10:58 AM, yaotian <yaotian@gmail.com> wrote:
Yes, you are right. The data is GPS trace related to corre= sponding uid. The reduce is doing this: Sort user to get this kind of resul= t: uid, gps1, gps2, gps3........
Yes, the gps data is big because this = is 30G data.

How to solve this?



2013/1/11 Mahesh Balija <balijamahesh.mca@gmail.= com>
Hi,

=A0=A0=A0=A0=A0=A0=A0=A0=A0 2 red= ucers are successfully completed and 1498 have been killed. I assume that y= ou have the data issues. (Either the data is huge or some issues with the d= ata you are trying to process)
=A0=A0=A0=A0=A0=A0=A0=A0=A0 One possibility could be you have many values a= ssociated to a single key, which can cause these kind of issues based on th= e operation you do in your reducer.
=A0=A0=A0=A0=A0=A0=A0=A0=A0 Can you put some logs in your reducer and try t= o trace out what is happening.

Best,
Mahesh Balija,
Calsoft La= bs.


On Fri, Jan 11, 2013 at 8:5= 3 AM, yaotian <yaotian@gmail.com> wrote:
I have 1 hadoop master which name nod= e locates and 2 slave which datanode locate.

If= i choose a small data like 200M, it can be done.

But if i run 30G data, Map is done. But the reduce report error. Any suggge= stion?

=

This is the info= rmation.
Black-listed TaskTrackers:= =A01

=
Kind %= CompleteNum Tasks PendingRunningComplete KilledFaile= d/Killed
Task Attempts
map 100.00%=
4500 0450 00 /=A01
<= a href=3D"http://23.20.27.135:9003/jobtasks.jsp?jobid=3Djob_201301090834_00= 41&type=3Dreduce&pagenum=3D1" style=3D"text-decoration:initial" tar= get=3D"_blank">reduce 100.00%=
1500 0 02 1498 12=A0/=A03


Start Time<= tr>
TaskCompleteStatusFinish TimeErrorsCounters
task_201301090834_0041_r_000001 0.00%

10-Jan-2= 013 04:18:54
10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)
Task attempt_201301090834_0041_r_000001=
_0 failed to report status for 600 seconds. Killing!
Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 s=
econds. Killing!

0
task_201301090834_0041_r_000002 0.00%

10-Jan-2= 013 04:18:54
10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)
Task attempt_201301090834_0041_r_000002=
_0 failed to report status for 601 seconds. Killing!
Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 s=
econds. Killing!

0
task_201301090834_0041_r_000003 0.00%

10-Jan-2= 013 04:18:57
10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)
Task attempt_201301090834_0041_r_000003=
_0 failed to report status for 602 seconds. Killing!
Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 s=
econds. Killing!

0
task_201301090834_0041_r_000005 0.00%

10-Jan-2= 013 06:11:07
10-Jan-2013 06:46:38 (35mins, 31sec)
<= pre style=3D"white-space:pre-wrap"> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 s= econds. Killing!
0





--
= Harsh J
--14dae934051723916904d2fd34c2--