Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 621A6ED3E for ; Fri, 11 Jan 2013 05:12:18 +0000 (UTC) Received: (qmail 57675 invoked by uid 500); 11 Jan 2013 05:12:13 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 57416 invoked by uid 500); 11 Jan 2013 05:12:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 57363 invoked by uid 99); 11 Jan 2013 05:12:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 05:12:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of balijamahesh.mca@gmail.com designates 209.85.215.45 as permitted sender) Received: from [209.85.215.45] (HELO mail-la0-f45.google.com) (209.85.215.45) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 05:12:05 +0000 Received: by mail-la0-f45.google.com with SMTP id ep20so1402848lab.32 for ; Thu, 10 Jan 2013 21:11:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=W2LY7/We9LFOsFcsnvDbD66UQ/1ornxa8K7GI9wcbvc=; b=EZiJD2Kgu0w1InU0fQZWyE0rKaF8+EInENPQoQFF8PaeMwdw8jp1PBahJAon32gx3f BiTfFybloze6/irBx3hg3KiN9K30uFqiWZXv7WRmodi8604MZyxRdXkm+h6J7ZJqcMTY SPj9kmu1ZjJFqZmFclDxLsIDk3NK1JrnWLJGvP4Cq0Sxfs9RQg0lqQJRxWgSS21Dqbkm YykM3A7yQZ116IDGaeXPZhNYXWPcQA5COPt6STVqWSOJXNqz5wqUbSD1rp8VHNQjJYYM mJM3rCyTmpA+JrF7eXE9ntjT7ZJPJFWUaiC1Rz5tRl3yO/Sz5XFh1k3mSQlWtV34wNPQ PB7g== MIME-Version: 1.0 Received: by 10.112.16.106 with SMTP id f10mr30646024lbd.1.1357881103416; Thu, 10 Jan 2013 21:11:43 -0800 (PST) Received: by 10.112.74.198 with HTTP; Thu, 10 Jan 2013 21:11:43 -0800 (PST) In-Reply-To: References: Date: Fri, 11 Jan 2013 10:41:43 +0530 Message-ID: Subject: Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed. From: Mahesh Balija To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d04016781d0fc3804d2fc54f6 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04016781d0fc3804d2fc54f6 Content-Type: text/plain; charset=ISO-8859-1 Hi, 2 reducers are successfully completed and 1498 have been killed. I assume that you have the data issues. (Either the data is huge or some issues with the data you are trying to process) One possibility could be you have many values associated to a single key, which can cause these kind of issues based on the operation you do in your reducer. Can you put some logs in your reducer and try to trace out what is happening. Best, Mahesh Balija, Calsoft Labs. On Fri, Jan 11, 2013 at 8:53 AM, yaotian wrote: > I have 1 hadoop master which name node locates and 2 slave which datanode > locate. > > If i choose a small data like 200M, it can be done. > > But if i run 30G data, Map is done. But the reduce report error. Any > sugggestion? > > > This is the information. > > *Black-listed TaskTrackers:* 1 > ------------------------------ > Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed > Task Attempts > map > 100.00%4500 0450 > 00 / 1 > reduce > 100.00%15000 02 > 1498 > 12 > / 3 > > > TaskCompleteStatusStart TimeFinish TimeErrorsCounters > task_201301090834_0041_r_000001 > 0.00% > 10-Jan-2013 04:18:54 > 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec) > > Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 seconds. Killing! > Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 seconds. Killing! > Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 seconds. Killing! > Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 seconds. Killing! > > > 0 > task_201301090834_0041_r_000002 > 0.00% > 10-Jan-2013 04:18:54 > 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec) > > Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 seconds. Killing! > Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 seconds. Killing! > > > 0 > task_201301090834_0041_r_000003 > 0.00% > 10-Jan-2013 04:18:57 > 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec) > > Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 seconds. Killing! > Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 seconds. Killing! > Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 seconds. Killing! > > > 0 > task_201301090834_0041_r_000005 > 0.00% > 10-Jan-2013 06:11:07 > 10-Jan-2013 06:46:38 (35mins, 31sec) > > Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 seconds. Killing! > > > 0 > --f46d04016781d0fc3804d2fc54f6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

=A0=A0=A0=A0=A0=A0=A0=A0=A0 2 reducers are successfully complete= d and 1498 have been killed. I assume that you have the data issues. (Eithe= r the data is huge or some issues with the data you are trying to process)<= br>=A0=A0=A0=A0=A0=A0=A0=A0=A0 One possibility could be you have many value= s associated to a single key, which can cause these kind of issues based on= the operation you do in your reducer.
=A0=A0=A0=A0=A0=A0=A0=A0=A0 Can you put some logs in your reducer and try t= o trace out what is happening.

Best,
Mahesh Balija,
Calsoft La= bs.

On Fri, Jan 11, 2013 at 8:53 AM, yaot= ian <yaotian@gmail.com> wrote:
I have 1 hadoop master which name nod= e locates and 2 slave which datanode locate.

If= i choose a small data like 200M, it can be done.

But if i run 30G data, Map is done. But the reduce report error. Any suggge= stion?

=

This is the info= rmation.
Black-listed TaskTrackers:= =A01

=
Kind %= CompleteNum Tasks PendingRunningComplete KilledFaile= d/Killed
Task Attempts
map 100.00%=
4500 0450 00 /=A01
<= a href=3D"http://23.20.27.135:9003/jobtasks.jsp?jobid=3Djob_201301090834_00= 41&type=3Dreduce&pagenum=3D1" style=3D"text-decoration:initial" tar= get=3D"_blank">reduce 100.00%=
1500 0 02 1498 12=A0/=A03


Start Time<= tr>
TaskCompleteStatusFinish TimeErrorsCounters
task_201301090834_0041_r_000001 0.00%

10-Jan-2= 013 04:18:54
10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)
Task attempt_201301090834_0041_r_000001=
_0 failed to report status for 600 seconds. Killing!
Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 s=
econds. Killing!

0
task_201301090834_0041_r_000002 0.00%

10-Jan-2= 013 04:18:54
10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)
Task attempt_201301090834_0041_r_000002=
_0 failed to report status for 601 seconds. Killing!
Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 s=
econds. Killing!

0
task_201301090834_0041_r_000003 0.00%

10-Jan-2= 013 04:18:57
10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)
Task attempt_201301090834_0041_r_000003=
_0 failed to report status for 602 seconds. Killing!
Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 s=
econds. Killing!
Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 s=
econds. Killing!

0
task_201301090834_0041_r_000005 0.00%

10-Jan-2= 013 06:11:07
10-Jan-2013 06:46:38 (35mins, 31sec)
<= pre style=3D"white-space:pre-wrap"> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 s= econds. Killing!
0

--f46d04016781d0fc3804d2fc54f6--