Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8524740DD for ; Thu, 2 Jun 2011 09:37:27 +0000 (UTC) Received: (qmail 51495 invoked by uid 500); 2 Jun 2011 09:37:26 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 51240 invoked by uid 500); 2 Jun 2011 09:37:26 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 51232 invoked by uid 99); 2 Jun 2011 09:37:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 09:37:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mayuresh.kshirsagar@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 09:37:19 +0000 Received: by fxm7 with SMTP id 7so1023823fxm.35 for ; Thu, 02 Jun 2011 02:36:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=YvJ5LKQJPXZwyFpAEkMilTHosHkYwRqZoxja+5aXI3s=; b=t4ZAiEnnNzBk8xu6z6d9H5P79CgRjX4uzCsFAeGnPdrJ2RgQNHTnlyng7HinYcC1vO vVMd/dkK+ct2e9LqKB6GW4RZbQ5Fq7WhmtzDLPBnwdAZebvjKI82o/8/aP/wAcdmasom /zfPlayCXAy1Axa+iHYpfNFJp92Juz2NFlNZk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=CWxYS6KiKxMGZbmGsH4ds060smWLUYCJsL2MWqMbtfb0slaMdrXtr1dlEGu5ZuA7bo A26di3pArUqLIz9rbaJG4JCewCo4AkbBO6C/SeO++bQvG6FCMvuEimVOuCAQtp+RhZa1 Duw594ubSsptfChTrrbBigS/VnZhn6lbPntcY= MIME-Version: 1.0 Received: by 10.223.100.2 with SMTP id w2mr558729fan.87.1307007418118; Thu, 02 Jun 2011 02:36:58 -0700 (PDT) Received: by 10.223.103.8 with HTTP; Thu, 2 Jun 2011 02:36:58 -0700 (PDT) Date: Thu, 2 Jun 2011 15:06:58 +0530 Message-ID: Subject: Debugging killed task attempts From: Mayuresh To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf30433ea0e0345204a4b76044 --20cf30433ea0e0345204a4b76044 Content-Type: text/plain; charset=ISO-8859-1 Hi, I am trying to scan around 4,600,000 rows of hbase data. I am using hive to query them back. I start the job with around 25 maps and there are 11 nodes in my cluster each running 2 maps at a time. I saw that it took around 7 minutes to scan all this data with 7 nodes, However I added 4 more nodes, and it is taking even more time. In the map task which is taking the longest, I see the following: attempt_201106011013_0010_m_000009_0 Task attempt: /default-rack/domU-12-31-39-0F-75-13.compute-1.internal Cleanup Attempt: /default-rack/domU-12-31-39-0F-75-13.compute-1.internal KILLED 100.00% 2-Jun-2011 08:53:16 2-Jun-2011 09:01:48 (8mins, 32sec) and attempt_201106011013_0010_m_000009_1 /default-rack/ip-10-196-198-48.ec2.internal SUCCEEDED 100.00% 2-Jun-2011 08:57:28 2-Jun-2011 09:01:44 (4mins, 15sec) The first attempt waited for 8mins 32secs before getting killed. I checked datanode logs and all I see over there is some data coming in and some going out. Can someout point me out to exactly how can I debug what exactly was going on, and how can I avoid such long non-useful task attempts from being run? Thanks, -Mayuresh --20cf30433ea0e0345204a4b76044 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

I am trying to scan around 4,600,000 rows of hbase data. I am us= ing hive to query them back. I start the job with around 25 maps and there = are 11 nodes in my cluster each running 2 maps at a time.

I saw that= it took around 7 minutes to scan all this data with 7 nodes, However I add= ed 4 more nodes, and it is taking even more time. In the map task which is = taking the longest, I see the following:

attempt_201106011013_0010_m_000009_0=A0=A0=A0 Task attempt: /default-ra= ck/domU-12-31-39-0F-75-13.compute-1.internal
Cleanup Attempt: /default-r= ack/domU-12-31-39-0F-75-13.compute-1.internal=A0=A0=A0 KILLED=A0=A0=A0 100.= 00%
=A0=A0=A0 2-Jun-2011 08:53:16=A0=A0=A0 2-Jun-2011 09:01:48 (8mins, 3= 2sec) =A0=A0=A0

and

attempt_201106011013_0010_m_000009_1=A0=A0=A0 /default-rack/= ip-10-196-198-48.ec2.internal=A0=A0=A0 SUCCEEDED=A0=A0=A0 100.00%
=A0=A0= =A0 2-Jun-2011 08:57:28=A0=A0=A0 2-Jun-2011 09:01:44 (4mins, 15sec) =A0=A0= =A0

The first attempt waited for 8mins 32secs before getting killed= . I checked datanode logs and all I see over there is some data coming in a= nd some going out. Can someout point me out to exactly how can I debug what= exactly was going on, and how can I avoid such long non-useful task attemp= ts from being run?

Thanks,
-Mayuresh
--20cf30433ea0e0345204a4b76044--