Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of mayuresh.kshirsagar@gmail.com
 designates 209.85.161.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=CWxYS6KiKxMGZbmGsH4ds060smWLUYCJsL2MWqMbtfb0slaMdrXtr1dlEGu5ZuA7bo
         A26di3pArUqLIz9rbaJG4JCewCo4AkbBO6C/SeO++bQvG6FCMvuEimVOuCAQtp+RhZa1
         Duw594ubSsptfChTrrbBigS/VnZhn6lbPntcY=
MIME-Version: 1.0
Date: Thu, 2 Jun 2011 15:06:58 +0530
Message-ID: <BANLkTi=F_1XKOhuHSKQConTXg31=Gp+M4A@mail.gmail.com>
Subject: Debugging killed task attempts
From: Mayuresh <mayuresh.kshirsagar@gmail.com>
To: mapreduce-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=20cf30433ea0e0345204a4b76044

--20cf30433ea0e0345204a4b76044
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I am trying to scan around 4,600,000 rows of hbase data. I am using hive to
query them back. I start the job with around 25 maps and there are 11 nodes
in my cluster each running 2 maps at a time.

I saw that it took around 7 minutes to scan all this data with 7 nodes,
However I added 4 more nodes, and it is taking even more time. In the map
task which is taking the longest, I see the following:

attempt_201106011013_0010_m_000009_0    Task attempt:
/default-rack/domU-12-31-39-0F-75-13.compute-1.internal
Cleanup Attempt: /default-rack/domU-12-31-39-0F-75-13.compute-1.internal
KILLED    100.00%
    2-Jun-2011 08:53:16    2-Jun-2011 09:01:48 (8mins, 32sec)

and

attempt_201106011013_0010_m_000009_1
/default-rack/ip-10-196-198-48.ec2.internal    SUCCEEDED    100.00%
    2-Jun-2011 08:57:28    2-Jun-2011 09:01:44 (4mins, 15sec)

The first attempt waited for 8mins 32secs before getting killed. I checked
datanode logs and all I see over there is some data coming in and some going
out. Can someout point me out to exactly how can I debug what exactly was
going on, and how can I avoid such long non-useful task attempts from being
run?

Thanks,
-Mayuresh

--20cf30433ea0e0345204a4b76044
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,<br><br>I am trying to scan around 4,600,000 rows of hbase data. I am us=
ing hive to query them back. I start the job with around 25 maps and there =
are 11 nodes in my cluster each running 2 maps at a time.<br><br>I saw that=
 it took around 7 minutes to scan all this data with 7 nodes, However I add=
ed 4 more nodes, and it is taking even more time. In the map task which is =
taking the longest, I see the following:<br>
<br>attempt_201106011013_0010_m_000009_0=A0=A0=A0 Task attempt: /default-ra=
ck/domU-12-31-39-0F-75-13.compute-1.internal<br>Cleanup Attempt: /default-r=
ack/domU-12-31-39-0F-75-13.compute-1.internal=A0=A0=A0 KILLED=A0=A0=A0 100.=
00%<br>=A0=A0=A0 2-Jun-2011 08:53:16=A0=A0=A0 2-Jun-2011 09:01:48 (8mins, 3=
2sec) =A0=A0=A0 <br>
<br>and<br><br>attempt_201106011013_0010_m_000009_1=A0=A0=A0 /default-rack/=
ip-10-196-198-48.ec2.internal=A0=A0=A0 SUCCEEDED=A0=A0=A0 100.00%<br>=A0=A0=
=A0 2-Jun-2011 08:57:28=A0=A0=A0 2-Jun-2011 09:01:44 (4mins, 15sec) =A0=A0=
=A0 <br><br>The first attempt waited for 8mins 32secs before getting killed=
. I checked datanode logs and all I see over there is some data coming in a=
nd some going out. Can someout point me out to exactly how can I debug what=
 exactly was going on, and how can I avoid such long non-useful task attemp=
ts from being run?<br>
<br>Thanks,<br>-Mayuresh<br>

--20cf30433ea0e0345204a4b76044--