hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iman E <hadoop_...@yahoo.com>
Subject Re: Performance of mappers
Date Fri, 05 Aug 2011 20:34:03 GMT
Milind, are you talking about the cache specified by the parameter local.cache.size. I have
not actually changed its value and I can see that the default is 10GB. 

From: "Milind.Bhandarkar@emc.com" <Milind.Bhandarkar@emc.com>
To: mapreduce-user@hadoop.apache.org; hadoop_ami@yahoo.com
Sent: Friday, August 5, 2011 3:47 PM
Subject: Re: Performance of mappers


Are you using cache archives ? If yes, what's the size of the cache archive?

- milind
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and do
not necessarily represent the views of any organization, past or present, the author might
be affiliated with.)

From: Iman E <hadoop_ami@yahoo.com<mailto:hadoop_ami@yahoo.com>>
Reply-To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>"
<mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>, Iman
E <hadoop_ami@yahoo.com<mailto:hadoop_ami@yahoo.com>>
Date: Fri, 5 Aug 2011 15:31:51 -0400
To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>
Subject: Re: Performance of mappers

Hi Arun,
Thanks for your reply. I am running Hadoop-0.20.1 and I also tried the cloudera hadoop-0.20.1+152.

The task tracker logs does not show any problem. These are the log entries for a task attempt
that is too slow
2011-08-05 14:28:01,644 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201108041814_0035_m_000000_0 task's state:UNASSIGNED
2011-08-05 14:28:01,644 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201108041814_0035_m_000000_0
2011-08-05 14:28:01,644 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current
free slots : 2 and trying to launch attempt_201108041814_0035_m_000000_0
2011-08-05 14:28:03,097 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201108041814_0035_m_1371719584
given task: attempt_201108041814_0035_m_000000_0
2011-08-05 14:32:52,341 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0
2011-08-05 14:32:55,398 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0
2011-08-05 14:32:58,402 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0
2011-08-05 14:33:01,419 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0
2011-08-05 14:33:04,804 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0
2011-08-05 14:33:06,617 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0
2011-08-05 14:33:06,625 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201108041814_0035_m_000000_0
is done.


From: Arun C Murthy <acm@hortonworks.com<mailto:acm@hortonworks.com>>
To: mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>; Iman
E <hadoop_ami@yahoo.com<mailto:hadoop_ami@yahoo.com>>
Sent: Friday, August 5, 2011 2:05 PM
Subject: Re: Performance of mappers

Which release of Hadoop are you running?

What do the logs on the TaskTracker tell you during the time the slow tasks are getting launched?

hadoop-0.20.203 has a ton of bug fixes since hadoop-0.20.2 which help fix issues with slow
launches - you might want to upgrade.


On Aug 5, 2011, at 11:02 AM, Iman E wrote:

Hello all,
I have a question regarding the mappers. I can see from the logs that the start time of the
mapper is different from start time of logging. I am having a problem because that time difference
sometimes is few seconds, but other times it is

For example, one mapper that is supposed to read 65 MB. Its start time is 12:30:53 whereis
the logging start time is 12:33:01 and the end time is 12:33:20. All the laoded data are local
to the same rack.
In a perfect run, these numbers are as follows: the start time is 18:15:45, logging start
time: 18:15:48, and end time: 18:16:02.

I am running a job of more than 2400 mapper. Because of the above problem, instead of the
job taking 15-20 mins  on 14 machine ( it happened in few runs), other times it is taking
more than 70 mins. Any suggestions how to fix this problem or what could possibly be causing

View raw message