hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Allen <jayaye...@gmail.com>
Subject Re: Hadoop 1.0.4 Performance Problem
Date Mon, 26 Nov 2012 21:49:13 GMT
Problem solved, but worth warning others about.

Before the upgrade the reducers for the terasort process had been evenly
distributed around the cluster - one per task tracker in turn, looping
around the cluster until all tasks were allocated.  After the upgrade all
reduce task had been submitted to small number of task trackers - submit
tasks until the task tracker slots were full and then move onto the next
task tracker.  Skewing the reducers like this quite clearly hit the
benchmark performance.

The reason for this turns out to be the fair scheduler rewrite
(MAPREDUCE-2981) that appears to have subtly modified the behaviour of the
assign multiple property. Previously this property caused a single map and
a single reduce task to be allocated in a task tracker heartbeat (rather
than the default of a map or a reduce).  After the upgrade it allocates as
many tasks as there are available task slots.  Turning off the multiple
assignment feature returned the terasort to its pre-upgrade performance.

I can see potential benefits to this change and need to think through the
consequences to real world applications (though in practice we're likely to
move away from fair scheduler due to MAPREDUCE-4451).  Investigating this
has been a pain so to warn other user is there anywhere central that can be
used to record upgrade gotchas like this?


On Fri, Nov 23, 2012 at 12:02 PM, Jon Allen <jayayedev@gmail.com> wrote:

> Hi,
>
> We've just upgraded our cluster from Hadoop 0.20.203 to 1.0.4 and have hit
> performance problems.  Before the upgrade a 15TB terasort took about 45
> minutes, afterwards it takes just over an hour.  Looking in more detail it
> appears the shuffle phase has increased from 20 minutes to 40 minutes.
>  Does anyone have any thoughts about what's changed between these releases
> that may have caused this?
>
> The only change to the system has been to Hadoop.  We moved from a tarball
> install of 0.20.203 with all processes running as hadoop to an RPM
> deployment of 1.0.4 with processes running as hdfs and mapred.  Nothing
> else has changed.
>
> As a related question, we're still running with a configuration that was
> tuned for version 0.20.1. Are there any recommendations for tuning
> properties that have been introduced in recent versions that are worth
> investigating?
>
> Thanks,
> Jon

Mime
View raw message