hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Allen <jayaye...@gmail.com>
Subject Hadoop 1.0.4 Performance Problem
Date Fri, 23 Nov 2012 12:02:25 GMT

We've just upgraded our cluster from Hadoop 0.20.203 to 1.0.4 and have hit performance problems.
 Before the upgrade a 15TB terasort took about 45 minutes, afterwards it takes just over an
hour.  Looking in more detail it appears the shuffle phase has increased from 20 minutes to
40 minutes.  Does anyone have any thoughts about what's changed between these releases that
may have caused this?

The only change to the system has been to Hadoop.  We moved from a tarball install of 0.20.203
with all processes running as hadoop to an RPM deployment of 1.0.4 with processes running
as hdfs and mapred.  Nothing else has changed.

As a related question, we're still running with a configuration that was tuned for version
0.20.1. Are there any recommendations for tuning properties that have been introduced in recent
versions that are worth investigating?

View raw message