hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Reduce shuffle data transfer takes excessively long
Date Tue, 31 Jan 2012 21:22:04 GMT
If just changing the buffer to 4k makes a big difference could you at a minimum file a JIRA
to change that buffer size?  I know that it is not a final fix but it sure seems like a very
nice Band-Aid to put on until we can get to the root of the issues.

--Bobby Evans

On 1/27/12 9:23 PM, "Sven Groot" <sgroot@gmail.com> wrote:

Hi Nick,

Thanks for your reply. I don't think what you are saying is related, as the problem happens
when the data is transferred; it's not deserialized or anything during that step. Note that
my code isn't involved at all: it's purely Hadoop's own code that's running here.

I have done additional work in trying to find the cause, and it's definitely in Jetty. I have
created a simple test with Jetty that transfers a file in a manner similar to Hadoop, and
it shows the same behavior. It appears to be linked to the buffer size used by Jetty for chunked
transfer encoding. Hadoop uses a hardcoded buffer of 64KB for that, which exhibits the problem.
If I change the buffer to 4KB, Jetty's transfer speed increases by an order of magnitude.

I have posted a question on StackOverflow regarding this behavior in Jetty: http://stackoverflow.com/questions/9031311/slow-transfers-in-jetty-with-chunked-transfer-encoding-at-certain-buffer-size.
So far, there are no answers posted.

I've always found it a strange decision that Hadoop uses HTTP to transfer intermediate data.
Let's just say that this issue reinforces that opinion.


From: Nussbaum, Nick [mailto:Nick.Nussbaum@fticonsulting.com]
Sent: zaterdag 28 januari 2012 3:52
To: sgroot@gmail.com
Subject: FW: Reduce shuffle data transfer takes excessively long

I'm no expert on Hadoop, but I have already encountered a surprising gotcha that may be your

If you repeatedly use a function like String getBytes <http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#getBytes%28%29>
() that needs to know a default OS character set it can take a surprisingly long time. I speculate
this is due to having to go through hoops in various sandboxes to read the OS default locale.

If it is the case, getting the system locale and char set once and specifying it explicitly
in the call to getBytes() or whatever should make a big difference.

let me know if it works for you


From: Sven Groot [mailto:sgroot@gmail.com]
Sent: Thursday, January 26, 2012 10:25 PM
To: mapreduce-user@hadoop.apache.org
Subject: Reduce shuffle data transfer takes excessively long


I have been working on profiling the performance of certain parts of Hadoop For
this reason, I have set up a simple cluster that uses one node as the Namenode/Jobtracker,
and one node as the sole Datanode/tasktracker.

In this experiment, I run a job consisting of a single map task and a single reduce task.
Both are simply using the default Mapper/Reducer implementations (the identity functions).
The input of the job is a file with a single 256MB block. Therefore, the output of the map
task is 256MB, and the reduce task must shuffle that 256MB from the local host.

To my surprise, shuffling this amount of data takes around 9 seconds, which is excessively
slow. First I turned my attention to the ReduceTask.ReduceOutputCopier. I determined that
about 1.1 seconds is spent calculating checksums (this is the expected value), and the remaining
time is spent reading from the stream returned by URLConnection.getInputStream(). Some simple
tests with URLConnection could not reproduce that issue except if it was actually reading
from the TaskTracker's MapOutputServlet, so the problem seemed to be on the server side. Reading
the same amount of data from any other local web server takes only 0.2s.

I inserted some measurements into the MapOutputServlet and determined that 0.1s was spent
reading the intermediate file (unsurprising as it was still in the page cache) and 7.7s are
spent writing to the stream returned by response.getOutputStream(). The slowdown therefore
appears to be in Jetty.

CPU usage during the transfer appears to be low, so it feels like the transfer is getting
throttled somehow. But if that's the case I can't figure out how that's happening. There's
nothing in the source code to lead me to believe Hadoop is deliberately throttling anything,
and as far as I know Jetty doesn't throttle by default.

I was seeing some warnings in the tasktracker log file related to this: http://wiki.eclipse.org/Jetty/Feature/JVM_NIO_Bug
However, running Hadoop under Java 7 made those warnings disappear and the transfer is still
slow, so I don't think that's it.

I'm out of ideas as to what could be causing this. Any insights?


Confidentiality Notice:
This email and any attachments may be confidential and protected by legal privilege. If you
are not the intended recipient, be aware that any disclosure, copying, distribution or use
of the e-mail or any attachment is prohibited. If you have received this email in error, please
notify us immediately by replying to the sender and then delete this copy and the reply from
your system. Thank you for your cooperation.

View raw message