hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Doubt from the book "Definitive Guide"
Date Thu, 05 Apr 2012 03:42:29 GMT
Hi Mohit,

On Thu, Apr 5, 2012 at 5:26 AM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> I am going through the chapter "How mapreduce works" and have some
> confusion:
> 1) Below description of Mapper says that reducers get the output file using
> HTTP call. But the description under "The Reduce Side" doesn't specifically
> say if it's copied using HTTP. So first confusion, Is the output copied
> from mapper -> reducer or from reducer -> mapper? And second, Is the call
> http:// or hdfs://

The flow is simple as this:
1. For M+R job, map completes its task after writing all partitions
down into the tasktracker's local filesystem (under mapred.local.dir
2. Reducers fetch completion locations from events at JobTracker, and
query the TaskTracker there to provide it the specific partition it
needs, which is done over the TaskTracker's HTTP service (50060).

So to clear things up - map doesn't send it to reduce, nor does reduce
ask the actual map task. It is the task tracker itself that makes the
bridge here.

Note however, that in Hadoop 2.0 the transfer via ShuffleHandler would
be over Netty connections. This would be much more faster and

> 2) My understanding was that mapper output gets written to hdfs, since I've
> seen part-m-00000 files in hdfs. If mapper output is written to HDFS then
> shouldn't reducers simply read it from hdfs instead of making http calls to
> tasktrackers location?

A map-only job usually writes out to HDFS directly (no sorting done,
cause no reducer is involved). If the job is a map+reduce one, the
default output is collected to local filesystem for partitioning and
sorting at map end, and eventually grouping at reduce end. Basically:
Data you want to send to reducer from mapper goes to local FS for
multiple actions to be performed on them, other data may directly go
to HDFS.

Reducers currently are scheduled pretty randomly but yes their
scheduling can be improved for certain scenarios. However, if you are
pointing that map partitions ought to be written to HDFS itself (with
replication or without), I don't see performance improving. Note that
the partitions aren't merely written but need to be sorted as well (at
either end). To do that would need ability to spill frequently (cause
we don't have infinite memory to do it all in RAM) and doing such a
thing on HDFS would only mean slowdown.

I hope this helps clear some things up for you.

Harsh J

View raw message