hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <bharathvissapragada1...@gmail.com>
Subject Re: MR job scheduler
Date Fri, 21 Aug 2009 16:57:45 GMT
I discussed the same doubt in Hbase forums .. Iam pasting the reply i got
(for those who aren't subscribed to that list)

Regarding optimizing the reduce phase(similar to what harish was pointing

I got the following reply .. frm Ryan

"I think people are confused about how optimal map reduces have to be.
Keeping all the data super-local on each machine is not always helping
you, since you have to read via a socket anyways. Going remote doesn't
actually make things that much slower, since on a modern lan ping
times are < 0.1ms.  If your entire cluster is hanging off a single
switch, there is nearly unlimited bandwidth between all nodes
(certainly much higher than any single system could push).  Only once
you go multi-switch then switch-locality (aka rack locality) becomes

Remember, hadoop isn't about the instantaneous speed of any job, but
about running jobs in a highly scalable manner that works on tens or
tens of thousands of nodes. You end up blocking on single machine
limits anyways, and the r=3 of HDFS helps you transcend a single
machine read speed for large files. Keeping the data transfer local in
this case results in lower performance."

Just FYI!

On Fri, Aug 21, 2009 at 1:43 PM, Harish Mallipeddi <
harish.mallipeddi@gmail.com> wrote:

> On Fri, Aug 21, 2009 at 12:11 PM, bharath vissapragada <
> bharathvissapragada1990@gmail.com> wrote:
> > Yes , My doubt is that how is the location of the reducer selected . Is
> it
> > selected arbitrarily or is selected on a particular machine which has
> > already the more values (corresponding to the key of that reducer) which
> > reduces the cost of transferring data across the network(because already
> > many values to that key are on that machine where the map phase
> > completed)..
> >

I discussed the same issue on hbase forums and one of its developers
answered my questi

> I think what you're asking for is whether a ReduceTask is scheduled on a
> node which has the largest partition among all the mapoutput partitions
> (p1-pN) that the ReduceTask has to fetch in order to do its job. The answer
> is "no" - the ReduceTasks are assigned arbitrarily (no such optimization is
> done and I think this can really be an optimization only if 1 of your
> partitions is heavily skewed for some reason). Also as Amogh pointed out,
> the ReduceTasks start fetching their mapoutput-partitions (shuffle phase)
> as
> and when they hear about completed ones. So it would not be possible to
> schedule ReduceTasks only on nodes with the largest partitions.
> --
> Harish Mallipeddi
> http://blog.poundbang.in

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message