hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <burtona...@gmail.com>
Subject Re: Performance of direct vs indirect shuffling
Date Wed, 21 Dec 2011 02:12:11 GMT
> We've discussed 'push' v/s 'pull' shuffle multiple times and each time
> turned away due to complexities in MR1. With MRv2 (YARN) this would be much
> more doable.
Ah.... gotcha. This is what I expected as well.  It would be interesting to
see a list of changes like this in MR1 vs MR2 to see what could POTENTIALLY
happen in the future should everyone get spare time.

> So, to really not do any disk i/o during the shuffle you'd need very large
> amounts of RAM...
Why is that?  I'm not suggesting buffering it *all* but send it directly
when it is generated.

I think there should be a SMALL amount of buffer for combining , and
compressing the data though.  Normally like 250-500MB per mapper but this
is when running say a 250GB job so this buffer is just to reduce IO sent to
the remote node.

Also, currently, the shuffle is effected by the reduce task. This has two
> major benefits :
> # The 'pull' can only be initiated after the reduce is scheduled. The
> 'push' model would be hampered if the reduce hasn't been started.

I've gone over this problem a number of times.  The way I'm handling it is
that ever map attempt is recorded and only successful maps actually have
their data reduced.  You end up having MORE intermediate data if machines
are crashing but it's only CHUNK_SIZE per ever crash and even in LARGE
clusters with lots of crashes this won't end up being a significant
percentage of the data.

> # The 'pull' is more resilient to failure of a single reduce. In the push
> model, it's harder to deal with a reduce

failing after a push from the map.
I don't see how this would be the case ... I'm replicating all the shuffle
data ... so if a reducer crashes I just startup a new one.

There IS the problem of whether we replicate the intermediate data from the
reducer but this can be a configuration option...

> Again, with MR2 we could experiment with push v/s pull where it makes
> sense (small jobs etc.). I'd love to mentor/help someone interested in
> putting cycles into it.
I'm going to be doing a ton of work in this area and I'll publish it if I
come across anything interesting.

View raw message