hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Minh Duc Nguyen <mdngu...@gmail.com>
Subject Re: How to reduce total shuffle time
Date Tue, 28 Aug 2012 16:06:55 GMT
Without knowing your exact workload, using a Combiner (if possible) as
Tsuyoshi recommended should decrease your total shuffle time.  You can also
try compressing the map output so that there's less disk and network IO.
 Here's an example configuration using Snappy:

conf.set("mapred.compress.map.output","true");
conf.set("mapred.map.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");

HTH,
Minh

On Tue, Aug 28, 2012 at 4:37 AM, Tsuyoshi OZAWA <
ozawa.tsuyoshi@lab.ntt.co.jp> wrote:

> It depends of workload. Could you tell us more specification about
> your job? In general case which reducers are bottleneck, there are
> some tuning techniques as follows:
> 1. Allocate more memory to reducers. It decreases disk IO of reducers
> when merging and running reduce functions.
> 2. Use combine function, which enable mapper-side aggregation
> processing, if your MR job consists of the operations that satisfy
> both the commutative and the associative low.
>
> See also about combine functions:
> http://wiki.apache.org/hadoop/HadoopMapReduce
>
> Tsuyoshi
>
> On Tuesday, August 28, 2012, Gaurav Dasgupta wrote:
> >
> > Hi,
> >
> > I have run some large and small jobs and calculated the Total Shuffle
> Time for the jobs. I can see that the Total Shuffle Time is almost half the
> Total Time which was taken by the full job to complete.
> >
> > My question, here, is that how can we decrease the Total Shuffle Time?
> And doing so, what will be its effect on the Job?
> >
> > Thanks,
> > Gaurav Dasgupta
>

Mime
View raw message