hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Best practices for hadoop shuffling/tunning ?
Date Tue, 31 Jan 2012 21:31:22 GMT
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.

Your io.sort.mb is too high. You only have 1G of heap for the map. Reduce parallel copies
is too high too.

On Jan 30, 2012, at 4:50 AM, praveenesh kumar wrote:

> Hey guys,
> 
> Just wanted to ask, are there any sort of best practices to be followed for
> hadoop shuffling improvements ?
> 
> I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
> with 48 GB RAM.
> 
> I have set the following parameters :
> 
> fs.inmemory.size.mb=2000
> io.sort.mb=2000
> io.sort.factor=200
> io.file.buffer.size=262544
> 
> mapred.map.tasks=200
> mapred.reduce.tasks=40
> mapred.reduce.parallel.copies=80
> mapred.map.child.java.opts = 1024 Mb
> mapred.map.reduce.java.opts=1024 Mb
> 
> mapred.job.tracker.handler.count=60
> tasktracker.http.threads=50
> mapred.job.reuse.jvm.num.tasks = -1
> mapred.compress.map.output = true
> mapred.reduce.slowstart.completed.maps = 0.5
> 
> mapred.tasktracker.map.tasks.maximum=24
> mapred.tasktracker.reduce.tasks.maximum=12
> 
> 
> Can anyone please validate the above tuning parameters, and suggest any
> further improvements ?
> My mappers are running fine. Shuffling and reducing part is comparatively
> slower, than expected for normal jobs. Wanted to know what I am doing
> wrong/missing.
> 
> Thanks,
> Praveenesh

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
View raw message