hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Chern <idry...@gmail.com>
Subject Re: Ideal number of mappers and reducers to increase performance
Date Mon, 04 Aug 2014 17:10:13 GMT
The mapper and reducer numbers really depends on what your program is trying to do. Without
your actual query it’s really difficult to tell why you are having this problem.

For example, if you tried to perform a global sum or count, cascalog will only use one reducer
since this is the only way to do a global sum/count. To avoid this behavior you can set a
output key that can generally split the reducer. e.g. word count example use word as the output
key. With this word count output you can sum it up in a serial manner or run the global map
reduce job with this much smaller input.

The mapper number is usually not a performance bottleneck. For your curious, if the file is
splittable (ie, unzipped text or sequence file), the number of mapper number is controlled
by the split size in configuration. The smaller the split size it is, the more mappers are

In short, your problem is not likely to be a configuration problem, but misunderstood the
map reduce logic. To solve your problem, can you paste your cascalog query and let people
take a look?


On Aug 3, 2014, at 1:51 PM, Sindhu Hosamane <sindhuht@gmail.com> wrote:

> I am not coding in mapreduce. I am running my cascalog queries on hadoop cluster(1 node
) on data of size 280MB. So all the config settings has to be made on hadoop cluster itself.
> As you said , i set the values of mapred.tasktracker.map.tasks.maximum =4  
>  and mapred.tasktracker.reduce.tasks.maximum = 4  
> and then kept tuning it up ways and down ways  like below 
> (4+4)   (5+3) (6+2) (2+6) (3+5) (3+3 ) (10+10)
> But all the time performance remains same .
> Everytime, inspite whatever combination of mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum
i use -  produces same execution time .
> Then when the above things failed i also tried mapred.reduce.tasks = 4 
> still results are same. No reduction in execution time.
> What other things should i set? Also i made sure hadoop is restarted every time after
changing config.
> I have attached my conf folder ..please indicate me what should be added where ?
> I am really stuck ..Your help would be much appreciated. Thank you .
> <(singlenodecuda)conf.zip>
> Regards,
> Sindhu

View raw message