hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Pratap Singh <manu.i...@gmail.com>
Subject Re: Resource underutilization / final reduce tasks only uses half of cluster ( tasktracker map/reduce slots )
Date Mon, 14 May 2012 19:03:20 GMT
Hi JD,

Number of reduce task will depend upon the key after all the mapper is
done. if the key is same than all the data will go to one node, similarly
utilization of all nodes of cluster will depend upon the number of
different keys for reduce task.


Regards,
Abhishek

On Fri, May 11, 2012 at 4:57 PM, Jeremy Davis
<jdavis@upstreamsoftware.com>wrote:

>
> I see mapred.tasktracker.reduce.tasks.maximum and
> mapred.tasktracker.map.tasks.maximum, but I'm wondering if there isn't
> another tuning parameter I need to look at.
>
> I can tune the task tracker so that when I have many jobs running, with
> many simultaneous maps and reduces I utilize 95% of cpu and memory.
>
> Inevitably though I end up with a huge final reduce task that only uses
> half of of my cluster because I have reserved the other half for Mapping.
>
> Is there a way around this problem?
>
> Seems like there should also be a maximum number of reducers conditional
> on no Map tasks running.
>
> -JD

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message