hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Marz <nat...@rapleaf.com>
Subject Re: Control over max map/reduce tasks per job
Date Tue, 03 Feb 2009 21:14:46 GMT
Another use case for per-job task limits is being able to use every  
core in the cluster on a map-only job.



On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote:

> Chris,
>
> For my specific use cases, it would be best to be able to set N
> mappers/reducers per job per node (so I can explicitly say, run at  
> most 2 at
> a time of this CPU bound task on any given node).  However, the  
> other way
> would work as well (on 10 node system, would set job to max 20 tasks  
> at a
> time globally), but opens up the possibility that a node could be  
> assigned
> more than 2 of that task.
>
> I would work with whatever is easiest to implement as either would  
> be a vast
> improvement for me (can run high numbers of network latency bound  
> tasks
> without fear of cpu bound tasks killing the cluster).
>
> JG
>
>
>
>> -----Original Message-----
>> From: Chris K Wensel [mailto:chris@wensel.net]
>> Sent: Tuesday, February 03, 2009 11:34 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: Control over max map/reduce tasks per job
>>
>> Hey Jonathan
>>
>> Are you looking to limit the total number of concurrent mapper/
>> reducers a single job can consume cluster wide, or limit the number
>> per node?
>>
>> That is, you have X mappers/reducers, but only can allow N mappers/
>> reducers to run at a time globally, for a given job.
>>
>> Or, you are cool with all X running concurrently globally, but want  
>> to
>> guarantee that no node can run more than N tasks from that job?
>>
>> Or both?
>>
>> just reconciling the conversation we had last week with this thread.
>>
>> ckw
>>
>> On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:
>>
>>> All,
>>>
>>>
>>>
>>> I have a few relatively small clusters (5-20 nodes) and am having
>>> trouble
>>> keeping them loaded with my MR jobs.
>>>
>>>
>>>
>>> The primary issue is that I have different jobs that have  
>>> drastically
>>> different patterns.  I have jobs that read/write to/from HBase or
>>> Hadoop
>>> with minimal logic (network throughput bound or io bound), others
>> that
>>> perform crawling (network latency bound), and one huge parsing
>>> streaming job
>>> (very CPU bound, each task eats a core).
>>>
>>>
>>>
>>> I'd like to launch very large numbers of tasks for network latency
>>> bound
>>> jobs, however the large CPU bound job means I have to keep the max
>>> maps
>>> allowed per node low enough as to not starve the Datanode and
>>> Regionserver.
>>>
>>>
>>>
>>> I'm an HBase dev but not familiar enough with Hadoop MR code to even
>>> know
>>> what would be involved with implementing this.  However, in talking
>>> with
>>> other users, it seems like this would be a well-received option.
>>>
>>>
>>>
>>> I wanted to ping the list before filing an issue because it seems
>> like
>>> someone may have thought about this in the past.
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> Jonathan Gray
>>>
>>
>> --
>> Chris K Wensel
>> chris@wensel.net
>> http://www.cascading.org/
>> http://www.scaleunlimited.com/
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message