hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: Control over max map/reduce tasks per job
Date Tue, 03 Feb 2009 19:33:43 GMT
Hey Jonathan

Are you looking to limit the total number of concurrent mapper/ 
reducers a single job can consume cluster wide, or limit the number  
per node?

That is, you have X mappers/reducers, but only can allow N mappers/ 
reducers to run at a time globally, for a given job.

Or, you are cool with all X running concurrently globally, but want to  
guarantee that no node can run more than N tasks from that job?

Or both?

just reconciling the conversation we had last week with this thread.


On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:

> All,
> I have a few relatively small clusters (5-20 nodes) and am having  
> trouble
> keeping them loaded with my MR jobs.
> The primary issue is that I have different jobs that have drastically
> different patterns.  I have jobs that read/write to/from HBase or  
> Hadoop
> with minimal logic (network throughput bound or io bound), others that
> perform crawling (network latency bound), and one huge parsing  
> streaming job
> (very CPU bound, each task eats a core).
> I'd like to launch very large numbers of tasks for network latency  
> bound
> jobs, however the large CPU bound job means I have to keep the max  
> maps
> allowed per node low enough as to not starve the Datanode and  
> Regionserver.
> I'm an HBase dev but not familiar enough with Hadoop MR code to even  
> know
> what would be involved with implementing this.  However, in talking  
> with
> other users, it seems like this would be a well-received option.
> I wanted to ping the list before filing an issue because it seems like
> someone may have thought about this in the past.
> Thanks.
> Jonathan Gray

Chris K Wensel

View raw message