hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: max 1 mapper per node
Date Thu, 10 May 2012 13:29:55 GMT
Yes adding in more resources in the scheduling request would be the ideal solution to the problem.
 But sadly that is not a trivial change.  The initial solution I suggested is an ugly hack,
and will not work for the cases you have suggested.  If you feel that this is important work
please feel free to file a JIRA for this.  We can continue discussion on that JIRA about 
the details of how to add in this type of functionality.  I am very interested in the scheduler
and would be happy to help out, but sadly my time right now is very limited.

--Bobby Evans

On 5/10/12 6:56 AM, "Radim Kolar" <hsn@filez.com> wrote:

> We've been against these 'features' since it leads to very bad
> behaviour across the cluster with multiple apps/users etc.
Its not new feature, its extension of existing resource scheduling which
works good enough only for RAM. There are 2 other resources - CPU cores
and network IO which needs to be considered.

We have job which is doing lot of network IO in mapper and its desirable
to run mappers on different nodes even if reading blocks from HDFS will
not be local.

Our second job is burning all CPU cores on machine while doing
computations, its important for mappers not to land on same node.

View raw message