hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Macdonald <cra...@dcs.gla.ac.uk>
Subject Re: HOD questions
Date Thu, 18 Dec 2008 16:42:35 GMT
> Just FYI, at Yahoo! we've set torque to allocate separate nodes for 
> the number specified to HOD. In other words, the number corresponds to 
> the number of nodes, not processors. This has proved simpler to 
> manage. I forget right now, but I think you can make Torque behave 
> like this (to not treat processors as individual nodes).
Thanks  - I think it's a Maui directive, either on the job level or 
globally. I'm looking into this currently.
>> However, on inspection of the Jobtracker UI, it tells us that node19 
>> has "Max Map Tasks" and "Max Reduce Tasks" both set to 2, when for 
>> node19, it should only be allowed one map task. 
> While HOD does not do this automatically, please note that since you 
> are bringing up a Map/Reduce cluster on the allocated nodes, you can 
> submit map/reduce parameters with which to bring up the cluster when 
> allocating jobs. The relevant options are 
> --gridservice-mapred.server-params (or -M in shorthand). Please refer to
> http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop

> for details.
I was aware of this, but the issue is that unless you obtain dedicated 
nodes (as above), this option is not suitable, as it isn't set on a 
per-node basis. I think it would be /fairly/ straightfoward to add to 
HOD, as I detailed in my initial email, so that it "does the correct 
thing" out the box.
>> (2) In our InputFormat, we use the numSplits to tell us how many map 
>> tasks the job's files should be split into. However, HOD does not 
>> override the mapred.map.tasks property (nor the mapred.reduce.tasks), 
>> while they should be set dependent on the number of available task 
>> trackers and/or nodes in the HOD session.
> Can this not be submitted via the Hadoop job's configuration ? Again, 
> HOD cannot do this automatically currently. But you could use the 
> hod.client-params to set up a client side hadoop-site.xml that would 
> work like this for all jobs submitted to the cluster.
According to hadoop-default.xml, the number of maps is "Typically set to 
a prime several times greater than number of available hosts." - Say 
that we relax this recommendation to read "Typically set to a NUMBER 
several times greater than number of available hosts" then it should be 
straightforward for HOD to set it automatically then?


View raw message