hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Macdonald <cra...@dcs.gla.ac.uk>
Subject Re: HOD questions
Date Fri, 19 Dec 2008 13:06:23 GMT
Hi Hemanth,

>>> While HOD does not do this automatically, please note that since you 
>>> are bringing up a Map/Reduce cluster on the allocated nodes, you can 
>>> submit map/reduce parameters with which to bring up the cluster when 
>>> allocating jobs. The relevant options are 
>>> --gridservice-mapred.server-params (or -M in shorthand). Please 
>>> refer to
>>> http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop

>>> for details.
>> I was aware of this, but the issue is that unless you obtain 
>> dedicated nodes (as above), this option is not suitable, as it isn't 
>> set on a per-node basis. I think it would be /fairly/ straightfoward 
>> to add to HOD, as I detailed in my initial email, so that it "does 
>> the correct thing" out the box.
> True, I did assume you obtained dedicated nodes. It has been fairly 
> simpler to operate HOD in this manner, and if I understand correctly, 
> would help to solve the requirement you are having as well.
I think it's a Maui change (or qos directive) to obtain dedicated nodes 
- I'm looking into it presently, but I'm not sure that the correct exact 
incantation is correct.
-W x="NACCESSPOLICY=SINGLETASK"

For mixed job environments [e.g. universities] - where users have jobs 
which aren't HOD, often using single CPUs, it can mean that a job has 
more complicated requirements and will hence take longer to reach the 
head of the queue.

>> According to hadoop-default.xml, the number of maps is "Typically set 
>> to a prime several times greater than number of available hosts." - 
>> Say that we relax this recommendation to read "Typically set to a 
>> NUMBER several times greater than number of available hosts" then it 
>> should be straightforward for HOD to set it automatically then?
> Actually, AFAIK, the number of maps for a job is determined more or 
> less exclusively by the M/R framework based on the number of splits. 
> I've seen messages on this list before about how the documentation for 
> this configuration item is misleading. So, this might actually not 
> make a difference at all, whatever is specified.
The reason we were asking is that mapred.map.tasks is provided as the 
"hint" to the input split.
We were using this number to generate the number of maps. I think its 
just that FileInputFormat doesn't exactly honour the hint, from what I 
can see. Pig's InputFormat ignores the hint.



Craig

Mime
View raw message