hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@yahoo-inc.com>
Subject Re: Integrate HADOOP and Map/Reduce paradigm into HPC environment
Date Fri, 05 Sep 2008 09:27:04 GMT
Hemanth Yamijala wrote:
> Filippo Spiga wrote:
>> This procedure allows me to
>> - use persistent HDFS on all cluster, placing namenode to frontend 
>> (always
>> up and running) and datanode to other nodes
>> - submit a lot of jobs to resource manager trasparently without any 
>> problem
>> and manage jobs priority/reservation with MAUI as simple as other 
>> classical
>> HPC jobs
>> - execute jobtracker and tasktracker services on the nodes chosen by 
>> TORQUE
>> (in particular, the first node selected becomes the jobtracker)
>> - store logs for different users into separated directory
>> - run only one job at time (but probably multiple map/reduce jobs can
>> runs together
>> because different jobs use different subset of nodes)
>>
>> Probably HOD does what I can do with my raw script... it's possibile 
>> that I
>> don't understand well the userguide...
>>
>>   
> Filippo, HOD indeed allows you to do all these things, and a little 
> bit more. On the other
> hand your script executes the jobtracker on the first node always, 
> which also seems
> useful to me. It will be nice if  you can still try HOD and see if it 
> makes your life
> simpler in any way. :-)
>
Some things that HOD does automatically:
- Set up log directories differently for different users
- Port numbers need not be fixed, HOD detects free ports and provisions 
the services to use
them
- Depending on need, you can also use a custom tarball of hadoop to 
deploy, rather than use
a pre-installed version.

Also, since HOD is only a thin wrapper around the resource manager, all 
policies that you
can set up for Maui can automatically apply for HOD-run clusters.

>> Sorry for my english :-P
>>
>> Regards
>>
>> 2008/9/2 Hemanth Yamijala <yhemanth@yahoo-inc.com>
>>
>>  
>>> Allen Wittenauer wrote:
>>>
>>>    
>>>> On 8/18/08 11:33 AM, "Filippo Spiga" <spiga.filippo@gmail.com> wrote:
>>>>
>>>>
>>>>      
>>>>> Well but I haven't understand how I should configurate HOD to work in
>>>>> this
>>>>> manner.
>>>>>
>>>>> For HDFS I folllow this sequence of steps
>>>>> - conf/master contain only master node of my cluster
>>>>> - conf/slaves contain all nodes
>>>>> - I start HDFS using bin/start-dfs.sh
>>>>>
>>>>>
>>>>>         
>>>>    Right, fine...
>>>>
>>>>
>>>>
>>>>      
>>>>> Potentially I would allow to use all nodes for MapReduce.
>>>>> For HOD which parameter should I set in contrib/hod/conf/hodrc? 
>>>>> Should I
>>>>> change only the gridservice-hdfs section?
>>>>>
>>>>>
>>>>>         
>>>>    I was hoping the HOD folks would answer this question for you, 
>>>> but they
>>>> are apparently sleeping. :)
>>>>
>>>>
>>>>
>>>>       
>>> Woops ! Sorry, I missed this.
>>>
>>>    
>>>>    Anyway, yes, if you point gridservice-hdfs to a static HDFS,  it 
>>>> should
>>>> use that as the -default- HDFS. That doesn't prevent a user from 
>>>> using HOD
>>>> to create a custom HDFS as part of their job submission.
>>>>
>>>>
>>>>
>>>>       
>>> Allen's answer is perfect. Please refer to
>>> http://hadoop.apache.org/core/docs/current/hod_user_guide.html#Using+an+external+HDFS

>>>
>>> for more information about how to set up the gridservice-hdfs 
>>> section to
>>> use a static or
>>> external HDFS.
>>>
>>>
>>>
>>>     
>>
>>
>>   
>


Mime
View raw message