hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@yahoo-inc.com>
Subject Re: GridEngine module for Hadoop on Demand
Date Sat, 09 May 2009 09:55:26 GMT
Daniel Templeton wrote:
> Thanks for the reply.  Grid Engine works roughly the same way wrt 
> parallel jobs, except that we call the tasks the master and the 
> slaves.  Grid Engine does not have a pbsdsh equivalent, but it would 
> be a really trivial wrapper script to write for qrsh, which is pbsdsh 
> minus the automatic use of the nodes files (called the pe_hostfile in 
> Grid Engine).
> I assume from the 3-task minimum that the JobTracker gets a slot, the 
> NameNode gets a slot, and there has to be at least one slot running a 
> DataNode/TaskTracker.  Correct?  
> Should a single job be prevented from running more than on hodring on 
> a single host?
More than one hodrings  can be launched on a single host. However, this 
means more than 1 instance of a slave would get launched - like 2 
tasktrackers and 2 datanodes. In practice, we've seen that while this is 
also OK, when we start running M/R tasks on such a system, it slows down 
the system quite a bit. Hence I don't think this is really useful.
> How do I go about contributing this Grid Engine extension to the HoD 
> source base?
Please feel free to submit a patch if you've figured out all the 
details. It should be against the current code base. Please refer to 
http://wiki.apache.org/hadoop/HowToContribute for details on contributing.

> Hemanth Yamijala wrote:
>> Daniel Templeton wrote:
>>> Hi,
>>> I have a functioning module for Grid Engine for HoD, but some parts 
>>> of it are currently hard-coded to my workstation.  In cleaning up 
>>> those elements, I need some advice.  Hopefully this is the right forum.
>>> So, in the hodlib/NodePools/torque.py file, there's a runWorkers() 
>>> method.  In that method, it makes a single call to pbsdsh to start 
>>> the NameNode, DataNodes, JobTracker, and TaskTracker.  I know nada 
>>> about Torque, so please tell me if I'm interpreting this correctly.  
>>> It would appear that the pbsdsh somehow reads out of the environment 
>>> how many hodring processes it should start up and executes them 
>>> remotely, and each hodring then figures out what service it should run.
>> Roughly right. In Torque, when a set of nodes are assigned to a job, 
>> the first node in that list is special (it's called mother superior - 
>> MS), the other nodes are called sisters. The job that's submitted to 
>> torque is a HOD process called 'ringmaster'. The ringmaster starts on 
>> the MS and invokes runWorkers which executes pbsdsh. AFAIK, pbsdsh 
>> reads the environment and gets a 'nodes' file that Torque writes out. 
>> This file contains all the sisters allocated for the job (including 
>> the MS). It executes the command passed to pbsdsh - another HOD 
>> process, called hodring - on all of these nodes. The Hodring 
>> processes work with the ringmaster and decide which service to run. 
>> In a sense the ringmaster coordinates which service to start where, 
>> and inform the hodring to start that service.
>>> In Grid Engine, the rough equivalent of pbsdsh is qrsh.  (I think.)  
>>> With qrsh, the master assigns the HoD job a set of nodes, and I then 
>>> have to step through that set of nodes and qrsh to each one to start 
>>> the hodring services.  As far as I can tell, the total number of 
>>> hodring services I need to start is 1 for the NameNode + 1 for the 
>>> JobTracker + n for the DataNodes + m for the TaskTrackers.  
>> HOD has a facility to use a HDFS service that's started outside of 
>> HOD. In that mode, it does not start NameNode or DataNodes. Also, the 
>> number of DataNodes always equals the number of TaskTrackers (if HDFS 
>> services are started with HOD).
>>> The thing that I'm not grokking is how the hodrings know what 
>>> services to start, and how I should be parceling them out across the 
>>> nodes of the cluster.  
>> This is decided by the ringmaster process. The logic is independent 
>> of the resource manager in use, and hence need not be worried about 
>> when porting to a new resource manager.
>>> Should I be making sure I have two hodrings per node, one for the 
>>> DataNode and one of the TaskTracker?  
>> No, a single hodring gets to start both the daemons.
>>> If I were to go start a dozen hodrings, one on each of a dozen 
>>> machines, would they work out among themselves how many should be 
>>> DataNodes and how many should be TaskTrackers? One more thing.  If 
>>> the above is on the mark, that means you're consuming a queue slot 
>>> for each DataNode unless you use an external hdfs service.  That 
>>> seems like a waste of cluster resources since slots tend to 
>>> correspond more to compute resources than I/O.  I have to wonder if 
>>> it wouldn't be more efficient from a cluster perspective to have 
>>> each hodring start a DataNode and a TaskTracker.  It would slightly 
>>> oversubscribe that job slot, but that may be better than grossly 
>>> undersubscribing two.
>> Explained above.
>> Thanks
>> Hemanth

View raw message