hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Templeton <Dan.Temple...@Sun.COM>
Subject GridEngine module for Hadoop on Demand
Date Thu, 07 May 2009 20:06:06 GMT

I have a functioning module for Grid Engine for HoD, but some parts of 
it are currently hard-coded to my workstation.  In cleaning up those 
elements, I need some advice.  Hopefully this is the right forum.

So, in the hodlib/NodePools/torque.py file, there's a runWorkers() 
method.  In that method, it makes a single call to pbsdsh to start the 
NameNode, DataNodes, JobTracker, and TaskTracker.  I know nada about 
Torque, so please tell me if I'm interpreting this correctly.  It would 
appear that the pbsdsh somehow reads out of the environment how many 
hodring processes it should start up and executes them remotely, and 
each hodring then figures out what service it should run.

In Grid Engine, the rough equivalent of pbsdsh is qrsh.  (I think.)  
With qrsh, the master assigns the HoD job a set of nodes, and I then 
have to step through that set of nodes and qrsh to each one to start the 
hodring services.  As far as I can tell, the total number of hodring 
services I need to start is 1 for the NameNode + 1 for the JobTracker + 
n for the DataNodes + m for the TaskTrackers.  The thing that I'm not 
grokking is how the hodrings know what services to start, and how I 
should be parceling them out across the nodes of the cluster.  Should I 
be making sure I have two hodrings per node, one for the DataNode and 
one of the TaskTracker?  If I were to go start a dozen hodrings, one on 
each of a dozen machines, would they work out among themselves how many 
should be DataNodes and how many should be TaskTrackers?

One more thing.  If the above is on the mark, that means you're 
consuming a queue slot for each DataNode unless you use an external hdfs 
service.  That seems like a waste of cluster resources since slots tend 
to correspond more to compute resources than I/O.  I have to wonder if 
it wouldn't be more efficient from a cluster perspective to have each 
hodring start a DataNode and a TaskTracker.  It would slightly 
oversubscribe that job slot, but that may be better than grossly 
undersubscribing two.


View raw message