hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nate Woody (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque
Date Mon, 09 Mar 2009 12:44:50 GMT
HOD refactoring to ease integration with scheduler/resource managers other than torque
--------------------------------------------------------------------------------------

                 Key: HADOOP-5441
                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
             Project: Hadoop Core
          Issue Type: Improvement
          Components: contrib/hod
         Environment: All
            Reporter: Nate Woody


Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface
to start remote processes) command to start processes on all nodes in the job.  This call
is provided as part of a torqueInterface class that is meant to abstract interactions with
the torque resource managers (RMs).  However, this is not functionality typically provided
by other RMs, and is instead typically performed by an distributed command available on the
HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes
writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose
the remote start method on a somewhat faulty per-RM basis.

Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start
method is available as a configuration option in hodrc.  This involves fairly simple changes
to remove the pbsdsh command from the Scheduler class and addition configuration step of starting
the appropriate remote start wrapper.  The selection of the nodePool class will be altered
to allow dynamic loading of classes, so that new interfaces people choose to write will not
require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well
as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network
interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message