hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Astaneh <aasta...@rc.usf.edu>
Subject Re: Integration with SGE
Date Wed, 18 Feb 2009 14:40:47 GMT
> Hi Amin,
> I am not familiar with SGE, do you think you could tell me what did you get
> from this combination? What is the benefit of running Hadoop on SGE?
Sun Grid Engine is a distributed resource management platform for 
supercomputing centers. We use it to allocate resources to a 
supercomputing task, such as requesting 32 processors to run a 
particular simulation. This mechanism is analogous to the scheduler on a 
multi-user OS. What I was able to accomplish was to turn Hadoop into an 
as-needed service. When you submit a job request to run Hadoop as the 
documentation describes, a Hadoop cluster of arbitrary size is 
instantiated depending on how many nodes were requested by generating a 
cluster configuration specific to that job request. This allows the 
Hadoop cluster to be deployed within the context of Gridengine, as well 
as being able to coexist with other running simulations on the cluster.

To the researcher or user needing to run a mapreduce code, all they need 
to worry about is telling Hadoop to execute it as well as determining 
how many machines should be dedicated to the task. This benefit makes 
Hadoop very accessible to people since they don't need to worry about 
configuring a cluster, SGE and it's helper scripts do it for them.

As Steve Loughran accurately commented, as of now we can only run one 
set of Hadoop slave processes per machine, due to the network binding 
issue. That problem is mitigated by configuring SGE to spread the slaves 
one per machine automatically to avoid failures.

In short, this solution makes Hadoop accessible for use in the HPC setting.

> Regards,
> Lukas
> On Tue, Feb 17, 2009 at 4:04 PM, Amin Astaneh <aastaneh@rc.usf.edu> wrote:
>> Developers-
>> I am happy to announce that I have managed to perform tight integration of
>> Hadoop with Sun Grid Engine. This can benefit Hadoop by making it more
>> accessible to supercomputing centers. I would like to contribute my
>> documentation so that other SGE admins may benefit from the resource:
>> https://rc.usf.edu/trac/hadoop/wiki/SGEIntegration
>> -Amin Astaneh
>> USF Research Computing

View raw message