hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Integration with SGE
Date Wed, 18 Feb 2009 16:24:32 GMT
Amin Astaneh wrote:
> Lukáš-
>> Hi Amin,
>> I am not familiar with SGE, do you think you could tell me what did 
>> you get
>> from this combination? What is the benefit of running Hadoop on SGE?
> Sun Grid Engine is a distributed resource management platform for 
> supercomputing centers. We use it to allocate resources to a 
> supercomputing task, such as requesting 32 processors to run a 
> particular simulation. This mechanism is analogous to the scheduler on a 
> multi-user OS. What I was able to accomplish was to turn Hadoop into an 
> as-needed service. When you submit a job request to run Hadoop as the 
> documentation describes, a Hadoop cluster of arbitrary size is 
> instantiated depending on how many nodes were requested by generating a 
> cluster configuration specific to that job request. This allows the 
> Hadoop cluster to be deployed within the context of Gridengine, as well 
> as being able to coexist with other running simulations on the cluster.
> To the researcher or user needing to run a mapreduce code, all they need 
> to worry about is telling Hadoop to execute it as well as determining 
> how many machines should be dedicated to the task. This benefit makes 
> Hadoop very accessible to people since they don't need to worry about 
> configuring a cluster, SGE and it's helper scripts do it for them.
> As Steve Loughran accurately commented, as of now we can only run one 
> set of Hadoop slave processes per machine, due to the network binding 
> issue. That problem is mitigated by configuring SGE to spread the slaves 
> one per machine automatically to avoid failures.

Only the Namenode and JobTracker need hard-coded/well-known port 
numbers, the rest could all be done dynamically.

One thing SGE does offer over Xen-hosted images is better performance 
than virtual machines, for both CPU  and storage, as  virtualised disk 
performance can be awful, and even on the latest x86 parts, there is a 
measurable hit from VM overheads.

View raw message