hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Astaneh <aasta...@rc.usf.edu>
Subject Re: Integration with SGE
Date Wed, 18 Feb 2009 18:37:54 GMT

Well, we have a graduate student that is using our facilities for a 
Masters' thesis in Map/Reduce. You guys are generating topics in 
computer science research.

What do we need to do in order to get our documentation on the Hadoop pages?

> Thanks guys,it is good to head that Hadoop is spreading... :-)
> Regards,
> Lukas
> On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <stevel@apache.org> wrote:
>> Amin Astaneh wrote:
>>> Lukáš-
>>>> Hi Amin,
>>>> I am not familiar with SGE, do you think you could tell me what did you
>>>> get
>>>> from this combination? What is the benefit of running Hadoop on SGE?
>>> Sun Grid Engine is a distributed resource management platform for
>>> supercomputing centers. We use it to allocate resources to a supercomputing
>>> task, such as requesting 32 processors to run a particular simulation. This
>>> mechanism is analogous to the scheduler on a multi-user OS. What I was able
>>> to accomplish was to turn Hadoop into an as-needed service. When you submit
>>> a job request to run Hadoop as the documentation describes, a Hadoop cluster
>>> of arbitrary size is instantiated depending on how many nodes were requested
>>> by generating a cluster configuration specific to that job request. This
>>> allows the Hadoop cluster to be deployed within the context of Gridengine,
>>> as well as being able to coexist with other running simulations on the
>>> cluster.
>>> To the researcher or user needing to run a mapreduce code, all they need
>>> to worry about is telling Hadoop to execute it as well as determining how
>>> many machines should be dedicated to the task. This benefit makes Hadoop
>>> very accessible to people since they don't need to worry about configuring a
>>> cluster, SGE and it's helper scripts do it for them.
>>> As Steve Loughran accurately commented, as of now we can only run one set
>>> of Hadoop slave processes per machine, due to the network binding issue.
>>> That problem is mitigated by configuring SGE to spread the slaves one per
>>> machine automatically to avoid failures.
>> Only the Namenode and JobTracker need hard-coded/well-known port numbers,
>> the rest could all be done dynamically.
>> One thing SGE does offer over Xen-hosted images is better performance than
>> virtual machines, for both CPU  and storage, as  virtualised disk
>> performance can be awful, and even on the latest x86 parts, there is a
>> measurable hit from VM overheads.

View raw message