hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukáš Vlček <lukas.vl...@gmail.com>
Subject Re: Integration with SGE
Date Wed, 18 Feb 2009 18:01:30 GMT
Thanks guys,it is good to head that Hadoop is spreading... :-)
Regards,
Lukas

On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <stevel@apache.org> wrote:

> Amin Astaneh wrote:
>
>> Lukáš-
>>
>>> Hi Amin,
>>> I am not familiar with SGE, do you think you could tell me what did you
>>> get
>>> from this combination? What is the benefit of running Hadoop on SGE?
>>>
>>>
>> Sun Grid Engine is a distributed resource management platform for
>> supercomputing centers. We use it to allocate resources to a supercomputing
>> task, such as requesting 32 processors to run a particular simulation. This
>> mechanism is analogous to the scheduler on a multi-user OS. What I was able
>> to accomplish was to turn Hadoop into an as-needed service. When you submit
>> a job request to run Hadoop as the documentation describes, a Hadoop cluster
>> of arbitrary size is instantiated depending on how many nodes were requested
>> by generating a cluster configuration specific to that job request. This
>> allows the Hadoop cluster to be deployed within the context of Gridengine,
>> as well as being able to coexist with other running simulations on the
>> cluster.
>>
>> To the researcher or user needing to run a mapreduce code, all they need
>> to worry about is telling Hadoop to execute it as well as determining how
>> many machines should be dedicated to the task. This benefit makes Hadoop
>> very accessible to people since they don't need to worry about configuring a
>> cluster, SGE and it's helper scripts do it for them.
>>
>> As Steve Loughran accurately commented, as of now we can only run one set
>> of Hadoop slave processes per machine, due to the network binding issue.
>> That problem is mitigated by configuring SGE to spread the slaves one per
>> machine automatically to avoid failures.
>>
>
> Only the Namenode and JobTracker need hard-coded/well-known port numbers,
> the rest could all be done dynamically.
>
> One thing SGE does offer over Xen-hosted images is better performance than
> virtual machines, for both CPU  and storage, as  virtualised disk
> performance can be awful, and even on the latest x86 parts, there is a
> measurable hit from VM overheads.
>



-- 
http://blog.lukas-vlcek.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message