hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruba Borthakur <dhr...@gmail.com>
Subject Re: Integration with SGE
Date Wed, 18 Feb 2009 18:49:36 GMT
This is cool work! A convenient place to document this information is in the
hadoop wiki:

http://wiki.apache.org/hadoop/

At the bottom of this page, there is a section titled "Related Projects".
You might want to insert a link in that section.

thanka,
dhruba


On Wed, Feb 18, 2009 at 10:37 AM, Amin Astaneh <aastaneh@rc.usf.edu> wrote:

> Lukáš-
>
> Well, we have a graduate student that is using our facilities for a
> Masters' thesis in Map/Reduce. You guys are generating topics in computer
> science research.
>
> What do we need to do in order to get our documentation on the Hadoop
> pages?
>
> -Amin
>
>  Thanks guys,it is good to head that Hadoop is spreading... :-)
>> Regards,
>> Lukas
>>
>> On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <stevel@apache.org>
>> wrote:
>>
>>
>>
>>> Amin Astaneh wrote:
>>>
>>>
>>>
>>>> Lukáš-
>>>>
>>>>
>>>>
>>>>> Hi Amin,
>>>>> I am not familiar with SGE, do you think you could tell me what did you
>>>>> get
>>>>> from this combination? What is the benefit of running Hadoop on SGE?
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Sun Grid Engine is a distributed resource management platform for
>>>> supercomputing centers. We use it to allocate resources to a
>>>> supercomputing
>>>> task, such as requesting 32 processors to run a particular simulation.
>>>> This
>>>> mechanism is analogous to the scheduler on a multi-user OS. What I was
>>>> able
>>>> to accomplish was to turn Hadoop into an as-needed service. When you
>>>> submit
>>>> a job request to run Hadoop as the documentation describes, a Hadoop
>>>> cluster
>>>> of arbitrary size is instantiated depending on how many nodes were
>>>> requested
>>>> by generating a cluster configuration specific to that job request. This
>>>> allows the Hadoop cluster to be deployed within the context of
>>>> Gridengine,
>>>> as well as being able to coexist with other running simulations on the
>>>> cluster.
>>>>
>>>> To the researcher or user needing to run a mapreduce code, all they need
>>>> to worry about is telling Hadoop to execute it as well as determining
>>>> how
>>>> many machines should be dedicated to the task. This benefit makes Hadoop
>>>> very accessible to people since they don't need to worry about
>>>> configuring a
>>>> cluster, SGE and it's helper scripts do it for them.
>>>>
>>>> As Steve Loughran accurately commented, as of now we can only run one
>>>> set
>>>> of Hadoop slave processes per machine, due to the network binding issue.
>>>> That problem is mitigated by configuring SGE to spread the slaves one
>>>> per
>>>> machine automatically to avoid failures.
>>>>
>>>>
>>>>
>>> Only the Namenode and JobTracker need hard-coded/well-known port numbers,
>>> the rest could all be done dynamically.
>>>
>>> One thing SGE does offer over Xen-hosted images is better performance
>>> than
>>> virtual machines, for both CPU  and storage, as  virtualised disk
>>> performance can be awful, and even on the latest x86 parts, there is a
>>> measurable hit from VM overheads.
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message