hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Astaneh <aasta...@rc.usf.edu>
Subject Re: Integration with SGE
Date Wed, 18 Feb 2009 19:05:01 GMT
Dhruba-

Just did. Thanks!

-Amin
> This is cool work! A convenient place to document this information is in the
> hadoop wiki:
>
> http://wiki.apache.org/hadoop/
>
> At the bottom of this page, there is a section titled "Related Projects".
> You might want to insert a link in that section.
>
> thanka,
> dhruba
>
>
> On Wed, Feb 18, 2009 at 10:37 AM, Amin Astaneh <aastaneh@rc.usf.edu> wrote:
>
>   
>> Lukáš-
>>
>> Well, we have a graduate student that is using our facilities for a
>> Masters' thesis in Map/Reduce. You guys are generating topics in computer
>> science research.
>>
>> What do we need to do in order to get our documentation on the Hadoop
>> pages?
>>
>> -Amin
>>
>>  Thanks guys,it is good to head that Hadoop is spreading... :-)
>>     
>>> Regards,
>>> Lukas
>>>
>>> On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <stevel@apache.org>
>>> wrote:
>>>
>>>
>>>
>>>       
>>>> Amin Astaneh wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> Lukáš-
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Hi Amin,
>>>>>> I am not familiar with SGE, do you think you could tell me what did
you
>>>>>> get
>>>>>> from this combination? What is the benefit of running Hadoop on SGE?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Sun Grid Engine is a distributed resource management platform for
>>>>> supercomputing centers. We use it to allocate resources to a
>>>>> supercomputing
>>>>> task, such as requesting 32 processors to run a particular simulation.
>>>>> This
>>>>> mechanism is analogous to the scheduler on a multi-user OS. What I was
>>>>> able
>>>>> to accomplish was to turn Hadoop into an as-needed service. When you
>>>>> submit
>>>>> a job request to run Hadoop as the documentation describes, a Hadoop
>>>>> cluster
>>>>> of arbitrary size is instantiated depending on how many nodes were
>>>>> requested
>>>>> by generating a cluster configuration specific to that job request. This
>>>>> allows the Hadoop cluster to be deployed within the context of
>>>>> Gridengine,
>>>>> as well as being able to coexist with other running simulations on the
>>>>> cluster.
>>>>>
>>>>> To the researcher or user needing to run a mapreduce code, all they need
>>>>> to worry about is telling Hadoop to execute it as well as determining
>>>>> how
>>>>> many machines should be dedicated to the task. This benefit makes Hadoop
>>>>> very accessible to people since they don't need to worry about
>>>>> configuring a
>>>>> cluster, SGE and it's helper scripts do it for them.
>>>>>
>>>>> As Steve Loughran accurately commented, as of now we can only run one
>>>>> set
>>>>> of Hadoop slave processes per machine, due to the network binding issue.
>>>>> That problem is mitigated by configuring SGE to spread the slaves one
>>>>> per
>>>>> machine automatically to avoid failures.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> Only the Namenode and JobTracker need hard-coded/well-known port numbers,
>>>> the rest could all be done dynamically.
>>>>
>>>> One thing SGE does offer over Xen-hosted images is better performance
>>>> than
>>>> virtual machines, for both CPU  and storage, as  virtualised disk
>>>> performance can be awful, and even on the latest x86 parts, there is a
>>>> measurable hit from VM overheads.
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>>
>>>       
>>     
>
>   


Mime
View raw message