Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <499C5BDD.2090103@rc.usf.edu>
Date: Wed, 18 Feb 2009 14:05:01 -0500
From: Amin Astaneh <aastaneh@rc.usf.edu>
User-Agent: Thunderbird 2.0.0.19 (X11/20090105)
MIME-Version: 1.0
To: core-dev@hadoop.apache.org
Subject: Re: Integration with SGE
References: <499AD210.5010900@rc.usf.edu>
	 <52c3ddca0902180149v5f6ee302lb6d0be0effc6d79b@mail.gmail.com>
	 <499C1DEF.6060808@rc.usf.edu>
 <499C3640.1080703@apache.org>	 <52c3ddca0902181001l5ba51e2cl14fab75560bb4694@mail.gmail.com>
	 <499C5582.7070409@rc.usf.edu>
 <4aa34eb70902181049n6f2b6b96k6adaaf3578b9ee03@mail.gmail.com>
In-Reply-To: <4aa34eb70902181049n6f2b6b96k6adaaf3578b9ee03@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit

Dhruba-

Just did. Thanks!

-Amin
> This is cool work! A convenient place to document this information is in the
> hadoop wiki:
>
> http://wiki.apache.org/hadoop/
>
> At the bottom of this page, there is a section titled "Related Projects".
> You might want to insert a link in that section.
>
> thanka,
> dhruba
>
>
> On Wed, Feb 18, 2009 at 10:37 AM, Amin Astaneh <aastaneh@rc.usf.edu> wrote:
>
>   
>> Luk�-
>>
>> Well, we have a graduate student that is using our facilities for a
>> Masters' thesis in Map/Reduce. You guys are generating topics in computer
>> science research.
>>
>> What do we need to do in order to get our documentation on the Hadoop
>> pages?
>>
>> -Amin
>>
>>  Thanks guys,it is good to head that Hadoop is spreading... :-)
>>     
>>> Regards,
>>> Lukas
>>>
>>> On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <stevel@apache.org>
>>> wrote:
>>>
>>>
>>>
>>>       
>>>> Amin Astaneh wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> Luk�-
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Hi Amin,
>>>>>> I am not familiar with SGE, do you think you could tell me what did you
>>>>>> get
>>>>>> from this combination? What is the benefit of running Hadoop on SGE?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Sun Grid Engine is a distributed resource management platform for
>>>>> supercomputing centers. We use it to allocate resources to a
>>>>> supercomputing
>>>>> task, such as requesting 32 processors to run a particular simulation.
>>>>> This
>>>>> mechanism is analogous to the scheduler on a multi-user OS. What I was
>>>>> able
>>>>> to accomplish was to turn Hadoop into an as-needed service. When you
>>>>> submit
>>>>> a job request to run Hadoop as the documentation describes, a Hadoop
>>>>> cluster
>>>>> of arbitrary size is instantiated depending on how many nodes were
>>>>> requested
>>>>> by generating a cluster configuration specific to that job request. This
>>>>> allows the Hadoop cluster to be deployed within the context of
>>>>> Gridengine,
>>>>> as well as being able to coexist with other running simulations on the
>>>>> cluster.
>>>>>
>>>>> To the researcher or user needing to run a mapreduce code, all they need
>>>>> to worry about is telling Hadoop to execute it as well as determining
>>>>> how
>>>>> many machines should be dedicated to the task. This benefit makes Hadoop
>>>>> very accessible to people since they don't need to worry about
>>>>> configuring a
>>>>> cluster, SGE and it's helper scripts do it for them.
>>>>>
>>>>> As Steve Loughran accurately commented, as of now we can only run one
>>>>> set
>>>>> of Hadoop slave processes per machine, due to the network binding issue.
>>>>> That problem is mitigated by configuring SGE to spread the slaves one
>>>>> per
>>>>> machine automatically to avoid failures.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> Only the Namenode and JobTracker need hard-coded/well-known port numbers,
>>>> the rest could all be done dynamically.
>>>>
>>>> One thing SGE does offer over Xen-hosted images is better performance
>>>> than
>>>> virtual machines, for both CPU  and storage, as  virtualised disk
>>>> performance can be awful, and even on the latest x86 parts, there is a
>>>> measurable hit from VM overheads.
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>>
>>>       
>>     
>
>