Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 92862 invoked from network); 18 Feb 2009 19:05:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Feb 2009 19:05:42 -0000 Received: (qmail 29348 invoked by uid 500); 18 Feb 2009 19:05:36 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 29294 invoked by uid 500); 18 Feb 2009 19:05:36 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 29283 invoked by uid 99); 18 Feb 2009 19:05:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2009 11:05:36 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [131.247.250.65] (HELO mail.rc.usf.edu) (131.247.250.65) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2009 19:05:27 +0000 Received: by mail.rc.usf.edu (Postfix, from userid 23454) id 3C864A805F; Wed, 18 Feb 2009 14:05:07 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail.rc.usf.edu X-Spam-Level: Received: from gideon.rc.usf.edu (unknown [131.247.250.89]) by mail.rc.usf.edu (Postfix) with ESMTP id 4BAF0A8058 for ; Wed, 18 Feb 2009 14:05:02 -0500 (EST) Message-ID: <499C5BDD.2090103@rc.usf.edu> Date: Wed, 18 Feb 2009 14:05:01 -0500 From: Amin Astaneh User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: core-dev@hadoop.apache.org Subject: Re: Integration with SGE References: <499AD210.5010900@rc.usf.edu> <52c3ddca0902180149v5f6ee302lb6d0be0effc6d79b@mail.gmail.com> <499C1DEF.6060808@rc.usf.edu> <499C3640.1080703@apache.org> <52c3ddca0902181001l5ba51e2cl14fab75560bb4694@mail.gmail.com> <499C5582.7070409@rc.usf.edu> <4aa34eb70902181049n6f2b6b96k6adaaf3578b9ee03@mail.gmail.com> In-Reply-To: <4aa34eb70902181049n6f2b6b96k6adaaf3578b9ee03@mail.gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-101.9 required=5.0 tests=AWL,BAYES_00, RCVD_IN_SORBS_DUL,RDNS_NONE,USER_IN_WHITELIST autolearn=no version=3.2.5 Dhruba- Just did. Thanks! -Amin > This is cool work! A convenient place to document this information is in the > hadoop wiki: > > http://wiki.apache.org/hadoop/ > > At the bottom of this page, there is a section titled "Related Projects". > You might want to insert a link in that section. > > thanka, > dhruba > > > On Wed, Feb 18, 2009 at 10:37 AM, Amin Astaneh wrote: > > >> Luk�- >> >> Well, we have a graduate student that is using our facilities for a >> Masters' thesis in Map/Reduce. You guys are generating topics in computer >> science research. >> >> What do we need to do in order to get our documentation on the Hadoop >> pages? >> >> -Amin >> >> Thanks guys,it is good to head that Hadoop is spreading... :-) >> >>> Regards, >>> Lukas >>> >>> On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran >>> wrote: >>> >>> >>> >>> >>>> Amin Astaneh wrote: >>>> >>>> >>>> >>>> >>>>> Luk�- >>>>> >>>>> >>>>> >>>>> >>>>>> Hi Amin, >>>>>> I am not familiar with SGE, do you think you could tell me what did you >>>>>> get >>>>>> from this combination? What is the benefit of running Hadoop on SGE? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> Sun Grid Engine is a distributed resource management platform for >>>>> supercomputing centers. We use it to allocate resources to a >>>>> supercomputing >>>>> task, such as requesting 32 processors to run a particular simulation. >>>>> This >>>>> mechanism is analogous to the scheduler on a multi-user OS. What I was >>>>> able >>>>> to accomplish was to turn Hadoop into an as-needed service. When you >>>>> submit >>>>> a job request to run Hadoop as the documentation describes, a Hadoop >>>>> cluster >>>>> of arbitrary size is instantiated depending on how many nodes were >>>>> requested >>>>> by generating a cluster configuration specific to that job request. This >>>>> allows the Hadoop cluster to be deployed within the context of >>>>> Gridengine, >>>>> as well as being able to coexist with other running simulations on the >>>>> cluster. >>>>> >>>>> To the researcher or user needing to run a mapreduce code, all they need >>>>> to worry about is telling Hadoop to execute it as well as determining >>>>> how >>>>> many machines should be dedicated to the task. This benefit makes Hadoop >>>>> very accessible to people since they don't need to worry about >>>>> configuring a >>>>> cluster, SGE and it's helper scripts do it for them. >>>>> >>>>> As Steve Loughran accurately commented, as of now we can only run one >>>>> set >>>>> of Hadoop slave processes per machine, due to the network binding issue. >>>>> That problem is mitigated by configuring SGE to spread the slaves one >>>>> per >>>>> machine automatically to avoid failures. >>>>> >>>>> >>>>> >>>>> >>>> Only the Namenode and JobTracker need hard-coded/well-known port numbers, >>>> the rest could all be done dynamically. >>>> >>>> One thing SGE does offer over Xen-hosted images is better performance >>>> than >>>> virtual machines, for both CPU and storage, as virtualised disk >>>> performance can be awful, and even on the latest x86 parts, there is a >>>> measurable hit from VM overheads. >>>> >>>> >>>> >>>> >>> >>> >>> >>> >> > >