Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 76615 invoked from network); 18 Feb 2009 18:38:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Feb 2009 18:38:43 -0000 Received: (qmail 64718 invoked by uid 500); 18 Feb 2009 18:38:40 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 64689 invoked by uid 500); 18 Feb 2009 18:38:40 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 64678 invoked by uid 99); 18 Feb 2009 18:38:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2009 10:38:40 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [131.247.250.65] (HELO mail.rc.usf.edu) (131.247.250.65) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2009 18:38:32 +0000 Received: by mail.rc.usf.edu (Postfix, from userid 23454) id 53C7EA805F; Wed, 18 Feb 2009 13:38:12 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail.rc.usf.edu X-Spam-Level: Received: from gideon.rc.usf.edu (unknown [131.247.250.89]) by mail.rc.usf.edu (Postfix) with ESMTP id 609A4A8058 for ; Wed, 18 Feb 2009 13:37:55 -0500 (EST) Message-ID: <499C5582.7070409@rc.usf.edu> Date: Wed, 18 Feb 2009 13:37:54 -0500 From: Amin Astaneh User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: core-dev@hadoop.apache.org Subject: Re: Integration with SGE References: <499AD210.5010900@rc.usf.edu> <52c3ddca0902180149v5f6ee302lb6d0be0effc6d79b@mail.gmail.com> <499C1DEF.6060808@rc.usf.edu> <499C3640.1080703@apache.org> <52c3ddca0902181001l5ba51e2cl14fab75560bb4694@mail.gmail.com> In-Reply-To: <52c3ddca0902181001l5ba51e2cl14fab75560bb4694@mail.gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-101.9 required=5.0 tests=AWL,BAYES_00, RCVD_IN_SORBS_DUL,RDNS_NONE,USER_IN_WHITELIST autolearn=no version=3.2.5 Luk�- Well, we have a graduate student that is using our facilities for a Masters' thesis in Map/Reduce. You guys are generating topics in computer science research. What do we need to do in order to get our documentation on the Hadoop pages? -Amin > Thanks guys,it is good to head that Hadoop is spreading... :-) > Regards, > Lukas > > On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran wrote: > > >> Amin Astaneh wrote: >> >> >>> Luk�- >>> >>> >>>> Hi Amin, >>>> I am not familiar with SGE, do you think you could tell me what did you >>>> get >>>> from this combination? What is the benefit of running Hadoop on SGE? >>>> >>>> >>>> >>> Sun Grid Engine is a distributed resource management platform for >>> supercomputing centers. We use it to allocate resources to a supercomputing >>> task, such as requesting 32 processors to run a particular simulation. This >>> mechanism is analogous to the scheduler on a multi-user OS. What I was able >>> to accomplish was to turn Hadoop into an as-needed service. When you submit >>> a job request to run Hadoop as the documentation describes, a Hadoop cluster >>> of arbitrary size is instantiated depending on how many nodes were requested >>> by generating a cluster configuration specific to that job request. This >>> allows the Hadoop cluster to be deployed within the context of Gridengine, >>> as well as being able to coexist with other running simulations on the >>> cluster. >>> >>> To the researcher or user needing to run a mapreduce code, all they need >>> to worry about is telling Hadoop to execute it as well as determining how >>> many machines should be dedicated to the task. This benefit makes Hadoop >>> very accessible to people since they don't need to worry about configuring a >>> cluster, SGE and it's helper scripts do it for them. >>> >>> As Steve Loughran accurately commented, as of now we can only run one set >>> of Hadoop slave processes per machine, due to the network binding issue. >>> That problem is mitigated by configuring SGE to spread the slaves one per >>> machine automatically to avoid failures. >>> >>> >> Only the Namenode and JobTracker need hard-coded/well-known port numbers, >> the rest could all be done dynamically. >> >> One thing SGE does offer over Xen-hosted images is better performance than >> virtual machines, for both CPU and storage, as virtualised disk >> performance can be awful, and even on the latest x86 parts, there is a >> measurable hit from VM overheads. >> >> > > > >