Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of umka@stanford.edu designates
 171.67.219.89 as permitted sender)
From: "Dmitry Pushkarev" <umka@stanford.edu>
To: <common-user@hadoop.apache.org>
References: <AANLkTikosfkEj4G2Bre0WYE4OeJf89uSLEDAIHke4o0I@mail.gmail.com>
	<C8511579.1934C%amarsri@yahoo-inc.com>
 <AANLkTilTKALE3evCuYIm_61xMkXqwHUa2S8lL3XdT8LC@mail.gmail.com>
In-Reply-To: <AANLkTilTKALE3evCuYIm_61xMkXqwHUa2S8lL3XdT8LC@mail.gmail.com>
Subject: Hadoop and SGE
Date: Wed, 30 Jun 2010 03:59:48 -0700
Message-ID: <000301cb1843$5b62ace0$122806a0$@edu>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
Thread-Index: AcsYPv/ebPatGu5RR6Ozu+MgO3F7mAAAs8mw
Content-Language: en-us

Dear Hadoop users,

I'm in the process of building a new cluster for our lab and I'm trying to
run SGE simultaneously with hadoop. Idea is that each node would function as
datanode at all times, but depending on situation and a fraction of nodes
will run SGE instead of plain. SGE jobs will not have access to HDFS or
local filesystem (except for /tmp) and will run out of external NAS, they
aren't supposed to be IO bound.  

I'm trying to figure out of what's the best way to setup this resource
sharing. One way would be to shutdown tasktrackers on reserved nodes and add
them to SGE pool. Another way is run tasktrackers as SGE jobs and each
tasktracker would shut down after some idle time. 

Has anyone tried something like this? I'd appreciate any advice.

Thanks.