Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 25784 invoked from network); 19 Aug 2006 00:26:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 19 Aug 2006 00:26:15 -0000 Received: (qmail 56685 invoked by uid 500); 19 Aug 2006 00:26:15 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 56663 invoked by uid 500); 19 Aug 2006 00:26:14 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 56654 invoked by uid 99); 19 Aug 2006 00:26:14 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Aug 2006 17:26:14 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Aug 2006 17:26:14 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D404B7142E4 for ; Sat, 19 Aug 2006 00:23:14 +0000 (GMT) Message-ID: <14835053.1155946994865.JavaMail.jira@brutus> Date: Fri, 18 Aug 2006 17:23:14 -0700 (PDT) From: "Kimoon Kim (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-428) Condor and Hadoop Map Reduce integration In-Reply-To: <17661487.1154983273932.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-428?page=comments#action_12429160 ] Kimoon Kim commented on HADOOP-428: ----------------------------------- Found a 2005 paper where the authors modified condor_startd so that node ClassAd includes the list of SRM input files available on an execute node. (see http://sdm.lbl.gov/~arie/papers/CoScheduling.SSDBM05.pdf) This allows a task to be scheduled to a node that has an input file, similar to how a map task of hadoop gets scheduled. Authors claim the modification of condor_startd was not so hard to them. Key optimization was not to import all files, but to only import files for current job(s) so that Condor matchmaking avoids scale barrier. > Condor and Hadoop Map Reduce integration > ---------------------------------------- > > Key: HADOOP-428 > URL: http://issues.apache.org/jira/browse/HADOOP-428 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Devaraj Das > Assigned To: Devaraj Das > > The issue is about using/enhancing Condor's features for Hadoop's Map Reduce framework. Some of the early thoughts in this respect: > * One should be able to submit a MR job that takes advantage of Condor's features like node reservation according to a job's requirements, monitoring of jobs, etc. > * JobTracker and TaskTrackers work as Master/Workers in the Condor environment. One should be able to simply start a MR cluster and the cluster goes down when the job is done. > * The classads can have an attribute for input file block locations that will be an input to Condor's scheduling decisions. > * Condor's features of monitoring jobs can be leveraged to reschedule failed TaskTrackers. Checkpointing of JobTrackers can also probably be done so that if the JobTracker job dies for some reason, the failed jobs can be restarted to start from the point where the JobTracker was last checkpointed at (assuming the input data has not changed). > * User priorities, job priorities should also be handled. If nodes are currently in use due to a job being run by one user, and another user of the same priority submits a new job, it gets queued and opportunistically the job of the second user is scheduled - for e.g., one master and 1 worker to start with and then 2 workers and so on... If the second user is of a higher priority, then the first user's job is completely suspended. > Please add your thoughts on this topic. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira