Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 28F63E210 for ; Thu, 29 Nov 2012 00:31:06 +0000 (UTC) Received: (qmail 78404 invoked by uid 500); 29 Nov 2012 00:30:59 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 78244 invoked by uid 500); 29 Nov 2012 00:30:59 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 78236 invoked by uid 99); 29 Nov 2012 00:30:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2012 00:30:59 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mogwaing@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2012 00:30:54 +0000 Received: by mail-vc0-f176.google.com with SMTP id fl13so17367159vcb.35 for ; Wed, 28 Nov 2012 16:30:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=95ui3OM6H633X45QZtixyd+xPkbn/O/LcLikLeBlvx8=; b=XtTeSsh1SqpGgd4byGmv4NJHYIOTJYdIRIS+Za0Hib4eQ3u2fVkwAnlo7MiFLxQNg0 YrWRbkGipW+L8vgYGzzURiFdg3Jx7yWACFO3PZvnNlxYW8cu2qYdRfdwmhWz2Gh2fqNR 2eL2MO4XAiLc0H31ZP9JUXOypNP+d54an6hBdyjd1epuL2/mW/1UUOV1+RBC9S46Vc0g +QCtY//Tk8Shh+mpW9EwHRcZYYA9ed2kpDGkTftgELmKdas7/Qw1H4c7gyuuYXmLkwUj edo7HWzYEWJbWekovAAeg5cKysrBNBtdePFPO9oVyCsL6rP1pCEWxZ9YLH3gQ7mLNIjO NvWA== MIME-Version: 1.0 Received: by 10.59.13.135 with SMTP id ey7mr31068488ved.37.1354149033843; Wed, 28 Nov 2012 16:30:33 -0800 (PST) Received: by 10.58.94.101 with HTTP; Wed, 28 Nov 2012 16:30:33 -0800 (PST) In-Reply-To: References: Date: Thu, 29 Nov 2012 09:30:33 +0900 Message-ID: Subject: Re: Assigning reduce tasks to specific nodes From: Hiroyuki Yamada To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Thank you all for the comments and advices. I know it is not recommended to assigning mapper locations by myself. But There needs to be one mapper running in each node in some cases, so I need a strict way to do it. So, locations is taken care of by JobTracker(scheduler), but it is not strict. And, the only way to do it strictly is making a own scheduler, right ? I have checked the source and I am not sure where to modify to do it. What I understand is FairScheduler and others are for scheduling multiple jobs. Is this right ? What I want to do is scheduling tasks in one job. This can be achieved by FairScheduler and others ? Regards, Hiroyuki On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel wrote: > Mappers? Uhm... yes you can do it. > Yes it is non-trivial. > Yes, it is not recommended. > > I think we talk a bit about this in an InfoQ article written by Boris > Lublinsky. > > Its kind of wild when your entire cluster map goes red in ganglia... :-) > > > On Nov 28, 2012, at 2:41 AM, Harsh J wrote: > > Hi, > > Mapper scheduling is indeed influenced by the getLocations() returned > results of the InputSplit. > > The map task itself does not care about deserializing the location > information, as it is of no use to it. The location information is vital to > the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly > when a job is submitted. The locations are used pretty well here. > > You should be able to control (or rather, influence) mapper placement by > working with the InputSplits, but not strictly so, cause in the end its up > to your MR scheduler to do data local or non data local assignments. > > > On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada > wrote: >> >> Hi Harsh, >> >> Thank you for the information. >> I understand the current circumstances. >> >> How about for mappers ? >> As far as I tested, location information in InputSplit is ignored in >> 0.20.2, >> so there seems no easy way for assigning mappers to specific nodes. >> (I before checked the source and noticed that >> location information is not restored when deserializing the InputSplit >> instance.) >> >> Thanks, >> Hiroyuki >> >> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J wrote: >> > This is not supported/available currently even in MR2, but take a look >> > at >> > https://issues.apache.org/jira/browse/MAPREDUCE-199. >> > >> > >> > On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada >> > wrote: >> >> >> >> Hi, >> >> >> >> I am wondering how I can assign reduce tasks to specific nodes. >> >> What I want to do is, for example, assigning reducer which produces >> >> part-00000 to node xxx000, >> >> and part-00001 to node xxx001 and so on. >> >> >> >> I think it's abount task assignment scheduling but >> >> I am not sure where to customize to achieve this. >> >> Is this done by writing some extensions ? >> >> or any easier way to do this ? >> >> >> >> Regards, >> >> Hiroyuki >> > >> > >> > >> > >> > -- >> > Harsh J > > > > > -- > Harsh J > >