Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EAB0E82E for ; Mon, 11 Feb 2013 02:30:19 +0000 (UTC) Received: (qmail 7813 invoked by uid 500); 11 Feb 2013 02:30:14 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 7725 invoked by uid 500); 11 Feb 2013 02:30:14 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 7718 invoked by uid 99); 11 Feb 2013 02:30:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 02:30:14 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.136.218.174] (HELO nm11-vm7.bullet.mail.gq1.yahoo.com) (98.136.218.174) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 11 Feb 2013 02:30:05 +0000 Received: from [98.137.12.59] by nm11.bullet.mail.gq1.yahoo.com with NNFMP; 11 Feb 2013 02:29:44 -0000 Received: from [208.71.42.196] by tm4.bullet.mail.gq1.yahoo.com with NNFMP; 11 Feb 2013 02:29:43 -0000 Received: from [127.0.0.1] by smtp207.mail.gq1.yahoo.com with NNFMP; 11 Feb 2013 02:29:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1360549783; bh=VJhpsothJZD7XZjQ9Ao6DIHBNudjmCqvviG7YEoHIzo=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:From:To:References:In-Reply-To:Subject:Date:Message-ID:MIME-Version:Content-Type:X-Mailer:thread-index:Content-Language; b=DXqbLBkI57B8ljo7uinhJ12TSspp7UHa7vaVLMvfPBWbt0F+svTsg2wftArkzHy5WZy3Qg5jSqZVDQIfuTE6hCSsZyvXqoAB2aLygh0PlT13iJ//sUgbalSTMDpIIoKVld6vMszsuWGGaWUZi457BqcJ3vvmOIDP8ZbGjZqfmqE= X-Yahoo-Newman-Id: 958305.50955.bm@smtp207.mail.gq1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: xOrjXXkVM1lESTgiTMkt2I4.pwctnNaOKOGk2Q5b6Bn1RDn mI5d.MocIxejZsUCZIaxmlIA1pyMPv.CVBMA_v2JHjGTn4dMESoblrLOOJuq _rb0ONVWSdaF5vZhW8jUNd1aTPzOaRIZzFz5xefz4EhNb1zxMFhFrKD9Oah_ 0c3WivxR8lq6Xu8hPo8Rf1yMnTEqXt_b07cUegRTuZPuymPJcQyzguVCKpar Er1Ix9wTBpOBdSKwaqvIPt7T122jk9N9LpFoW05SIUJvEZuqN69Q2J1qgzhm Q9oiNoQBsLDdqM3OLRYLYBYjpsHWb7JiRvRe8pcZX.ngBGW159Dz2BLnVKUb RfVQmA7RIkSN_Cw4AbAoh0v1mJTFp3aOwH_ShIhdRzttjg.e8tJ9U22Ch0uN 2DL9EO5BY.T1zEHspnfVvrlr20roRbKh4O0d0BEE0iij1NQZtaQ5nOo_oS0Z 3mxYauALmNDQ- X-Yahoo-SMTP: k2gD1GeswBAV_JFpZm8dmpTCwr4ufTKOyA-- Received: from sattelite (davidparks21@1.54.13.8 with login) by smtp207.mail.gq1.yahoo.com with SMTP; 10 Feb 2013 18:29:43 -0800 PST From: "David Parks" To: References: <088601ce0679$2e56c110$8b044330$@yahoo.com> In-Reply-To: Subject: RE: How can I limit reducers to one-per-node? Date: Mon, 11 Feb 2013 09:29:39 +0700 Message-ID: <0a5f01ce07ff$a59a2280$f0ce6780$@yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0A60_01CE083A.51FA3300" X-Mailer: Microsoft Outlook 14.0 thread-index: AQJTcGUtpoQ37VYnZUxvv59NqhcA+gJVkOSmAei9gzKXRyu14A== Content-Language: en-us X-Virus-Checked: Checked by ClamAV on apache.org This is a multipart message in MIME format. ------=_NextPart_000_0A60_01CE083A.51FA3300 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I guess the FairScheduler is doing multiple assignments per heartbeat, hence the behavior of multiple reduce tasks per node even when they should otherwise be full distributed. Adding a combiner will change this behavior? Could you explain more? Thanks! David From: Michael Segel [mailto:michael_segel@hotmail.com] Sent: Monday, February 11, 2013 8:30 AM To: user@hadoop.apache.org Subject: Re: How can I limit reducers to one-per-node? Adding a combiner step first then reduce? On Feb 8, 2013, at 11:18 PM, Harsh J wrote: Hey David, There's no readily available way to do this today (you may be interested in MAPREDUCE-199 though) but if your Job scheduler's not doing multiple-assignments on reduce tasks, then only one is assigned per TT heartbeat, which gives you almost what you're looking for: 1 reduce task per node, round-robin'd (roughly). On Sat, Feb 9, 2013 at 9:24 AM, David Parks wrote: I have a cluster of boxes with 3 reducers per node. I want to limit a particular job to only run 1 reducer per node. This job is network IO bound, gathering images from a set of webservers. My job has certain parameters set to meet "web politeness" standards (e.g. limit connects and connection frequency). If this job runs from multiple reducers on the same node, those per-host limits will be violated. Also, this is a shared environment and I don't want long running network bound jobs uselessly taking up all reduce slots. -- Harsh J Michael Segel | (m) 312.755.9623 Segel and Associates ------=_NextPart_000_0A60_01CE083A.51FA3300 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I guess the FairScheduler is doing multiple assignments per = heartbeat, hence the behavior of multiple reduce tasks per node even = when they should otherwise be full distributed.

 

Adding a combiner will change this behavior? Could you explain = more?

 

Thanks!

David

 

 

From:= = Michael Segel [mailto:michael_segel@hotmail.com]
Sent: = Monday, February 11, 2013 8:30 AM
To: = user@hadoop.apache.org
Subject: Re: How can I limit reducers = to one-per-node?

 

Adding a = combiner step first then reduce? 

 

 

On = Feb 8, 2013, at 11:18 PM, Harsh J <harsh@cloudera.com> = wrote:



Hey = David,

There's no readily available way to do this today (you may = be
interested in MAPREDUCE-199 though) but if your Job scheduler's = not
doing multiple-assignments on reduce tasks, then only one is = assigned
per TT heartbeat, which gives you almost what you're looking = for: 1
reduce task per node, round-robin'd (roughly).

On Sat, = Feb 9, 2013 at 9:24 AM, David Parks <davidparks21@yahoo.com> = wrote:

I have a cluster of = boxes with 3 reducers per node. I want to limit a
particular job to = only run 1 reducer per node.



This job is network IO = bound, gathering images from a set of webservers.



My job = has certain parameters set to meet “web politeness” = standards (e.g.
limit connects and connection = frequency).



If this job runs from multiple reducers on = the same node, those per-host
limits will be violated.  Also, = this is a shared environment and I don’t
want long running = network bound jobs uselessly taking up all reduce = slots.




--
Harsh = J

 

Michael Segel  | (m) = 312.755.9623

Segel and = Associates

 

------=_NextPart_000_0A60_01CE083A.51FA3300--